# Lab 7: Deep Learning for Multiclass Classification
## Multiclass Classification for Covertype Dataset
This notebook demonstrates a step-by-step implementation of a neural network for multiclass classification using the Covertype dataset. The goal is to classify forest cover types based on input features.

In [12]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam

## Step 1: Load and Preprocess the Data
We load the Covertype dataset, split it into training and test sets, standardize the features, and one-hot encode the target labels for multiclass classification.

Classes: 7
<br/>Samples total: 581012
<br/>Features: 54
<br/>Features type:int

1. Elevation / quantitative /meters / Elevation in meters
2. Aspect / quantitative / azimuth / Aspect in degrees azimuth
3. Slope / quantitative / degrees / Slope in degrees
4. Horizontal_Distance_To_Hydrology / quantitative / meters / Horz Dist to nearest surface water features
5. Vertical_Distance_To_Hydrology / quantitative / meters / Vert Dist to nearest surface water features
6. Horizontal_Distance_To_Roadways / quantitative / meters / Horz Dist to nearest roadway
7. Hillshade_9am / quantitative / 0 to 255 index / Hillshade index at 9am, summer solstice
8. Hillshade_Noon / quantitative / 0 to 255 index / Hillshade index at noon, summer soltice
9. Hillshade_3pm / quantitative / 0 to 255 index / Hillshade index at 3pm, summer solstice
10. Horizontal_Distance_To_Fire_Points / quantitative / meters / Horz Dist to nearest wildfire ignition points
11. Wilderness_Area (4 binary columns) / qualitative / 0 (absence) or 1 (presence) / Wilderness area designation
12. Soil_Type (40 binary columns) / qualitative / 0 (absence) or 1 (presence) / Soil Type designation
13. Cover_Type (7 types) / integer / 1 to 7 / Forest Cover Type designation
<br/>Target: Cover_Type

In [13]:
# Load the Covertype dataset to a pandas DataFrame(df)
# Separate the features(X) and labels(y)

# -------------------------------
# Your code here
CSV_PATH = "cover_dataset.csv"
df = pd.read_csv(CSV_PATH)
# -------------------------------

# Convert labels to integers (from 1-7 to 0-6 for classification)
# -------------------------------
# Your code here
X = df.drop(['target'],axis=1)
y = df['target']

y = y-1

# -------------------------------

In [14]:
# Show the first 5 rows of the dataset
df.head()

Unnamed: 0,Elevation,Aspect,Slope,Horizontal_Distance_To_Hydrology,Vertical_Distance_To_Hydrology,Horizontal_Distance_To_Roadways,Hillshade_9am,Hillshade_Noon,Hillshade_3pm,Horizontal_Distance_To_Fire_Points,...,Soil_Type32,Soil_Type33,Soil_Type34,Soil_Type35,Soil_Type36,Soil_Type37,Soil_Type38,Soil_Type39,Soil_Type40,target
0,0.368684,0.141667,0.045455,0.184681,0.223514,0.071659,0.870079,0.913386,0.582677,0.875366,...,0,0,0,0,0,0,0,0,0,5
1,0.365683,0.155556,0.030303,0.151754,0.215762,0.054798,0.866142,0.925197,0.594488,0.867838,...,0,0,0,0,0,0,0,0,0,5
2,0.472736,0.386111,0.136364,0.19184,0.307494,0.446817,0.92126,0.937008,0.531496,0.853339,...,0,0,0,0,0,0,0,0,0,2
3,0.463232,0.430556,0.272727,0.173228,0.375969,0.434172,0.937008,0.937008,0.480315,0.865886,...,0,0,0,0,0,0,0,0,0,2
4,0.368184,0.125,0.030303,0.10952,0.222222,0.054939,0.866142,0.92126,0.590551,0.860449,...,0,0,0,0,0,0,0,0,0,5


## Split and Scale the Dataset
We split the dataset into training and test sets with a test size of 20% and scale the features using the StandardScaler.

In [15]:
# Split into training and test sets using 80% training and 20% testing
# Set random_state to 42 for reproducibility
# Use train_test_split function from scikit-learn
# -------------------------------
# Your code here
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=42)
# -------------------------------

# Standardize the feature data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


## Encode the Target Labels
We one-hot encode the target labels using the to_categorical function from the keras.utils module.

In [16]:
# One-hot encode the labels using to_categorical function from Keras
# -------------------------------
# Your code here
y_train = to_categorical(y_train, num_classes=7)
y_test = to_categorical(y_test, num_classes=7)
# -------------------------------
print(f"{'Data':<50} {'Label'}")
for i in range(10):
    print(f'{str(X_train[i]):<50} {y_train[i]}')


Data                                               Label
[ 1.17786285e+00 -1.19400849e+00  6.54796257e-01 -1.37987827e-01
  8.00187107e-01 -4.11852084e-01 -2.67395837e-01 -1.38183047e+00
 -5.36633187e-01  4.65720940e-01 -9.03292202e-01 -2.32806838e-01
  1.13853924e+00 -2.61005253e-01 -7.23442362e-02 -1.14539478e-01
 -9.17007702e-02 -1.47590004e-01 -5.26109663e-02 -1.06996288e-01
 -1.27036467e-02 -1.76650344e-02 -4.44363959e-02 -2.44188447e-01
 -1.47513892e-01 -2.33785999e-01 -1.75236284e-01 -3.21519794e-02
 -2.07433391e-03 -6.99307790e-02 -7.68357133e-02 -5.77497310e-02
 -8.30766865e-02 -1.27376360e-01 -3.80506028e-02 -2.47460780e-01
 -3.32544316e-01 -1.94220522e-01 -2.86419967e-02 -6.65741856e-02
 -4.30791298e-02 -4.06289194e-02 -4.97350313e-01 -2.33949729e-01
 -2.14827451e-01 -3.15216807e-01  3.44496063e+00 -5.24874638e-02
 -5.74300815e-02 -1.45957558e-02 -2.27290094e-02 -1.65864496e-01
 -1.56362447e-01 -1.23168489e-01] [1. 0. 0. 0. 0. 0. 0.]
[ 1.40493809e-02 -1.20294544e+00  5.21220

## Step 2: Build the Model
We define a neural network with multiple dense layers and dropout layers to prevent overfitting. The output layer uses a softmax activation for multiclass classification.

In [19]:
# Define the model
# Complete the code to build a Sequential model
# The model should have 4 Dense layers with 256, 128, 64, and 7 units
# Use 'relu' activation function for the first 3 layers and 'softmax' for the last layer
# Add Dropout layers with 0.3 dropout rate after the first 2 Dense layers

model = Sequential([
    Dense(256, activation='relu', input_shape=(54,)),
    Dropout(0.3),
    Dense(128, activation='relu'),
    Dropout(0.3),
    Dense(64, activation='relu'),
    Dense(7, activation='softmax')
])


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


## Step 3: Compile the Model
We compile the model using the Adam optimizer with a learning rate of 0.001, categorical crossentropy loss, and accuracy as the evaluation metric.

In [20]:
# Compile the model
# Use Adam optimizer with learning rate of 0.001

# -------------------------------
# Your code here
optimizer = Adam(learning_rate=0.001)
model.compile(optimizer=optimizer,
              loss='categorical_crossentropy',
              metrics=['accuracy'])
# -------------------------------

## Step 4: Train the Model
We train the model on the training data for 20 epochs, using a batch size of 64 and validating on the test set.

In [25]:
# Train the model
# Use 50 epochs and batch size of 32
# Use the training and test sets
# Save the training history to a variable(history)

# -------------------------------
# Your code here
history = model.fit(X_train, y_train, 
                    validation_data=(X_test, y_test), 
                    epochs=20, 
                    batch_size=256)
# -------------------------------

Epoch 1/20
[1m1816/1816[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.8614 - loss: 0.3379 - val_accuracy: 0.8968 - val_loss: 0.2587
Epoch 2/20
[1m1816/1816[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.8639 - loss: 0.3320 - val_accuracy: 0.8973 - val_loss: 0.2587
Epoch 3/20
[1m1816/1816[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.8634 - loss: 0.3327 - val_accuracy: 0.8984 - val_loss: 0.2544
Epoch 4/20
[1m1816/1816[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.8641 - loss: 0.3312 - val_accuracy: 0.8989 - val_loss: 0.2537
Epoch 5/20
[1m1816/1816[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.8653 - loss: 0.3285 - val_accuracy: 0.8997 - val_loss: 0.2515
Epoch 6/20
[1m1816/1816[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.8653 - loss: 0.3290 - val_accuracy: 0.9004 - val_loss: 0.2531
Epoch 7/20
[1m1

## Step 5: Evaluate the Model
We evaluate the model on the test set and print the test accuracy.

In [26]:
# Evaluate the model
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=2)
print(f"Test accuracy: {test_accuracy:.2f}")


3632/3632 - 4s - 1ms/step - accuracy: 0.9067 - loss: 0.2378
Test accuracy: 0.91


## Step 6: Make Predictions
We use the trained model to make predictions for the first 10 samples in the test set.

In [27]:
# Make predictions
# Use the first 10 test data points to make predictions(predictions)

# -------------------------------
# Your code here
predictions = model.predict(X_test[:10])
# -------------------------------

# Show the predicted probabilities
print('\n Predicted Probabilities:')
print(f"{'Class 0':<10}{'Class 1':<10}{'Class 2':<10}{'Class 3':<10}{'Class 4':<10}{'Class 5':<10}{'Class 6':<10}")
for pred_prob in predictions:
    print(f"{pred_prob[0]:<10.3}{pred_prob[1]:<10.3}{pred_prob[2]:<10.3}{pred_prob[3]:<10.3}{pred_prob[4]:<10.3}{pred_prob[5]:<10.3}{pred_prob[6]:<10.3}")

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 45ms/step

 Predicted Probabilities:
Class 0   Class 1   Class 2   Class 3   Class 4   Class 5   Class 6   
0.928     0.000227  1.87e-12  1.34e-14  6.56e-12  1.09e-11  0.0714    
0.164     0.795     0.00236   2.15e-05  0.00696   0.00679   0.0244    
0.000326  0.924     0.0139    7.17e-07  0.0311    0.0305    6.4e-09   
0.065     0.935     2.46e-12  1.5e-15   1.88e-06  3.92e-09  7.85e-09  
0.00102   0.898     2.83e-06  7.69e-12  0.101     3.95e-06  0.000377  
3.02e-14  4.1e-06   1.0       1.11e-05  3.99e-10  0.00035   6.53e-21  
0.0713    0.929     5.88e-11  1.33e-21  8.73e-14  1.18e-13  3.21e-08  
0.98      0.0083    1.22e-08  2.24e-10  2.93e-05  8.48e-07  0.0118    
0.0889    0.911     5.93e-14  1.63e-20  4.73e-07  1.05e-12  3.55e-07  
0.00376   0.982     0.00045   8.18e-07  0.0137    0.000242  5.31e-08  


## Step 7: Interpret the Predictions
We convert the predicted probabilities into class labels using `np.argmax`, and compare them with the true labels.

In [28]:
# Interpret the predictions
# Use np.argmax() to get the predicted class labels from the predicted probabilities

# -------------------------------
# Your code here
predicted_labels = np.argmax(predictions, axis=1)
true_labels = np.argmax(y_test[:10], axis=1)
# -------------------------------


# Show the predicted and true labels
print("Predicted labels:", predicted_labels)
print("True labels:", true_labels)


Predicted labels: [0 1 1 1 1 2 1 0 1 1]
True labels: [0 1 1 1 1 2 1 0 1 1]
