 ## C. Neural Network: MultiClass Classification 

Modify the previous architecture to model multi-class classification task. Test your architecture on the **Vehicle Silhouettes** Data Set ('Vehicles.csv'). Save your solution as a seperate notebook file with appropriate filename.

**Note:**
1. Perform the train/validate/test split as 70/15/15.
2. Use Random seed as '777' wherever needed.
3. Report appropriate measures in addition to accuracy and also plot the confusion matrix.

More details on the dataset can be found at: https://archive.ics.uci.edu/ml/datasets/Statlog+%28Vehicle+Silhouettes%29

In [1]:
# Package imports
from tensorflow.keras.utils import to_categorical
import numpy as np
import pandas as pd 
import sklearn
import sklearn.linear_model
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

%matplotlib inline

np.random.seed(777)






## 1. Loading the dataset and preprocessing

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler

# Load the dataset
df = pd.read_csv('Vehicles.csv', header=None)

# Preprocessing
# Assuming 'label_column_name' is the name of the column with the labels
X = df.iloc[:, :-1]
Y = df.iloc[:, -1]

# Convert labels to numerical values
# Assuming the labels are strings and there are 4 unique classes
label_encoder = OneHotEncoder(sparse=False)
Y_encoded = label_encoder.fit_transform(Y.values.reshape(-1, 1))

# Normalize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the dataset into train, validation, and test sets
X_train, X_temp, Y_train, Y_temp = train_test_split(X_scaled, Y_encoded, test_size=0.3, random_state=777)
X_val, X_test, Y_val, Y_test = train_test_split(X_temp, Y_temp, test_size=0.5, random_state=777)

# Reshape data for the neural network
X_train, X_val, X_test = X_train.T, X_val.T, X_test.T
Y_train, Y_val, Y_test = Y_train.T, Y_val.T, Y_test.T

In [3]:
print(X_train.shape)
print(Y_train.shape)

(18, 592)
(4, 592)


In [4]:
def model_architecture(X, num_classes):
    """
    Arguments:
    X -- input dataset of shape (input size, number of examples)
    num_classes -- total number of classes in the classification problem
    
    Returns:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer (equal to the number of classes)
    """
    ### START CODE HERE ### 
    n_x = X.shape[0] # size of input layer
    n_h = 10         # size of hidden layer (can be adjusted based on model complexity)
    n_y = 4 # size of output layer, equal to the number of classes
    ### END CODE HERE ###
    return (n_x, n_h, n_y)


In [5]:
def initialize_parameters(n_x, n_h, n_y):
    """
    Argument:
    n_x -- size of the input layer
    n_h -- size of the hidden layer
    n_y -- size of the output layer
    
    Returns:
    params -- python dictionary containing your parameters:
                    W1 -- weight matrix of shape (n_h, n_x)
                    b1 -- bias vector of shape (n_h, 1)
                    W2 -- weight matrix of shape (n_y, n_h)
                    b2 -- bias vector of shape (n_y, 1)
    """
    
    np.random.seed(2)

    
    ### START CODE HERE ### 
    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros((n_y, 1))
    ### END CODE HERE ###
    
    assert (W1.shape == (n_h, n_x))
    assert (b1.shape == (n_h, 1))
    assert (W2.shape == (n_y, n_h))
    assert (b2.shape == (n_y, 1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [6]:
def softmax(Z, axis=0):
    """
    Compute the softmax of matrix Z along the specified axis.

    Arguments:
    Z -- A numpy array of shape (n_classes, m)

    Returns:
    A -- Softmax of Z, same shape as Z
    """
    # Compute the exponential of Z element-wise
    e_Z = np.exp(Z - np.max(Z, axis=axis, keepdims=True))

    # Compute softmax values along the specified axis
    A = e_Z / np.sum(e_Z, axis=axis, keepdims=True)

    return A


In [7]:
def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters (output of initialization function)
    
    Returns:
    A2 -- The softmax output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    """
    # Retrieve each parameter from the dictionary "parameters"
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    
    # Implement Forward Propagation to calculate A2 (probabilities)
    Z1 = np.dot(W1, X) + b1
    A1 = np.tanh(Z1)  # tanh is often used for hidden layers
    Z2 = np.dot(W2, A1) + b2
    A2 = softmax(Z2)  # Using softmax at the output layer
    
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    return A2, cache


In [8]:
def compute_cost(A2, Y):
    """
    Arguments:
    A2 -- The softmax output of the second activation, of shape (n_y, number of examples), where n_y is the number of classes
    Y -- "true" labels vector of shape (n_y, number of examples), one-hot encoded

    Returns:
    cost -- categorical cross-entropy cost
    """
    
    m = Y.shape[1] # number of examples

    # Compute the categorical cross-entropy cost
    logprobs = np.multiply(Y, np.log(A2))
    cost = - np.sum(logprobs) / m
    
    cost = float(np.squeeze(cost))  # ensures cost is the dimension we expect. 

    assert(isinstance(cost, float))
    
    return cost


In [9]:
def backprop(parameters, cache, X, Y):
    """
    Arguments:
    parameters -- python dictionary containing our parameters 
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2".
    X -- input data
    Y -- "true" labels (one-hot encoded)
    
    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters
    """
    m = X.shape[1]
    
    # Retrieve W1 and W2 from the dictionary "parameters".
    W1 = parameters["W1"]
    W2 = parameters["W2"]
        
    # Retrieve A1 and A2 from dictionary "cache".
    A1 = cache["A1"]
    A2 = cache["A2"]
    
    # Backward propagation: calculate dW1, db1, dW2, db2. 
    dZ2 = A2 - Y  # Simplification due to softmax and categorical cross-entropy
    dW2 = (1/m) * np.dot(dZ2, A1.T)
    db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)
    dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2))  # Assuming tanh activation in hidden layer
    dW1 = (1/m) * np.dot(dZ1, X.T)
    db1 = (1/m) * np.sum(dZ1, axis=1, keepdims=True)
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads


In [10]:
def update(parameters, grads, learning_rate = 0.01):
    """
    Arguments:
    parameters -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients 
    learning_rate -- The learning rate
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
    """
    # Retrieve each parameter from the dictionary "parameters"
    ### START CODE HERE ### 
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    ### END CODE HERE ###
    
    # Retrieve each gradient from the dictionary "grads"
    ### START CODE HERE ### 
    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]
    ## END CODE HERE ###
    
    # Update rule for each parameter
    ### START CODE HERE ### 
    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2
    ### END CODE HERE ###
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [11]:
def NeuralNetwork(X, Y, n_h, num_iterations = 10000, learning_rate = 0.01, print_cost=False):
    """
    Arguments:
    X -- dataset
    Y -- labels 
    n_h -- size of the hidden layer
    num_iterations -- Number of iterations in gradient descent loop
    learning_rate -- The learning rate
    print_cost -- if True, print the cost every 1000 iterations
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to make predictions.
    """
    
    np.random.seed(3)
    n_x = model_architecture(X, Y)[0]
    n_y = model_architecture(X, Y)[2]
    
    # Initialize parameters
    ### START CODE HERE ### 
    parameters = initialize_parameters(n_x, n_h, n_y)
    ### END CODE HERE ###
    
    # Loop (gradient descent)

    for i in range(0, num_iterations):
         
        ### START CODE HERE ### 
        # Forward propagation. Inputs: "X, parameters". Outputs: "A2, cache".
        A2, cache = forward_propagation(X, parameters)
        
        # Cost function. Inputs: "A2, Y, parameters". Outputs: "cost".
        cost = compute_cost(A2, Y)
 
        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        grads = backprop(parameters, cache, X, Y)
 
        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        parameters =  update(parameters, grads, learning_rate)
        
        ### END CODE HERE ###
        
        # Print the cost every 100 iterations
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))

    return parameters

In [12]:
def predict(parameters, X):
    """
    Arguments:
    parameters -- python dictionary containing your parameters 
    X -- input data 
    
    Returns:
    predictions -- vector of predictions of our model
    """
    
    # Computes probabilities using forward propagation
    A2, cache = forward_propagation(X, parameters)

    # Convert probabilities to class predictions
    predictions = np.argmax(A2, axis=0)  # axis=0 as we're considering each column as an example

    return predictions


In [13]:
def run_experiment(n_h, num_iterations, learning_rate, X_train, Y_train, X_val, Y_val):
    parameters = NeuralNetwork(X_train, Y_train, n_h, num_iterations, learning_rate, print_cost=False)
    predictions_val = predict(parameters, X_val)
    accuracy = accuracy_score(np.argmax(Y_val, axis=0), predictions_val)
    return accuracy

# Initialize an empty list to store the results
results = []

# Experiment 1
acc1 = run_experiment(4, 10000, 0.01, X_train, Y_train, X_val, Y_val)
results.append(('Experiment 1', acc1))

# Experiment 2
acc2 = run_experiment(8, 15000, 0.01, X_train, Y_train, X_val, Y_val)
results.append(('Experiment 2', acc2))

# Experiment 3
acc3 = run_experiment(4, 10000, 0.005, X_train, Y_train, X_val, Y_val)
results.append(('Experiment 3', acc3))

# Experiment 4
acc4 = run_experiment(8, 15000, 0.005, X_train, Y_train, X_val, Y_val)
results.append(('Experiment 4', acc4))

# Print results
for exp_result in results:
    print(f"{exp_result[0]}: Accuracy = {exp_result[1]*100}%")


Experiment 1: Accuracy = 75.59055118110236%
Experiment 2: Accuracy = 81.10236220472441%
Experiment 3: Accuracy = 66.14173228346458%
Experiment 4: Accuracy = 77.16535433070865%


In [14]:
#use the hyperparameters of the best model above ^
parameters = NeuralNetwork(X_train, Y_train, 8, num_iterations=15000, learning_rate=0.01, print_cost=True)

Cost after iteration 0: 1.386517
Cost after iteration 100: 1.383564
Cost after iteration 200: 1.378010
Cost after iteration 300: 1.363870
Cost after iteration 400: 1.335744
Cost after iteration 500: 1.300696
Cost after iteration 600: 1.271036
Cost after iteration 700: 1.248848
Cost after iteration 800: 1.230650
Cost after iteration 900: 1.213228
Cost after iteration 1000: 1.194381
Cost after iteration 1100: 1.172599
Cost after iteration 1200: 1.147039
Cost after iteration 1300: 1.117445
Cost after iteration 1400: 1.084018
Cost after iteration 1500: 1.047318
Cost after iteration 1600: 1.008240
Cost after iteration 1700: 0.968110
Cost after iteration 1800: 0.928597
Cost after iteration 1900: 0.891234
Cost after iteration 2000: 0.856975
Cost after iteration 2100: 0.826145
Cost after iteration 2200: 0.798644
Cost after iteration 2300: 0.774168
Cost after iteration 2400: 0.752353
Cost after iteration 2500: 0.732840
Cost after iteration 2600: 0.715309
Cost after iteration 2700: 0.699479
Cost

## Check the accuracy on the training set

In [15]:
# Print accuracy
predictions = predict(parameters, X_train)
print("predicitino shape ",predictions.shape)
print("Y train shape", Y_train.shape)
# from sklearn.metrics import classification_report

# report = classification_report(Y_train, predictions)
# print("Classification Report:\n", report)

predicitino shape  (592,)
Y train shape (4, 592)


In [16]:
# Convert one-hot encoded labels to class labels
Y_train_labels = np.argmax(Y_train, axis=0)

# Now Y_train_labels should have the same shape as predictions
print("Converted Y_train shape:", Y_train_labels.shape)


Converted Y_train shape: (592,)


In [17]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
print("Training Error and other metrics: ")
# Calculate accuracy
accuracy = accuracy_score(Y_train_labels, predictions)
print(f"Accuracy: {accuracy}")

# Generate a classification report
report = classification_report(Y_train_labels, predictions)
print("Classification Report:\n", report)

# Generate a confusion matrix
conf_matrix = confusion_matrix(Y_train_labels, predictions)
print("Confusion Matrix:\n", conf_matrix)


Training Error and other metrics: 
Accuracy: 0.839527027027027
Classification Report:
               precision    recall  f1-score   support

           0       0.96      0.97      0.96       154
           1       0.74      0.72      0.73       158
           2       0.71      0.72      0.71       148
           3       0.95      0.98      0.96       132

    accuracy                           0.84       592
   macro avg       0.84      0.84      0.84       592
weighted avg       0.84      0.84      0.84       592

Confusion Matrix:
 [[149   1   2   2]
 [  2 113  40   3]
 [  3  37 106   2]
 [  1   1   1 129]]


## Check the accuracy on the test set

In [18]:
predictions_test = predict(parameters, X_test)
Y_test_labels = np.argmax(Y_test, axis=0)
print("Test Error and other metrics: ")
# Calculate accuracy
accuracy = accuracy_score(Y_test_labels, predictions_test)
print(f"Accuracy: {accuracy}")

# Generate a classification report
report = classification_report(Y_test_labels, predictions_test)
print("Classification Report:\n", report)

# Generate a confusion matrix
conf_matrix = confusion_matrix(Y_test_labels, predictions_test)
print("Confusion Matrix:\n", conf_matrix)


Test Error and other metrics: 
Accuracy: 0.8267716535433071
Classification Report:
               precision    recall  f1-score   support

           0       0.91      0.97      0.94        33
           1       0.61      0.54      0.57        26
           2       0.72      0.74      0.73        31
           3       0.97      0.97      0.97        37

    accuracy                           0.83       127
   macro avg       0.80      0.81      0.80       127
weighted avg       0.82      0.83      0.82       127

Confusion Matrix:
 [[32  0  0  1]
 [ 3 14  9  0]
 [ 0  8 23  0]
 [ 0  1  0 36]]


References:
- https://www.coursera.org/learn/neural-networks-deep-learning
- http://scs.ryerson.ca/~aharley/neural-networks/
- http://cs231n.github.io/neural-networks-case-study/
- https://archive.ics.uci.edu/ml/datasets.php