# Assignment1: Classification of Sonar dataset With one Hidden Layer

In this assignment, you will implement a neural network with one hidden layer from scratch using numpy operations to classify the UCI sonar dataset to Rock or Mine: https://www.kaggle.com/datasets/shrutimehta/nasa-asteroids-classification.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import sklearn.linear_model
import pandas as pd
from sklearn.model_selection import train_test_split

np.random.seed(10) 

## Load  and Prepare Dataset

In [2]:
from google.colab import files
uploaded = files.upload()

Saving sonar.all-data.csv to sonar.all-data.csv


Load the dataset into a dataframe and show the first few rows:

In [3]:
sonar_dataframe = pd.read_csv("sonar.all-data.csv", header=None)
sonar_dataframe.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
0,0.02,0.0371,0.0428,0.0207,0.0954,0.0986,0.1539,0.1601,0.3109,0.2111,...,0.0027,0.0065,0.0159,0.0072,0.0167,0.018,0.0084,0.009,0.0032,R
1,0.0453,0.0523,0.0843,0.0689,0.1183,0.2583,0.2156,0.3481,0.3337,0.2872,...,0.0084,0.0089,0.0048,0.0094,0.0191,0.014,0.0049,0.0052,0.0044,R
2,0.0262,0.0582,0.1099,0.1083,0.0974,0.228,0.2431,0.3771,0.5598,0.6194,...,0.0232,0.0166,0.0095,0.018,0.0244,0.0316,0.0164,0.0095,0.0078,R
3,0.01,0.0171,0.0623,0.0205,0.0205,0.0368,0.1098,0.1276,0.0598,0.1264,...,0.0121,0.0036,0.015,0.0085,0.0073,0.005,0.0044,0.004,0.0117,R
4,0.0762,0.0666,0.0481,0.0394,0.059,0.0649,0.1209,0.2467,0.3564,0.4459,...,0.0031,0.0054,0.0105,0.011,0.0015,0.0072,0.0048,0.0107,0.0094,R


How many rows and columns does this data set have:

In [4]:
print("Number of rows:", sonar_dataframe.shape[0])
print("Number of columns:", sonar_dataframe.shape[1])

Number of rows: 208
Number of columns: 61


Check the columns of the dataframe using info() function:

In [5]:
sonar_dataframe.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 208 entries, 0 to 207
Data columns (total 61 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       208 non-null    float64
 1   1       208 non-null    float64
 2   2       208 non-null    float64
 3   3       208 non-null    float64
 4   4       208 non-null    float64
 5   5       208 non-null    float64
 6   6       208 non-null    float64
 7   7       208 non-null    float64
 8   8       208 non-null    float64
 9   9       208 non-null    float64
 10  10      208 non-null    float64
 11  11      208 non-null    float64
 12  12      208 non-null    float64
 13  13      208 non-null    float64
 14  14      208 non-null    float64
 15  15      208 non-null    float64
 16  16      208 non-null    float64
 17  17      208 non-null    float64
 18  18      208 non-null    float64
 19  19      208 non-null    float64
 20  20      208 non-null    float64
 21  21      208 non-null    float64
 22  22

Convert the target column to 0 and 1:

In [6]:
sonar_dataframe[60]=sonar_dataframe[60].map({'R': 0,'M' :1 })

Convert the sonar_dataframe to numpy array using the values function:


In [7]:
sonar_np_array = sonar_dataframe.values

Split the dataset into  80\% train and 20\% validation usig the train_test_split command:

In [8]:
train, test = train_test_split(sonar_np_array,test_size=0.2)

split the last column as the label:

In [9]:
X_train = train[:,0:60].astype(float)
Y_train = train[:,60]

In [10]:
X_test = test[:,0:60].astype(float)
Y_test = test[:,60]

## Train a logistic Regression Model
Use sklearn to train a logistic regression model:

In [11]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Train a logistic regression model
logreg = LogisticRegression()
logreg.fit(X_train, Y_train)

# Make predictions on the test set
predictions = logreg.predict(X_test)


What is the accuracy of the model:

In [12]:
# Calculate the accuracy score
accuracy = accuracy_score(Y_test, predictions)
print(f"Accuracy: {accuracy}")

Accuracy: 0.7142857142857143


## Building a Neural Network Model
In this section, you will create an NN model with one hidden layer and a sigmoid function for the output layer. Use a tanh functiomn for the hidden layer activation. Use average cross entropy for the loss function.
Fill in the missing code wherever you see \#CODE HERE comment

In [13]:
def initialize_parameters(n_x, n_h, n_y):
    np.random.seed(2) 
    
    W1 = np.random.randn(n_h,n_x) * 0.01
    b1 = np.zeros((n_h,1))
    W2 = np.random.randn(n_y,n_h) * 0.01
    b2 = np.zeros((n_y,1))
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

In [14]:
def sigmoid(z):
    return 1/(1 + np.exp(-z))

In [15]:
def forward_propagation(X, parameters):
    
    #Computes the forward propagation for a neural network with two layers.
    
    #Arguments:
    #X -- input dataset (number of features, number of examples)
    #parameters -- dictionary containing the weights and biases for the two layers (W1, b1, W2, b2)
    
    #Returns:
    #A2 -- the output of the second layer
    #cache -- dictionary containing the values of the intermediate calculations
        
    # Retrieve the parameters
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    
    # Compute the first layer (linear transformation + activation)
    Z1 = np.dot(W1, X) + b1
    A1 = np.tanh(Z1)
    
    # Compute the second layer (linear transformation + activation)
    Z2 = np.dot(W2, A1) + b2
    A2 =  sigmoid(Z2) 
    
    # Store the intermediate calculations in a cache dictionary
    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}
    
    # Return the output of the second layer and the cache dictionary
    return A2, cache

In [16]:
def cross_entropy(Y_hat, Y, parameters):
    m = Y.shape[1] # number of example
    logprobs = logprobs = np.multiply(Y ,np.log(Y_hat)) + np.multiply((1-Y), np.log(1-Y_hat))
    cost = (-1/m) * np.sum(logprobs)
    cost = float(np.squeeze(cost)) 
                                    
    return cost

In [17]:
def backward_propagation(parameters, cache, X, Y):
    m = X.shape[0]

    # retrieve parameters from dictionary
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    
    # retrieve cached activations and linear combinations
    A1 = cache["A1"]
    A2 = cache["A2"]
    Z1 = cache["Z1"]
    Z2 = cache["Z2"]
    
    # compute derivatives using chain rule
    dZ2 = A2 - Y
    dW2 = (1/m) * np.dot(dZ2,A1.T)
    db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)

    # Calculate the derivative of the activation function of the hidden layer
    dA1 = np.dot(W2.T,dZ2)
    dZ1 = dA1 * (1 - np.power(np.tanh(Z1), 2))
    
    # Calculate the gradients for the first layer
    dW1 = (1/m) * np.dot(dZ1,X.T)
    db1 = (1/m) * np.sum(dZ1, axis=1, keepdims=True)
    
    # store derivatives in a dictionary
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

In [18]:
def update_parameters(parameters, grads, learning_rate):
    
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    
 
    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]
    

    W1 = W1 - learning_rate*dW1
    b1 = b1 - learning_rate*db1
    W2 = W2 - learning_rate*dW2
    b2 = b2 - learning_rate*db2

    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}
    
    return parameters

Below is the main function which puts all the previous functions together to train the model

In [19]:
def train_nn_model(X, Y, num_of_hidden_units, learning_rate, num_iterations = 10000, print_cost=False):
    
    input_size=X_train.shape[1] 
    num_of_output_units=1
    parameters = initialize_parameters(input_size, num_of_hidden_units, num_of_output_units)
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    
    # gradient descent Loop
    for i in range(0, num_iterations):
        A2, cache = forward_propagation(X, parameters)
        grads = backward_propagation(parameters, cache, X, Y)
        parameters = update_parameters(parameters, grads, learning_rate)
        if print_cost and i % 10000 == 0:
            cost = cross_entropy(A2, Y, parameters)
            print ("Cost after iteration %i: %f" %(i, cost))

    return parameters

Predict is the scoring function to get predictions for new instances using the trained model: 

In [20]:
def predict(parameters, X):
    A2, cache = forward_propagation(X, parameters)
    predictions = (A2 > 0.5)    
    return predictions


Train the model using the train_nn_model defined above with 5 units in the hidden layer

In [21]:
print(X_train.shape) # prints (166, 5)
print(Y_train.shape) # prints (166,)


(166, 60)
(166,)


In [22]:
X = X_train.T
Y = Y_train.reshape(1, -1)
num_of_hidden_units = 5
learning_rate = 0.1


In [41]:
num_iterations = 100000
parameters = train_nn_model(X, Y, num_of_hidden_units, learning_rate, num_iterations, print_cost=True)

Cost after iteration 0: 0.693185
Cost after iteration 10000: 0.002334
Cost after iteration 20000: 0.000887
Cost after iteration 30000: 0.000527
Cost after iteration 40000: 0.000370
Cost after iteration 50000: 0.000283
Cost after iteration 60000: 0.000228
Cost after iteration 70000: 0.000190
Cost after iteration 80000: 0.000163
Cost after iteration 90000: 0.000142


Use the predict function to generate the output for the X_test data. What is the accuracy of the model?

In [42]:
predictions = predict(parameters, X_test.T)
accuracy = accuracy_score(Y_test, predictions.flatten())
print("Accuracy:", accuracy)

Accuracy: 0.8571428571428571


In [43]:
print ('Accuracy: %d' % float((np.dot(Y_test,predictions.T) + np.dot(1-Y_test,1-predictions.T))/float(Y_test.size)*100) + '%')

Accuracy: 85%


## Tunning the Size of Hidden Layer

Run the following code to see which size for the hodden layer gives you the best performance

In [44]:
plt.figure(figsize=(16, 32))
hidden_layer_sizes = [1, 2, 3, 4, 5, 10, 20, 30,50]
for i, num_hidden_units in enumerate(hidden_layer_sizes):
    parameters = train_nn_model(X_train.T, np.expand_dims(Y_train,axis=0),num_hidden_units ,0.01, num_iterations = 100000)
    predictions = predict(parameters, X_test.T)
    accuracy = float((np.dot(Y_test,predictions.T) + np.dot(1-Y_test,1-predictions.T))/float(Y_test.size)*100)
    print ("Accuracy for {} hidden units: {} %".format(num_hidden_units, accuracy))

Accuracy for 1 hidden units: 76.19047619047619 %
Accuracy for 2 hidden units: 78.57142857142857 %
Accuracy for 3 hidden units: 78.57142857142857 %
Accuracy for 4 hidden units: 85.71428571428571 %
Accuracy for 5 hidden units: 85.71428571428571 %
Accuracy for 10 hidden units: 83.33333333333334 %
Accuracy for 20 hidden units: 83.33333333333334 %
Accuracy for 30 hidden units: 80.95238095238095 %
Accuracy for 50 hidden units: 80.95238095238095 %


<Figure size 1152x2304 with 0 Axes>

Which one was the best model?

* Accuracy for 1 hidden units: 76.19047619047619 %
* Accuracy for 2 hidden units: 78.57142857142857 %
* Accuracy for 3 hidden units: 78.57142857142857 %
* Accuracy for 4 hidden units: 85.71428571428571 %
* Accuracy for 5 hidden units: 85.71428571428571 %
* Accuracy for 10 hidden units: 83.33333333333334 %
* Accuracy for 20 hidden units: 83.33333333333334 %
* Accuracy for 30 hidden units: 80.95238095238095 %
* Accuracy for 50 hidden units: 80.95238095238095 %

With 100,000 iterations set for all, model with 4 hidden units seems to achieve the highest accuracy with lowest processing cost.