# **Dropout Regularization In Deep Neural Network**

- This is a dataset that describes sonar chirp returns bouncing off different services. The 60 input variables are the strength of the returns at different angles. It is a binary classification problem that requires a model to differentiate rocks from metal cylinders.

- Dataset information: https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Sonar,+Mines+vs.+Rocks) Download it from here: https://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data

In [27]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import tensorflow as tf
from tensorflow import keras 

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [20]:
df = pd.read_csv("./sonar_dataset.csv", header=None) # The dataset doesn't have headers thats why header = None
df.sample(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,51,52,53,54,55,56,57,58,59,60
152,0.0131,0.0201,0.0045,0.0217,0.023,0.0481,0.0742,0.0333,0.1369,0.2079,...,0.0168,0.0086,0.0045,0.0062,0.0065,0.003,0.0066,0.0029,0.0053,M
6,0.0317,0.0956,0.1321,0.1408,0.1674,0.171,0.0731,0.1401,0.2083,0.3513,...,0.0201,0.0248,0.0131,0.007,0.0138,0.0092,0.0143,0.0036,0.0103,R
38,0.0123,0.0022,0.0196,0.0206,0.018,0.0492,0.0033,0.0398,0.0791,0.0475,...,0.0125,0.0134,0.0026,0.0038,0.0018,0.0113,0.0058,0.0047,0.0071,R
104,0.0307,0.0523,0.0653,0.0521,0.0611,0.0577,0.0665,0.0664,0.146,0.2792,...,0.0321,0.0189,0.0137,0.0277,0.0152,0.0052,0.0121,0.0124,0.0055,M
136,0.1088,0.1278,0.0926,0.1234,0.1276,0.1731,0.1948,0.4262,0.6828,0.5761,...,0.0455,0.0213,0.0082,0.0124,0.0167,0.0103,0.0205,0.0178,0.0187,M


In [21]:
print(df.shape) # Shape identification
print(df.isna().sum()) # Check if NANs
print(df.columns) # Print Column names 

(208, 61)
0     0
1     0
2     0
3     0
4     0
     ..
56    0
57    0
58    0
59    0
60    0
Length: 61, dtype: int64
Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
       36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
       54, 55, 56, 57, 58, 59, 60],
      dtype='int64')


In [22]:
df[60].value_counts() # 60 = target column M = Metal , R = Rock

60
M    111
R     97
Name: count, dtype: int64

In [23]:
X = df.drop(60, axis = 'columns') # instead of 1 we can add 'columns'"
y = df[60]

# Remove from the training set the R and M 
y = pd.get_dummies(y, drop_first = True)
y = y.astype(int)
y.sample(5) # R = 1 and M = 0

Unnamed: 0,R
126,0
207,0
7,1
93,1
1,1


In [24]:
y.value_counts()

R
0    111
1     97
Name: count, dtype: int64

In [25]:
# Split Data 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 1)

In [26]:
X_train.shape, X_test.shape

((156, 60), (52, 60))

# Artificial Neural Network (ANN)

In [32]:
model = keras.Sequential([
    keras.layers.Dense(60, input_dim = 60, activation= 'relu'),
    keras.layers.Dense(30,  activation= 'relu'),
    keras.layers.Dense(15,  activation= 'relu'),
    keras.layers.Dense(1,activation= 'sigmoid') # 1 neuron because we want a binary classification problem. For the same reason the activation is sigmoid
])

model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])


model.fit(X_train, y_train, epochs = 100, batch_size = 8)

Epoch 1/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.4732 - loss: 0.6913
Epoch 2/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6457 - loss: 0.6665 
Epoch 3/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6074 - loss: 0.6568 
Epoch 4/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6548 - loss: 0.6288 
Epoch 5/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6503 - loss: 0.6078 
Epoch 6/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6706 - loss: 0.5766 
Epoch 7/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7252 - loss: 0.5536  
Epoch 8/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7415 - loss: 0.5304 
Epoch 9/100
[1m20/20[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x1807b9726d0>

In [33]:
model.evaluate(X_test, y_test) # We might have accuracy = 1, but 78% prediced accuracy

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7163 - loss: 1.1078  


[0.9286364912986755, 0.7307692170143127]

In [36]:
y_pred = model.predict(X_test).reshape(-1)
print(y_pred[:10])

# Round the values to nearest integer ie. 0 or 1, because we used a sigmoid activation, the values are between 0 to 1
y_pred = np.round(y_pred)
print(y_pred[:10])

print(y_test[:10])

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step 
[5.9991862e-07 9.7589672e-01 9.9790227e-01 4.4954359e-05 9.9999899e-01
 9.9999559e-01 8.6287651e-03 9.9999994e-01 3.1974705e-05 9.9999982e-01]
[0. 1. 1. 0. 1. 1. 0. 1. 0. 1.]
     R
186  0
155  0
165  0
200  0
58   1
34   1
151  0
18   1
202  0
62   1


In [38]:
# Display classification metrics (model metrics)
from sklearn.metrics import confusion_matrix, classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.70      0.85      0.77        27
           1       0.79      0.60      0.68        25

    accuracy                           0.73        52
   macro avg       0.74      0.73      0.72        52
weighted avg       0.74      0.73      0.73        52



# Changing Now our model with a dropout layer

In [39]:
model = keras.Sequential([
    keras.layers.Dense(60, input_dim = 60, activation= 'relu'),
    keras.layers.Dropout(0.5),  # This layer drops 50% of our neurons
    keras.layers.Dense(30,  activation= 'relu'),
    keras.layers.Dropout(0.5), 
    keras.layers.Dense(15,  activation= 'relu'),
    keras.layers.Dropout(0.5), 
    keras.layers.Dense(1,activation= 'sigmoid') # 1 neuron because we want a binary classification problem. For the same reason the activation is sigmoid
])

model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])


model.fit(X_train, y_train, epochs = 100, batch_size = 8)

Epoch 1/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.5834 - loss: 0.6598
Epoch 2/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.5942 - loss: 0.6852
Epoch 3/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.4635 - loss: 0.7456
Epoch 4/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.4760 - loss: 0.7445
Epoch 5/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.6492 - loss: 0.6577
Epoch 6/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.5714 - loss: 0.7139
Epoch 7/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.5387 - loss: 0.7095
Epoch 8/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.5695 - loss: 0.6944
Epoch 9/100
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x1807b9a26d0>

We can see that the training set accuract is lower 

In [40]:
model.evaluate(X_test, y_test)

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.7628 - loss: 0.4169  


[0.38432276248931885, 0.7692307829856873]

In [41]:
y_pred = model.predict(X_test).reshape(-1)
print(y_pred[:10])
# Round the values to nearest integer ie. 0 or 1, because we used a sigmoid activation, the values are between 0 to 1
y_pred = np.round(y_pred)
print(y_pred[:10])
print(y_test[:10])

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 70ms/step
[1.9320838e-04 7.4747699e-01 7.6553971e-01 2.1801354e-02 9.9922186e-01
 8.7637490e-01 4.9905694e-01 9.9942636e-01 3.8489480e-02 9.9963582e-01]
[0. 1. 1. 0. 1. 1. 0. 1. 0. 1.]
     R
186  0
155  0
165  0
200  0
58   1
34   1
151  0
18   1
202  0
62   1


The above indicates that we might have lower training set accuracy, but the prediction statistics arer higher than the model without a dropout layer 

In [42]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.73      0.89      0.80        27
           1       0.84      0.64      0.73        25

    accuracy                           0.77        52
   macro avg       0.78      0.76      0.76        52
weighted avg       0.78      0.77      0.77        52



From the classification report above we can clearly see that the f1-score, which is considered to be the most important metric in our application is higher for the model with dropout layers.