**Machine Learning Lab - CSE 432**

# 09 Neural Networks

**Neural networks** are a fascinating and complex aspect of machine learning, often compared to the human brain for their ability to learn from data. They consist of interconnected nodes or neurons that process information and make decisions. By adjusting weights and biases through a process known as training, neural networks can improve their accuracy over time, making them powerful tools for tasks like speech and image recognition. As they evolve, neural networks continue to push the boundaries of what machines can learn and accomplish

In [None]:
import pandas as pd
import numpy as np

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
df = pd.read_csv("/content/drive/MyDrive/ML_Lab/diabetes (1).csv")
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
df.shape

(768, 9)

In the next step, we divide the attributes into two parts: training features and label. Features (X) will be all the columns except **Outcome**, **Outcome** itself will be the label(Y).

In [None]:
X = df.drop(['Outcome'], axis=1)
y = df['Outcome']
X.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


In classification, data is usually divided into train and test sets. We will do the same here. For that purpose, Scikit-learn libraries train_test_split() function can be used. We will take 80% data for training, 20% data for testing. The split will be random. It should be noted that, there are other ways to split dataset, but we will use the percantage method.

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(614, 8) (154, 8) (614,) (154,)


In the last classes, we have tried different classifers like deision trees, support vector machine(SVM) and naive bayes. This time, we will use **artificial neural network (ANN)**. Artificial neural networks have an input layer, an output layer, and multiple hidden layers. The tensorflow and keras libraries can be used to create ANN.

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense

At first, we need to create the model. For this task, we will use a simple multilayer perceptron. **Next time you change the model, start executing code from here.**

In [None]:
def create_model():
  # This line creates a blank sequential neural network
  model = Sequential()

  # Adds a hidden layer with 8 neurons
  # We are giving it 8 features, so input_dim=8
  # 'relu' will be our activation function. We could also use 'sigmoid(o and 1)', 'softmax(multyclass)', 'relu', etc.
  model.add(Input(shape=(8,)))
  model.add(Dense(2, activation='relu'))

  # Add more layers if you want
  model.add(Dense(4, activation='relu'))
  model.add(Dense(16, activation='relu'))

  # The output could be contained in just one neuron, as it is basically 0 or 1
  # We will add another Dense layer for output with sigmoid activation
  model.add(Dense(1, activation='sigmoid'))

  # The model now needs to be compiled
  model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
  return model

In [None]:
# Training step
model = create_model()
model.fit(X_train, y_train, epochs=100, batch_size=16, validation_split=0.2, verbose=1)

Epoch 1/100
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 18ms/step - accuracy: 0.3411 - loss: 3.6497 - val_accuracy: 0.3577 - val_loss: 0.8523
Epoch 2/100
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.4527 - loss: 0.7324 - val_accuracy: 0.6098 - val_loss: 0.6969
Epoch 3/100
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.6523 - loss: 0.6625 - val_accuracy: 0.6098 - val_loss: 0.6847
Epoch 4/100
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 24ms/step - accuracy: 0.6622 - loss: 0.6513 - val_accuracy: 0.6098 - val_loss: 0.6868
Epoch 5/100
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 13ms/step - accuracy: 0.6757 - loss: 0.6412 - val_accuracy: 0.6098 - val_loss: 0.6851
Epoch 6/100
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - accuracy: 0.6507 - loss: 0.6591 - val_accuracy: 0.6098 - val_loss: 0.6870
Epoch 7/100
[1m31/31[0m [32m

<keras.src.callbacks.history.History at 0x7b0671b45040>

Our model is ready(yes, its THAT easy). But now, we need to check how accurate it is. To do that, we will first predict the label for the data stored in X_test. Then compare that prediction with the original label stored in Y_test.

In [None]:
from sklearn.metrics import classification_report

y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int).flatten().tolist() # Convert probabilities to binary predictions

print(classification_report(y_test, y_pred))

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step 
              precision    recall  f1-score   support

           0       0.64      1.00      0.78        99
           1       0.00      0.00      0.00        55

    accuracy                           0.64       154
   macro avg       0.32      0.50      0.39       154
weighted avg       0.41      0.64      0.50       154



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Now, let's try to improve the performance. We will start with binning the data.

In [None]:
from sklearn.preprocessing import KBinsDiscretizer

# Initialize KBinsDiscretizer
discretizer = KBinsDiscretizer(n_bins=5, encode='ordinal', strategy='quantile')

# Fit and transform the 'Age' column in X_train
X_train['Age_bin'] = discretizer.fit_transform(X_train[['Age']])
X_train = X_train.drop('Age', axis=1)

# Transform the 'Age' column in X_test using the same bins
X_test['Age_bin'] = discretizer.transform(X_test[['Age']])
X_test = X_test.drop('Age', axis=1)

X_train.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age_bin
60,2,84,0,0,0,0.0,0.304,0.0
618,9,112,82,24,0,28.2,1.282,4.0
346,1,139,46,19,83,28.7,0.654,0.0
294,0,161,50,0,0,21.9,0.254,4.0
231,6,134,80,37,370,46.2,0.238,4.0


In [None]:
model = create_model()
model.fit(X_train, y_train, epochs=200, batch_size=16, validation_split=0.2, verbose=1)

y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int).flatten().tolist()
print(classification_report(y_test, y_pred))

Epoch 1/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 12ms/step - accuracy: 0.6210 - loss: 1.0035 - val_accuracy: 0.6260 - val_loss: 0.6789
Epoch 2/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6095 - loss: 0.7076 - val_accuracy: 0.6585 - val_loss: 0.6772
Epoch 3/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6653 - loss: 0.6730 - val_accuracy: 0.6585 - val_loss: 0.6688
Epoch 4/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6556 - loss: 0.6711 - val_accuracy: 0.6423 - val_loss: 0.6756
Epoch 5/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6461 - loss: 0.6632 - val_accuracy: 0.6341 - val_loss: 0.6727
Epoch 6/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6697 - loss: 0.6536 - val_accuracy: 0.6260 - val_loss: 0.6593
Epoch 7/200
[1m31/31[0m [32m━━

Let's try by normalizing.

In [None]:
# Normalize the data
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_train = pd.DataFrame(X_train, columns=X_test.columns)

X_test = scaler.transform(X_test)
X_test = pd.DataFrame(X_test, columns=X_train.columns)

X_train.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age_bin
0,0.117647,0.422111,0.0,0.0,0.0,0.0,0.096499,0.0
1,0.529412,0.562814,0.672131,0.380952,0.0,0.420268,0.514091,1.0
2,0.058824,0.698492,0.377049,0.301587,0.098109,0.42772,0.245944,0.0
3,0.0,0.809045,0.409836,0.0,0.0,0.326379,0.075149,1.0
4,0.352941,0.673367,0.655738,0.587302,0.437352,0.688525,0.068318,1.0


In [None]:
model = create_model()
model.fit(X_train, y_train, epochs=200, batch_size=16, validation_split=0.2, verbose=1)

y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int).flatten().tolist()
print(classification_report(y_test, y_pred))

Epoch 1/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 12ms/step - accuracy: 0.6413 - loss: 0.6791 - val_accuracy: 0.6098 - val_loss: 0.6631
Epoch 2/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6374 - loss: 0.6458 - val_accuracy: 0.6098 - val_loss: 0.6580
Epoch 3/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6622 - loss: 0.6208 - val_accuracy: 0.6098 - val_loss: 0.6513
Epoch 4/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6869 - loss: 0.5816 - val_accuracy: 0.6098 - val_loss: 0.6356
Epoch 5/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6598 - loss: 0.5872 - val_accuracy: 0.6098 - val_loss: 0.6253
Epoch 6/200
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6596 - loss: 0.5965 - val_accuracy: 0.6098 - val_loss: 0.6231
Epoch 7/200
[1m31/31[0m [32m━━



[1m1/5[0m [32m━━━━[0m[37m━━━━━━━━━━━━━━━━[0m [1m0s[0m 81ms/step



[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 22ms/step
              precision    recall  f1-score   support

           0       0.81      0.75      0.78        99
           1       0.60      0.69      0.64        55

    accuracy                           0.73       154
   macro avg       0.71      0.72      0.71       154
weighted avg       0.74      0.73      0.73       154



One hot encoding

In [None]:
import pandas as pd
# Perform one-hot encoding on 'Age_bin'
X_train = pd.get_dummies(X_train, columns=['Age_bin'], prefix='Age')
X_test = pd.get_dummies(X_test, columns=['Age_bin'], prefix='Age')

# Align columns in X_train and X_test to handle potential missing columns after one-hot encoding
X_train, X_test = X_train.align(X_test, join='outer', axis=1, fill_value=0)

X_train.head()


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age_0.0,Age_0.25,Age_0.5,Age_0.75,Age_1.0
0,0.117647,0.422111,0.0,0.0,0.0,0.0,0.096499,True,False,False,False,False
1,0.529412,0.562814,0.672131,0.380952,0.0,0.420268,0.514091,False,False,False,False,True
2,0.058824,0.698492,0.377049,0.301587,0.098109,0.42772,0.245944,True,False,False,False,False
3,0.0,0.809045,0.409836,0.0,0.0,0.326379,0.075149,False,False,False,False,True
4,0.352941,0.673367,0.655738,0.587302,0.437352,0.688525,0.068318,False,False,False,False,True


In [None]:
X_train.shape

(614, 12)

As one hot encoding changes the shape of the dataset, we need to use another
ANN architecutre.

In [None]:
def create_model_2():
  # This line creates a blank sequential neural network
  model = Sequential()

  # Adds a hidden layer with 8 neurons
  # We are giving it 8 features, so input_dim=8
  # 'relu' will be our activation function. We could also use 'sigmoid', 'softmax', 'selu', etc.
  model.add(Input(shape=(12,)))
  model.add(Dense(16, activation='relu'))

  # Add more layers if you want
  model.add(Dense(16, activation='relu'))
  model.add(Dense(16, activation='relu'))

  # The output could be contained in just one neuron, as it is basically 0 or 1
  # We will add another Dense layer for output with sigmoid activation
  model.add(Dense(1, activation='sigmoid'))

  # The model now needs to be compiled
  model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
  return model

We will also implement early stopping now

In [None]:
from tensorflow.keras.callbacks import EarlyStopping

model = create_model_2()
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

model.fit(
    X_train, y_train,
    epochs=300,
    batch_size=16,
    validation_split=0.2,
    verbose=1,
    callbacks=[early_stopping]
)

y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int).flatten().tolist()
print(classification_report(y_test, y_pred))

Epoch 1/300
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 11ms/step - accuracy: 0.5397 - loss: 0.6888 - val_accuracy: 0.6098 - val_loss: 0.6740
Epoch 2/300
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6555 - loss: 0.6608 - val_accuracy: 0.6098 - val_loss: 0.6588
Epoch 3/300
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.6765 - loss: 0.6269 - val_accuracy: 0.6098 - val_loss: 0.6518
Epoch 4/300
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6370 - loss: 0.6217 - val_accuracy: 0.6098 - val_loss: 0.6433
Epoch 5/300
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6698 - loss: 0.5941 - val_accuracy: 0.6098 - val_loss: 0.6398
Epoch 6/300
[1m31/31[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6799 - loss: 0.5785 - val_accuracy: 0.6504 - val_loss: 0.6314
Epoch 7/300
[1m31/31[0m [32m━━

Let's apply SMOTE to remove data imbalance

In [None]:
from imblearn.over_sampling import SMOTE

smote = SMOTE(random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

y_train_resampled.value_counts()

Unnamed: 0_level_0,count
Outcome,Unnamed: 1_level_1
0,401
1,401


In [None]:
model = create_model_2()
model.fit(X_train_resampled, y_train_resampled, epochs=100, batch_size=16, validation_split=0.2, verbose=1)

y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int).flatten().tolist()
print(classification_report(y_test, y_pred))

Epoch 1/100
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 15ms/step - accuracy: 0.6681 - loss: 0.6637 - val_accuracy: 0.0000e+00 - val_loss: 0.9392
Epoch 2/100
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6362 - loss: 0.6235 - val_accuracy: 0.0000e+00 - val_loss: 0.9235
Epoch 3/100
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.6264 - loss: 0.6251 - val_accuracy: 0.1118 - val_loss: 0.9116
Epoch 4/100
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.6775 - loss: 0.5915 - val_accuracy: 0.4099 - val_loss: 0.8773
Epoch 5/100
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7014 - loss: 0.5779 - val_accuracy: 0.4472 - val_loss: 0.8450
Epoch 6/100
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.7107 - loss: 0.5709 - val_accuracy: 0.5466 - val_loss: 0.7681
Epoch 7/100
[1m41/41[0m

Designing neural networks is an art and science, both. Create your own architecture and see if you can go above 85%. This time, we will also use Dropout layers to tackle overfitting.

In [None]:
from tensorflow.keras.layers import Dropout

def create_model_2():
  # This line creates a blank sequential neural network
  model = Sequential()

  # Adds a hidden layer with 16 neurons
  # We are giving it 12 features, so input_shape=(12,)
  # 'relu' will be our activation function.
  model.add(Input(shape=(12,)))
  model.add(Dense(32, activation='relu'))
  model.add(Dropout(0.3)) # Added Dropout layer

  # Add more layers
  model.add(Dense(16, activation='relu'))
  model.add(Dropout(0.3)) # Added Dropout layer
  model.add(Dense(8, activation='relu'))
  model.add(Dropout(0.3)) # Added Dropout layer


  # The output could be contained in just one neuron, as it is basically 0 or 1
  # We will add another Dense layer for output with sigmoid activation
  model.add(Dense(1, activation='sigmoid'))

  # The model now needs to be compiled
  model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
  return model

model = create_model_2()
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True
)

# Convert boolean columns to integer before converting to numpy
X_train_resampled = X_train_resampled.astype('float32')
X_test = X_test.astype('float32')


model.fit(
    X_train_resampled.to_numpy(), y_train_resampled,
    epochs=200,
    batch_size=16,
    validation_split=0.2,
    verbose=1,
    callbacks=[early_stopping]
)

y_pred_prob = model.predict(X_test)
y_pred = (y_pred_prob > 0.5).astype(int).flatten().tolist()

print(classification_report(y_test, y_pred))

Epoch 1/200
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 9ms/step - accuracy: 0.5641 - loss: 0.6868 - val_accuracy: 0.0000e+00 - val_loss: 0.7596
Epoch 2/200
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.6043 - loss: 0.6715 - val_accuracy: 0.0000e+00 - val_loss: 0.8212
Epoch 3/200
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.6416 - loss: 0.6510 - val_accuracy: 0.0000e+00 - val_loss: 0.8635
Epoch 4/200
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6201 - loss: 0.6447 - val_accuracy: 0.0000e+00 - val_loss: 0.8813
Epoch 5/200
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.6360 - loss: 0.6539 - val_accuracy: 0.0000e+00 - val_loss: 0.8763
Epoch 6/200
[1m41/41[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - accuracy: 0.6129 - loss: 0.6548 - val_accuracy: 0.0000e+00 - val_loss: 0.9105
[1m5/5[0

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
