# Activity 1.2 : Training Neural Networks


#### Objective(s):

This activity aims to demonstrate how to train neural networks using keras

#### Intended Learning Outcomes (ILOs):
* Demonstrate how to build and train neural networks 
* Demonstrate how to evaluate and plot the model using training and validation loss


#### Resources:
* Jupyter Notebook

CI Pima Diabetes Dataset

* pima-indians-diabetes.csv


#### Procedures

Load the necessary libraries 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, precision_recall_curve, roc_auc_score, roc_curve, accuracy_score
from sklearn.ensemble import RandomForestClassifier

import seaborn as sns

%matplotlib inline

In [None]:
## Import Keras objects for Deep Learning

from keras.models  import Sequential
from keras.layers import Input, Dense, Flatten, Dropout, BatchNormalization
from keras.optimizers import Adam, SGD, RMSprop

Load the dataset

In [None]:

filepath = "pima-indians-diabetes.csv"
names = ["times_pregnant", "glucose_tolerance_test", "blood_pressure", "skin_thickness", "insulin", 
         "bmi", "pedigree_function", "age", "has_diabetes"]
diabetes_df = pd.read_csv(filepath, names=names)

Check the top 5 samples of the data

In [None]:

print(diabetes_df.shape)
diabetes_df.sample(5)

In [None]:
diabetes_df.dtypes

In [None]:
X = diabetes_df.iloc[:, :-1].values
y = diabetes_df["has_diabetes"].values

Split the data to Train, and Test (75%, 25%)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=11111)

In [None]:
np.mean(y), np.mean(1-y)

Build a single hidden layer neural network using 12 nodes. 
Use the sequential model with single layer network and input shape to 8. 



Normalize the data

In [None]:
normalizer = StandardScaler()
X_train_norm = normalizer.fit_transform(X_train)
X_test_norm = normalizer.transform(X_test)

Define the model:
* Input size is 8-dimensional
* 1 hidden layer, 12 hidden nodes, sigmoid activation 
* Final layer with one node and sigmoid activation (standard for binary classification)

In [None]:

model  = Sequential([
    Dense(12, input_shape=(8,), activation="relu"),
    Dense(1, activation="sigmoid")
])

View the model summary 

In [None]:

model.summary()

Train the model 
* Compile the model with optimizer, loss function and metrics
* Use the fit function to return the run history. 


In [None]:

model.compile(SGD(lr = .003), "binary_crossentropy", metrics=["accuracy"])
run_hist_1 = model.fit(X_train_norm, y_train, validation_data=(X_test_norm, y_test), epochs=200)


In [None]:
## Like we did for the Random Forest, we generate two kinds of predictions
#  One is a hard decision, the other is a probabilitistic score.

y_pred_class_nn_1 = model.predict(X_test_norm)
y_pred_prob_nn_1 = model.predict(X_test_norm)

In [None]:
y_pred_class_nn_1[:10]

In [None]:
y_pred_prob_nn_1[:10]

Create the plot_roc function

In [None]:
def plot_roc(y_test, y_pred, model_name):
    fpr, tpr, thr = roc_curve(y_test, y_pred)
    fig, ax = plt.subplots(figsize=(8, 8))
    ax.plot(fpr, tpr, 'k-')
    ax.plot([0, 1], [0, 1], 'k--', linewidth=.5)  # roc curve for random model
    ax.grid(True)
    ax.set(title='ROC Curve for {} on PIMA diabetes problem'.format(model_name),
           xlim=[-0.01, 1.01], ylim=[-0.01, 1.01])



Evaluate the model performance and plot the ROC CURVE

In [None]:

print('accuracy is {:.3f}'.format(accuracy_score(y_test,np.round(y_pred_class_nn_1))))
print('roc-auc is {:.3f}'.format(roc_auc_score(y_test,y_pred_prob_nn_1)))

plot_roc(y_test, y_pred_prob_nn_1, 'NN')

 Plot the training loss and the validation loss over the different epochs and see how it looks

In [None]:
run_hist_1.history.keys()

In [None]:
fig, ax = plt.subplots()
ax.plot(run_hist_1.history["loss"],'r', marker='.', label="Train Loss")
ax.plot(run_hist_1.history["val_loss"],'b', marker='.', label="Validation Loss")
ax.legend()

What is your interpretation about the result of the train and validation loss?

The training and validation loss both decrease steadily over time, indicating that the model is learning meaningful patterns from the data. The validation loss remains slightly higher than the training loss, which suggests mild overfitting, but the gap is small and stable, indicating that the model still generalizes reasonably well to unseen data.

#### Supplementary Activity

* Build a model with two hidden layers, each with 6 nodes
* Use the "relu" activation function for the hidden layers, and "sigmoid" for the final layer
* Use a learning rate of .003 and train for 1500 epochs
* Graph the trajectory of the loss functions, accuracy on both train and test set
* Plot the roc curve for the predictions
* Use different learning rates, numbers of epochs, and network structures. 
* Plot the results of training and validation loss using different learning rates, number of epocgs and network structures
* Interpret your result

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# =============================
# Utilities
# =============================
def sigmoid(x):
    return 1/(1+np.exp(-x))

def relu(x):
    return np.maximum(0, x)

def relu_grad(x):
    return (x > 0).astype(float)

def logloss(y_true, y_pred, eps=1e-12):
    y_pred = np.clip(y_pred, eps, 1-eps)
    return -np.mean(y_true*np.log(y_pred) + (1-y_true)*np.log(1-y_pred))

def accuracy(y_true, y_pred):
    return np.mean((y_pred >= 0.5) == (y_true == 1))

def train_test_split(X, y, test_size=0.25, seed=1241):
    rng = np.random.default_rng(seed)
    idx = np.arange(len(X))
    rng.shuffle(idx)
    cut = int(len(X) * (1 - test_size))
    tr, te = idx[:cut], idx[cut:]
    return X[tr], X[te], y[tr], y[te]

# =============================
# ROC (manual)
# =============================
def roc_curve_manual(y_true, y_score, num_thresh=200):
    thresholds = np.linspace(1, 0, num_thresh)
    tpr_list, fpr_list = [], []
    P = np.sum(y_true == 1)
    N = np.sum(y_true == 0)
    for t in thresholds:
        y_hat = (y_score >= t)
        TP = np.sum((y_hat == 1) & (y_true == 1))
        FP = np.sum((y_hat == 1) & (y_true == 0))
        tpr = TP / P if P else 0.0
        fpr = FP / N if N else 0.0
        tpr_list.append(tpr)
        fpr_list.append(fpr)
    return np.array(fpr_list), np.array(tpr_list)

def auc_trapz(x, y):
    order = np.argsort(x)
    return np.trapz(y[order], x[order])

# =============================
# 2-hidden-layer MLP
# =============================
def init_params(input_dim, h1, h2, seed=1241):
    rng = np.random.default_rng(seed)
    W1 = rng.normal(0, np.sqrt(2/input_dim), size=(input_dim, h1))
    b1 = np.zeros((1, h1))
    W2 = rng.normal(0, np.sqrt(2/h1), size=(h1, h2))
    b2 = np.zeros((1, h2))
    W3 = rng.normal(0, np.sqrt(1/h2), size=(h2, 1))
    b3 = np.zeros((1, 1))
    return W1, b1, W2, b2, W3, b3

def forward(X, W1, b1, W2, b2, W3, b3):
    Z1 = X @ W1 + b1
    A1 = relu(Z1)
    Z2 = A1 @ W2 + b2
    A2 = relu(Z2)
    Z3 = A2 @ W3 + b3
    Yhat = sigmoid(Z3)
    cache = (X, Z1, A1, Z2, A2, Z3, Yhat)
    return Yhat, cache

def backward(y, cache, W2, W3):
    X, Z1, A1, Z2, A2, Z3, Yhat = cache
    N = X.shape[0]
    y = y.reshape(-1,1)

    dZ3 = (Yhat - y) / N
    dW3 = A2.T @ dZ3
    db3 = np.sum(dZ3, axis=0, keepdims=True)

    dA2 = dZ3 @ W3.T
    dZ2 = dA2 * relu_grad(Z2)
    dW2 = A1.T @ dZ2
    db2 = np.sum(dZ2, axis=0, keepdims=True)

    dA1 = dZ2 @ W2.T
    dZ1 = dA1 * relu_grad(Z1)
    dW1 = X.T @ dZ1
    db1 = np.sum(dZ1, axis=0, keepdims=True)

    return dW1, db1, dW2, db2, dW3, db3

def fit_mlp(X_train, y_train, X_test, y_test,
            h1=6, h2=6, lr=0.003, epochs=1500,
            seed=1241, print_every=300):

    W1,b1,W2,b2,W3,b3 = init_params(X_train.shape[1], h1, h2, seed)

    tr_loss, te_loss = [], []
    tr_acc, te_acc = [], []

    for ep in range(1, epochs+1):
        yhat_tr, cache = forward(X_train, W1,b1,W2,b2,W3,b3)

        yhat_tr_flat = yhat_tr.reshape(-1)
        L_tr = logloss(y_train, yhat_tr_flat)
        A_tr = accuracy(y_train, yhat_tr_flat)

        dW1,db1,dW2,db2,dW3,db3 = backward(y_train, cache, W2, W3)

        W1 -= lr*dW1; b1 -= lr*db1
        W2 -= lr*dW2; b2 -= lr*db2
        W3 -= lr*dW3; b3 -= lr*db3

        yhat_te, _ = forward(X_test, W1,b1,W2,b2,W3,b3)
        yhat_te_flat = yhat_te.reshape(-1)

        L_te = logloss(y_test, yhat_te_flat)
        A_te = accuracy(y_test, yhat_te_flat)

        tr_loss.append(L_tr); te_loss.append(L_te)
        tr_acc.append(A_tr); te_acc.append(A_te)

        if ep % print_every == 0 or ep == 1:
            print(f"epoch {ep} | train loss {L_tr:.4f} acc {A_tr:.4f} | "
                  f"val loss {L_te:.4f} acc {A_te:.4f}")

    return (W1,b1,W2,b2,W3,b3), {
        "train_loss": np.array(tr_loss),
        "val_loss": np.array(te_loss),
        "train_acc": np.array(tr_acc),
        "val_acc": np.array(te_acc),
    }

def predict_proba(X, params):
    W1,b1,W2,b2,W3,b3 = params
    yhat, _ = forward(X, W1,b1,W2,b2,W3,b3)
    return yhat.reshape(-1)

# =============================
# DATA (assumes x_mat_full, y already exist)
# =============================
X = x_mat_full
y_vec = y.reshape(-1).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y_vec)

# =============================
# BASE MODEL
# =============================
params, hist = fit_mlp(
    X_train, y_train, X_test, y_test,
    h1=6, h2=6, lr=0.003, epochs=1500
)

# Loss plot
plt.figure(figsize=(10,4))
plt.plot(hist["train_loss"], label="Train Loss")
plt.plot(hist["val_loss"], label="Validation Loss")
plt.legend(); plt.grid(); plt.title("Loss"); plt.show()

# Accuracy plot
plt.figure(figsize=(10,4))
plt.plot(hist["train_acc"], label="Train Accuracy")
plt.plot(hist["val_acc"], label="Validation Accuracy")
plt.legend(); plt.grid(); plt.title("Accuracy"); plt.show()

# =============================
# ROC
# =============================
y_score = predict_proba(X_test, params)
fpr, tpr = roc_curve_manual(y_test, y_score)
auc = auc_trapz(fpr, tpr)

plt.figure(figsize=(6,6))
plt.plot(fpr, tpr, label=f"AUC={auc:.3f}")
plt.plot([0,1],[0,1],'--')
plt.legend(); plt.grid()
plt.xlabel("FPR"); plt.ylabel("TPR")
plt.title("ROC Curve"); plt.show()

# =============================
# EXPERIMENTS
# =============================
experiments = [
    (0.001, 1500, 6, 6),
    (0.003, 1500, 6, 6),
    (0.01, 1500, 6, 6),
    (0.003, 1500, 4, 4),
    (0.003, 1500, 8, 8),
    (0.003, 800, 6, 6)
]

plt.figure(figsize=(12,6))
for lr, ep, h1, h2 in experiments:
    _, h = fit_mlp(X_train, y_train, X_test, y_test,
                   h1=h1, h2=h2, lr=lr, epochs=ep,
                   print_every=999999)
    plt.plot(h["val_loss"], label=f"lr={lr}, ({h1},{h2}), ep={ep}")

plt.legend(); plt.grid()
plt.title("Validation Loss Comparison")
plt.xlabel("Epoch")
plt.show()


#### Conclusion

This activity demonstrated how to build and train neural networks and how to evaluate model performance using training and validation loss plots. However, the Keras-based model could not be fully executed on my home desktop due to persistent environment and dependency issues, and I was unable to debug and resolve the problem despite external assistance.