Use the Network class from Lecture 3 to train a simple MLP for binary classification on the
Hepatitis dataset. Tune the architecture and learning parameters to improve performance.
Dataset:
The dataset contains medical records of hepatitis patients. Each record consists of 19 input
features (such as age, bilirubin level, etc.) and a binary class label:
The set of data is from http://archive.ics.uci.edu/ml/datasets/Hepatitis.
Number of Instances: 155
Number of Attributes: 20 (including the class attribute)
1 = Die
2 = Live
Use one-hot encoding for the output labels:
Die → [1, 0]
Live → [0, 1]
This allows the network to use sigmoid activations and represent classification outputs as
vectors

In [1]:
import numpy as np
import pandas as pd
import random
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, StandardScaler, OneHotEncoder
from sklearn.metrics import accuracy_score
from sklearn.impute import SimpleImputer
from network import Network
from network import sigmoid, sigmoid_prime
from imblearn.over_sampling import SMOTE
from sklearn.metrics import classification_report, confusion_matrix



1) Load dataset 

In [2]:
data = pd.read_csv('hepatitis_data.csv')
SEED = 62
np.random.seed(SEED)
random.seed(SEED)


2. Preprocess the data:
o Replace all missing records with a single and unique value, which is the mean
value of that attribute.
o Normalize the input features (recommended).
o Split the dataset into training and test sets (e.g., 80%/20%)

In [3]:
X = data.drop(columns=['Die_Live']).values
y = data['Die_Live'].values

imputer = SimpleImputer(strategy='mean')
X = imputer.fit_transform(X)

scaler = MinMaxScaler()
X = scaler.fit_transform(X)

encoder = OneHotEncoder(sparse_output=False)
y_encoded = encoder.fit_transform(y.reshape(-1, 1))
        
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, stratify=y, random_state=62)

smote = SMOTE(random_state=62)
X_train, y_train = smote.fit_resample(X_train, np.argmax(y_train, axis=1))
y_train = np.eye(2)[y_train] 

train_data = [(x.reshape(-1, 1), y.reshape(-1, 1)) for x, y in zip(X_train, y_train)]
test_data  = [(x.reshape(-1, 1), int(np.argmax(y))) for x, y in zip(X_test, y_test)]

3. Initial Network architecture:
o Input layer: 19 neurons
o Hidden layer 1: 30 neurons
o Hidden layer 2: 15 neurons
o Output layer: 2 neurons
o Activation function: Sigmoid for all layers

In [4]:
net = Network([19,30,15,2])

4. Training:o Use the Network class implemented in Lecture 3.
o Apply Stochastic Gradient Descent (SGD) to train the model.
o Use a suitable learning rate and mini-batch size.
o Train for at least 20 epochs

In [5]:
net.SGD(training_data=train_data,
        epochs=10,
        mini_batch_size=5,
        eta=0.3,
        test_data=test_data)

5. Evaluation:
o Report classification accuracy on the test set.
o Show a few example predictions.
o Compare results for different hyperparameters (number of hidden units, neurons,
learning rate, epochs and batch size).

In [6]:
predictions = [np.argmax(net.feedforward(x)) for (x, _) in test_data]
y_true = [y for (_, y) in test_data]

acc = accuracy_score(y_true, predictions)
print(f"\nTest Accuracy: {acc:.2%}\n")

print("\nConfusion Matrix:")
print(confusion_matrix(y_true, predictions))

print("\nClassification Report:")
print(classification_report(y_true, predictions, target_names=["Die","Live"]))
label_map = {0: "Die", 1: "Live"}
for i in range(5):
    print(f"Sample {i+1}:")
    print(f"  True Label     : {label_map[y_true[i]]} ({y_true[i]})")
    print(f"  Predicted Label: {label_map[predictions[i]]} ({predictions[i]})")
    print("-" * 40)
    
    


Test Accuracy: 90.32%


Confusion Matrix:
[[ 5  1]
 [ 2 23]]

Classification Report:
              precision    recall  f1-score   support

         Die       0.71      0.83      0.77         6
        Live       0.96      0.92      0.94        25

    accuracy                           0.90        31
   macro avg       0.84      0.88      0.85        31
weighted avg       0.91      0.90      0.91        31

Sample 1:
  True Label     : Live (1)
  Predicted Label: Live (1)
----------------------------------------
Sample 2:
  True Label     : Live (1)
  Predicted Label: Live (1)
----------------------------------------
Sample 3:
  True Label     : Die (0)
  Predicted Label: Die (0)
----------------------------------------
Sample 4:
  True Label     : Live (1)
  Predicted Label: Live (1)
----------------------------------------
Sample 5:
  True Label     : Live (1)
  Predicted Label: Live (1)
----------------------------------------


Trying various options

In [7]:
# Hyperparameter options
epoch_options = [10, 20, 50]
batch_options = [5, 10, 20]
eta_options   = [0.5, 0.1, 0.05]
architectures = [
    [19,10,2],
    [19, 20, 2],
    [19, 30, 15, 2],
    [19, 50, 25, 2],
    [19, 64, 32, 16, 2]
]


# Store results
results = []
for arch in architectures:
    for epochs in epoch_options:
        for batch in batch_options:
            for eta in eta_options:
                print(f"\nTraining with arch={arch}, epochs={epochs}, batch={batch}, eta={eta}")

                # New network for each run
                net = Network(arch)  

                # Train
                net.SGD(train_data, epochs=epochs, mini_batch_size=batch, eta=eta, test_data=test_data)

                # Evaluate after training
                predictions = [np.argmax(net.feedforward(x)) for (x, _) in test_data]
                y_true = [y for (_, y) in test_data]

                acc = accuracy_score(y_true, predictions)
                print(f"Test Accuracy: {acc:.2%}")

                # Save results
                results.append((arch, epochs, batch, eta, acc))

print("\nSummary of results:")
for (arch, epochs, batch, eta, acc) in results:
    print(f"arch={arch}, epochs={epochs}, batch={batch}, eta={eta} → acc={acc:.2%}")
    
best = max(results, key=lambda x: x[-1])  

best_arch, best_epochs, best_batch, best_eta, best_acc = best

print("\nBest Result:")
print(f"  Architecture : {best_arch}")
print(f"  Epochs       : {best_epochs}")
print(f"  Batch Size   : {best_batch}")
print(f"  Learning Rate: {best_eta}")
print(f"  Accuracy     : {best_acc:.2%}")


Training with arch=[19, 10, 2], epochs=10, batch=5, eta=0.5
Test Accuracy: 90.32%

Training with arch=[19, 10, 2], epochs=10, batch=5, eta=0.1
Test Accuracy: 80.65%

Training with arch=[19, 10, 2], epochs=10, batch=5, eta=0.05
Test Accuracy: 67.74%

Training with arch=[19, 10, 2], epochs=10, batch=10, eta=0.5
Test Accuracy: 77.42%

Training with arch=[19, 10, 2], epochs=10, batch=10, eta=0.1
Test Accuracy: 61.29%

Training with arch=[19, 10, 2], epochs=10, batch=10, eta=0.05
Test Accuracy: 77.42%

Training with arch=[19, 10, 2], epochs=10, batch=20, eta=0.5
Test Accuracy: 83.87%

Training with arch=[19, 10, 2], epochs=10, batch=20, eta=0.1
Test Accuracy: 19.35%

Training with arch=[19, 10, 2], epochs=10, batch=20, eta=0.05
Test Accuracy: 38.71%

Training with arch=[19, 10, 2], epochs=20, batch=5, eta=0.5
Test Accuracy: 90.32%

Training with arch=[19, 10, 2], epochs=20, batch=5, eta=0.1
Test Accuracy: 80.65%

Training with arch=[19, 10, 2], epochs=20, batch=5, eta=0.05
Test Accuracy: 9

Findings:
Best Performace epochs = 50, batch = 5, eta = 0.5 with 93.55%
Many clustered around 80-90%
When the learning rate was small at 0.05 specially when epoch 
was also low was when when obtained the lowest scores (22.58%)
We can see that the model is sensitive to n and batch size.
Using too low learning rate makes for poor results.
Too many epochs can hurt becuase the model overfits.
Therefore for this dataset small batch size and a hgih learning rate 
are the best combinations. Larger batch size with smaller learning rates performed worse.


Debugging

In [8]:
print("\n--- Training Data (first 5 samples) ---")
for i in range(5):
    print("X_train:", X_train[i])
    print("y_train (one-hot):", y_train[i])
    print("---")

print("\n--- Test Data (first 5 samples) ---")
for i in range(5):
    print("X_test:", X_test[i])
    print("y_test (int):", np.argmax(y_test[i]))   
    print("---")


--- Training Data (first 5 samples) ---
X_train: [0.43661972 0.         0.         0.         1.         1.
 1.         0.         0.         1.         1.         1.
 1.         0.05194805 0.16356877 0.02208202 0.48837209 0.62
 0.        ]
y_train (one-hot): [0. 1.]
---
X_train: [0.76056338 0.         0.         1.         0.         1.
 1.         0.         0.         1.         1.         1.
 1.         0.12987013 0.19330855 0.01735016 0.39534884 1.
 0.        ]
y_train (one-hot): [0. 1.]
---
X_train: [0.63380282 0.         0.         0.         1.         1.
 1.         1.         1.         1.         1.         1.
 1.         0.05194805 0.18215613 0.06466877 0.44186047 0.21
 0.        ]
y_train (one-hot): [0. 1.]
---
X_train: [0.32394366 0.         1.         1.         0.         1.
 1.         1.         1.         1.         1.         1.
 1.         0.05194805 0.08921933 0.10094637 0.48837209 0.74
 0.        ]
y_train (one-hot): [0. 1.]
---
X_train: [0.56338028 0.         1