## Week 1: Neural Network with PyTorch Step by Step

### Load Data

In [2]:
import requests
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

In [3]:
# For more details of the dataset: https://machinelearningmastery.com/develop-your-first-neural-network-with-pytorch-step-by-step/
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv"
filename = "pima-indians-diabetes.csv"

response = requests.get(url)
if response.status_code == 200:
    with open(filename, 'wb') as file: 
        file.write(response.content) 
    print("Download successful.")
else:
    print("Download failed with status code:", response.status_code)

Download successful.


In [5]:
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]

In [6]:
#convert numerical value to a tensor
#PyTorch usually operates in a 32-bit floating point while NumPy, by default, uses a 64-bit floating point.
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

In [15]:
print(y[:5])

tensor([[1.],
        [0.],
        [1.],
        [0.],
        [1.]])


### Define the Model

##### 1. **Sequential Model Definition**:
- A `Sequential` model in PyTorch is created by listing out layers in sequence. 
- The example provided constructs a model for data with 8 input features using fully connected or dense layers (`nn.Linear`), specifying the input and output dimensions for each layer.
- The model includes two hidden layers with 12 and 8 neurons, respectively, each followed by a ReLU activation function to introduce non-linearity.
- The output layer consists of a single neuron with a sigmoid activation function, making the model's output suitable for binary classification tasks, mapping the output to a probability between 0 and 1.

##### 2. **Class-Based Model Definition**:
- The alternative approach to defining a PyTorch model involves creating a Python class that inherits from `nn.Module`.
- This class-based model explicitly defines each layer and the activation function as class attributes in the constructor (`__init__` method). It also requires defining a `forward()` method, detailing how the input tensor is processed through the layers to produce an output tensor.
- The same structure is used as in the Sequential model, with the layers and activations defined as attributes of the class and the data flow specified in the `forward` method.

Both methods allow for the flexibility to experiment with different architectures and layer configurations to optimize model performance. The choice between using a Sequential model or a class-based model typically depends on the complexity of the model and personal preference, with the class-based approach offering more control over the forward pass and the ability to include custom operations.

##### Sequential Model

In [29]:
#nn.Sequential automatically handles the forward pass in the order the layers are added.
model = nn.Sequential(
    nn.Linear(8, 12),
    nn.ReLU(),
    nn.Linear(12, 8),
    nn.ReLU(),
    nn.Linear(8, 1),
    nn.Sigmoid()
)

print(model)

Sequential(
  (0): Linear(in_features=8, out_features=12, bias=True)
  (1): ReLU()
  (2): Linear(in_features=12, out_features=8, bias=True)
  (3): ReLU()
  (4): Linear(in_features=8, out_features=1, bias=True)
  (5): Sigmoid()
)


##### Class_Based Model

In [22]:
class nn_classifier(nn.Module): #specify how data flows through these layers in the forward method.
    def __init__(self):
        super().__init__() # Call the parent's __init__ first!
        self.hidden1 = nn.Linear(8,12)
        self.act1 = nn.ReLU()
        self.hidden2 = nn.Linear(12,8)
        self.act2 = nn.ReLU()
        self.output = nn.Linear(8,1)
        self.act_output = nn.Sigmoid()
    
    def forward(self, x):
        x = self.act1(self.hidden1(x))
        x = self.act2(self.hidden2(x))
        x = self.act_output(self.output(x))
        return X

model = nn_classifier()
print(model)

nn_classifier(
  (hidden1): Linear(in_features=8, out_features=12, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=12, out_features=8, bias=True)
  (act2): ReLU()
  (output): Linear(in_features=8, out_features=1, bias=True)
  (act_output): Sigmoid()
)


### Preparation for Training

To train a neural network model, you must define a **loss function** and an **optimizer**. 
- The **loss function** measures how close the model's predictions are to the true target values. 
For binary classification tasks, **binary cross-entropy** is a suitable loss function. 
(Mean Squared Error for regression, Cross-Entropy Loss for classification)
- An **optimizer**, such as **Adam**, is used to update the model's weights based on the loss function's feedback to improve its predictions.

In [27]:
loss_fn = nn.BCELoss() #binary cross entropy
optimizer = optim.Adam(model.parameters(), lr=0.001)

### Training a Model

Training a neural network model usually takes in **epochs** and **batches**.
- Each epoch consists of multiple iterations over all the batches, where gradient descent is applied to refine the model's weights. 
- The process is iteratively repeated across epochs until the model's performance is satisfactory, aiming for a balance between computational efficiency and training duration. 
- The choice of batch size and the number of epochs typically involves experimentation. 
- As training progresses, the model's error decreases and eventually stabilizes, indicating convergence.
- A training loop is commonly implemented using two nested for-loops, one iterating over epochs and the other over batches.


**optimizer.zero_grad()**
- Gradients are small changes that show how much the error would change if the model's weights were slightly altered.
- In PyTorch, gradients accumulate by default (i.e., they are summed up) to support dynamic computations and recurrent neural networks. 
- optimizer.zero_grad() sets all the gradients to zero for all model parameters at the beginning of each iteration
- Without this call, gradients would accumulate across batches, leading to incorrect updates.

**loss.backward()**
- It initiates the backpropagation algorithm
- It does not compute the loss itself; rather, it calculates the gradients of the loss with respect to the model's parameters during backpropagation.
- This means that when the loss.backward() function is executed, it doesn't actually calculate the value of the loss (error). Instead, **based on the already computed value of the loss function, it calculates how this loss affects each parameter of the model, i.e., it computes the gradient (rate of change) of the loss with respect to each parameter. Backpropagation plays a crucial role in using these gradients to update the model's parameters.

**optimizer.step()**
- After the gradients are calculated and stored, you need to update the model parameters in the direction that minimizes the loss.
- The optimizer, which was previously defined (e.g., using torch.optim.Adam), knows how to update each parameter given its current gradient stored in .grad. 
- Adam adjust the learning rates of each parameter dynamically.

In [33]:
n_epochs = 10
batch_size = 10

for epoch in range(n_epochs):
    for i in range(0, len(X), batch_size):
        Xbatch = X[i: i+batch_size]
        y_pred = model(Xbatch) # Forward pass
        ybatch = y[i: i+batch_size]
        loss = loss_fn(y_pred, ybatch)
        optimizer.zero_grad() 
        loss.backward() # Initiates the backpropagation algorithm
        optimizer.step() # Update model parameters
    print(f'Finishede epoch {epoch}, latest loss {loss}')

Finishede epoch 0, latest loss 0.5624324679374695
Finishede epoch 1, latest loss 0.5624324679374695
Finishede epoch 2, latest loss 0.5624324679374695
Finishede epoch 3, latest loss 0.5624324679374695
Finishede epoch 4, latest loss 0.5624324679374695
Finishede epoch 5, latest loss 0.5624324679374695
Finishede epoch 6, latest loss 0.5624324679374695
Finishede epoch 7, latest loss 0.5624324679374695
Finishede epoch 8, latest loss 0.5624324679374695
Finishede epoch 9, latest loss 0.5624324679374695


### Evaluate the Model

In [34]:
# compute accuracy (no_grad is optional)
with torch.no_grad():
    y_pred = model(X)
 
accuracy = (y_pred.round() == y).float().mean()
print(f"Accuracy {accuracy}")

Accuracy 0.6510416865348816


In [36]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
 
# load the dataset, split into input (X) and output (y) variables
dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:,0:8]
y = dataset[:,8]
 
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
 
# define the model
model = nn.Sequential(
    nn.Linear(8, 12),
    nn.ReLU(),
    nn.Linear(12, 8),
    nn.ReLU(),
    nn.Linear(8, 1),
    nn.Sigmoid()
)
print(model)
 
# train the model
loss_fn   = nn.BCELoss()  # binary cross entropy
optimizer = optim.Adam(model.parameters(), lr=0.001)
 
n_epochs = 100
batch_size = 10
 
for epoch in range(n_epochs):
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i+batch_size]
        y_pred = model(Xbatch)
        ybatch = y[i:i+batch_size]
        loss = loss_fn(y_pred, ybatch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'Finished epoch {epoch}, latest loss {loss}')
 
# compute accuracy (no_grad is optional)
with torch.no_grad():
    y_pred = model(X)
accuracy = (y_pred.round() == y).float().mean()
print(f"Accuracy {accuracy}")

Sequential(
  (0): Linear(in_features=8, out_features=12, bias=True)
  (1): ReLU()
  (2): Linear(in_features=12, out_features=8, bias=True)
  (3): ReLU()
  (4): Linear(in_features=8, out_features=1, bias=True)
  (5): Sigmoid()
)
Finished epoch 0, latest loss 0.5492443442344666
Finished epoch 1, latest loss 0.6172183156013489
Finished epoch 2, latest loss 0.5994867086410522
Finished epoch 3, latest loss 0.5781427025794983
Finished epoch 4, latest loss 0.560743510723114
Finished epoch 5, latest loss 0.5461434721946716
Finished epoch 6, latest loss 0.5438551306724548
Finished epoch 7, latest loss 0.5339827537536621
Finished epoch 8, latest loss 0.5248621106147766
Finished epoch 9, latest loss 0.51628178358078
Finished epoch 10, latest loss 0.5057646036148071
Finished epoch 11, latest loss 0.4977971017360687
Finished epoch 12, latest loss 0.4928841292858124
Finished epoch 13, latest loss 0.48636361956596375
Finished epoch 14, latest loss 0.4819941520690918
Finished epoch 15, latest loss 0.

### Add L1 and L2 Regularization to prevent overfitting
- L1 regularization: l1_loss is calculated by summing the absolute values of all the model parameters
- L2 regularization: Used the weight_decay parameter in the optimizer


In [37]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:, 0:8]
y = dataset[:, 8]

X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

model = nn.Sequential(
    nn.Linear(8, 12),
    nn.ReLU(),
    nn.Linear(12, 8),
    nn.ReLU(),
    nn.Linear(8, 1),
    nn.Sigmoid()
)
print(model)

# Define loss function (binary cross entropy) and optimizer with L2 regularization
loss_fn = nn.BCELoss()
l2_lambda = 0.001  # L2 regularization weight
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=l2_lambda)

n_epochs = 100
batch_size = 10

for epoch in range(n_epochs):
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i + batch_size]
        ybatch = y[i:i + batch_size]
        y_pred = model(Xbatch)

        loss = loss_fn(y_pred, ybatch)

        # L1 regularization: manually add L1 loss for all parameters
        l1_lambda = 0.0005  # L1 regularization weight
        l1_loss = sum(p.abs().sum() for p in model.parameters())
        total_loss = loss + l1_lambda * l1_loss

        optimizer.zero_grad()
        total_loss.backward()
        optimizer.step()

    print(f'Finished epoch {epoch}, latest total loss {total_loss.item()}')

with torch.no_grad():
    y_pred = model(X)
accuracy = (y_pred.round() == y).float().mean()
print(f"Accuracy: {accuracy.item()}")

Sequential(
  (0): Linear(in_features=8, out_features=12, bias=True)
  (1): ReLU()
  (2): Linear(in_features=12, out_features=8, bias=True)
  (3): ReLU()
  (4): Linear(in_features=8, out_features=1, bias=True)
  (5): Sigmoid()
)
Finished epoch 0, latest total loss 0.6232933402061462
Finished epoch 1, latest total loss 0.6222648024559021
Finished epoch 2, latest total loss 0.6221972703933716
Finished epoch 3, latest total loss 0.6166897416114807
Finished epoch 4, latest total loss 0.6068466305732727
Finished epoch 5, latest total loss 0.5833206176757812
Finished epoch 6, latest total loss 0.5786837339401245
Finished epoch 7, latest total loss 0.5704125761985779
Finished epoch 8, latest total loss 0.5675392746925354
Finished epoch 9, latest total loss 0.5628054738044739
Finished epoch 10, latest total loss 0.5587262511253357
Finished epoch 11, latest total loss 0.5553116798400879
Finished epoch 12, latest total loss 0.5542548298835754
Finished epoch 13, latest total loss 0.54895550012588

### Logistic regression for a classification model

- Logistic regression is essentially a single-layer neural network without hidden layers and uses a sigmoid activation function for binary classification

**Logistic Regression Model with PyTorch**

In [38]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:, 0:8]
y = dataset[:, 8]

X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

# Define the logistic regression model
log_reg_model = nn.Sequential(
    nn.Linear(8, 1),
    nn.Sigmoid()
)
print(log_reg_model)

loss_fn = nn.BCELoss()
l2_lambda = 0.001
optimizer = optim.Adam(log_reg_model.parameters(), lr=0.001, weight_decay=l2_lambda)

n_epochs = 100
batch_size = 10

for epoch in range(n_epochs):
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i + batch_size]
        ybatch = y[i:i + batch_size]
        y_pred = log_reg_model(Xbatch)

        loss = loss_fn(y_pred, ybatch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    print(f'Finished epoch {epoch}, latest loss {loss.item()}')

# Compute accuracy
with torch.no_grad():
    y_pred = log_reg_model(X)
accuracy = (y_pred.round() == y).float().mean()
print(f"Accuracy: {accuracy.item()}")

Sequential(
  (0): Linear(in_features=8, out_features=1, bias=True)
  (1): Sigmoid()
)
Finished epoch 0, latest loss 75.0
Finished epoch 1, latest loss 75.0
Finished epoch 2, latest loss 75.0
Finished epoch 3, latest loss 75.0
Finished epoch 4, latest loss 75.0
Finished epoch 5, latest loss 3.3385066986083984
Finished epoch 6, latest loss 1.7819201946258545
Finished epoch 7, latest loss 1.4942830801010132
Finished epoch 8, latest loss 1.2700145244598389
Finished epoch 9, latest loss 1.1119365692138672
Finished epoch 10, latest loss 0.9939742684364319
Finished epoch 11, latest loss 0.907599925994873
Finished epoch 12, latest loss 0.8482560515403748
Finished epoch 13, latest loss 0.8095002174377441
Finished epoch 14, latest loss 0.7833399772644043
Finished epoch 15, latest loss 0.7631639242172241
Finished epoch 16, latest loss 0.7454894185066223
Finished epoch 17, latest loss 0.7291943430900574
Finished epoch 18, latest loss 0.7140536308288574
Finished epoch 19, latest loss 0.69998615980

**Logistic Regression Model with Scikit-learn**

In [46]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np

dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:, 0:8]
y = dataset[:, 8]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(C= 0.0018329807108324356, penalty='l2', solver='newton-cg', max_iter=5000)

model.fit(X_train, y_train)

y_pred_train = model.predict(X_train)
train_accuracy = accuracy_score(y_train, y_pred_train)
print(f"Training Accuracy: {train_accuracy}")

y_pred_test = model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred_test)
print(f"Testing Accuracy: {test_accuracy}")

Training Accuracy: 0.7736156351791531
Testing Accuracy: 0.7402597402597403


**Hyperparameter tuning using GridSearchCV from Scikit-learn on the logistic regression mode**

In [44]:
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np

dataset = np.loadtxt('pima-indians-diabetes.csv', delimiter=',')
X = dataset[:, 0:8]
y = dataset[:, 8]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=5000)

params_grid = [
    {'solver': ['liblinear', 'saga'], 'penalty': ['l1', 'l2'], 
     'C': np.logspace(-4, 4, 20)},
    {'solver': ['newton-cg', 'lbfgs', 'sag'], 'penalty': ['l2'], 
     'C': np.logspace(-4, 4, 20)},
    {'solver': ['saga'], 'penalty': ['elasticnet'], 'C': np.logspace(-4, 4, 20), 
     'l1_ratio': np.linspace(0, 1, 10)},
    {'solver': ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'], 'penalty': ['none']}
]

grid_search = GridSearchCV(estimator=model, param_grid=params_grid, cv=5, verbose=1, scoring='accuracy', n_jobs=-1)

grid_search.fit(X_train, y_train)

print("Best Parameters:", grid_search.best_params_)
print("Best Cross-validation Score:", grid_search.best_score_)

y_pred_test = grid_search.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred_test)
print(f"Testing Accuracy with Best Parameters: {test_accuracy}")

Fitting 5 folds for each of 345 candidates, totalling 1725 fits
Best Parameters: {'C': 0.0018329807108324356, 'penalty': 'l2', 'solver': 'newton-cg'}
Best Cross-validation Score: 0.7687724910035986
Testing Accuracy with Best Parameters: 0.7402597402597403


25 fits failed out of a total of 1725.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
25 fits failed with the following error:
Traceback (most recent call last):
  File "C:\Users\alice\AppData\Roaming\Python\Python312\site-packages\sklearn\model_selection\_validation.py", line 895, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\alice\AppData\Roaming\Python\Python312\site-packages\sklearn\base.py", line 1467, in wrapper
    estimator._validate_params()
  File "C:\Users\alice\AppData\Roaming\Python\Python312\site-packages\sklearn\base.py", line 666, in _validate_params
    validate_parameter_constraints(
  File "C:\Users\alice\AppData\Roaming\Python\Python312\site-packages\sklearn\utils\_param_validat