### Neural Networks

Neural networks are computational models inspired by the human brain, designed to recognize patterns and
make decisions based on data. They consist of interconnected layers of nodes, or "neurons," which process
and transform input information. Through training, neural networks learn to improve their accuracy in tasks like image recognition, language processing, and more.Neural networks comprise of layers that perform operations on data.

In [None]:
import os
# The jupyter notebook is launched from your $HOME directory.
# Change the working directory to the workshop directory
# which was created in your username directory under /scratch/vp91
os.chdir(os.path.expandvars("/scratch/vp91/$USER/"))

In [None]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

### Dataset
The Pima Indians Diabetes dataset is a popular dataset in the field of machine learning and statistics, particularly for those working on classification problems. 

Dataset Overview:
**Source**: The dataset was created by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) and is available in the UCI Machine Learning Repository.
**Purpose**: The dataset is used to predict the onset of diabetes within five years based on diagnostic measures.
**Features**: The dataset contains 768 samples, each with 8 features. 

The features are:

1. Pregnancies: Number of times pregnant.
2. Glucose: Plasma glucose concentration (mg/dL) a 2 hours in an oral glucose tolerance test.
3. Blood Pressure: Diastolic blood pressure (mm Hg) at the time of screening.
4. Skin Thickness: Triceps skinfold thickness (mm) measured at the back of the upper arm.
5. Insulin: 2-Hour serum insulin (mu U/ml).
6. BMI: Body mass index (weight in kg/(height in m)^2).
7. Diabetes Pedigree Function: A function that scores likelihood of diabetes based on family history.
8. Age: Age of the individual (years).

**Outcome**: Whether or not the individual has diabetes (1 for positive, 0 for negative).

In [None]:
!head /scratch/vp91/$USER/intro-to-pytorch/data/pima-indians-diabetes.data.csv

In [None]:
datapath = os.path.expandvars('/scratch/vp91/$USER/intro-to-pytorch/data/pima-indians-diabetes.data.csv')
print(datapath)

### Curate the dataset
Load the dataset, split into features (X) and output (y) variables

In [None]:
dataset = np.loadtxt(datapath, delimiter=',')
X = dataset[:,0:8] 
y = dataset[:,8]

### Convert the data to tensors

In [None]:
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

### Defining the Model

When designing the model, keep the following points in mind:

1. The input features in the input layer must match the input features in the dataset (`X_tensor`).
2. A high number of layers can increase computation time, while too few layers may result in poor predictions.
3. Each layer should be followed by an activation function.

In this example, we will use a 3-layer neural network:

1. The input layer expects 8 features.
2. The first hidden layer has 12 neurons, followed by a ReLU activation function.
3. The second hidden layer has 8 neurons, followed by another ReLU activation function.
4. The output layer has one neuron, followed by a sigmoid activation function.

The sigmoid function outputs values between 0 and 1, which is exactly what we need.


In PyTorch, neural networks can be defined using different approaches, and two common ones are the Sequential model and the class-based model.

#### Sequential model

* The Sequential model is a simple, linear stack of layers where each layer has a single input and output. It is useful for straightforward feedforward networks where layers are applied in a sequential order.
* It is easier to use for simple architectures where layers are applied in a linear fashion.
* Defined Using: *torch.nn.Sequential*.

In [None]:
seq_model = nn.Sequential(
    nn.Linear(8, 12),
    nn.ReLU(),
    nn.Linear(12, 8),
    nn.ReLU(),
    nn.Linear(8, 1),
    nn.Sigmoid()
)

In [None]:
print(seq_model)

### Class-Based Model

The class-based model allows you to define a network by subclassing torch.nn.Module. This approach provides greater flexibility and control, making it suitable for complex models and custom behaviors.

* Offers full control over the network architecture, including complex data flows, multiple inputs/outputs, and custom forward methods.
* Custom Forward Pass: You can define complex forward passes and control data flow through the network.
* Dynamic Behavior: Allows for dynamic computations, such as conditional layers or operations.
* Defined Using: Subclass of torch.nn.Module

In [None]:
class PimaClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden1 = nn.Linear(8, 12)
        self.act1 = nn.ReLU()
        self.hidden2 = nn.Linear(12, 8)
        self.act2 = nn.ReLU()
        self.output = nn.Linear(8, 1)
        self.act_output = nn.Sigmoid()
 
    def forward(self, x):
        x = self.act1(self.hidden1(x))
        x = self.act2(self.hidden2(x))
        x = self.act_output(self.output(x))
        return x

In [None]:
class_model = PimaClassifier()
print(class_model)

### Define the loss function
Binary Cross-Entropy (BCE) Loss: Measures the performance of a classification model whose output is a probability value between 0 and 1. It calculates the difference between the predicted probabilities and the actual binary labels (0 or 1) and penalizes the model more when the predictions are further from the true labels.

BCELoss(y', y)=−[ylog(y')+(1−y)log(1−y')]

Where, y' is the predicted output and y is the actual otput.

In [None]:
loss_fn = nn.BCELoss()

### Optimizer

Optimizer's main role is to update the model's parameters based on the gradients computed during backpropagation.

1. **Parameter Updates**: Optimizers adjust the weights and biases of the neural network to reduce the loss. This involves applying algorithms that modify the parameters to minimize the difference between the predicted outputs and the actual targets.
2. **Learning Rate Management**: Most optimizers include mechanisms to adjust the learning rate, either statically or dynamically, to control how large the parameter updates are.

In this example we use an optimizer called Adaptive Moment Estimation (Adam). This computes an adaptive learning rates for each parameter by considering both the mean and the variance of the gradients.

In [None]:
optimizer = optim.Adam(class_model.parameters(), lr=0.001)

#### Training the Model

Training a neural network involves epochs and batches, which define how data is fed to the model:

- **Epoch:** A full pass through the entire training dataset.
- **Batch:** A subset of samples processed at a time, with gradient descent performed after each batch.

In practice, the dataset is divided into batches, and each batch is processed sequentially in a training loop. Completing all batches constitutes one epoch. The process is repeated for multiple epochs to refine the model.

Batch size is constrained by system memory (GPU memory), and computational demands scale with batch size. More epochs and batches lead to better model performance but increase training time. The optimal number of epochs and batch size is often determined through experimentation.

#### Purpose of optimizer.zero_grad(), loss.backward(), optimizer.step()

**optimizer.zero_grad()**: During training, gradients accumulate by default in PyTorch. This means that if you don’t clear them, gradients from multiple backward passes (from different batches) will be added together, which can lead to incorrect updates to the model parameters.
By calling optimizer.zero_grad(), you ensure that gradients from previous steps are reset to zero, preventing them from affecting the current update.

**loss.backward()**:  Calculates the gradients of the loss with respect to each parameter of the model. This is done using backpropagation, a key algorithm for training neural networks.

**optimizer.step()**: Used to update the model's parameters based on the gradients computed during during the backward pass (**loss.backward()**).

In [None]:
%%time
n_epochs = 100
batch_size = 10
 
for epoch in range(n_epochs):
    for i in range(0, len(X_tensor), batch_size):
        Xbatch = X_tensor[i:i+batch_size]
        y_pred = class_model(Xbatch)
        ybatch = y_tensor[i:i+batch_size]
        loss = loss_fn(y_pred, ybatch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'Finished epoch {epoch}, latest loss {loss}')

# Evaluate the Model

Currently, we are testing the model on the training dataset. Ideally, we should split the data into separate training and testing datasets, or use a distinct dataset for evaluation. For simplicity, we are testing the model on the same data used for training.



In [None]:
with torch.no_grad():
    y_pred = class_model(X_tensor)
 
accuracy = (y_pred.round() == y_tensor).float().mean()
print(f"Accuracy {accuracy}")

### Exercise

1. **Increase the number of layers in the neural network.** Observe any changes in accuracy.
2. **Change the optimizer from Adam to [Stochastic Gradient Descent (SGD)](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html).** Evaluate how this affects the loss calculation.