### Neural Networks

Neural networks are computational models inspired by the human brain, designed to recognize patterns and
make decisions based on data. They consist of interconnected layers of nodes, or "neurons," which process
and transform input information. Through training, neural networks learn to improve their accuracy in tasks like image recognition, language processing, and more.Neural networks comprise of layers that perform operations on data.

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

In [11]:
import os
# The jupyter notebook is launched from your $HOME directory.
# Change the working directory to the workshop directory
# which was created in your username directory under /scratch/vp91
os.chdir(os.path.expandvars("/scratch/vp91/$USER/"))

### Dataset
The Pima Indians Diabetes dataset is a popular dataset in the field of machine learning and statistics, particularly for those working on classification problems. 

Dataset Overview:
**Source**: The dataset was created by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) and is available in the UCI Machine Learning Repository.
**Purpose**: The dataset is used to predict the onset of diabetes within five years based on diagnostic measures.
**Features**: The dataset contains 768 samples, each with 8 features. 

The features are:

1. Pregnancies: Number of times pregnant.
2. Glucose: Plasma glucose concentration (mg/dL) a 2 hours in an oral glucose tolerance test.
3. Blood Pressure: Diastolic blood pressure (mm Hg) at the time of screening.
4. Skin Thickness: Triceps skinfold thickness (mm) measured at the back of the upper arm.
5. Insulin: 2-Hour serum insulin (mu U/ml).
6. BMI: Body mass index (weight in kg/(height in m)^2).
7. Diabetes Pedigree Function: A function that scores likelihood of diabetes based on family history.
8. Age: Age of the individual (years).

**Outcome**: Whether or not the individual has diabetes (1 for positive, 0 for negative).

In [7]:
!head /scratch/vp91/$USER/intro-to-pytorch/data/pima-indians-diabetes.data.csv

6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
0,137,40,35,168,43.1,2.288,33,1
5,116,74,0,0,25.6,0.201,30,0
3,78,50,32,88,31.0,0.248,26,1
10,115,0,0,0,35.3,0.134,29,0
2,197,70,45,543,30.5,0.158,53,1
8,125,96,0,0,0.0,0.232,54,1


In [13]:
datapath = os.path.expandvars('/scratch/vp91/$USER/intro-to-pytorch/data/pima-indians-diabetes.data.csv')
print(datapath)

/scratch/vp91/jxj900/intro-to-pytorch/data/pima-indians-diabetes.data.csv


### Curate the dataset
Load the dataset, split into features (X) and output (y) variables

In [19]:
dataset = np.loadtxt(datapath, delimiter=',')
X = dataset[:,0:8] 
y = dataset[:,8]

### Convert the data to tensors

In [18]:
X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

### Defining the Model

When designing the model, keep the following points in mind:

1. The input features in the input layer must match the input features in the dataset (`X_tensor`).
2. A high number of layers can increase computation time, while too few layers may result in poor predictions.
3. Each layer should be followed by an activation function.

In this example, we will use a 3-layer neural network:

1. The input layer expects 8 features.
2. The first hidden layer has 12 neurons, followed by a ReLU activation function.
3. The second hidden layer has 8 neurons, followed by another ReLU activation function.
4. The output layer has one neuron, followed by a sigmoid activation function.

The sigmoid function outputs values between 0 and 1, which is exactly what we need.


In PyTorch, neural networks can be defined using different approaches, and two common ones are the Sequential model and the class-based model.

#### Sequential model

* The Sequential model is a simple, linear stack of layers where each layer has a single input and output. It is useful for straightforward feedforward networks where layers are applied in a sequential order.
* It is easier to use for simple architectures where layers are applied in a linear fashion.
* Defined Using: *torch.nn.Sequential*.

In [22]:
seq_model = nn.Sequential(
    nn.Linear(8, 12),
    nn.ReLU(),
    nn.Linear(12, 8),
    nn.ReLU(),
    nn.Linear(8, 1),
    nn.Sigmoid()
)

In [23]:
print(seq_model)

Sequential(
  (0): Linear(in_features=8, out_features=12, bias=True)
  (1): ReLU()
  (2): Linear(in_features=12, out_features=8, bias=True)
  (3): ReLU()
  (4): Linear(in_features=8, out_features=1, bias=True)
  (5): Sigmoid()
)


### Class-Based Model

The class-based model allows you to define a network by subclassing torch.nn.Module. This approach provides greater flexibility and control, making it suitable for complex models and custom behaviors.

* Offers full control over the network architecture, including complex data flows, multiple inputs/outputs, and custom forward methods.
* Custom Forward Pass: You can define complex forward passes and control data flow through the network.
* Dynamic Behavior: Allows for dynamic computations, such as conditional layers or operations.
* Defined Using: Subclass of torch.nn.Module

In [24]:
class PimaClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden1 = nn.Linear(8, 12)
        self.act1 = nn.ReLU()
        self.hidden2 = nn.Linear(12, 8)
        self.act2 = nn.ReLU()
        self.output = nn.Linear(8, 1)
        self.act_output = nn.Sigmoid()
 
    def forward(self, x):
        x = self.act1(self.hidden1(x))
        x = self.act2(self.hidden2(x))
        x = self.act_output(self.output(x))
        return x

In [25]:
class_model = PimaClassifier()
print(class_model)

PimaClassifier(
  (hidden1): Linear(in_features=8, out_features=12, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=12, out_features=8, bias=True)
  (act2): ReLU()
  (output): Linear(in_features=8, out_features=1, bias=True)
  (act_output): Sigmoid()
)


### Define the loss function
Binary Cross-Entropy (BCE) Loss: Measures the performance of a classification model whose output is a probability value between 0 and 1. It calculates the difference between the predicted probabilities and the actual binary labels (0 or 1) and penalizes the model more when the predictions are further from the true labels.

BCELoss(y', y)=−[ylog(y')+(1−y)log(1−y')]

Where, y' is the predicted output and y is the actual otput.

In [26]:
loss_fn = nn.BCELoss()