In [2]:
! pip install torch torchvision torchaudio

Collecting torch
  Downloading torch-2.7.0-cp312-cp312-win_amd64.whl.metadata (29 kB)
Collecting torchvision
  Downloading torchvision-0.22.0-cp312-cp312-win_amd64.whl.metadata (6.3 kB)
Collecting torchaudio
  Downloading torchaudio-2.7.0-cp312-cp312-win_amd64.whl.metadata (6.7 kB)
Collecting sympy>=1.13.3 (from torch)
  Downloading sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Downloading torch-2.7.0-cp312-cp312-win_amd64.whl (212.5 MB)
   ---------------------------------------- 0.0/212.5 MB ? eta -:--:--
   ---------------------------------------- 0.1/212.5 MB 7.0 MB/s eta 0:00:31
   ---------------------------------------- 0.3/212.5 MB 4.4 MB/s eta 0:00:49
   ---------------------------------------- 0.4/212.5 MB 3.3 MB/s eta 0:01:06
   ---------------------------------------- 0.5/212.5 MB 3.0 MB/s eta 0:01:11
   ---------------------------------------- 0.7/212.5 MB 3.4 MB/s eta 0:01:02
   ---------------------------------------- 1.0/212.5 MB 3.9 MB/s eta 0:00:55
   ---------------

In [3]:
import numpy as np
import torch
import torch.nn as nn
import pandas as pd
from sklearn.preprocessing import StandardScaler
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

**Pandas** and **Sckikit_learn** libraries will help us to preprocess our dataset, and after everything is ready, we are going to build our PyTorch custome dataset using **torch.utilis.data**

**Pandas:** we will use Pandas to load, manipulate, and preprocess our raw dataset. Pandas is a powerful Python library for data analysis that allows us to perform various data cleaning, transformation, and exploration tasks.

**Scikit-learn (sklearn):** Scikit-learn will help us with additional data preprocessing tasks like handling missing values, feature scaling, encoding categorical variables, and splitting the data into training and testing sets. It provides a range of tools to prepare our data for machine learning.

**torch.utils.data:**  module to create a custom dataset that can be directly fed into our PyTorch neural network for training.

In [4]:
#Load the dataset using pandas
#this dataset has 700+ samples and each sample has 7 features and 1 output.
data = pd.read_csv('diabetes.csv')
data

Unnamed: 0,Number of times pregnant,Plasma glucose concentration,Diastolic blood pressure,Triceps skin fold thickness,2-Hour serum insulin,Body mass index,Age,Class
0,6,148,72,35,0,33.6,50,positive
1,1,85,66,29,0,26.6,31,negative
2,8,183,64,0,0,23.3,32,positive
3,1,89,66,23,94,28.1,21,negative
4,0,137,40,35,168,43.1,33,positive
...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,63,negative
764,2,122,70,27,0,36.8,27,negative
765,5,121,72,23,112,26.2,30,negative
766,1,126,60,0,0,30.1,47,positive


Now, we need to pre-process the dataset, first of all we need to extract our X and Y. X = input = all the features except "Class" which is output class.
Y = output = only the "class" column.

In [5]:
# For x: Extract out the dataset from all the rows (all samples) and all columns except last column (all features)
# For y: Extract out the last column (which is the label)
# Convert both to numpy using the .values method
x = data.iloc[:,:7].values
y_string = list(data.iloc[:,-1]) #since our output classes are string, so first store the string then convert it in numbers

In [6]:
# Our neural network only understands numbers! So convert the string to labels(numbers)
y_int = []
for string in y_string:
    if string == 'positive':
        y_int.append(1)
    else:
        y_int.append(0)

Now, convert the **y_int** list to an array. Because we are doing a neural netwrok and everything has to be in array or matrix format since we're using NumPy. We use NumPy because Python lists are slow and we can't basically do some matrix operations on them. That's why we need to convert everything to a numpy array and after that we need to convert it also to a pytorch tensor. We can actually convert a list directly to PyTorch tensor as well.

In [7]:
# Now convert to an array
y = np.array(y_int, dtype = 'float64')
# since x has float numbers, so we specify y to be float as well.

In [8]:
x

array([[  6. , 148. ,  72. , ...,   0. ,  33.6,  50. ],
       [  1. ,  85. ,  66. , ...,   0. ,  26.6,  31. ],
       [  8. , 183. ,  64. , ...,   0. ,  23.3,  32. ],
       ...,
       [  5. , 121. ,  72. , ..., 112. ,  26.2,  30. ],
       [  1. , 126. ,  60. , ...,   0. ,  30.1,  47. ],
       [  1. ,  93. ,  70. , ...,   0. ,  30.4,  23. ]])

Each feature has different range. So we need to normalize the data. Normalization is needed in deep learning to ensure that input features are on a consistent scale. It prevents certain features from dominating others during training, which can help the model converge faster and improve its ability to generalize to new data. Normalization typically involves scaling features to have a mean of 0 and a standard deviation of 1.

There are many other reasons why Normalization is needed.
- **Stabilizes Training:** It ensures that when the model learns, it doesn't get confused by the differences in how big or small the data is.
- **Accelerates Convergence:** Speeds up learning by ensuring balanced gradients.
- **Improves Generalization:** Enhances model's ability to make accurate predictions on unseen data.
- **Enhances Gradient Descent:** Facilitates more efficient gradient descent optimization.
- **Facilitates Weight Initialization:** Helps weight initialization methods work effectively.
- **Mitigates Overfitting:** Acts as implicit regularization by reducing the impact of outliers.
- **Ensures Model Robustness:** Makes the model less sensitive to variations in input scale.
- **Compatibility with Activation Functions:** Ensures activations stay within desired ranges.
- **Interpretability:** Enables better understanding of learned features and model behavior.
- **Prevents Numerical Instabilities:** Guards against numerical issues in computations.

 $x' = \frac{x - \mu}{\sigma}$

This is the formula that we're going to apply for Normalization. This is caled Standard Scaler.

In [9]:
# Feature Normalization. All features should have the same range.
# We're going to do this using scikit learn library
sc = StandardScaler()
x = sc.fit_transform(x)

What we did in the previous cell is that:

we create an instance of the StandardScaler class, such as sc, to act as our normalization object. Once we have this object, we can use its **fit_transform** method to apply the normalization to our data.

In [10]:
x

array([[ 0.63994726,  0.84832379,  0.14964075, ..., -0.69289057,
         0.20401277,  1.4259954 ],
       [-0.84488505, -1.12339636, -0.16054575, ..., -0.69289057,
        -0.68442195, -0.19067191],
       [ 1.23388019,  1.94372388, -0.26394125, ..., -0.69289057,
        -1.10325546, -0.10558415],
       ...,
       [ 0.3429808 ,  0.00330087,  0.14964075, ...,  0.27959377,
        -0.73518964, -0.27575966],
       [-0.84488505,  0.1597866 , -0.47073225, ..., -0.69289057,
        -0.24020459,  1.17073215],
       [-0.84488505, -0.8730192 ,  0.04624525, ..., -0.69289057,
        -0.20212881, -0.87137393]])

There are other Normalization techniques as well, such as, Mean Normalization, Min-Max scaling etc.

In [11]:
# Now we convert the arrays to PyTorch tensors.
x = torch.tensor(x)
y = torch.tensor(y)

In [12]:
x.shape
#768 samples and 7 features

torch.Size([768, 7])

In [13]:
y.shape

torch.Size([768])

Since, we'll be using binary cross entropy as our loss function, so target variable(y) must be 2-D

In [14]:
y = y.unsqueeze(1)
y.shape

torch.Size([768, 1])

Let's continue to build our custome PyTorch dataset. Firstly, we are going to create a class named "Dataset" and we are going to inherit this class from the ***Dataset*** (which is the base class for creating custom datasets in PyTorch) class that we imported from ***torch.utils.data***.

In [15]:
class Dataset(Dataset):
    
    def __init__(self,x,y):
        self.x = x
        self.y = y
        
    def __getitem__(self,index):        
       # actually we're overriding the getitem function from the inherited class Dataset.
        #Because we're building a custom dataset and we want our function to perform differently.
        #And this function is in charge of getting one sample of the datasets.
        return self.x[index], self.y[index]
    
    def __len__(self):
        #this is also a overridden function
        return len(self.x)


In [16]:
dataset = Dataset(x,y) # done creating our object

In [17]:
len(dataset)

768

In [18]:
# Load the data to our dataloader for batch processing and shuffling
train_loader = DataLoader(dataset= dataset,
          batch_size = 32,
          shuffle = True)
# shuffle=True mixes up the data for training, while 
#shuffle=False keeps it in order for evaluation.

In [19]:
train_loader

<torch.utils.data.dataloader.DataLoader at 0x19e94a38920>

In [20]:
# Let's have a look at the data loader
print("There is {} batches in the dataset".format(len(train_loader)))
for (x,y) in train_loader:
    print("For one iteration (batch), there is:")
    print("Data:   {}".format(x.shape))
    print("Labels: {}".format(y.shape))
    break

There is 24 batches in the dataset
For one iteration (batch), there is:
Data:   torch.Size([32, 7])
Labels: torch.Size([32, 1])


The ***len(train_loader)*** gives you the number of mini-batches that the dataset has been divided into.

The ***for*** loop iterates over the mini-batches in *train_loader*. It doesn't go through the data that inside each mini-batches.

The ***x.shape*** gives you the dimensions of the data, which is a tuple representing the batch size and the input feature dimensions for that mini-batch.

***y.shape*** gives you the dimensions of the labels, which typically correspond to the batch size and the number of classes or regression targets.

![my image](diabetes_neural_network.png)

It's upto us, whether we want to use 3 hidden layer or 2 hidden layer or only 1.

In [23]:
# Now let's build the above network
class Model(nn.Module):
    def __init__(self,input_feauters, output_features):
        super(Model, self).__init__()
        #now we're defining the attributes of our NN
        self.fc1 = nn.Linear(input_feauters, 5) #for the 1st layer, output features=5
        self.fc2 = nn.Linear(5,4)
        self.fc3 = nn.Linear(4,3)
        self.fc4 = nn.Linear(3,output_features)
        
        #for hidden layers we'll use Tanh and for output layer we'll use Sigmoid(because we used BCE loss) activation function
        self.sigmoid = nn.Sigmoid()
        self.tanh = nn.Tanh()
        
    #defining the functionalities
    def forward(self, x):
        out = self.fc1(x)
        out = self.tanh(out)
        out = self.fc2(out)
        out = self.tanh(out)
        out = self.fc3(out)
        out = self.tanh(out)
        out = self.fc4(out)
        out = self.sigmoid(out)
        return out
        

We're done building our neural network now. We don't need to build or code our backpropagation function because PyTorch will automatically do it for us. All we need to do is supply the forward propagation function and pytorch automatically does the back propagation.

fc = fully connected layer/ linear layer / multi-layer perceptron

In a neural network, there are ***attributes*** and ***functionalities*** that define its architecture and behavior. Here are common attributes and functionalities associated with a neural network:

## Attributes

  **Layers:** A neural network typically consists of multiple layers, including input, hidden, and output layers. Each layer is an attribute of the neural network.

  **Weights and Biases:** Neural networks learn from data by adjusting weights and biases associated with each connection between neurons. These weights and biases are learned during training and are crucial attributes of the network.

  **Activation Functions:** The type of activation functions used in each layer (e.g., ReLU, Sigmoid, Tanh) is an attribute that defines how the neurons in that layer process input.

  **Loss Function:** The loss function used for training, which measures the error between predicted and actual outputs, is an attribute.

  **Optimization Algorithm:** The optimization algorithm used for updating weights and biases during training (e.g., SGD, Adam) is another attribute.

  **Learning Rate:** The learning rate, which determines the step size for weight updates during training, is often an attribute that can be adjusted.

## Functionalities:

  **Forward Propagation:** The neural network should have a functionality to perform forward propagation, which involves passing input data through the network to make predictions.

  **Backpropagation:** During training, the network should be able to compute gradients and update weights and biases using backpropagation, a core functionality for learning from data.

  **Inference:** After training, the network should be capable of making predictions (inference) on new, unseen data.

  **Regularization:** The network can implement regularization techniques (e.g., dropout, L1/L2 regularization) to prevent overfitting.

  **Evaluation:** It should provide functionality for evaluating its performance on validation or test datasets using metrics such as accuracy, precision, recall, etc.

  **Saving and Loading:** Often, neural networks need to save their learned weights and architecture to disk and load them for future use.

  **Hyperparameter Tuning:** Some neural networks may have functionality to search for optimal hyperparameters (e.g., learning rate, batch size) automatically.

  **Visualization:** Tools for visualizing model architecture, training curves, and feature maps can be helpful for debugging and analysis.

The specific attributes and functionalities of a neural network can vary depending on the type of network (e.g., feedforward, convolutional, recurrent), the problem it is designed to solve (e.g., classification, regression, generation), and the framework or library used for implementation (e.g., PyTorch, TensorFlow).

$ H_{p}(q) = \frac{-1}{N} \sum_{i=1}^{N} y_{i} . \log{p(y_{i})} + (1 - y_{i}).\log{1 - p(y_{i})} $

cost = -(Y  torch.log(hypothesis) + (1 - Y)torch.log(1 - hypothesis)).mean()

In [24]:
# Now, create the network(an object of the Model class)
net = Model(7,1)
# Since our output is either 0 or 1. So we use BCE loss function
# In BCE loss function: the input and output should have the same shape
# size_average = True --> the losses are averaged over observations for each minimatch
criterion = torch.nn.BCELoss(size_average = True)

# Finally our optimaztion  algorithm!
# we will use SGD with momentum with a learning rate of 0.1
optimizer = torch.optim.SGD(net.parameters(), lr = 0.1, momentum =0.9)



If ***size_average*** is set to **True**, it means that the loss value will be an average of the errors for each item in the batch. This is useful when we want the loss to be roughly the same scale, regardless of the batch size.

If ***size_average*** is set to **False**, it means that the loss value will be the sum of the errors for each item in the batch. This can be useful when we want to know the total error for the entire batch without considering the batch size.

The choice between **True** and **False** depends on what you need for our specific problem and how we want to interpret the loss value.

***net.parameters()*** specifies the parameters (weights and biases) that the optimizer will update during training. We know here **net** is an instance of our neural network model, and ***net.parameters()*** retrieves all the learnable parameters from the model.

In [25]:
# Training the network
# let's train our hyperparameters
epochs = 200
for epoch in range(200):
    for inputs,labels in train_loader:
        inputs = inputs.float() #though everyting is in float dtype, just to be sate
        labels = labels.float()
        #let's feed our data to the the NN
        #Forward prop
        outputs = net(inputs)
        # loss calculation
        loss = criterion(outputs, labels) #predicted val=outputs, Actual val=labels
        
        # let's go ahead and begin back prop. There are 3 steps for the FP in PyTorch
        
        # firstly, clear the gradient buffer
        optimizer.zero_grad()
        
        #secondly, calculate the gradient
        loss.backward()
        
        #Thirdly, update the weights
        optimizer.step() # new_weight <-- old_weight - lr*gradient
        
        
    # Now we want to calculate the training accuracy after each epoch.
    output = (outputs>0.5).float()  # 0.5 is a threshold value.
    accuracy = (output == labels).float().mean()    
    # Print statistics
    print("Epoch {}/{}, Loss: {:.3f}, Accuracy: {:.3f}".format(epoch+1, epochs, loss, accuracy))
        

Epoch 1/200, Loss: 0.606, Accuracy: 0.719
Epoch 2/200, Loss: 0.512, Accuracy: 0.781
Epoch 3/200, Loss: 0.375, Accuracy: 0.906
Epoch 4/200, Loss: 0.560, Accuracy: 0.688
Epoch 5/200, Loss: 0.603, Accuracy: 0.625
Epoch 6/200, Loss: 0.433, Accuracy: 0.781
Epoch 7/200, Loss: 0.588, Accuracy: 0.781
Epoch 8/200, Loss: 0.522, Accuracy: 0.781
Epoch 9/200, Loss: 0.560, Accuracy: 0.656
Epoch 10/200, Loss: 0.421, Accuracy: 0.844
Epoch 11/200, Loss: 0.488, Accuracy: 0.688
Epoch 12/200, Loss: 0.363, Accuracy: 0.875
Epoch 13/200, Loss: 0.593, Accuracy: 0.625
Epoch 14/200, Loss: 0.513, Accuracy: 0.750
Epoch 15/200, Loss: 0.389, Accuracy: 0.844
Epoch 16/200, Loss: 0.343, Accuracy: 0.844
Epoch 17/200, Loss: 0.442, Accuracy: 0.750
Epoch 18/200, Loss: 0.511, Accuracy: 0.750
Epoch 19/200, Loss: 0.333, Accuracy: 0.875
Epoch 20/200, Loss: 0.497, Accuracy: 0.656
Epoch 21/200, Loss: 0.563, Accuracy: 0.656
Epoch 22/200, Loss: 0.536, Accuracy: 0.656
Epoch 23/200, Loss: 0.425, Accuracy: 0.812
Epoch 24/200, Loss: 

There are 3 steps for the backward propagation in PyTorch. 
### Clearing the Gradient Buffer:
Firstly, clear the gradient buffer. This is not to accumulate gradients.If gradients are not cleared between batches, they would accumulate over time. This means that the gradients from the current batch would be added to the gradients from previous batches. As a result, the parameter updates in subsequent batches would be influenced by the cumulative gradients, potentially leading to incorrect updates and unstable training. To avoid gradient accumulation, it is a common practice to clear the gradient buffer (i.e., set all gradients to zero) at the beginning of each batch. This ensures that the gradients computed for the current batch are independent of any previous batches. It's typically done using a command like **optimizer.zero_grad()** in PyTorch.
### Calculating the Gradient (Backward Pass):
The second step we need to calculate the gradient.So we're not going to apply the the weight update rule yet. All we're doing right now is just calculating the gradients. We do this by **loss.backward()**. Now when you call **loss.backward()**, this will perform the the back propagation and calculate all the gredients. so you'll have a matrix of gradients with respect to each model parameter. So this is also the back propagation.

### Updating the Weights (Optimization Step):
Updates the model's weights based on the computed gradients. This step applies the optimization algorithm (e.g., stochastic gradient descent) to make small adjustments to the weights in the direction that reduces the loss. **optimizer.step()** is used to apply the weight update rule to all model parameters. It internally uses the gradients computed in the backward pass to update the weights.


## Training Accuracy Calculation:

**output = (outputs > 0.5).float()**: Converts the model's output to binary values (0 or 1) by comparing them to a threshold (0.5).

**accuracy = (output == labels).float().mean()**: Calculates the training accuracy by comparing the binary predictions (**output**) to the actual labels (**labels**) and computing the mean accuracy for the current mini-batch.