# A Simple Neural Network in PyTorch

In this tutorial, we will explore how to build a simple neural network model using PyTorch. 

### Step 1: Import Libraries

In [1]:
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

### Step 2: Load the Dataset

In [2]:
# Load the dataset
data = pd.read_csv("https://raw.githubusercontent.com/yangliuiuk/data/main/diabetes.csv")

# Display the first few rows of the dataset
data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


### Step 3: Preprocess the Data

In [3]:
# Split the data into features (X) and target variable (y)
X = data.drop('Outcome', axis=1)
y = data['Outcome']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert the data to PyTorch tensors
X_train_tensor = torch.tensor(X_train.astype('float32'))
X_test_tensor = torch.tensor(X_test.astype('float32'))
y_train_tensor = torch.tensor(y_train.values.astype('float32')).unsqueeze(1) 
# Using y_train.values is necessary because PyTorch tensors expect NumPy arrays as input, not pandas Series objects.
# X_train has already been converted to Numpy array by the scaler
y_test_tensor = torch.tensor(y_test.values.astype('float32')).unsqueeze(1)

### Step 4: Define the neural network architecture

In [4]:
class NeuralNetwork(nn.Module):
    def __init__(self, input_size):
        super(NeuralNetwork, self).__init__()
        self.fc1 = nn.Linear(input_size, 16)  # Input layer to hidden layer
        self.fc2 = nn.Linear(16, 8)           # Hidden layer to hidden layer
        self.fc3 = nn.Linear(8, 1)            # Hidden layer to output layer
        self.relu = nn.ReLU()                  # Activation function

    def forward(self, x):
        x = self.relu(self.fc1(x))  # Pass through first hidden layer
        x = self.relu(self.fc2(x))  # Pass through second hidden layer
        x = torch.sigmoid(self.fc3(x))  # Pass through output layer with sigmoid activation
        return x
    
    #def forward(self, x):
    #    h1 = nm.Linear(input_size, 16)(x)
    #    h1 = nm.ReLu(h1)

    #    h2 = nm.Linear(16, 16)(h1)
    #    h2 = nm.ReLu(h2)

    #    h3 = nm.Linear(16,1)(h2)

    #    y = torch.sigmoid(h3)

    #   return y


In PyTorch, **nn.Linear** is a class that represents a linear transformation layer. It is commonly used to define the linear mapping between the input and output layers of a neural network.

The nn.Linear layer performs a linear transformation on the input data by multiplying it with a weight matrix and adding a bias term. Mathematically, the output of nn.Linear layer can be represented as:

output = input × weight + bias

Here's a brief explanation of the parameters:

- in_features: The number of input features or dimensions.
- out_features: The number of output features or dimensions.
- bias: A boolean parameter indicating whether to include a bias term in the linear transformation. If set to True, a bias vector will be added to the output.

When you create an instance of nn.Linear, you specify the input and output dimensions. For example, nn.Linear(10, 5) creates a linear layer with 10 input features and 5 output features.

In PyTorch, **nn.ReLU** is a class representing the Rectified Linear Unit (ReLU) activation function. It's one of the most commonly used activation functions in deep learning models.

The ReLU function is defined as:

f(x)=max(0,x)

It simply replaces any negative input with zero, while leaving positive values unchanged. Geometrically, it looks like a ramp starting from the origin.

In the context of neural networks, nn.ReLU is applied element-wise to the output of a linear transformation (or any other layer) to introduce non-linearity into the model. This non-linearity is crucial for the network to learn complex patterns and relationships in the data.

Here's a brief overview of how nn.ReLU works:

- For each element in the input tensor, if the element is less than zero, it is replaced with zero. Otherwise, it remains unchanged.
- The operation is performed independently on each element of the input tensor.
- The output tensor has the same shape as the input tensor.

### Step 5: Initialize the Neural Network

In [5]:
# Instantiate the neural network, creating a new instance of the neural network model.
input_size = X_train_tensor.shape[1]  # Number of features
model = NeuralNetwork(input_size)

### Step 6: Define Loss Function and Optimizer

In [6]:
# Define loss function and optimizer
criterion = nn.BCELoss()  # Binary cross-entropy loss for binary classification
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer

### Step 7: Train the Model

In [7]:
# Training the model
num_epochs = 1000
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    
    # Backward pass and optimization
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # Print progress
    if (epoch+1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluate the model on the training set
with torch.no_grad():
    outputs = model(X_train_tensor)
    predicted = (outputs >= 0.5).float()
    accuracy = (predicted == y_train_tensor).sum().item() / len(y_train_tensor)
    print(f'Accuracy on training set: {accuracy:.4f}')

Epoch [100/1000], Loss: 0.5447
Epoch [200/1000], Loss: 0.4372
Epoch [300/1000], Loss: 0.4192
Epoch [400/1000], Loss: 0.4021
Epoch [500/1000], Loss: 0.3857
Epoch [600/1000], Loss: 0.3708
Epoch [700/1000], Loss: 0.3593
Epoch [800/1000], Loss: 0.3480
Epoch [900/1000], Loss: 0.3369
Epoch [1000/1000], Loss: 0.3274
Accuracy on training set: 0.8632


### Step 8: Evaluate the Model

In [8]:
# Evaluate the model on the test set
with torch.no_grad():
    outputs = model(X_test_tensor)
    predicted = (outputs >= 0.5).float()
    accuracy = (predicted == y_test_tensor).sum().item() / len(y_test_tensor)
    print(f'Accuracy on test set: {accuracy:.4f}')

Accuracy on test set: 0.7403


### Step 9: Tuning the Neural Network
By comparing the accuracy on test set (0.7078) and the accuracy on the training set (0.8567), we can see there is an overfitting problem. This is due to the neural network model is too complex so it is prone to overfitting. There are many methods to address this issue, including:

Increase the Size of the Training Data: Providing more diverse and representative data to the model can help it learn the underlying patterns better and reduce overfitti

Early Stop: It involves monitoring the performance of the model on a separate validation dataset during training and stopping the training process when the performance of the model starts to degrade on the validation set.ng.

Regularization Techniques:

L2 Regularization (Weight Decay): Add a penalty term to the loss function that penalizes large weights, preventing them from becoming too large and causing overfitting.

Dropout: Randomly drop neurons during training to prevent them from relying too much on each other and thus reduce overfitting. 

Batch Normalization: Normalize the activations of each layer to reduce internal covariate shift, which can help stabilize and regularize the training process.

Reduce Model Complexity:

Decrease the Number of Parameters: Reduce the number of layers or neurons in the network to decrease its capacity and make it less prone to overfitting.

Simplify the Architecture: Use simpler architectures or techniques like early stopping to prevent the model from learning overly complex representations.

Cross-Validation: Use techniques like k-fold cross-validation to evaluate the model's performance on multiple validation sets and obtain a more robust estimate of its generalization performance.

Data Augmentation: Generate additional training data by applying transformations like rotation, scaling, or cropping to the existing data, which can help the model generalize better.

Ensemble Learning: Train multiple models with different architectures or random initializations and combine their predictions to improve generalization performance.

Hyperparameter Tuning: Experiment with different hyperparameters such as learning rate, batch size, and optimizer to find the optimal configuration that reduces overfitting.at reduces overfitting.
uces overfitting.