<h1>Building and Training a Simple Neural Network in PyTorch</h1>

Now that we have a grasp on tensors and autograd, let's build a simple neural network from scratch using PyTorch.<br>
We'll cover defining the <b>network architecture</b>,  <b>preparing the dataset</b>, <b>defining the loss function</b> and <b>optimizer</b>, and <b>training the model</b>.

Step-by-Step Guide to Building a Neural Network
<ol>
<li><b>Define the Neural Network Architecture:</b> Create a class inheriting from nn.Module.</li>
<li><b>Prepare the Dataset:</b> Use PyTorch's Dataset and DataLoader classes.</li>
<li><b>Define the Loss Function and Optimizer:</b> Use built-in loss functions and optimizers.</li>
<li><b>Train the Model:</b> Implement the training loop.</li>
</ol>

<h4>1. Define the Neural Network Architecture</h4>
We'll define a simple feedforward neural network with one hidden layer.

In [14]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define the neural network
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.hidden = nn.Linear(2, 5)  # Hidden layer with 5 neurons | 2 input features and 5 output features | The output 5 features will be input to next layer as 5 input featurs
        self.output = nn.Linear(5, 1)  # Output layer | 5 input features and 1 output feature.    
    '''
    Modifying the number of neurons(i.e in above example it is 5) affects the model's capacity to learn and represent the data. 
    Case 1: If we Increasing the Number of Neurons
            Effects:
            a) Capacity: Increasing the number of neurons in the hidden layer increases the model's capacity to learn and represent more complex functions. 
            This can potentially improve performance on complex datasets.
            b) Overfitting: A larger model can more easily overfit the training data, especially if the dataset is small or noisy. 
            Regularization techniques such as dropout or weight decay might be needed to mitigate this risk.
            c)Computational Cost: More neurons mean more parameters to train, which increases the computational cost and memory usage during training and 
            inference.
    
    Case 2: Decreasing the Number of Neurons
            Effects:
            a) Capacity: Decreasing the number of neurons reduces the model's capacity to learn and represent complex functions. 
            This might lead to underfitting, where the model fails to capture important patterns in the data.
            b) Generalization: A smaller model is less likely to overfit and may generalize better on simpler or smaller datasets.
            c) Computational Cost: Fewer neurons mean fewer parameters to train, which reduces the computational cost and memory usage.

    Choosing the Number of Neurons
    The optimal number of neurons in the hidden layer depends on various factors:
        a) Dataset Complexity: More complex datasets generally require more neurons to capture the underlying patterns.
        b) Size of the Dataset: Larger datasets can support larger models without overfitting, while smaller datasets might require smaller models.
        c) Model Generalization: Balancing the capacity of the model to avoid both underfitting and overfitting is crucial. Cross-validation and hyperparameter tuning can help find the optimal number of neurons.

    '''
    def forward(self, x):
        x = torch.relu(self.hidden(x))  # Apply ReLU activation
        x = self.output(x)
        return x

# Instantiate the network
net = SimpleNet()
print(net)

SimpleNet(
  (hidden): Linear(in_features=2, out_features=5, bias=True)
  (output): Linear(in_features=5, out_features=1, bias=True)
)


<h4>2. Prepare the Dataset </h4>
We'll create a simple synthetic dataset for training.

In [20]:
from torch.utils.data import Dataset, DataLoader

# Create a synthetic dataset
class SimpleDataset(Dataset):
    def __init__(self):
        self.data = torch.tensor([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]]) # (4,2)
        self.targets = torch.tensor([[1.0], [2.0], [3.0], [4.0]]) # (4,1)

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx], self.targets[idx]

# Instantiate the dataset and dataloader
dataset = SimpleDataset()
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)  #  Loads the dataset with specified batch size and shuffling.

for data, target in dataloader:
    print("Data:", data)
    print("Target:", target)


Data: tensor([[2., 3.],
        [1., 2.]])
Target: tensor([[2.],
        [1.]])
Data: tensor([[4., 5.],
        [3., 4.]])
Target: tensor([[4.],
        [3.]])


<h4> 3. Define the Loss Function and Optimizer </h4>
We'll use Mean Squared Error (MSE) as the loss function and Stochastic Gradient Descent (SGD) as the optimizer

In [25]:
# Define the loss function and optimizer
criterion = nn.MSELoss()                            # Mean Squared Error (MSE)
optimizer = optim.SGD(net.parameters(), lr=0.01)    # Stochastic Gradient Descent (SGD) 

<h4> 4. Train the Model </h4>
Implement the training loop to train the model over multiple epochs.

In [30]:
# Training loop
num_epochs = 1000

for epoch in range(num_epochs):
    for data, target in dataloader:
        # Zero gradients
        optimizer.zero_grad()     # Clears gradients to avoid accumulation.
        
        # Forward pass
        output = net(data)
        
        # Compute loss
        loss = criterion(output, target)
        
        # Backward pass (compute gradients)
        loss.backward()
        
        # Update weights
        optimizer.step()
    
    if (epoch + 1) % 100 == 0:           # Print the loss at every 100th Epoch
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')


Epoch [100/1000], Loss: 0.0169
Epoch [200/1000], Loss: 0.0013
Epoch [300/1000], Loss: 0.0002
Epoch [400/1000], Loss: 0.0002
Epoch [500/1000], Loss: 0.0000
Epoch [600/1000], Loss: 0.0000
Epoch [700/1000], Loss: 0.0000
Epoch [800/1000], Loss: 0.0000
Epoch [900/1000], Loss: 0.0000
Epoch [1000/1000], Loss: 0.0000


<hr>

<h4><b>*Additional</b> - Hyperparameter Tuning</h4>

As mentioned in comments in first section, the <b>number of neurons</b> in layers affects the model's performance. <br>
So, to decide it's value is crucial. So either we can decide it by increasing numbers or hit and trial orr we can use "<b>Hyperparameter Tuning Using Grid Search</b>" for decide number of neurons.<br>

In [50]:
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Dummy dataset
X = torch.tensor([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]])
y = torch.tensor([[1.0], [2.0], [3.0], [4.0]])

# Split dataset
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.25)

# Define model class
class SimpleNet(nn.Module):
    def __init__(self, hidden_size):
        super(SimpleNet, self).__init__()
        self.hidden = nn.Linear(2, hidden_size)
        self.output = nn.Linear(hidden_size, 1)

    def forward(self, x):
        x = torch.relu(self.hidden(x))
        x = self.output(x)
        return x

# Function to train and evaluate the model
def train_and_evaluate(hidden_size):
    model = SimpleNet(hidden_size)
    criterion = nn.MSELoss()
    optimizer = optim.SGD(model.parameters(), lr=0.01)
    
    # Training
    for epoch in range(100):
        model.train()
        optimizer.zero_grad()
        outputs = model(X_train)
        loss = criterion(outputs, y_train)
        loss.backward()
        optimizer.step()
    
    # Evaluation
    model.eval()
    with torch.no_grad():
        val_outputs = model(X_val)
        val_loss = mean_squared_error(y_val.numpy(), val_outputs.numpy())
    
    return val_loss

# Hyperparameter tuning
hidden_sizes = [2, 4, 8, 16, 32, 64]
best_hidden_size = None
best_val_loss = float('inf')

for hidden_size in hidden_sizes:
    val_loss = train_and_evaluate(hidden_size)
    print(f'Hidden size: {hidden_size}, Validation Loss: {val_loss}')
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        best_hidden_size = hidden_size

print(f'\n Best hidden size: {best_hidden_size}, Best validation loss: {best_val_loss}')


Hidden size: 2, Validation Loss: 5.2197489738464355
Hidden size: 4, Validation Loss: 5.365090847015381
Hidden size: 8, Validation Loss: 0.13457150757312775
Hidden size: 16, Validation Loss: 0.09531105309724808
Hidden size: 32, Validation Loss: 0.045698389410972595
Hidden size: 64, Validation Loss: 0.010166753083467484

 Best hidden size: 64, Best validation loss: 0.010166753083467484
