A typical neural network training involves:

```
Neural Network Training
│
├── Data
│   ├── Dataset Preparation
│   └── DataLoader
│
├── Model Initialization
│
└── Model Training
    ├── Forward Pass
    ├── Loss Calculation
    ├── Backpropagation
    └── Weight Update

```

<img src=https://learnopencv.com/wp-content/uploads/2024/08/c2-Module01-training-neural-networks-02.png height=500>

# Dataset Preparation

Before starting the training, it's essential to prepare the dataset properly. The quality of the data directly impacts the model's performance, following the GIGO principle—**Garbage In, Garbage Out**. High-quality, well-curated data ensures that the model can learn meaningful patterns.

[Torchvision datasets](https://pytorch.org/vision/stable/datasets.html#datasets) has a set of well structured and readily usable datasets to spin up our training instantly.
For eg:
- MNIST & FashionMNIST
- CIFAR10
- ImageNet
- Caltech101 etc.,



```python
from torchvision import datasets, transforms

# Download and load the MNIST dataset
mnist_data = datasets.MNIST(
    root=".",         # Directory where the dataset will be stored
    download=True,    # Download the dataset if it's not already available
    transform=transforms.Compose([
        transforms.ToTensor(),              # Convert image to tensor
        transforms.Normalize((0.5,), (0.5,)) # Normalize with mean and std (for grayscale)
    ])
)

```

Additionally, datasets from Kaggle competitions and institutional data are other sources of quality datasets.

In [2]:
from torchvision import datasets, transforms

# Download and load the MNIST dataset
mnist_data = datasets.MNIST(
    root=".",         # Directory where the dataset will be stored
    download=True,    # Download the dataset if it's not already available
    transform=transforms.Compose([
        transforms.ToTensor(),              # Convert image to tensor
        transforms.Normalize((0.5,), (0.5,)) # Normalize with mean and std (for grayscale)
    ])
)

## DataLoaders

Once the dataset is prepared, the next step is to load the data in batches using a [torch.utils.DataLoader](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#datasets-dataloaders). The DataLoader is a PyTorch utility that efficiently loads data in mini-batches, shuffles the data, and handles other aspects like parallel data loading. This is crucial for training as it ensures the model can process the data into manageable chunks and in a randomized order, which helps in overfitting by reducing the risk of the model learning any sequence patterns in the dataset.

```python
from torch.utils.data import DataLoader

# Initialize DataLoader
batch_size = 64
train_loader = DataLoader(mnist_data, batch_size=batch_size, shuffle=True)

# Now 'train_loader' can be used to iterate through the data in mini-batches
for batch in train_loader:
    inputs, labels = batch

```

In [5]:
from torch.utils.data import DataLoader

# Initialize DataLoader
batch_size = 64
train_loader = DataLoader(mnist_data, batch_size=batch_size, shuffle=True)

# Now 'train_loader' can be used to iterate through the data in mini-batches
for batch in train_loader:
    inputs, labels = batch

# Model Initialization

Next, we will move to model preparation. [Torchvision](https://pytorch.org/vision/stable/models.html#models-and-pre-trained-weights), [HuggingFace](https://huggingface.co/models?other=computer-vision) and [PyTorchHub](https://pytorch.org/hub/) provides a range of pre-trained model for various computer vision tasks.

These pre-trained models are trained on larger datasets with millions of image samples and classes and can be finetuned for specific datasets, saving computational resources and achieves excellent accuracy.


```python
import torchvision.models as models

# Load a pre-trained ResNet model
model = models.resnet18(pretrained=True)

# Modify the final layer to match the number of classes in training dataset
model.fc = nn.Linear(model.fc.in_features, num_classes)

# (OR)

#PyTorchHub
model = torch.hub.load('datvuthanh/hybridnets', 'hybridnets', pretrained=True)
```

In [4]:
import torchvision.models as models

# Load a pre-trained ResNet model
model = models.resnet18(pretrained=True)

# Modify the final layer to match the number of classes in training dataset
model.fc = nn.Linear(model.fc.in_features, num_classes)

# (OR)

#PyTorchHub
model = torch.hub.load('datvuthanh/hybridnets', 'hybridnets', pretrained=True)



Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to C:\Users\Wambui/.cache\torch\hub\checkpoints\resnet18-f37072fd.pth


100.0%


NameError: name 'nn' is not defined

# Model Training

Training a neural network involves several steps that are repeated over multiple steps and epochs. During each epoch, the model process the dataset in batches which helps to manage the computational overhead, especially for large datasets.

Key Steps in Training:

- Forward Pass
- Loss Calculation
- Backpropagation
- Weight Updates


Optimizers adjust the model weights during training based on gradients computed in backpropagation. Here's how we define an optimizer.

```python
import torch.optim as optim

# Initialize the optimizer (e.g., SGD, Adam)
optimizer = optim.SGD(model.parameters(), lr=0.01)
```

## Forward Pass

During training, over each training step the batch of input images is forward pass through the network layers, to get predictions over entire batch.

```python
for images, targets in train_loader:

    # Forward pass
    outputs = model(images) # Model prediction
```


## Loss Calculation

The loss function measures the difference between the predicted outputs and the actual targets or ground truth. This helps to guide the network to improve by minimizing loss.

```python
import torch.nn.functional as F

# Calculate loss (CrossEntropyLoss for classification)
loss = F.cross_entropy(outputs, target)
```



## Backpropagation

It is the process of computing the gradient of the loss function with respect to each of the model's trainable parameters (`requires_grad = True`). This is done by propagating the loss backward through the network.

```python
loss.backward()  # Gradient computation
```

## Weight Update

Finally the optimizer updates the model's weight for that each batch based on computed gradients during backpropagation.


**The Weight Update Formula is:**


$$ \mathbf{w}_{\text{new}} = \mathbf{w} - \eta \nabla L(\mathbf{w}) $$



After the weight update, the gradients are are set to zero to preprare for weight updates of next step.

```python
# Update weights
optimizer.step()

# Zero the gradients after updating
optimizer.zero_grad()
```