## Github setup

In [1]:
from google.colab import drive, userdata
import os

# 1. Mount Drive
drive.mount('/content/drive')

# 2. Setup Paths (Change to your actual repo name)
REPO_PATH = "/content/drive/MyDrive/ML/DL_With_Pytorch"
%cd {REPO_PATH}

# 3. Secure Auth
token = userdata.get('GH_TOKEN')
username = "barada02"
repo = "DL_With_Pytorch"
!git remote set-url origin https://{token}@github.com/{username}/{repo}.git

# 4. Identity
!git config --global user.email "Chandanbarada2@gmail.com"
!git config --global user.name "Kumar"

!git pull origin main
print("âœ… Environment Ready!")

Mounted at /content/drive
/content/drive/MyDrive/ML/DL_With_Pytorch
From https://github.com/barada02/DL_With_Pytorch
 * branch            main       -> FETCH_HEAD
Already up to date.
âœ… Environment Ready!


## Commit and Push

In [12]:
# 2. Push notebook changes to GitHub
# IMPORTANT: Press Ctrl+S (Save) before running this!
!git add .
!git commit -m "Optimization loop"
!git push origin main

[main ed803d8] Optimization loop
 1 file changed, 1 insertion(+), 1 deletion(-)
 rewrite 02_Workflow.ipynb (73%)
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 2 threads
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 2.53 KiB | 216.00 KiB/s, done.
Total 3 (delta 2), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.[K
To https://github.com/barada02/DL_With_Pytorch.git
   0491d13..ed803d8  main -> main


# Workflow

# PyTorch Deep Learning Workflow

This notebook demonstrates the **complete end-to-end workflow** for training a neural network in PyTorch. 

## ðŸ“‹ Workflow Overview:
1. **Setup & Import** - Import necessary libraries
2. **Data Preparation** - Load and prepare datasets
3. **Model Definition** - Create neural network architecture
4. **Training Configuration** - Set hyperparameters, loss function, optimizer
5. **Training Loop** - Train the model
6. **Evaluation** - Test model performance
7. **Model Persistence** - Save and load trained models

This workflow is fundamental to all deep learning projects and can be adapted for various tasks.

## Step 1: Import Required Libraries

**Purpose**: Import PyTorch and related libraries needed for the deep learning workflow.

**Key Imports**:
- `torch` - Core PyTorch library
- `torch.nn` - Neural network modules and building blocks
- `torch.utils.data.DataLoader` - Efficient data loading with batching
- `torchvision.datasets` - Pre-built datasets (FashionMNIST in this case)
- `torchvision.transforms` - Data preprocessing and augmentation

In [3]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
print(torch.__version__)

2.9.0+cpu


In [4]:

training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 26.4M/26.4M [00:01<00:00, 16.2MB/s]
100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 29.5k/29.5k [00:00<00:00, 282kB/s]
100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 4.42M/4.42M [00:00<00:00, 5.03MB/s]
100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 5.15k/5.15k [00:00<00:00, 22.5MB/s]


## Step 2: Load and Prepare Data

**Purpose**: Download and prepare the FashionMNIST dataset for training and testing.

**What is FashionMNIST?**
- 70,000 grayscale images of clothing items (28x28 pixels)
- 10 classes: T-shirt, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot
- Split into 60,000 training and 10,000 test images

**Key Parameters**:
- `root="data"` - Directory to store the dataset
- `train=True/False` - Specify training or test set
- `download=True` - Automatically download if not present
- `transform=ToTensor()` - Convert PIL images to PyTorch tensors (0-1 range)

In [5]:

train_dataloader = DataLoader(training_data, batch_size=64)
test_dataloader = DataLoader(test_data, batch_size=64)


## Step 3: Create DataLoaders

**Purpose**: Wrap datasets in DataLoader for efficient batch processing during training.

**Why DataLoader?**
- Automatically batches data into manageable chunks
- Shuffles data between epochs (for training)
- Enables parallel data loading with multiple workers
- Memory efficient - loads only the current batch

**Batch Size = 64**: Each iteration processes 64 images at once, balancing memory usage and training speed.

In [6]:


class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits


## Step 4: Define the Neural Network

**Purpose**: Create the model architecture that will learn to classify images.

**Architecture Breakdown**:
1. **Input Layer**: 28Ã—28 = 784 pixels (flattened)
2. **Hidden Layer 1**: 784 â†’ 512 neurons with ReLU activation
3. **Hidden Layer 2**: 512 â†’ 512 neurons with ReLU activation
4. **Output Layer**: 512 â†’ 10 neurons (one per class)

**Key Components**:
- `nn.Flatten()` - Converts 2D image (28Ã—28) to 1D vector (784)
- `nn.Linear()` - Fully connected layers (neurons)
- `nn.ReLU()` - Activation function (introduces non-linearity)
- `nn.Sequential()` - Chains layers together

**Forward Pass**: Defines how data flows through the network from input to output.

In [7]:

model = NeuralNetwork()
model

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

## Step 5: Instantiate the Model

**Purpose**: Create an instance of the neural network.

This creates an untrained model with randomly initialized weights. The model is now ready to be trained.

## Hyperparameters

Hyperparameters are adjustable parameters that let you control the model optimization process. Different hyperparameter values can impact model training and convergence rates [read more](https://docs.pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html)

We define the following hyperparameters for training:
* Number of Epochs - the number of times to iterate over the dataset

* Batch Size - the number of data samples propagated through the network before the parameters are updated

* Learning Rate - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.

In [9]:
learning_rate = 1e-3
batch_size = 64
epochs = 5

## Step 6: Set Hyperparameters

**Purpose**: Configure the training process parameters.

**Hyperparameter Definitions**:
- **Learning Rate (1e-3 = 0.001)**: Controls how much to adjust weights after each batch
  - Too high â†’ unstable training
  - Too low â†’ slow learning
  
- **Batch Size (64)**: Number of samples processed before updating weights
  - Larger â†’ more stable gradients, more memory
  - Smaller â†’ faster updates, noisier gradients
  
- **Epochs (5)**: Number of times to iterate through the entire dataset
  - More epochs â†’ better learning (but risk overfitting)

These are the "knobs" you tune to improve model performance!

## Optimization Loop
Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each iteration of the optimization loop is called an epoch.

**Each epoch consists of two main parts:**
**The Train Loop** - iterate over the training dataset and try to converge to optimal parameters.

**The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving.

Letâ€™s briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to see the Full Implementation of the optimization loop.

## Loss Function
When presented with some training data, our untrained network is likely not to give the correct answer. Loss function measures the degree of dissimilarity of obtained result to the target value, and it is the loss function that we want to minimize during training. To calculate the loss we make a prediction using the inputs of our given data sample and compare it against the true data label value.

Common loss functions include nn.MSELoss (Mean Square Error) for regression tasks, and nn.NLLLoss (Negative Log Likelihood) for classification. nn.CrossEntropyLoss combines `nn.LogSoftmax` and `nn.NLLLoss`.

We pass our modelâ€™s output logits to `nn.CrossEntropyLoss`, which will normalize the logits and compute the prediction error.

In [10]:
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()

## Step 7: Initialize Loss Function

**Purpose**: Measure how far the model's predictions are from actual labels.

**CrossEntropyLoss**:
- Best for multi-class classification (choosing 1 class from many)
- Combines LogSoftmax + NLLLoss
- Lower loss = better predictions

**How it works**: 
- Compares predicted probabilities with true labels
- Penalizes confident wrong predictions more heavily
- Goal: Minimize this loss during training

Optimizer
Optimization is the process of adjusting model parameters to reduce model error in each training step. Optimization algorithms define how this process is performed (in this example we use Stochastic Gradient Descent). All optimization logic is encapsulated in the optimizer object. Here, we use the SGD optimizer; additionally, there are many different optimizers available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.

We initialize the optimizer by registering the modelâ€™s parameters that need to be trained, and passing in the learning rate hyperparameter.

In [11]:
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

## Step 8: Initialize Optimizer

**Purpose**: Algorithm that updates model weights to minimize loss.

**SGD (Stochastic Gradient Descent)**:
- Classic optimization algorithm
- Updates weights based on gradients (derivatives of loss)
- "Descends" toward minimum loss

**Alternative Optimizers**:
- `Adam` - Adaptive learning rate, often faster convergence
- `RMSProp` - Good for recurrent networks
- `AdaGrad` - Adapts learning rate per parameter

The optimizer needs:
1. Model parameters to update
2. Learning rate to control update size

Inside the training loop, optimization happens in three steps:
Call optimizer.zero_grad() to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.

Backpropagate the prediction loss with a call to loss.backward(). PyTorch deposits the gradients of the loss w.r.t. each parameter.

Once we have our gradients, we call optimizer.step() to adjust the parameters by the gradients collected in the backward pass.

### The Three Steps of Optimization (per batch):

**Visual Flow**: 
```
Data Batch â†’ Forward Pass â†’ Calculate Loss â†’ Backward Pass â†’ Update Weights â†’ Repeat
```

1. **`optimizer.zero_grad()`** 
   - Clears old gradients from previous batch
   - Prevents gradient accumulation

2. **`loss.backward()`** 
   - Backpropagation: Calculates gradients for all parameters
   - Determines how to adjust each weight

3. **`optimizer.step()`** 
   - Updates weights using calculated gradients
   - Moves model toward better predictions

**Remember**: Zero gradients â†’ Calculate gradients â†’ Apply gradients

# Full Implementaion

We define train_loop that loops over our optimization code, and test_loop that evaluates the modelâ€™s performance against our test data.

## Step 9: Define Training and Testing Functions

**Purpose**: Encapsulate the training and evaluation logic.

### `train_loop()` Function:
**What it does**: Trains the model for one epoch
- Loops through all training batches
- For each batch:
  1. Forward pass (get predictions)
  2. Calculate loss
  3. Backward pass (compute gradients)
  4. Update weights
  5. Zero gradients for next iteration
- Prints loss every 100 batches to monitor progress

**Key setting**: `model.train()` - Enables training mode (activates dropout, batch norm)

---

### `test_loop()` Function:
**What it does**: Evaluates model performance on test data
- Loops through all test batches
- Calculates predictions without updating weights
- Computes accuracy and average loss
- Reports overall performance

**Key settings**: 
- `model.eval()` - Evaluation mode (disables dropout)
- `torch.no_grad()` - Disables gradient computation (saves memory/time)

**Why separate test loop?**
- Prevents overfitting (model hasn't seen test data)
- Validates that model generalizes to new data

In [13]:
def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    # Set the model to training mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * batch_size + len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


def test_loop(dataloader, model, loss_fn):
    # Set the model to evaluation mode - important for batch normalization and dropout layers
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    test_loss, correct = 0, 0

    # Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode
    # also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

We initialize the loss function and optimizer, and pass it to train_loop and test_loop. Feel free to increase the number of epochs to track the modelâ€™s improving performance.

## Step 10: Train the Model

**Purpose**: Execute the complete training process.

**Training Process** (10 epochs):
- **Epoch** = One complete pass through training data
- For each epoch:
  1. Train on entire training set (`train_loop`)
  2. Evaluate on entire test set (`test_loop`)
  3. Print progress

**What to Watch**:
- **Training Loss**: Should decrease over epochs
- **Test Accuracy**: Should increase over epochs
- **Gap between train and test**: If test accuracy plateaus while training improves, model may be overfitting

**Expected Behavior**: 
- Initial accuracy ~10% (random guessing among 10 classes)
- Final accuracy ~85-88% after 10 epochs

In [14]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.306856  [   64/60000]
loss: 2.298582  [ 6464/60000]
loss: 2.283126  [12864/60000]
loss: 2.264119  [19264/60000]
loss: 2.248101  [25664/60000]
loss: 2.226551  [32064/60000]
loss: 2.225410  [38464/60000]
loss: 2.193868  [44864/60000]
loss: 2.198018  [51264/60000]
loss: 2.159197  [57664/60000]
Test Error: 
 Accuracy: 43.6%, Avg loss: 2.157352 

Epoch 2
-------------------------------
loss: 2.168603  [   64/60000]
loss: 2.159842  [ 6464/60000]
loss: 2.107098  [12864/60000]
loss: 2.107127  [19264/60000]
loss: 2.065855  [25664/60000]
loss: 2.007953  [32064/60000]
loss: 2.022113  [38464/60000]
loss: 1.944845  [44864/60000]
loss: 1.959247  [51264/60000]
loss: 1.875185  [57664/60000]
Test Error: 
 Accuracy: 58.2%, Avg loss: 1.881718 

Epoch 3
-------------------------------
loss: 1.915567  [   64/60000]
loss: 1.884475  [ 6464/60000]
loss: 1.778140  [12864/60000]
loss: 1.798797  [19264/60000]
loss: 1.709102  [25664/60000]
loss: 1.660662  [32064/600

# Save and Load the Model

## Step 11: Model Persistence

**Purpose**: Save trained model for later use without retraining.

### Saving the Model:
`torch.save(model, 'model.pth')`
- Saves entire model (architecture + weights)
- `.pth` or `.pt` are common PyTorch model file extensions

**Alternative**: `torch.save(model.state_dict(), 'model.pth')`
- Saves only weights (smaller file, more flexible)
- Requires model architecture definition when loading

---

### Loading the Model:
`torch.load('model.pth', weights_only=False)`
- Loads entire saved model
- `weights_only=False` - Loads full model (not just weights)
- Model is ready to use for predictions immediately

**Why Save Models?**
- Avoid retraining (saves time and compute)
- Deploy models to production
- Share models with others
- Resume training later

In [15]:
torch.save(model, 'model.pth') # or model.state_dict()?

In [17]:
model = torch.load('model.pth', weights_only=False)


NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)

---

## ðŸŽ“ Summary: Complete PyTorch Workflow

### The 11-Step Process:

1. âœ… **Import** libraries
2. âœ… **Load** datasets
3. âœ… **Create** DataLoaders
4. âœ… **Define** model architecture
5. âœ… **Instantiate** model
6. âœ… **Set** hyperparameters
7. âœ… **Initialize** loss function
8. âœ… **Initialize** optimizer
9. âœ… **Define** train/test loops
10. âœ… **Train** the model
11. âœ… **Save/Load** model

### Key Concepts to Remember:

- **Epochs**: Full passes through the dataset
- **Batches**: Subsets of data processed together
- **Loss**: Measure of prediction error (lower is better)
- **Optimizer**: Updates weights to minimize loss
- **Forward Pass**: Data â†’ Model â†’ Predictions
- **Backward Pass**: Loss â†’ Gradients â†’ Weight Updates
- **Train/Eval Modes**: Different behaviors for training vs testing

### This workflow applies to most deep learning problems!
Just swap out the dataset, model architecture, and hyperparameters for your specific task.

---
**Next Steps**: Experiment with different architectures, optimizers, or datasets to deepen understanding!