# PyTorch Techniques for Model Optimization

## Table of Contents

[PyTorch Techniques for Model Optimization](#pytorch-techniques-for-model-optimization)

- [Saving Progress w/ Model Checkpointing](##saving-progress-w-model-checkpointing)

- [Model Training w/ Mini-Batches in PyTorch](##model-training-w-mini-batches-in-pytorch)

## Saving Progress w/ Model Checkpointing

**Model Checkpointing Intro.** - Will now focus on model checkpointing using PyTorch. This is vital technique in machine learning that allows save state of model during training, ensuring best-performing models are preserved. Will come to understand how implement model checkpointing, allowing to save model whenever achieves best performance on a validation set.

So model checkpointing involves saving state of a neural network model at various points during training process. Crucial for several reasons:
- **Prevent Loss of Progress**: In case of unexpected interruptions (e.g., power failure, hardware consumption), checkpointing helps resuming training from last saved state.
- **Save Best Performing Models**: By saving model whenever achieves a new best performance on validation set, ensure that retain best version of our model.

**Setting up Environment** - Assume set up environment seen before and used below: import necessary libraries, do preprocessing of Wine dataset, define model, define loss and optimizer. Will for now omit training loop with eval, graphing of loss and finally saving model loading and confirming same val_loss, as this code will be modified with checkpointing.

In [None]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import sklearn.utils as skUtils

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

from typing import List, Tuple, Dict

wine_set: skUtils.Bunch = load_wine()

def load_preprocessed_data(wine: skUtils.Bunch) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
    X: np.ndarray = wine.data
    y: np.ndarray = wine.target

    Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.3, stratify=y)

    scaler: StandardScaler = StandardScaler().fit(Xtrain)
    Xtrain_scaled: np.ndarray = scaler.transform(Xtrain)
    Xtest_scaled: np.ndarray = scaler.transform(Xtest)

    Xtrain_tensor: torch.Tensor = torch.tensor(Xtrain_scaled, dtype=torch.float32)
    Xtest_tensor: torch.Tensor = torch.tensor(Xtest_scaled, dtype=torch.float32)
    ytrain_tensor: torch.Tensor = torch.tensor(ytrain, dtype=torch.long)
    ytest_tensor: torch.Tensor = torch.tensor(ytest, dtype=torch.long)

    return Xtrain_tensor, Xtest_tensor, ytrain_tensor, ytest_tensor


X_train, X_test, y_train, y_test = load_preprocessed_data(wine_set)

model: nn.Sequential = nn.Sequential( # MAKE SURE YOU CAN CUSTOM DEFINE TOO!!
    nn.Linear(in_features=13, out_features=10),
    nn.ReLU(),
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 3)
)

criterion: nn.CrossEntropyLoss = nn.CrossEntropyLoss()
optimizer: optim.Adam = optim.Adam(model.parameters(), lr=0.001)

**Initialize Checkpoint Parameters** - Before diving into training loops changes, first set up initial parameters for checkpointing. Will ensure can effectively track model's performance and save best version. Specifically need establish:
- `best_loss` to keep track of best validation loss. Initialize `best_loss` to `float(inf)` to ensure first validation loss will trigger model save.
- `checkpoint_path` where model will be saved.

In [2]:
best_loss: float = float('inf')
checkpoint_path: str = "best_model.pth"

**Training Loop with Checkpointing** - Now implement training loop portion with validation and model checkpointing.

In [None]:
num_epochs: int = 150
history: Dict[str, List[float]] = {"loss": [], "val_loss": []}
for epoch in range(num_epochs):
    model.train()
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()
    history["loss"].append(loss.item())

    model.eval()
    with torch.no_grad():
        outputs_test = model(X_test)
        val_loss: float = criterion(outputs_test, y_test).item()
        history["val_loss"].append(val_loss)
    
    if val_loss < best_loss:
        best_loss = val_loss
        torch.save(model, checkpoint_path)
        print(f'Model saved at epoch {epoch} with validation loss {val_loss:.4f}')

    if epoch % 10 == 0:
        print(f'At epoch {epoch}/{num_epochs}, Loss is: {loss.item():.4f} and Val. Loss is: {val_loss:.4f}')

In this training loop:
- Model is trained on training set
- Model's performance is validated on the validation set
- If validation loss improves, the model is saved using `torch.save()` . This ensures that only best performing model is saved.

Learned concept and importance of model checkpointing, as well as how implement checkpointing in a PyTorch model. Remember that implementing effective checkpointing can significantly boost productivity and model performance in real-world machine learning tasks.

## Model Training w/ Mini-Batches in PyTorch

Here learn how to efficiently train neural network model using mini-batches in PyTorch. Focus will be on understanding concept of mini-batches, creating them using PyTorch's `DataLoader` and training model using these mini-batches. Will be equipped with knowledge to implement mini-batch gradient descent in machine learning projects.

**Intro to Mini-Batch Training** - In machine learning there are three main methods for training models: stochastic gradient descent (SGD), full-batch gradient descent, and mini-batch gradient descent. Explained here using simple analogy.

Imagine learning to shoot basketballs in hoop:
1. **Stochastic Gradient Descent (SGD)**: This is like shooting one basketball, adjusting your aim after each shot. Get feedback quickly, but each shot influenced by random factors, making learning process noisy.
2. **Full-Batch Gradient Descent**: This is like shooting all basketballs you have, then reviewing overall performance to adjust your aim. Gives clear picture but is slow and tiring because have to shoot all balls before making any adjustments. 
3. **Mini-Batch Gradient Descent**: This method is middle ground. Like shooting few basketballs (say 10) before adjusting your aim. Faster than shooting all balls at once and more stable than adjusting after every single shot, offering balanced approach.

**Why Use Mini-Batch Training?**
- 1. **Efficiency**: Processing smaller subsets of data significantly reduces memory usage and can take advatntaged of parllel processing hardware.
- 2. **Convergence**: Provides balance between noisy updates (SGD) and slow updates (full-batch), which can stabilize convergence.
- 3. **Regularization**: Each mini-batch introduces some noise into parameter updates, which can help overfitting.

**Loading the Dataset** -
After having loaded dataset preprocessed and returned as PyTorch tensors, can use `DataLoader` to divide dataset into mini-batches and iterate over them efficiently.

In [3]:
from torch.utils.data import TensorDataset, DataLoader
batch_size: int = 32
dataset: TensorDataset = TensorDataset(X_train, y_train)
dataloader: DataLoader = DataLoader(dataset, batch_size, shuffle=True)

In code above:
- `TensorDataset`: Combines features `X` and targets `y` into single dataset.
- `DataLoader`: Splits dataset into mini-batches of size specified by `batch_size` , making it easy to iterate over dataset in chunks during training.

By setting `batch_size=32` each mini-batch will contain 32 samples. The `shuffle=True` parameter ensures data shuffled at each epoch, improving the generalization capabilities of model. The `DataLoader` simplifies the process of batching and shuffling, which essential for efficient mini-batch training.

**Building and Compiling Model** - Before using dataset split into mini-batches, the standard process of the model definition, loss function and optimizer are needed. These are copied from prev. section and doing `del` before to ensure data is reset.

In [8]:
del model, criterion, optimizer

model: nn.Sequential = nn.Sequential(
    nn.Linear(in_features=13, out_features=10),
    nn.ReLU(),
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 3)
)

criterion: nn.CrossEntropyLoss = nn.CrossEntropyLoss()
optimizer: optim.Adam = optim.Adam(model.parameters(), lr=0.001)