<div style="text-align:left;">
  <a href="https://code213.tech/" target="_blank">
    <img src="code213.PNG" alt="code213">
  </a>
  <p><em>prepared by Latreche Sara</em></p>
</div>

# 10.8 - Best Practices in PyTorch

This notebook covers **best practices** for building, training, and deploying PyTorch models.  

### Key Points
1. **Device management**
   - Always check if GPU is available  
   - Move tensors and models to the same device

2. **Reproducibility**
   - Set random seeds for `torch`, `numpy`, and Python `random`  
   - Use `torch.backends.cudnn.deterministic=True` for deterministic GPU results

3. **Data Handling**
   - Use `Dataset` and `DataLoader` for efficient batching  
   - Normalize and transform data properly

4. **Model Saving and Loading**
   - Save only model weights (`state_dict`) for flexibility  
   - Use `torch.save()` and `torch.load()`  

5. **Training Loop**
   - Zero gradients each step: `optimizer.zero_grad()`  
   - Track loss and metrics  
   - Use `with torch.no_grad()` for validation

6. **Mixed Precision & Gradient Clipping**
   - For large models, consider `torch.cuda.amp` for faster training  
   - Clip gradients to prevent exploding gradients


## Table of Contents

- [1 - Device Management](#1)
- [2 - Reproducibility](#2)
- [3 - Data Handling](#3)
- [4 - Model Saving & Loading](#4)
- [5 - Training Loop Best Practices](#5)
- [6 - Mixed Precision & Gradient Clipping](#6)
- [7 - Practice Exercises](#7)


<a name='1'></a>
## 1 - Device Management

Proper device management ensures that your **model and data are on the same device**.  

### Key Points
- Use `torch.cuda.is_available()` to check for GPU  
- Use `torch.device('cuda')` for GPU or `torch.device('cpu')` for CPU  
- Move tensors and models to the chosen device using `.to(device)`  

Benefits:
- Avoid runtime errors due to mismatched devices  
- Take advantage of GPU acceleration


In [None]:
import torch
import torch.nn as nn

# Check for GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print("Using device:", device)

# Sample tensor
x = torch.randn(2, 3).to(device)
print("Tensor device:", x.device)

# Sample model
model = nn.Linear(3, 2).to(device)
print("Model device:", next(model.parameters()).device)


<a name='2'></a>
## 2 - Reproducibility

Setting seeds ensures that your experiments are **deterministic and reproducible**.  

### Key Points
- Set seeds for:
  - PyTorch: `torch.manual_seed()`  
  - CUDA: `torch.cuda.manual_seed_all()`  
  - NumPy: `np.random.seed()`  
  - Python: `random.seed()`  
- For GPU determinism:
  - `torch.backends.cudnn.deterministic = True`  
  - `torch.backends.cudnn.benchmark = False`  

Benefits:
- Helps debug and compare models  
- Produces consistent results across runs


In [1]:
import torch
import numpy as np
import random

# Set seeds
seed = 42
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)

# Ensure deterministic behavior
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# Test reproducibility
x1 = torch.randn(3, 3)
x2 = torch.randn(3, 3)
print("Random tensor x1:\n", x1)
print("Random tensor x2:\n", x2)


Random tensor x1:
 tensor([[ 0.3367,  0.1288,  0.2345],
        [ 0.2303, -1.1229, -0.1863],
        [ 2.2082, -0.6380,  0.4617]])
Random tensor x2:
 tensor([[ 0.2674,  0.5349,  0.8094],
        [ 1.1103, -1.6898, -0.9890],
        [ 0.9580,  1.3221,  0.8172]])


<a name='3'></a>
## 3 - Data Handling

Efficient data handling is crucial for training deep learning models.  

### Key Points
- Use **Dataset** and **DataLoader** for:
  - Efficient batching  
  - Shuffling data  
  - Parallel data loading (`num_workers`)  
- Apply **transformations** to preprocess data:
  - Normalization  
  - Augmentation (flip, rotation, crop)  
- Always move batches to the same device as the model


In [2]:
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms

# Sample dataset
class MyDataset(Dataset):
    def __init__(self):
        self.data = torch.randn(10, 3, 32, 32)  # e.g., 32x32 images
        self.labels = torch.randint(0, 2, (10,))
        self.transform = transforms.Normalize(mean=[0.5]*3, std=[0.5]*3)
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        x = self.data[idx]
        y = self.labels[idx]
        x = self.transform(x)
        return x, y

dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=0)

# Iterate through a batch
for x_batch, y_batch in dataloader:
    print("Batch x shape:", x_batch.shape)
    print("Batch y:", y_batch)
    break


Batch x shape: torch.Size([4, 3, 32, 32])
Batch y: tensor([0, 0, 1, 0])


<a name='4'></a>
## 4 - Model Saving & Loading

Saving and loading models properly ensures **flexibility and reproducibility**.  

### Key Points
- Save only the **model weights** (`state_dict`) for flexibility  
- Save the **entire model** if you want to load it directly (less flexible)  
- Use `torch.save()` to save and `torch.load()` to load models  

Example:
- `torch.save(model.state_dict(), "model.pth")`  
- `model.load_state_dict(torch.load("model.pth"))`  
- `model.eval()` sets the model to evaluation mode


In [4]:
import torch
import torch.nn as nn

# Sample model
model = nn.Linear(3, 2)

# Save model state_dict
torch.save(model.state_dict(), "model_weights.pth")
print("Model weights saved.")

# Load model state_dict
model_loaded = nn.Linear(3, 2)
model_loaded.load_state_dict(torch.load("model_weights.pth"))
model_loaded.eval()
print("Model loaded and set to eval mode.")


Model weights saved.
Model loaded and set to eval mode.


<a name='5'></a>
## 5 - Training Loop Best Practices

Proper training loop management ensures **stable and efficient training**.  

### Key Points
- **Zero gradients** before each backward pass: `optimizer.zero_grad()`  
- **Track loss and metrics** for monitoring  
- Use `with torch.no_grad()` for validation to save memory  
- Move **both model and data** to the same device  
- Save checkpoints periodically to resume training if needed


In [5]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Sample dataset
x = torch.randn(20, 3)
y = torch.randint(0, 2, (20,))
dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=5, shuffle=True)

# Sample model
model = nn.Linear(3, 2).to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training loop with best practices
for epoch in range(2):
    for x_batch, y_batch in dataloader:
        x_batch, y_batch = x_batch.to(device), y_batch.to(device)
        
        # Forward pass
        outputs = model(x_batch)
        loss = loss_fn(outputs, y_batch)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

# Validation example
with torch.no_grad():
    x_val = torch.randn(5, 3).to(device)
    val_output = model(x_val)
    print("Validation output:\n", val_output)


Epoch 1, Loss: 0.7349
Epoch 2, Loss: 0.4746
Validation output:
 tensor([[-0.5966,  0.6238],
        [-0.8944,  0.8310],
        [-0.6978,  0.6653],
        [-0.0489,  0.1887],
        [-0.3532,  0.8180]])


<a name='6'></a>
## 6 - Mixed Precision & Gradient Clipping

Advanced techniques for **efficient and stable training**.

### Key Points
- **Mixed precision training** (`torch.cuda.amp`)  
  - Speeds up training on GPUs  
  - Reduces memory usage  
- **Gradient clipping**  
  - Prevents exploding gradients in deep networks  
  - Use `torch.nn.utils.clip_grad_norm_()` or `clip_grad_value_()`


In [6]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Sample dataset
x = torch.randn(20, 3)
y = torch.randint(0, 2, (20,))
dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=5, shuffle=True)

# Sample model
model = nn.Linear(3, 2).to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01)
loss_fn = nn.CrossEntropyLoss()

# Mixed precision training
scaler = torch.cuda.amp.GradScaler()

for epoch in range(2):
    for x_batch, y_batch in dataloader:
        x_batch, y_batch = x_batch.to(device), y_batch.to(device)
        
        optimizer.zero_grad()
        with torch.cuda.amp.autocast():
            outputs = model(x_batch)
            loss = loss_fn(outputs, y_batch)
        
        # Backward pass with gradient scaling
        scaler.scale(loss).backward()
        
        # Gradient clipping
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        
        # Optimizer step
        scaler.step(optimizer)
        scaler.update()
    
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")


Epoch 1, Loss: 0.7234
Epoch 2, Loss: 1.0122


  scaler = torch.cuda.amp.GradScaler()
  with torch.cuda.amp.autocast():


<a name='7'></a>
## 7 - Practice Exercises

Try the following exercises to reinforce your understanding of **PyTorch best practices**:



### **Exercise 1: Device Management**
- Check if GPU is available and set the device.  
- Create a tensor and move it to the selected device.



### **Exercise 2: Reproducibility**
- Set seeds for `torch`, `numpy`, and Python `random`  
- Ensure deterministic GPU results



### **Exercise 3: Data Handling**
- Create a custom `Dataset` with 10 samples of 3 features each  
- Use `DataLoader` to batch the data (batch size = 2) and shuffle it



### **Exercise 4: Model Saving & Loading**
- Create a simple linear model  
- Save its `state_dict` and load it back into a new model  
- Set the loaded model to evaluation mode



### **Exercise 5: Mixed Precision & Gradient Clipping**
- Train a simple linear model with:
  - Mixed precision (`torch.cuda.amp`)  
  - Gradient clipping (`clip_grad_norm_`)
- Use a small dataset with 12 samples


In [7]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
import random

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# ----------------------------
# Exercise 1: Device Management
# ----------------------------
x = torch.randn(2, 3).to(device)
print("Tensor device:", x.device)

# ----------------------------
# Exercise 2: Reproducibility
# ----------------------------
seed = 42
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
np.random.seed(seed)
random.seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# ----------------------------
# Exercise 3: Data Handling
# ----------------------------
class MyDataset(Dataset):
    def __init__(self):
        self.data = torch.randn(10, 3)
        self.labels = torch.randint(0, 2, (10,))
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

dataset = MyDataset()
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)
for xb, yb in dataloader:
    print("Batch x:", xb)
    print("Batch y:", yb)
    break

# ----------------------------
# Exercise 4: Model Saving & Loading
# ----------------------------
model = nn.Linear(3, 2)
torch.save(model.state_dict(), "model_weights.pth")

model_loaded = nn.Linear(3, 2)
model_loaded.load_state_dict(torch.load("model_weights.pth"))
model_loaded.eval()
print("Loaded model:", model_loaded)

# ----------------------------
# Exercise 5: Mixed Precision & Gradient Clipping
# ----------------------------
x_train = torch.randn(12, 3)
y_train = torch.randint(0, 2, (12,))
train_dataset = torch.utils.data.TensorDataset(x_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True)

model = nn.Linear(3, 2).to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01)
loss_fn = nn.CrossEntropyLoss()
scaler = torch.cuda.amp.GradScaler()

for epoch in range(2):
    for xb, yb in train_loader:
        xb, yb = xb.to(device), yb.to(device)
        optimizer.zero_grad()
        with torch.cuda.amp.autocast():
            out = model(xb)
            loss = loss_fn(out, yb)
        scaler.scale(loss).backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
        scaler.step(optimizer)
        scaler.update()
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")


Tensor device: cpu
Batch x: tensor([[ 1.6487, -0.3925, -1.4036],
        [ 1.9269,  1.4873,  0.9007]])
Batch y: tensor([1, 1])
Loaded model: Linear(in_features=3, out_features=2, bias=True)
Epoch 1, Loss: 0.6941
Epoch 2, Loss: 0.4391


  scaler = torch.cuda.amp.GradScaler()
  with torch.cuda.amp.autocast():
