# PyTorch Deep Learning Tutorials

This notebook mirrors the Python scripts in this directory and demonstrates core PyTorch functionality.

PyTorch documentation: [PyTorch Docs](https://pytorch.org/docs/stable/index.html)

In [None]:
import torch
from torch import nn
from torchvision import datasets, transforms, models
from torch.utils.data import DataLoader, random_split
import matplotlib.pyplot as plt

## 0. Tensor basics

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using device:', device)

In [None]:
a = torch.tensor([[1.,2.],[3.,4.]], device=device)
b = torch.rand(2,2, device=device)
print('Tensor a:', a)
print('Tensor b:', b)
print('a + b:', a + b)
print('a @ b:', a @ b)

In [None]:
x = torch.tensor([2.0], requires_grad=True, device=device)
y = x**2 + 3*x + 1
y.backward()
print('dy/dx at x=2:', x.grad.item())

In [None]:
w = torch.randn(1, requires_grad=True, device=device)
b_param = torch.randn(1, requires_grad=True, device=device)

xs = torch.linspace(-1, 1, 10, device=device)
ys = 2 * xs + 1

In [None]:
opt = torch.optim.SGD([w, b_param], lr=0.1)
loss_fn = torch.nn.MSELoss()

In [None]:
num_iterations = 100
for i in range(num_iterations):
    opt.zero_grad()
    preds = w * xs + b_param
    loss = loss_fn(preds, ys)
    print('Iteration:', i, 'Loss:', loss.item())
    loss.backward()
    opt.step()

print('Fitted parameters:', w.item(), b_param.item())

## 1. Train a simple MLP

In [None]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
full_train = datasets.MNIST(root='data', train=True, download=True, transform=transform)

In [None]:
train_size = int(0.8 * len(full_train))
val_size = len(full_train) - train_size
train_ds, val_ds = random_split(full_train, [train_size, val_size])

In [None]:
test_ds = datasets.MNIST(root='data', train=False, download=True, transform=transform)

In [None]:
train_loader = DataLoader(train_ds, batch_size=64, shuffle=True)
val_loader = DataLoader(val_ds, batch_size=64)
test_loader = DataLoader(test_ds, batch_size=64)

### Architecture Explanation: Simple MLP for MNIST

The model defined here is a simple Multi-Layer Perceptron (MLP) designed for image classification on the MNIST dataset. The architecture consists of the following layers:

- **Flatten**: Converts each 28x28 input image into a 784-dimensional vector.
- **Linear (784 → 128)**: A fully connected layer mapping the flattened input to 128 hidden units.
- **ReLU**: Applies the Rectified Linear Unit activation function, introducing non-linearity.
- **Linear (128 → 10)**: A fully connected output layer mapping the 128 features to 10 classes (digits 0–9).

The final output provides class scores for each digit.

In [None]:
model = nn.Sequential(
                nn.Flatten(), 
                nn.Linear(28*28,128), 
                nn.ReLU(), 
                nn.Linear(128,10)
        ).to(device)

loss_fn = nn.CrossEntropyLoss()
opt = torch.optim.Adam(model.parameters(), lr=1e-3)

In [None]:
train_losses, val_losses = [], []
num_epochs = 10

for epoch in range(num_epochs):
    model.train()
    epoch_loss = 0
    for imgs, labels in train_loader:
        imgs, labels = imgs.to(device), labels.to(device)
        opt.zero_grad()
        out = model(imgs)
        loss = loss_fn(out, labels)
        loss.backward()
        opt.step()
        epoch_loss += loss.item()
    
    train_losses.append(epoch_loss / len(train_loader))
    model.eval()
    val_loss = 0
    
    with torch.no_grad():
        for imgs, labels in val_loader:
            imgs, labels = imgs.to(device), labels.to(device)
            out = model(imgs)
            val_loss += loss_fn(out, labels).item()
    
    val_losses.append(val_loss / len(val_loader))
    print(f'Epoch {epoch+1}: train_loss={train_losses[-1]:.4f}, val_loss={val_losses[-1]:.4f}')


### Evaluate MLP on the test set

In [None]:
plt.figure()
plt.plot(range(1, num_epochs+1), train_losses, label='Train')
plt.plot(range(1, num_epochs+1), val_losses, label='Val')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
model.eval()
correct = 0
with torch.no_grad():
    for imgs, labels in test_loader:
        preds = model(imgs.to(device)).argmax(dim=1)
        correct += (preds == labels.to(device)).sum().item()
acc = correct / len(test_ds)
print('Test accuracy:', acc)

In [None]:
imgs, labels = next(iter(test_loader))

with torch.no_grad():
    preds = model(imgs.to(device)).argmax(dim=1).cpu()
    
fig, axes = plt.subplots(1,6,figsize=(12,2))

for i, ax in enumerate(axes):
    ax.imshow(imgs[i].squeeze(), cmap='gray')
    ax.set_title(f'{preds[i].item()} / {labels[i].item()}')
    ax.axis('off')

plt.tight_layout()
plt.show()

## 2. Train a basic CNN

### Architecture Explanation: SimpleCNN for MNIST

The `SimpleCNN` model is a basic convolutional neural network designed for image classification on the MNIST dataset. Its architecture consists of two main parts:

- **Convolutional Feature Extractor (`self.conv`)**:
    - `Conv2d(1, 32, 3, padding=1)`: Applies 32 filters of size 3x3 to the input image (1 channel), preserving spatial dimensions with padding.
    - `ReLU()`: Introduces non-linearity.
    - `MaxPool2d(2)`: Reduces spatial dimensions by half (from 28x28 to 14x14).
    - `Conv2d(32, 64, 3, padding=1)`: Applies 64 filters of size 3x3, again preserving spatial dimensions.
    - `ReLU()`: Non-linearity.
    - `MaxPool2d(2)`: Further reduces spatial dimensions by half (from 14x14 to 7x7).

- **Fully Connected Classifier (`self.fc`)**:
    - `Flatten()`: Flattens the output from the convolutional layers (shape: 64 channels × 7 × 7 = 3136 features).
    - `Linear(3136, 128)`: Fully connected layer with 128 hidden units.
    - `ReLU()`: Non-linearity.
    - `Linear(128, 10)`: Output layer mapping to 10 classes (digits 0–9).

The model processes input images through the convolutional layers to extract features, then classifies them using the fully connected layers.

In [None]:
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1,32,3,padding=1),
            nn.ReLU(), 
            nn.MaxPool2d(2),
            nn.Conv2d(32,64,3,padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

        self.fc = nn.Sequential(
            nn.Flatten(),
            nn.Linear(7*7*64,128), 
            nn.ReLU(), 
            nn.Linear(128,10)
        )
        
    def forward(self,x):
        return self.fc(self.conv(x))

In [None]:
model = SimpleCNN().to(device)
criterion = nn.CrossEntropyLoss()
learning_rate = 1e-3
opt = torch.optim.Adam(model.parameters(), lr=learning_rate)
train_losses, val_losses = [], []
num_epochs = 10

In [None]:
for epoch in range(num_epochs):
    
    model.train()
    epoch_loss=0
    
    for imgs, labels in train_loader:
        imgs, labels = imgs.to(device), labels.to(device)
        opt.zero_grad()
        out = model(imgs)
        loss = criterion(out, labels)
        loss.backward()
        opt.step()
        epoch_loss += loss.item()
    train_losses.append(epoch_loss/len(train_loader))
    
    model.eval()
    val_loss=0
    
    with torch.no_grad():
        for imgs, labels in val_loader:
            val_loss += criterion(model(imgs.to(device)), labels.to(device)).item()
    val_losses.append(val_loss/len(val_loader))
    
    print(f'Epoch {epoch+1}: train_loss={train_losses[-1]:.4f}, val_loss={val_losses[-1]:.4f}')

### Evaluate CNN

In [None]:
plt.figure()
plt.plot(range(1,num_epochs+1), train_losses, label='Train')
plt.plot(range(1,num_epochs+1), val_losses, label='Val')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
model.eval()
correct=0
with torch.no_grad():
    for imgs, labels in test_loader:
        preds = model(imgs.to(device)).argmax(dim=1)
        correct += (preds == labels.to(device)).sum().item()
acc = correct / len(test_ds)
print('Test accuracy:', acc)

In [None]:
imgs, labels = next(iter(test_loader))

with torch.no_grad():
    preds = model(imgs.to(device)).argmax(dim=1).cpu()

fig, axes = plt.subplots(1,6,figsize=(12,2))

for i, ax in enumerate(axes):
    ax.imshow(imgs[i].squeeze(), cmap='gray')
    ax.set_title(f'{preds[i].item()} / {labels[i].item()}')
    ax.axis('off')

plt.tight_layout()
plt.show()

## 3. Finetune a pretrained ResNet

In [None]:
full_train = datasets.CIFAR10(root='data', train=True, download=True, transform=transform)

In [None]:
transform = transforms.Compose(
                [transforms.Resize((224,224)), 
                 transforms.ToTensor(), 
                 transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))]
            )

train_size = int(0.8 * len(full_train))
val_size = len(full_train) - train_size

In [None]:
train_ds, val_ds = random_split(full_train, [train_size, val_size])

In [None]:
test_ds = datasets.CIFAR10(root='data', train=False, download=True, transform=transform)

In [None]:
train_loader = DataLoader(train_ds, batch_size=32, shuffle=True)
val_loader = DataLoader(val_ds, batch_size=32)
test_loader = DataLoader(test_ds, batch_size=32)

In [None]:
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
model

In [None]:
model.fc = nn.Linear(model.fc.in_features, 10)

In [None]:
model.to(device)
criterion = nn.CrossEntropyLoss()
opt = torch.optim.Adam(model.parameters(), lr=1e-4)

In [None]:
train_losses, val_losses = [], []
num_epochs = 10

for epoch in range(num_epochs):
    model.train()
    epoch_loss=0
    for imgs, labels in train_loader:
        imgs, labels = imgs.to(device), labels.to(device)
        opt.zero_grad()
        out = model(imgs)
        loss = criterion(out, labels)
        loss.backward()
        opt.step()
        epoch_loss += loss.item()

    train_losses.append(epoch_loss/len(train_loader))
    
    model.eval()
    val_loss=0
    
    with torch.no_grad():
        for imgs, labels in val_loader:
            val_loss += criterion(model(imgs.to(device)), labels.to(device)).item()
    
    val_losses.append(val_loss/len(val_loader))
    print(f'Epoch {epoch+1}: train_loss={train_losses[-1]:.4f}, val_loss={val_losses[-1]:.4f}')

### Evaluate ResNet

In [None]:
plt.figure()
plt.plot(range(1,num_epochs+1), train_losses, label='Train')
plt.plot(range(1,num_epochs+1), val_losses, label='Val')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
model.eval()
correct=0
with torch.no_grad():
    for imgs, labels in test_loader:
        preds = model(imgs.to(device)).argmax(dim=1)
        correct += (preds == labels.to(device)).sum().item()
acc = correct / len(test_ds)
print('Test accuracy:', acc)

In [None]:
imgs, labels = next(iter(test_loader))
with torch.no_grad():
    preds = model(imgs.to(device)).argmax(dim=1).cpu()

imgs = imgs * 0.5 + 0.5
fig, axes = plt.subplots(2,3,figsize=(9,6))
for i, ax in enumerate(axes.flat):
    ax.imshow(imgs[i].permute(1,2,0))
    ax.set_title(f'{preds[i].item()} / {labels[i].item()}')
    ax.axis('off')

plt.tight_layout()
plt.show()

## 4. Autoencoder for MNIST

In [None]:
class AutoEncoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.encoder = nn.Sequential(
                            nn.Flatten(), 
                            nn.Linear(28*28,64), 
                            nn.ReLU()
        )

        self.decoder = nn.Sequential(
            nn.Linear(64,28*28), 
            nn.Sigmoid(), 
            nn.Unflatten(1,(1,28,28))
        )

    def forward(self,x):
        return self.decoder(self.encoder(x))

In [None]:
# Redefine the MNIST dataset here
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
full_train = datasets.MNIST(root='data', train=True, download=True, transform=transform)
train_size = int(0.8 * len(full_train))
val_size = len(full_train) - train_size
train_ds, val_ds = random_split(full_train, [train_size, val_size])
test_ds = datasets.MNIST(root='data', train=False, download=True, transform=transform)
train_loader = DataLoader(train_ds, batch_size=64, shuffle=True)
val_loader = DataLoader(val_ds, batch_size=64)
test_loader = DataLoader(test_ds, batch_size=64)

In [None]:
model = AutoEncoder().to(device)
criterion = nn.MSELoss()
opt = torch.optim.Adam(model.parameters(), lr=1e-3)

In [None]:
train_losses, val_losses = [], []
num_epochs = 10

for epoch in range(num_epochs):
    model.train()
    epoch_loss=0
    for imgs, _ in train_loader:
        imgs = imgs.to(device)
        opt.zero_grad()
        out = model(imgs)
        loss = criterion(out, imgs)
        loss.backward()
        opt.step()
        epoch_loss += loss.item()
    
    train_losses.append(epoch_loss/len(train_loader))
    
    model.eval()
    val_loss=0
    with torch.no_grad():
        for imgs, _ in val_loader:
            val_loss += criterion(model(imgs.to(device)), imgs.to(device)).item()
    
    val_losses.append(val_loss/len(val_loader))
    print(f'Epoch {epoch+1}: train_loss={train_losses[-1]:.4f}, val_loss={val_losses[-1]:.4f}')

### Evaluate Autoencoder

In [None]:
plt.figure()
plt.plot(range(1,num_epochs+1), train_losses, label='Train')
plt.plot(range(1,num_epochs+1), val_losses, label='Val')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [None]:
model.eval()
test_loss=0
with torch.no_grad():
    for imgs, _ in test_loader:
        out = model(imgs.to(device))
        test_loss += criterion(out, imgs.to(device)).item()
test_loss /= len(test_loader)
print('Test reconstruction loss:', test_loss)

In [None]:
imgs, _ = next(iter(test_loader))
with torch.no_grad():
    recon = model(imgs.to(device)).cpu()

fig, axes = plt.subplots(2,6,figsize=(12,4))
for i in range(6):
    axes[0,i].imshow(imgs[i].squeeze(), cmap='gray'); axes[0,i].axis('off')
    axes[1,i].imshow(recon[i].squeeze(), cmap='gray'); axes[1,i].axis('off')

plt.tight_layout()
plt.show()