# Lab1 ‚Äî PyTorch Foundations for Computer Vision

**Course**: Deep Learning for Image Analysis

**Class**: M2 IASD App  

**Professor**: Mehyar MLAWEH

---

## Objectives
By the end of this lab, you should be able to:

- Understand how **neurons and layers** are implemented in PyTorch
- Manipulate **tensors** and reason about shapes
- Use **autograd** to compute gradients
- Implement a **training loop** yourself
- Connect theory (neurons, loss, backprop) to actual code

‚ö†Ô∏è This notebook is **intentionally incomplete**.  
Whenever you see **`# TODO`**, you are expected to write code.


**Deadline:** üóìÔ∏è **Saturday, February 7th (23:59)**

## ü§ñ A small (honest) note before you start

Let‚Äôs be real for a second.

 I know you **can use LLMs (ChatGPT, Copilot, Claude, etc.)** to help you with this lab.  
And yes, **I use them too**, so don‚Äôt worry üòÑ

üëâ **You are allowed to use AI tools.**  
But here‚Äôs the deal:

- Don‚Äôt just **copy‚Äìpaste** code you don‚Äôt understand  
- Take time to **read, question, and modify** what the model gives you  
- If you can solve a block **by yourself, without AI**, that‚Äôs excellent

Remember:

> AI can write code for you, but **only you can understand it** ‚Äî and understanding is what matters for exams, projects, and real work.

Use these tools **as assistants, not as replacements for thinking**.

---

## üìö Useful documentation (highly recommended)

You will often find answers faster (and more reliably) by checking the official documentation:

- **PyTorch main documentation**  
  https://pytorch.org/docs/stable/index.html

- **PyTorch tensors**  
  https://pytorch.org/docs/stable/tensors.html

- **Neural network modules (`torch.nn`)**  
  https://pytorch.org/docs/stable/nn.html

- **Loss functions** (`BCEWithLogitsLoss`, CrossEntropy, etc.)  
  https://pytorch.org/docs/stable/nn.html#loss-functions

- **Optimizers** (`SGD`, `Adam`, ‚Ä¶)  
  https://pytorch.org/docs/stable/optim.html

If you learn how to **navigate the documentation**, you are already thinking like a real AI engineer üëå

---

## PART I

## 0) Colab setup ‚Äî GPU check

**Instructions**
1. In Colab: `Runtime ‚Üí Change runtime type to GPU T4`
2. Select **GPU**
3. Save and restart runtime

Then run the cell below.


In [2]:
import torch

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

# TODO: set the device correctly (cuda if available, else cpu)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print("Using device:", device)


PyTorch version: 2.9.0+cu126
CUDA available: True
Using device: cuda


## 1) Imports and reproducibility


In [3]:
import torch
import torch.nn as nn
import torch.optim as optim

# TODO: fix the random seed for reproducibility
torch.manual_seed(42)


<torch._C.Generator at 0x7c9f5017c5b0>

## 2) PyTorch tensors and shapes

Tensors are multi-dimensional arrays that support:
- GPU acceleration
- automatic differentiation

Understanding **shapes** is critical in deep learning.


In [4]:
# Examples
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.randn(4, 5)

print("a shape:", a.shape)
print("b shape:", b.shape)


a shape: torch.Size([3])
b shape: torch.Size([4, 5])


### üîç Question (answer inside the markdown)
- How many dimensions does tensor `b` have?
- What does each dimension represent conceptually?


### ‚úÖTensor operations

Complete the following:

1. Create a tensor `x` of shape `(8, 3)` with random values  
2. Compute:
   - the **mean of each column**
   - the **L2 norm of each row**
3. Normalize `x` **row-wise** using the L2 norm

In [5]:
# TODO: create x
from torch import random
x = torch.randn(8, 3)

# TODO: column mean
col_mean = torch.mean(x, dim=0)

# TODO: row-wise L2 norm
row_norm = torch.norm(x, p=2, dim=1)

# TODO: normalized tensor
x_normalized = x / row_norm.view(-1, 1)

print(x.shape, col_mean.shape, row_norm.shape, x_normalized.shape)


torch.Size([8, 3]) torch.Size([3]) torch.Size([8]) torch.Size([8, 3])


## 3) Artificial neuron ‚Äî from math to code

A neuron computes:

$$
z = \sum_i w_i x_i + b
$$

Then applies an activation function:

$$
y = g(z)
$$

This section connects directly to the theory seen in class.


In [6]:
x = torch.tensor([1.0, -2.0, 3.0])
w = torch.tensor([0.2, 0.4, -0.1])
b = torch.tensor(0.1)

z = torch.sum(x * w) + b
z


tensor(-0.8000)

### Activation functions

1. Implement **ReLU**
2. Implement **Sigmoid**
3. Apply both to `z` and compare the outputs

Which activation preserves negative values?


In [7]:
# TODO
def relu(z):
  return torch.max(z, torch.zeros_like(z))

def sigmoid(z):
  return 1 / (1 + torch.exp(-x))


y_relu = relu(z)
y_sigmoid = sigmoid(z)
y_relu, y_sigmoid


(tensor(0.), tensor([0.7311, 0.1192, 0.9526]))

## 4) Autograd and gradients

PyTorch uses **automatic differentiation** to compute gradients
using the **chain rule** (backpropagation).


In [8]:
x = torch.tensor([1.0, 2.0, -1.0], requires_grad=True)
w = torch.tensor([0.5, -0.3, 0.8], requires_grad=True)
b = torch.tensor(0.2, requires_grad=True)

z = torch.sum(x * w) + b
loss = (z - 1.0) ** 2

loss.backward()

print("loss:", loss.item())
print("grad w:", w.grad)
print("grad b:", b.grad)


loss: 2.890000104904175
grad w: tensor([-3.4000, -6.8000,  3.4000])
grad b: tensor(-3.4000)


### üîç Conceptual question

- If `b.grad > 0`, should `b` increase or decrease after a gradient descent step?
Explain **why** in one sentence.


b should decrease because gradient descent minimizes loss by moving in the opposite direction of the gradient. So if the gradient is positive, subtracting it reduces the value of b

## 5) Toy classification dataset

We create a **linearly separable** dataset.

Label rule:
- class = 1 if `x‚ÇÅ + x‚ÇÇ + x‚ÇÉ > 0`
- else class = 0

This mimics a very simple classification problem.


In [9]:
# TODO: generate a dataset of size N=500 with 3 features
X = torch.randn(500, 3)
y = (torch.sum(X, dim=1) > 0).float()

# TODO: split into train (80%) and validation (20%)
train_size = int(0.8 * len(X))
X_train, X_val = X[:train_size], X[train_size:]
y_train, y_val = y[:train_size], y[train_size:]


## 6) Model definition

We define a small **MLP** (fully-connected network):

`3 ‚Üí 16 ‚Üí 8 ‚Üí 1`

Activation: ReLU  
Output: raw logits (no sigmoid)


In [10]:
from numpy import linalg
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential( nn.Linear(3, 16),
            nn.ReLU(), nn.Linear(16, 8),
            nn.ReLU(), nn.Linear(8, 1)
        )

    def forward(self, x):
        return self.net(x)

# TODO: create model and move it to the GPU
model = MLP().to(device)



###  parameters

1. Compute **by hand** the total number of parameters
2. Verify your answer using PyTorch


$$\text{parameters} = (3 \times 16) + 16 + (16 \times 8) + 8 + (8 \times 1) + 1 = 209$$

In [11]:
# TODO: count parameters with PyTorch
total_params = sum(p.numel() for p in model.parameters())
total_params


209

## 7) Training loop

You must complete the full training loop:
- forward pass
- loss computation
- backward pass
- optimizer step

Loss: `BCEWithLogitsLoss`
Optimizer: `SGD`


In [13]:
# TODO: move data to device
X_train_d = X_train.to(device)
y_train_d = y_train.to(device)
X_val_d = X_val.to(device)
y_val_d = y_val.to(device)

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(100):
    model.train()
    optimizer.zero_grad()

    # TODO: forward
    logits = model(X_train_d)

    # TODO: loss
    loss = criterion(logits.squeeze(), y_train_d)

    # TODO: backward
    loss.backward()

    # TODO: update
    optimizer.step()

    if epoch % 5 == 0:
        print("Epoch", epoch, "| loss =", float(loss))


Epoch 0 | loss = 0.3635922074317932
Epoch 5 | loss = 0.32014164328575134
Epoch 10 | loss = 0.28147804737091064
Epoch 15 | loss = 0.24821636080741882
Epoch 20 | loss = 0.22020456194877625
Epoch 25 | loss = 0.19692149758338928
Epoch 30 | loss = 0.17761117219924927
Epoch 35 | loss = 0.16155460476875305
Epoch 40 | loss = 0.1481115072965622
Epoch 45 | loss = 0.13669680058956146
Epoch 50 | loss = 0.12695583701133728
Epoch 55 | loss = 0.11857117712497711
Epoch 60 | loss = 0.11131421476602554
Epoch 65 | loss = 0.10496898740530014
Epoch 70 | loss = 0.09938684105873108
Epoch 75 | loss = 0.09442509710788727
Epoch 80 | loss = 0.08998428285121918
Epoch 85 | loss = 0.0859915018081665
Epoch 90 | loss = 0.08237683773040771
Epoch 95 | loss = 0.07908732444047928


## 8) Evaluation

1. Apply `sigmoid` to the logits
2. Convert probabilities to predictions
3. Compute **accuracy** on the validation set


In [14]:
# TODO: evaluation
with torch.no_grad():
  logits = model(X_val_d)
  probs = torch.sigmoid(logits)
  preds = (probs > 0.5).float()


accuracy = (preds == y_val_d).float().mean()
print("Accuracy:", int(accuracy.item()*100), "%")
accuracy


Accuracy: 50 %


tensor(0.5072, device='cuda:0')

## 9) Reflection questions (answer inside the markdown)

1. Why do we **not** apply sigmoid inside the model?
2. What would happen if we removed all ReLU activations?
3. How does this toy problem relate to image classification?

Write short answers (2‚Äì3 lines each).


1. Why do we *not* apply sigmoid inside the model?

On ne l'applique pas dans le `forward` car la fonction de perte, comme `BCEWithLogitsLoss`, elle est d√©j√† int√©gr√©e

2. What would happen if we removed all ReLU activations?

Le r√©seau deviendrait √©quivalent √† une simple r√©gression lin√©aire (car une suite de transformations lin√©aires reste lin√©aire). Il serait alors incapable d'apprendre des fronti√®res de d√©cision complexes (non lin√©aires)

3. How does this toy problem relate to image classification?

C'est exactement le m√™me principe math√©matique : le mod√®le apprend des poids pour associer des entr√©es √† une sortie. La seule diff√©rence est l'√©chelle, car ici on traite 3 variables alors qu'une image n'est qu'un vecteur d'entr√©e beaucoup plus grand compos√© de milliers de pixels

## 10) Bridge to Computer Vision

So far:
- inputs = vectors of size 3
- layers = fully-connected

Next session:
- inputs = images `(B, C, H, W)`
- layers = convolutions
- same training logic

üëâ **Architecture changes, learning principles stay the same.**


## Part II ‚Äî Training on MNIST

Check the next notebook