# Lab1 ‚Äî PyTorch Foundations for Computer Vision

**Course**: Deep Learning for Image Analysis 

**Class**: M2 IASD App  

**Professor**: Mehyar MLAWEH

---

## Objectives
By the end of this lab, you should be able to:

- Understand how **neurons and layers** are implemented in PyTorch
- Manipulate **tensors** and reason about shapes
- Use **autograd** to compute gradients
- Implement a **training loop** yourself
- Connect theory (neurons, loss, backprop) to actual code

‚ö†Ô∏è This notebook is **intentionally incomplete**.  
Whenever you see **`# TODO`**, you are expected to write code.


**Deadline:** üóìÔ∏è **Saturday, February 7th (23:59)**

## ü§ñ A small (honest) note before you start

Let‚Äôs be real for a second.

 I know you **can use LLMs (ChatGPT, Copilot, Claude, etc.)** to help you with this lab.  
And yes, **I use them too**, so don‚Äôt worry üòÑ

üëâ **You are allowed to use AI tools.**  
But here‚Äôs the deal:

- Don‚Äôt just **copy‚Äìpaste** code you don‚Äôt understand  
- Take time to **read, question, and modify** what the model gives you  
- If you can solve a block **by yourself, without AI**, that‚Äôs excellent 

Remember:

> AI can write code for you, but **only you can understand it** ‚Äî and understanding is what matters for exams, projects, and real work.

Use these tools **as assistants, not as replacements for thinking**.

---

## üìö Useful documentation (highly recommended)

You will often find answers faster (and more reliably) by checking the official documentation:

- **PyTorch main documentation**  
  https://pytorch.org/docs/stable/index.html

- **PyTorch tensors**  
  https://pytorch.org/docs/stable/tensors.html

- **Neural network modules (`torch.nn`)**  
  https://pytorch.org/docs/stable/nn.html

- **Loss functions** (`BCEWithLogitsLoss`, CrossEntropy, etc.)  
  https://pytorch.org/docs/stable/nn.html#loss-functions

- **Optimizers** (`SGD`, `Adam`, ‚Ä¶)  
  https://pytorch.org/docs/stable/optim.html

If you learn how to **navigate the documentation**, you are already thinking like a real AI engineer üëå

---

## PART I

## 0) Colab setup ‚Äî GPU check

**Instructions**
1. In Colab: `Runtime ‚Üí Change runtime type to GPU T4` 
2. Select **GPU**
3. Save and restart runtime

Then run the cell below.


In [None]:
import torch

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

# TODO: set the device correctly (cuda if available, else cpu)

if torch.cuda.is_available() :
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

print("Using device:", device)


PyTorch version: 2.8.0+cpu
CUDA available: False
Using device: cpu


## 1) Imports and reproducibility


In [2]:
import torch
import torch.nn as nn
import torch.optim as optim

# TODO: fix the random seed for reproducibility

torch.manual_seed(42)

# 3 cons√©quences principales :
# - initialisation des poids : √† chaque fois que l'on va r√©entrainer le m√™me r√©seau de neurones, les poids des neurones seront initialis√©s de la m√™me mani√®re
# - m√©lange des donn√©es : √† chaque √©poque de l'entra√Ænement, les donn√©es seront m√©lang√©s de la m√™me mani√®re
# - dropout : en cas de dropout, les neurones d√©sactiv√©s seront toujours les m√™mes √† chaque essai

<torch._C.Generator at 0x2376f4da6f0>

## 2) PyTorch tensors and shapes

Tensors are multi-dimensional arrays that support:
- GPU acceleration
- automatic differentiation

Understanding **shapes** is critical in deep learning.


In [9]:
# Examples
a = torch.tensor([1.0, 2.0, 3.0]) # vecteur de taille 3
b = torch.randn(4, 5) # matrice 4 x 5 remplie al√©atoirement selon N(0,1)

print("a shape:", a.shape)
print("b shape:", b.shape)


a shape: torch.Size([3])
b shape: torch.Size([4, 5])


### üîç Question (answer inside the markdown)
- How many dimensions does tensor `b` have? It has two dimensions
- What does each dimension represent conceptually? La question est ambigue...


### ‚úÖTensor operations

Complete the following:

1. Create a tensor `x` of shape `(8, 3)` with random values  
2. Compute:
   - the **mean of each column**
   - the **L2 norm of each row**
3. Normalize `x` **row-wise** using the L2 norm

In [None]:
# TODO: create x
x = torch.randn(8,3)
print(x)

# TODO: column mean
col_mean = torch.mean(x, 0) # torch.mean(input, dim, keepdim=False, *, dtype=None, out=None) 
# -> 'dim' correspond √† la dimension que l'on va "√©craser" en voulant calculer la moyenne. Pour calculer la moyenne des colonnes, on "√©crase" les lignes, d'o√π dim=0.
# -> ATTENTION : dim=d va litt√©ralement √©craser la dimension d, i.e. la supprimer du tenseur d'output. Utiliser keepdim=True pour conserver la dimension.

# TODO: row-wise L2 norm
row_norm = torch.linalg.norm(x, dim=1, keepdim=True) # keepdim=True indispensable pour la normalisation suivante

# TODO: normalized tensor
x_normalized = x / row_norm
print(x_normalized)

print(x.shape, col_mean.shape, row_norm.shape, x_normalized.shape)


tensor([[-0.4138,  0.5184, -0.7015],
        [-0.4323,  0.1415,  0.0711],
        [ 0.5634, -0.5786, -0.9437],
        [ 0.1730, -1.8815,  0.5851],
        [ 1.5287, -0.9324,  1.3527],
        [ 0.1603,  0.5374,  0.7817],
        [ 1.0477, -0.3948,  1.6077],
        [-0.8064,  0.0732, -2.0952]])
tensor([[-0.4286,  0.5370, -0.7266],
        [-0.9390,  0.3073,  0.1544],
        [ 0.4535, -0.4659, -0.7598],
        [ 0.0875, -0.9512,  0.2958],
        [ 0.6812, -0.4155,  0.6028],
        [ 0.1666,  0.5586,  0.8126],
        [ 0.5348, -0.2015,  0.8206],
        [-0.3590,  0.0326, -0.9328]])
torch.Size([8, 3]) torch.Size([3]) torch.Size([8, 1]) torch.Size([8, 3])


Visualisation de la normalisation

$$X = \begin{pmatrix}
x_{11} & x_{12} & x_{13} \\
x_{21} & x_{22} & x_{23} \\
\vdots & \vdots & \vdots \\
x_{81} & x_{82} & x_{83}
\end{pmatrix}$$

$$row\_norm = \begin{pmatrix}
n_1 \\
n_2 \\
\vdots \\
n_8
\end{pmatrix}$$

$$X_{norm} = \begin{pmatrix}
\frac{x_{11}}{n_1} & \frac{x_{12}}{n_1} & \frac{x_{13}}{n_1} \\
\frac{x_{21}}{n_2} & \frac{x_{22}}{n_2} & \frac{x_{23}}{n_2} \\
\vdots & \vdots & \vdots \\
\frac{x_{81}}{n_8} & \frac{x_{82}}{n_8} & \frac{x_{83}}{n_8}
\end{pmatrix}$$

## 3) Artificial neuron ‚Äî from math to code

A neuron computes:

$$
z = \sum_i w_i x_i + b
$$

Then applies an activation function:

$$
y = g(z)
$$

This section connects directly to the theory seen in class.


In [23]:
x = torch.tensor([1.0, -2.0, 3.0])
w = torch.tensor([0.2, 0.4, -0.1])
b = torch.tensor(0.1)

z = torch.sum(x * w) + b
z


tensor(-0.8000)

### Activation functions

1. Implement **ReLU**
2. Implement **Sigmoid**
3. Apply both to `z` and compare the outputs

Which activation preserves negative values?
-> AUCUNE des deux. Pour pr√©server les valeurs n√©gatives il y a Leaky ReLU ou Tanh.


In [46]:
# TODO
def relu(z):
    return max(0,z)

def sigmoid(z):
    return 1/(1+torch.exp(-z))

y_relu = relu(z)
y_sigmoid = sigmoid(z)
y_relu, y_sigmoid


(0, tensor(0.3318, grad_fn=<MulBackward0>))

## 4) Autograd and gradients

PyTorch uses **automatic differentiation** to compute gradients
using the **chain rule** (backpropagation).


In [None]:
x = torch.tensor([1.0, 2.0, -1.0], requires_grad=True) # requires_grad=True permet de dire √† PyTorch de retenir toutes les op√©rations effectu√©es sur le tenseur
w = torch.tensor([0.5, -0.3, 0.8], requires_grad=True)
b = torch.tensor(0.2, requires_grad=True)

z = torch.sum(x * w) + b
loss = (z - 1.0) ** 2

loss.backward() # .backward() d√©clenche la phase de r√©tropropagation
# -> avant .backward(), w.grad = None, b.grad = None
# -> apr√®s .backward(), w.grad et b.grad contiennent les valeurs du gradient calcul√©

print("loss:", loss.item())
print("grad w:", w.grad)
print("grad b:", b.grad)


loss: 2.890000104904175
grad w: tensor([-3.4000, -6.8000,  3.4000])
grad b: tensor(-3.4000)


### üîç Conceptual question

- If `b.grad > 0`, should `b` increase or decrease after a gradient descent step?
Explain **why** in one sentence.<br><br>

$b.grad = \frac{\partial L}{\partial b}$ donc si b.grad est positif, la loss augmente. Il faut donc diminuer b √† l'√©tape suivante.

## 5) Toy classification dataset

We create a **linearly separable** dataset.

Label rule:
- class = 1 if `x‚ÇÅ + x‚ÇÇ + x‚ÇÉ > 0`
- class = 0 otherwise

This mimics a very simple classification problem.


In [3]:
# TODO: generate a dataset of size N=500 with 3 features

N=500
X = torch.randn(N,3)
y = torch.where(torch.sum(X, dim=1) > 0, 1, 0) # comment cr√©er la condition ? torch.where(condition, input, other, *, out=None) -> retourne 'input' si condition=True, 'other' sinon

# TODO: split into train (80%) and validation (20%)

split_size = int(0.8*N)

X_train = X[:split_size , :]
X_val = X[split_size: , :]

y_train = y[:split_size]
y_val = y[split_size:]

# ATTENTION : on suppose que les donn√©es sont m√©lang√©es

## 6) Model definition

We define a small **MLP** (fully-connected network):

`3 ‚Üí 16 ‚Üí 8 ‚Üí 1`

Activation: ReLU  
Output: raw logits (no sigmoid)


In [37]:
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(3,16),
            nn.ReLU(),
            nn.Linear(16,8),
            nn.ReLU(),
            nn.Linear(8,1)
        )

    def forward(self, x):
        return self.net(x)

# TODO: create model and move it to the GPU

model = MLP()

model.to(device)


MLP(
  (net): Sequential(
    (0): Linear(in_features=3, out_features=16, bias=True)
    (1): ReLU()
    (2): Linear(in_features=16, out_features=8, bias=True)
    (3): ReLU()
    (4): Linear(in_features=8, out_features=1, bias=True)
  )
)

###  parameters

1. Compute **by hand** the total number of parameters

Le nombre de param√®tres du r√©seau de neurones devrait √™tre : (3 x 16 + 16) + (16 x 8 + 8) + (8 x 1 + 1) = 209

2. Verify your answer using PyTorch


In [38]:
# TODO: count parameters with PyTorch

total_params = sum(p.numel() for p in model.parameters())
print(f"Nombre total de param√®tres : {total_params}")


Nombre total de param√®tres : 209


## 7) Training loop 

You must complete the full training loop:
- forward pass
- loss computation
- backward pass
- optimizer step

Loss: `BCEWithLogitsLoss`<br>
Optimizer: `SGD`


In [None]:
# TODO: move data to device
X_train_d = X_train.to(device)
y_train_d = y_train.to(device).float().unsqueeze(1) # .float() n√©cessaire car le torch.where() a renvoy√© des Long (entiers), or PyTorch veut des floats pour que la fonction de perte puisse calculer les gradients
X_val_d = X_val.to(device)
y_val_d = y_val.to(device).float().unsqueeze(1)

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(20):
    model.train()
    optimizer.zero_grad() # OBLIGATOIRE pour ne pas cumuler les gradients des √©poques pr√©c√©dentes

    # TODO: forward
    logits = model.forward(X_train_d)

    # TODO: loss
    loss = criterion(logits, y_train_d)

    # TODO: backward
    loss.backward()

    # TODO: update
    optimizer.step() # c'est ICI qu'on applique la formule de la descente de gradient pour mettre √† jour les poids

    if epoch % 5 == 0:
        print("Epoch", epoch, "| loss =", float(loss))


Epoch 0 | loss = 0.6947555541992188
Epoch 5 | loss = 0.689077615737915
Epoch 10 | loss = 0.6831324100494385
Epoch 15 | loss = 0.676664412021637


Consider using tensor.detach() first. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\torch\csrc\autograd\generated\python_variable_methods.cpp:836.)
  print("Epoch", epoch, "| loss =", float(loss))


Remarque :<br>

`model.train()` : cette instruction dit √† PyTorch comment certaines couches doivent se comporter. En effet, certaines couches PyTorch ne se comportent pas de le m√™me mani√®re en entra√Ænement qu'en √©valuation (ex : nn.Dropout(), nn.BatchNorm1d()). Par exemple, en mode train, la couche de Dropout d√©sactive des neurones alors qu'en mode eval tous les neurones sont actifs.<br>

Dans ce code, l'instruction `model.train()` n'est pas obligatoire mais c'est une bonne pratique.

## 8) Evaluation

1. Apply `sigmoid` to the logits
2. Convert probabilities to predictions
3. Compute **accuracy** on the validation set


In [None]:
# TODO: evaluation
with torch.no_grad(): # .no_grad() d√©sactive le suivi des op√©rations sur les tenseurs o√π l'on avait √©crit "requires_grad=True" (cela r√©duit l'utilisation de la m√©moire)
    logits = model.forward(X_val_d)
    probs = sigmoid(logits)
    preds = torch.where(probs > 0.5, 1, 0)

accuracy = (preds == y_val_d).float().mean()
# -> preds == y_val_d renvoie un tenseur de bool√©ens
# -> .float() convertit ces bool√©ens en float : True  = 1.0, False = 0.0
accuracy


tensor(0.7400)

## 9) Reflection questions (answer inside the markdown)

1. Why do we **not** apply sigmoid inside the model?

Stabilit√© : BCEWithLogitsLoss calcule les gradients de fa√ßon plus pr√©cise qu'une Sigmoid manuelle.

Saturation : On √©vite que les gradients deviennent minuscules trop vite pendant l'entra√Ænement.

2. What would happen if we removed all ReLU activations?

Il ne pourra plus apprendre que des droites et √©chouera sur des probl√®mes complexes.

3. How does this toy problem relate to image classification?

C'est qu'une question d'√©chelle. Au lieu d'avoir des tenseurs de dimension 2, on aura des tenseurs de dimension 3 (la troisi√®me dimension correspondant aux nombre de couleurs, ou canaux)

Write short answers (2‚Äì3 lines each).


## 10) Bridge to Computer Vision

So far:
- inputs = vectors of size 3
- layers = fully-connected

Next session:
- inputs = images `(B, C, H, W)`
- layers = convolutions
- same training logic

üëâ **Architecture changes, learning principles stay the same.**


## Part II ‚Äî Training on MNIST

Check the next notebook