# Lab1 ‚Äî PyTorch Foundations for Computer Vision

**Course**: Deep Learning for Image Analysis

**Class**: M2 IASD App  

**Professor**: Mehyar MLAWEH

---

## Objectives
By the end of this lab, you should be able to:

- Understand how **neurons and layers** are implemented in PyTorch
- Manipulate **tensors** and reason about shapes
- Use **autograd** to compute gradients
- Implement a **training loop** yourself
- Connect theory (neurons, loss, backprop) to actual code

‚ö†Ô∏è This notebook is **intentionally incomplete**.  
Whenever you see **`# TODO`**, you are expected to write code.


**Deadline:** üóìÔ∏è **Saturday, February 7th (23:59)**

## ü§ñ A small (honest) note before you start

Let‚Äôs be real for a second.

 I know you **can use LLMs (ChatGPT, Copilot, Claude, etc.)** to help you with this lab.  
And yes, **I use them too**, so don‚Äôt worry üòÑ

üëâ **You are allowed to use AI tools.**  
But here‚Äôs the deal:

- Don‚Äôt just **copy‚Äìpaste** code you don‚Äôt understand  
- Take time to **read, question, and modify** what the model gives you  
- If you can solve a block **by yourself, without AI**, that‚Äôs excellent

Remember:

> AI can write code for you, but **only you can understand it** ‚Äî and understanding is what matters for exams, projects, and real work.

Use these tools **as assistants, not as replacements for thinking**.

---

## üìö Useful documentation (highly recommended)

You will often find answers faster (and more reliably) by checking the official documentation:

- **PyTorch main documentation**  
  https://pytorch.org/docs/stable/index.html

- **PyTorch tensors**  
  https://pytorch.org/docs/stable/tensors.html

- **Neural network modules (`torch.nn`)**  
  https://pytorch.org/docs/stable/nn.html

- **Loss functions** (`BCEWithLogitsLoss`, CrossEntropy, etc.)  
  https://pytorch.org/docs/stable/nn.html#loss-functions

- **Optimizers** (`SGD`, `Adam`, ‚Ä¶)  
  https://pytorch.org/docs/stable/optim.html

If you learn how to **navigate the documentation**, you are already thinking like a real AI engineer üëå

---

## PART I

## 0) Colab setup ‚Äî GPU check

**Instructions**
1. In Colab: `Runtime ‚Üí Change runtime type to GPU T4`
2. Select **GPU**
3. Save and restart runtime

Then run the cell below.


In [2]:
import torch

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

# Set the device correctly (cuda if available, else cpu)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print("Using device:", device)


PyTorch version: 2.9.0+cpu
CUDA available: False
Using device: cpu


## 1) Imports and reproducibility


In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

# Fix random seed for reproducibility (CPU only)
torch.manual_seed(42)

# pourquoi ?

## 2) PyTorch tensors and shapes

Tensors are multi-dimensional arrays that support:
- GPU acceleration
- automatic differentiation

Understanding **shapes** is critical in deep learning.


In [3]:
# Examples
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.randn(4, 5)

print("a shape:", a.shape)
print("b shape:", b.shape)


a shape: torch.Size([3])
b shape: torch.Size([4, 5])


### üîç Question

- **How many dimensions does tensor `b` have?**  
  Le tenseur **`b` a 2 dimensions**.

- **What does each dimension represent conceptually?**  
  La **premi√®re dimension (taille 4)** correspond au **nombre de lignes**.  
  La **seconde dimension (taille 5)** correspond au **nombre de colonnes**.  

  On peut interpr√©ter les **lignes comme des observations** et les **colonnes comme des caract√©ristiques**.


### ‚úÖTensor operations

Complete the following:

1. Create a tensor `x` of shape `(8, 3)` with random values  
2. Compute:
   - the **mean of each column**
   - the **L2 norm of each row**
3. Normalize `x` **row-wise** using the L2 norm

In [4]:
# TODO: create x
x = torch.randn(8, 3)

# TODO: column mean
col_mean = x.mean(dim=0)

# TODO: row-wise L2 norm
row_norm = torch.norm(x, p=2, dim=1)

# TODO: normalized tensor
x_normalized = x / row_norm.unsqueeze(1)

print(x.shape, col_mean.shape, row_norm.shape, x_normalized.shape)


torch.Size([8, 3]) torch.Size([3]) torch.Size([8]) torch.Size([8, 3])


## 3) Artificial neuron ‚Äî from math to code

A neuron computes:

$$
z = \sum_i w_i x_i + b
$$

Then applies an activation function:

$$
y = g(z)
$$

This section connects directly to the theory seen in class.


In [5]:
x = torch.tensor([1.0, -2.0, 3.0])
w = torch.tensor([0.2, 0.4, -0.1])
b = torch.tensor(0.1)

z = torch.sum(x * w) + b
z


tensor(-0.8000)

### Activation functions

1. Implement **ReLU**
2. Implement **Sigmoid**
3. Apply both to `z` and compare the outputs

Which activation preserves negative values?

### R√©ponse

**Aucune des deux fonctions d‚Äôactivation (ReLU et Sigmoid) ne pr√©serve les valeurs n√©gatives.**

- **ReLU** met toutes les valeurs n√©gatives √† **0**.  
- **Sigmoid** transforme les valeurs n√©gatives en **valeurs positives comprises entre 0 et 0,5**.

Donc, **les valeurs n√©gatives sont perdues dans les deux cas**.


In [6]:
# ReLU
def relu(z):
    return torch.clamp(z, min=0)

# Sigmoid
def sigmoid(z):
    return 1 / (1 + torch.exp(-z))

# Apply activations
y_relu = relu(z)
y_sigmoid = sigmoid(z)

y_relu, y_sigmoid


(tensor(0.), tensor(0.3100))

c

In [7]:
x = torch.tensor([1.0, 2.0, -1.0], requires_grad=True)
w = torch.tensor([0.5, -0.3, 0.8], requires_grad=True)
b = torch.tensor(0.2, requires_grad=True)

z = torch.sum(x * w) + b
loss = (z - 1.0) ** 2

loss.backward()

print("loss:", loss.item())
print("grad w:", w.grad)
print("grad b:", b.grad)


loss: 2.890000104904175
grad w: tensor([-3.4000, -6.8000,  3.4000])
grad b: tensor(-3.4000)


### üîç Conceptual question

- If `b.grad > 0`, should `b` increase or decrease after a gradient descent step?
Explain **why** in one sentence.


### R√©ponse conceptuelle

- **Si `b.grad > 0`, `b` doit diminuer apr√®s une √©tape de descente de gradient.**

En effet, la descente de gradient met √† jour les param√®tres dans la **direction oppos√©e au gradient** afin de r√©duire la valeur de la fonction de perte.



## 5) Toy classification dataset

We create a **linearly separable** dataset.

Label rule:
- class = 1 if `x‚ÇÅ + x‚ÇÇ + x‚ÇÉ > 0`
- else class = 0

This mimics a very simple classification problem.


In [8]:
# TODO: generate a dataset of size N=500 with 3 features
# X = ...
# y = ...  # shape (N, 1)

# TODO: split into train (80%) and validation (20%)

# Dataset size
N = 500

X = torch.randn(N, 3)

# class = 1 if x1 + x2 + x3 > 0 else 0
y = (X.sum(dim=1) > 0).float().unsqueeze(1)  # shape (N, 1)

n_train = int(0.8 * N)

X_train, X_val = X[:n_train], X[n_train:]
y_train, y_val = y[:n_train], y[n_train:]

# check
print(X_train.shape, y_train.shape)
print(X_val.shape, y_val.shape)



torch.Size([400, 3]) torch.Size([400, 1])
torch.Size([100, 3]) torch.Size([100, 1])


## 6) Model definition

We define a small **MLP** (fully-connected network):

`3 ‚Üí 16 ‚Üí 8 ‚Üí 1`

Activation: ReLU  
Output: raw logits (no sigmoid)


In [10]:

import torch.nn as nn

class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(3, 16),  # 3 ‚Üí 16
            nn.ReLU(),
            nn.Linear(16, 8),  # 16 ‚Üí 8
            nn.ReLU(),
            nn.Linear(8, 1)    # 8 ‚Üí 1 (logits)
        )

    def forward(self, x):
        return self.net(x)

# Create model and move it to the device (CPU here)
model = MLP().to(device)

print(model)



MLP(
  (net): Sequential(
    (0): Linear(in_features=3, out_features=16, bias=True)
    (1): ReLU()
    (2): Linear(in_features=16, out_features=8, bias=True)
    (3): ReLU()
    (4): Linear(in_features=8, out_features=1, bias=True)
  )
)


###  parameters

1. Compute **by hand** the total number of parameters
2. Verify your answer using PyTorch

### 1) Calcul √† la main

R√®gle : pour une couche `Linear(a ‚Üí b)`  
‚Üí nombre de param√®tres = `a √ó b + b`

- **Couche 1 : Linear(3 ‚Üí 16)**  
  3 √ó 16 + 16 = **64**

- **Couche 2 : Linear(16 ‚Üí 8)**  
  16 √ó 8 + 8 = **136**

- **Couche 3 : Linear(8 ‚Üí 1)**  
  8 √ó 1 + 1 = **9**

- **Total = 64 + 136 + 9 = 209 param√®tres**

---



In [11]:
# TODO: count parameters with PyTorch
total_params = sum(p.numel() for p in model.parameters())

total_params


209

## 7) Training loop

You must complete the full training loop:
- forward pass
- loss computation
- backward pass
- optimizer step

Loss: `BCEWithLogitsLoss`
Optimizer: `SGD`


In [17]:
# TODO: move data to device
import torch.optim as optim

X_train_d = X_train.to(device)
y_train_d = y_train.to(device)
X_val_d = X_val.to(device)
y_val_d = y_val.to(device)

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(51):
    model.train()
    optimizer.zero_grad()

    # Forward pass
    logits = model(X_train_d)

    # Loss computation
    loss = criterion(logits, y_train_d)

    # Backward pass
    loss.backward()

    # Optimizer step
    optimizer.step()

    if epoch % 5 == 0:
        print("Epoch", epoch, "| loss =", float(loss))



Epoch 0 | loss = 0.5832364559173584
Epoch 5 | loss = 0.5511772036552429
Epoch 10 | loss = 0.5140101909637451
Epoch 15 | loss = 0.47297462821006775
Epoch 20 | loss = 0.4294820725917816
Epoch 25 | loss = 0.38542649149894714
Epoch 30 | loss = 0.34272757172584534
Epoch 35 | loss = 0.302848219871521
Epoch 40 | loss = 0.26720693707466125
Epoch 45 | loss = 0.23636604845523834
Epoch 50 | loss = 0.2101871818304062


## 8) Evaluation

1. Apply `sigmoid` to the logits
2. Convert probabilities to predictions
3. Compute **accuracy** on the validation set


In [19]:
# TODO: evaluation

# Evaluation on validation set
model.eval()

with torch.no_grad():
    # Forward pass
    logits = model(X_val_d)
    # Apply sigmoid to get probabilities
    probs = torch.sigmoid(logits)
    # Convert probabilities to predictions (0 or 1)
    preds = (probs > 0.5).float()

# Compute accuracy
accuracy = (preds == y_val_d).float().mean()

accuracy


tensor(0.9800)

## 9) Reflection questions (answer inside the markdown)

1. Why do we **not** apply sigmoid inside the model?
2. What would happen if we removed all ReLU activations?
3. How does this toy problem relate to image classification?

Write short answers (2‚Äì3 lines each).



## 9) Reflection questions

1.
On utilise `BCEWithLogitsLoss`, qui applique d√©j√† la sigmoid de mani√®re interne.  
Cela est num√©riquement plus stable et √©vite des probl√®mes de gradient.

2.
Le r√©seau deviendrait une simple transformation lin√©aire, √©quivalente √† une r√©gression logistique.  
Il perdrait sa capacit√© √† mod√©liser des relations non lin√©aires.

3.
Chaque entr√©e peut √™tre vue comme un vecteur de pixels, et le r√©seau apprend une fronti√®re de d√©cision.  
La classification d‚Äôimages suit le m√™me principe, mais avec beaucoup plus de dimensions et des mod√®les plus complexes.


## 10) Bridge to Computer Vision

So far:
- inputs = vectors of size 3
- layers = fully-connected

Next session:
- inputs = images `(B, C, H, W)`
- layers = convolutions
- same training logic

üëâ **Architecture changes, learning principles stay the same.**


## Part II ‚Äî Training on MNIST

Check the next notebook