# Lab1 ‚Äî PyTorch Foundations for Computer Vision

**Course**: Deep Learning for Image Analysis 

**Class**: M2 IASD App  

**Professor**: Mehyar MLAWEH

---

## Objectives
By the end of this lab, you should be able to:

- Understand how **neurons and layers** are implemented in PyTorch
- Manipulate **tensors** and reason about shapes
- Use **autograd** to compute gradients
- Implement a **training loop** yourself
- Connect theory (neurons, loss, backprop) to actual code

‚ö†Ô∏è This notebook is **intentionally incomplete**.  
Whenever you see **`# TODO`**, you are expected to write code.


**Deadline:** üóìÔ∏è **Saturday, February 7th (23:59)**

## ü§ñ A small (honest) note before you start

Let‚Äôs be real for a second.

 I know you **can use LLMs (ChatGPT, Copilot, Claude, etc.)** to help you with this lab.  
And yes, **I use them too**, so don‚Äôt worry üòÑ

üëâ **You are allowed to use AI tools.**  
But here‚Äôs the deal:

- Don‚Äôt just **copy‚Äìpaste** code you don‚Äôt understand  
- Take time to **read, question, and modify** what the model gives you  
- If you can solve a block **by yourself, without AI**, that‚Äôs excellent 

Remember:

> AI can write code for you, but **only you can understand it** ‚Äî and understanding is what matters for exams, projects, and real work.

Use these tools **as assistants, not as replacements for thinking**.

---

## üìö Useful documentation (highly recommended)

You will often find answers faster (and more reliably) by checking the official documentation:

- **PyTorch main documentation**  
  https://pytorch.org/docs/stable/index.html

- **PyTorch tensors**  
  https://pytorch.org/docs/stable/tensors.html

- **Neural network modules (`torch.nn`)**  
  https://pytorch.org/docs/stable/nn.html

- **Loss functions** (`BCEWithLogitsLoss`, CrossEntropy, etc.)  
  https://pytorch.org/docs/stable/nn.html#loss-functions

- **Optimizers** (`SGD`, `Adam`, ‚Ä¶)  
  https://pytorch.org/docs/stable/optim.html

If you learn how to **navigate the documentation**, you are already thinking like a real AI engineer üëå

---

## PART I

## 0) Colab setup ‚Äî GPU check

**Instructions**
1. In Colab: `Runtime ‚Üí Change runtime type to GPU T4` 
2. Select **GPU**
3. Save and restart runtime

Then run the cell below.


In [1]:
import torch

print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())

# TODO: set the device correctly (cuda if available, else cpu)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print("Using device:", device)


PyTorch version: 2.9.0+cu126
CUDA available: True
Using device: cuda


## 1) Imports and reproducibility


In [18]:
import torch
import torch.nn as nn
import torch.optim as optim

# TODO: fix the random seed for reproducibility
seed = 43
torch.manual_seed(seed)

torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False


## 2) PyTorch tensors and shapes

Tensors are multi-dimensional arrays that support:
- GPU acceleration
- automatic differentiation

Understanding **shapes** is critical in deep learning.


In [3]:
# Examples
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.randn(4, 5)

print("a shape:", a.shape)
print("b shape:", b.shape)


a shape: torch.Size([3])
b shape: torch.Size([4, 5])


### üîç Question (answer inside the markdown)
- How many dimensions does tensor `b` have? b has (4,5) as shape so 2 dimensions
- What does each dimension represent conceptually? each dimension is an axe, here there are 2 one for rows and another for cols


### ‚úÖTensor operations

Complete the following:

1. Create a tensor `x` of shape `(8, 3)` with random values  
2. Compute:
   - the **mean of each column**
   - the **L2 norm of each row**
3. Normalize `x` **row-wise** using the L2 norm

In [4]:
# TODO: create x
x = torch.randn(8, 3)
print(x)


# TODO: column mean
col_mean = torch.mean(x, dim=0)

# TODO: row-wise L2 norm
row_norm = torch.norm(x, p=2, dim=1, keepdim=True)

# TODO: normalized tensor
x_normalized = x / row_norm

print(x.shape, col_mean.shape, row_norm.shape, x_normalized.shape)


tensor([[ 0.3189, -0.4245,  0.3057],
        [-0.7746,  0.0349,  0.3211],
        [ 1.5736, -0.8455,  0.3672],
        [ 0.1754,  1.3852, -0.4459],
        [ 1.4451,  0.8564,  2.2181],
        [ 0.5232,  1.1754,  0.5612],
        [-0.4527, -0.7718, -0.1722],
        [ 0.5238,  0.0566,  0.4263]])
torch.Size([8, 3]) torch.Size([3]) torch.Size([8, 1]) torch.Size([8, 3])


## 3) Artificial neuron ‚Äî from math to code

A neuron computes:

$$
z = \sum_i w_i x_i + b
$$

Then applies an activation function:

$$
y = g(z)
$$

This section connects directly to the theory seen in class.


In [5]:
x = torch.tensor([1.0, -2.0, 3.0])
w = torch.tensor([0.2, 0.4, -0.1])
b = torch.tensor(0.1)

z = torch.sum(x * w) + b
z


tensor(-0.8000)

### Activation functions

1. Implement **ReLU**
2. Implement **Sigmoid**
3. Apply both to `z` and compare the outputs

Which activation preserves negative values?


In [6]:
# TODO
def relu(z):
    if z < 0: 
        return 0
    return z

def sigmoid(z):
   return 1 / (1 + torch.exp(-z))

y_relu = relu(z)
y_sigmoid = sigmoid(z)
y_relu, y_sigmoid


(0, tensor(0.3100))

## 4) Autograd and gradients

PyTorch uses **automatic differentiation** to compute gradients
using the **chain rule** (backpropagation).


In [7]:
x = torch.tensor([1.0, 2.0, -1.0], requires_grad=True)
w = torch.tensor([0.5, -0.3, 0.8], requires_grad=True)
b = torch.tensor(0.2, requires_grad=True)

z = torch.sum(x * w) + b
loss = (z - 1.0) ** 2

loss.backward()

print("loss:", loss.item())
print("grad w:", w.grad)
print("grad b:", b.grad)


loss: 2.890000104904175
grad w: tensor([-3.4000, -6.8000,  3.4000])
grad b: tensor(-3.4000)


### üîç Conceptual question

- If `b.grad > 0`, should `b` increase or decrease after a gradient descent step?
Explain **why** in one sentence.


## 5) Toy classification dataset

We create a **linearly separable** dataset.

Label rule:
- class = 1 if `x‚ÇÅ + x‚ÇÇ + x‚ÇÉ > 0`
- else class = 0

This mimics a very simple classification problem.


In [8]:
# TODO: generate a dataset of size N=500 with 3 features
X = torch.randn(500, 3)
y = (X.sum(dim=1) > 0).int()

# TODO: split into train (80%) and validation (20%)
N = X.shape[0]
split_idx = int(0.8 * N)
# X_train, X_val = X[:split_idx], X[split_idx:]
# y_train, y_val = y[:split_idx], y[split_idx:]

X_train, X_val = torch.split(X, [split_idx, N - split_idx])
y_train, y_val = torch.split(y, [split_idx, N - split_idx])

## 6) Model definition

We define a small **MLP** (fully-connected network):

`3 ‚Üí 16 ‚Üí 8 ‚Üí 1`

Activation: ReLU  
Output: raw logits (no sigmoid)


In [9]:
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            # TODO: Linear 3 ‚Üí 16
            nn.Linear(3, 16),
            # TODO: ReLU
            nn.ReLU(),
            # TODO: Linear 16 ‚Üí 8
            nn.Linear(16, 8),
            # TODO: ReLU
            nn.ReLU(),
            # TODO: Linear 8 ‚Üí 1
            nn.Linear(8, 1)
        )

    def forward(self, x):
        return self.net(x)

# TODO: create model and move it to the GPU 
model = MLP().to(device)


###  parameters

1. Compute **by hand** the total number of parameters
2. Verify your answer using PyTorch


In [10]:
# TODO: count parameters with PyTorch
total_params = sum(p.numel() for p in model.parameters())
total_params


209

## 7) Training loop 

You must complete the full training loop:
- forward pass
- loss computation
- backward pass
- optimizer step

Loss: `BCEWithLogitsLoss`
Optimizer: `SGD`


In [11]:
# TODO: move data to device
X_train_d = X_train.to(device)
y_train_d = y_train.to(device)
X_val_d = X_val.to(device)
y_val_d = y_val.to(device)

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

for epoch in range(50):
    model.train()
    optimizer.zero_grad()

    # TODO: forward
    logits = model(X_train_d).squeeze()

    # TODO: loss
    loss = criterion(logits, y_train_d.float())

    # TODO: backward
    loss.backward()

    # TODO: update
    optimizer.step()

    if epoch % 5 == 0:
        print("Epoch", epoch, "| loss =", float(loss))


Epoch 0 | loss = 0.7006815671920776
Epoch 5 | loss = 0.6826688051223755
Epoch 10 | loss = 0.6627987623214722
Epoch 15 | loss = 0.6406792998313904
Epoch 20 | loss = 0.6144785284996033
Epoch 25 | loss = 0.5832680463790894
Epoch 30 | loss = 0.5466195344924927
Epoch 35 | loss = 0.5044291019439697
Epoch 40 | loss = 0.45812201499938965
Epoch 45 | loss = 0.4103253185749054


Consider using tensor.detach() first. (Triggered internally at /pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:836.)
  print("Epoch", epoch, "| loss =", float(loss))


## 8) Evaluation

1. Apply `sigmoid` to the logits
2. Convert probabilities to predictions
3. Compute **accuracy** on the validation set


In [15]:
# TODO: evaluation
with torch.no_grad():
    logits = model(X_val_d).squeeze()
    probs = torch.sigmoid(logits)
    preds = (probs >= 0.5).int()

accuracy = (preds == y_val_d).float().mean().item()
accuracy


0.9599999785423279

In [27]:
print(yy)

tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

In [28]:
mean = 10
std = 2
XX = torch.randn(500, 3).to(device) * std + mean
yy = (XX.sum(dim=1) > 30).int()

with torch.no_grad():
    logits = model(XX).squeeze()
    probs = torch.sigmoid(logits)
    preds = (probs >= 0.5).int()

accuracy = (preds == yy).float().mean().item()
accuracy


0.4960000216960907

## 9) Reflection questions (answer inside the markdown)

1. Why do we **not** apply sigmoid inside the model? bc sometimes we need logits as output, and some losses expect logits 
2. What would happen if we removed all ReLU activations? the architecure becomes linear so the model collapses and loses the possibility to learn none linear relationship 
3. How does this toy problem relate to image classification? same training, same principal, the image classification adds only dimension 

Write short answers (2‚Äì3 lines each).


## 10) Bridge to Computer Vision

So far:
- inputs = vectors of size 3
- layers = fully-connected

Next session:
- inputs = images `(B, C, H, W)`
- layers = convolutions
- same training logic

üëâ **Architecture changes, learning principles stay the same.**


## Part II ‚Äî Training on MNIST

Check the next notebook