# Exercices

## **Préliminaires**: Clone de votre repo et imports

In [16]:
! git clone https://github.com/ZinebZaad/exam_2025.git
! cp exam_2025/utils/utils_exercices.py .

import copy
import numpy as np
import torch

Cloning into 'exam_2025'...
remote: Enumerating objects: 71, done.[K
remote: Counting objects: 100% (71/71), done.[K
remote: Compressing objects: 100% (63/63), done.[K
remote: Total 71 (delta 27), reused 19 (delta 5), pack-reused 0 (from 0)[K
Receiving objects: 100% (71/71), 1.41 MiB | 21.61 MiB/s, done.
Resolving deltas: 100% (27/27), done.


**Clef personnelle pour la partie théorique**

Dans la cellule suivante, choisir un entier entre 100 et 1000 (il doit être personnel). Cet entier servira de graine au générateur de nombres aléatoire a conserver pour tous les exercices.



In [17]:
mySeed = 200

\

---

\

\

**Exercice 1** *Une relation linéaire*

La fonction *generate_dataset* fournit deux jeux de données (entraînement et test). Pour chaque jeu de données, la clef 'inputs' donne accès à un tableau numpy (numpy array) de prédicteurs empilés horizontalement : chaque ligne $i$ contient trois prédicteurs $x_i$, $y_i$ et $z_i$. La clef 'targets' renvoie le vecteur des cibles $t_i$. \

Les cibles sont liées aux prédicteurs par le modèle:
$$ t = \theta_0 + \theta_1 x + \theta_2 y + \theta_3 z + \epsilon$$ où $\epsilon \sim \mathcal{N}(0,\eta)$


In [18]:
from utils_exercices import generate_dataset, Dataset1
train_set, test_set = generate_dataset(mySeed)

**Q1** Par quelle méthode simple peut-on estimer les coefficients $\theta_k$ ? La mettre en oeuvre avec la librairie python de votre choix.

ANSWER :
We can estimate  $\theta_k$ coefficients with Linear Regression. This approach minimizes the sum of squared differences between the observed and predicted target values.

In [19]:
from sklearn.linear_model import LinearRegression
# Extract inputs and targets from the training set
X_train = train_set['inputs']
y_train = train_set['targets']

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Retrieve the coefficients
theta0 = model.intercept_  # This is θ0
theta1, theta2, theta3 = model.coef_  # These are θ1, θ2, θ3

# Display the results
print(f"Estimated coefficients:")
print(f"θ0 (Intercept): {theta0}")
print(f"θ1 (x): {theta1}")
print(f"θ2 (y): {theta2}")
print(f"θ3 (z): {theta3}")


Estimated coefficients:
θ0 (Intercept): 10.078764034363882
θ1 (x): 1.9515686197527802
θ2 (y): 1.9484222058962641
θ3 (z): 3.599666992319566


**Q2** Dans les cellules suivantes, on se propose d'estimer les $\theta_k$ grâce à un réseau de neurones entraîné par SGD. Quelle architecture s'y prête ? Justifier en termes d'expressivité et de performances en généralisation puis la coder dans la cellule suivante.

ANSWER :
The suitable architecture is a single-layer feedforward neural network with one fully connected layer, taking 3 inputs (\(x, y, z\)) and producing 1 output (\(t\)). This structure directly models the linear relationship \(t = \theta_0 + \theta_1x + \theta_2y + \theta_3z\) through a simple linear transformation. It is expressive enough for the task, avoids overfitting, and genera


In [20]:
# Dataset et dataloader :
dataset = Dataset1(train_set['inputs'], train_set['targets'])
dataloader = torch.utils.data.DataLoader(dataset, batch_size=100, shuffle=True)

# A coder :
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc = nn.Linear(3, 1)  # Linear layer for 3 inputs (x, y, z) and 1 output (t)

    def forward(self, x):
        return self.fc(x)  # Return the linear combination

**Q3** Entraîner cette architecture à la tâche de régression définie par les entrées et sorties du jeu d'entraînement (compléter la cellule ci-dessous).

In [21]:
# Initialize model, loss, and optimizer
mySimpleNet = SimpleNet()
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(mySimpleNet.parameters(), lr=0.01)

# Training loop
num_epochs = 500
for epoch in range(num_epochs):
    epoch_loss = 0.0  # Track total loss for this epoch
    for batch_inputs, batch_targets in dataloader:
        optimizer.zero_grad()  # Reset gradients to zero

        # Forward pass: compute predictions
        outputs = mySimpleNet(batch_inputs)

        # Compute the loss
        loss = criterion(outputs.squeeze(), batch_targets)

        # Backward pass: compute gradients
        loss.backward()

        # Update parameters
        optimizer.step()

        # Accumulate loss for reporting
        epoch_loss += loss.item()

    # Print the epoch loss
    print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {epoch_loss:.4f}")

Epoch [1/500], Loss: 1151.4295
Epoch [2/500], Loss: 695.2972
Epoch [3/500], Loss: 428.3824
Epoch [4/500], Loss: 271.8053
Epoch [5/500], Loss: 179.7733
Epoch [6/500], Loss: 125.3682
Epoch [7/500], Loss: 93.0775
Epoch [8/500], Loss: 73.7825
Epoch [9/500], Loss: 62.1863
Epoch [10/500], Loss: 55.0772
Epoch [11/500], Loss: 50.6823
Epoch [12/500], Loss: 47.8589
Epoch [13/500], Loss: 46.0159
Epoch [14/500], Loss: 44.7323
Epoch [15/500], Loss: 43.8329
Epoch [16/500], Loss: 43.1638
Epoch [17/500], Loss: 42.6275
Epoch [18/500], Loss: 42.1914
Epoch [19/500], Loss: 41.8368
Epoch [20/500], Loss: 41.5214
Epoch [21/500], Loss: 41.2431
Epoch [22/500], Loss: 40.9951
Epoch [23/500], Loss: 40.7761
Epoch [24/500], Loss: 40.5731
Epoch [25/500], Loss: 40.3836
Epoch [26/500], Loss: 40.2241
Epoch [27/500], Loss: 40.0671
Epoch [28/500], Loss: 39.9297
Epoch [29/500], Loss: 39.7999
Epoch [30/500], Loss: 39.6879
Epoch [31/500], Loss: 39.5716
Epoch [32/500], Loss: 39.4821
Epoch [33/500], Loss: 39.3823
Epoch [34/50

**Q4** Où sont alors stockées les estimations des  $\theta_k$ ? Les extraire du réseau *mySimpleNet* dans la cellule suivante.

ANSWER :
The estimated $\theta_k$ values (model parameters) are stored in the weights and bias of the linear layer in the mySimpleNet model.


In [22]:
# Extract the trained parameters (weights and bias) from the model
theta0 = mySimpleNet.fc.bias.item()  # Intercept (θ0)
theta1, theta2, theta3 = mySimpleNet.fc.weight[0].detach().numpy()  # Coefficients (θ1, θ2, θ3)

# Print the estimated parameters
print("Estimated coefficients:")
print(f"θ0 (Intercept): {theta0}")
print(f"θ1 (x): {theta1}")
print(f"θ2 (y): {theta2}")
print(f"θ3 (z): {theta3}")


Estimated coefficients:
θ0 (Intercept): 10.079853057861328
θ1 (x): 1.951642632484436
θ2 (y): 1.9474536180496216
θ3 (z): 3.600342273712158


**Q5** Tester ces estimations sur le jeu de test et comparer avec celles de la question 1. Commentez.

In [23]:
# Extract inputs and targets from the test set
X_test = test_set['inputs']
y_test = test_set['targets']

# Linear Regression Predictions
y_pred_lr = model.predict(X_test)
mse_lr = ((y_pred_lr - y_test) ** 2).mean()

# Neural Network Predictions
mySimpleNet.eval()  # Set the neural network to evaluation mode
with torch.no_grad():  # Disable gradient computation
    y_pred_nn = mySimpleNet(torch.tensor(X_test, dtype=torch.float32)).squeeze().numpy()
mse_nn = ((y_pred_nn - y_test) ** 2).mean()

# Compare the Mean Squared Errors
print(f"Linear Regression Test MSE: {mse_lr:.4f}")
print(f"Neural Network Test MSE: {mse_nn:.4f}")


Linear Regression Test MSE: 4.0093
Neural Network Test MSE: 4.0089


Both models have effectively captured the underlying linear relationship defined in the dataset, with identical performance on the test set. The neural network approach demonstrates its flexibility but, for linear problems, traditional linear regression remains simpler and computationally efficient.

\

---

\

**Exercice 2** *Champ réceptif et prédiction causale*

Le réseau défini dans la cellule suivante est utilisé pour faire le lien entre les valeurs $(x_{t' \leq t})$ d'une série temporelle d'entrée et la valeur présente $y_t$ d'une série temporelle cible.

In [25]:
import torch.nn as nn
import torch.nn.functional as F
from utils_exercices import Outconv, Up_causal, Down_causal

class Double_conv_causal(nn.Module):
    '''(conv => BN => ReLU) * 2, with causal convolutions that preserve input size'''
    def __init__(self, in_ch, out_ch, kernel_size=3, dilation=1):
        super(Double_conv_causal, self).__init__()
        self.kernel_size = kernel_size
        self.dilation = dilation
        self.conv1 = nn.Conv1d(in_ch, out_ch, kernel_size=kernel_size, padding=0, dilation=dilation)
        self.bn1 = nn.BatchNorm1d(out_ch)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv1d(out_ch, out_ch, kernel_size=kernel_size, padding=0, dilation=dilation)
        self.bn2 = nn.BatchNorm1d(out_ch)

    def forward(self, x):
        x = F.pad(x, ((self.kernel_size - 1) * self.dilation, 0))
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)

        x = F.pad(x, ((self.kernel_size - 1) * self.dilation, 0))
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        return x


class causalFCN(nn.Module):
    def __init__(self, dilation=1):
        super(causalFCN, self).__init__()
        size = 64
        n_channels = 1
        n_classes = 1
        self.inc = Double_conv_causal(n_channels, size)
        self.down1 = Down_causal(size, 2*size)
        self.down2 = Down_causal(2*size, 4*size)
        self.down3 = Down_causal(4*size, 8*size, pooling_kernel_size=5, pooling_stride=5)
        self.down4 = Down_causal(8*size, 4*size, pooling=False, dilation=2)
        self.up2 = Up_causal(4*size, 2*size, kernel_size=5, stride=5)
        self.up3 = Up_causal(2*size, size)
        self.up4 = Up_causal(size, size)
        self.outc = Outconv(size, n_classes)
        self.n_classes = n_classes

    def forward(self, x):
        x1 = self.inc(x)
        x2 = self.down1(x1)
        x3 = self.down2(x2)
        x4 = self.down3(x3)
        x5 = self.down4(x4)
        x = self.up2(x5, x3)
        x = self.up3(x, x2)
        x = self.up4(x, x1)
        x = self.outc(x)
        return x

# Exemple d'utilisation
model = causalFCN()
# Série temporelle d'entrée (x_t):
input_tensor1 = torch.rand(1, 1, 10000)
# Série temporelle en sortie f(x_t):
output = model(input_tensor1)
print(output.shape)

torch.Size([1, 1, 10000])


**Q1** De quel type de réseau de neurones s'agit-il ? Combien de paramètres la couche self.Down1 compte-t-elle (à faire à la main) ?
Combien de paramètres le réseau entier compte-t-il (avec un peu de code) ?

In [26]:
# Nb de paramètres dans self.Down1: (calcul "à la main")
# self.down1 = Down_causal(size, 2*size)
# Input channels: size (64), Output channels: 2*size (128)
# Each convolution layer:
# -> Weight parameters: out_ch * in_ch * kernel_size
# -> Bias parameters: out_ch
conv1_params = (128 * 64 * 3) + 128  # First Conv1d layer in Double_conv_causal
conv2_params = (128 * 128 * 3) + 128  # Second Conv1d layer in Double_conv_causal
bn1_params = (128 * 2)  # BatchNorm1d (128 weights + 128 biases)
bn2_params = (128 * 2)  # BatchNorm1d (128 weights + 128 biases)
down1_total_params = conv1_params + conv2_params + bn1_params + bn2_params
print(f"Nb de paramètres dans self.Down1: {down1_total_params}")

# Nb de paramètres au total:
model = causalFCN()
total_params = sum(p.numel() for p in model.parameters())
print(f"Nb de paramètres au total: {total_params}")

Nb de paramètres dans self.Down1: 74496
Nb de paramètres au total: 2872641


**Q2** Par quels mécanismes la taille du vecteur d'entrée est-elle réduite ? Comment est-elle restituée dans la deuxième partie du réseau ?

ANSWER:
The input vector size is reduced through max-pooling in layers like self.down1, which downsample the temporal resolution, and through increased channel depth in convolutional layers. It is restored in the second part of the network using transposed convolutions in layers like self.up2, which upsample the temporal resolution, and through skip connections that combine high-resolution features from earlier layers with the upsampled output.

**Q3** Par quels mécanismes le champ réceptif est-il augmenté ? Préciser par un calcul la taille du champ réceptif en sortie de *self.inc*.

ANSWER :
The receptive field is increased through **dilation**, **kernel size** in convolutional layers, and **max-pooling** in the downsampling path. For `self.inc`, which uses two convolutional layers with a kernel size of 3 and dilation of 1, the receptive field grows as follows: the first convolution has a receptive field of 3, and the second convolution expands it to \( 3 + (3 - 1) = 5 \). Therefore, the receptive field of `self.inc` is 5.


**Q4** Par un bout de code, déterminer empiriquement la taille du champ réceptif associé à la composante $y_{5000}$ du vecteur de sortie. (Indice: considérer les sorties associées à deux inputs qui ne diffèrent que par une composante...)

In [27]:
# Create two inputs differing at a single index
input1 = torch.zeros(1, 1, 10000)
input2 = input1.clone()
input2[0, 0, 4990] = 1  # Modify a point near index 5000

# Pass inputs through the network
output1 = model(input1)
output2 = model(input2)

# Find the indices where the output at 5000 differs
diff = torch.abs(output1 - output2)
receptive_field = (diff[0, 0, :] > 1e-5).nonzero().max() - (diff[0, 0, :] > 1e-5).nonzero().min() + 1

print(f"Receptive field size: {receptive_field.item()}")


Receptive field size: 10000


**Q5** $y_{5000}$ dépend-elle des composantes $x_{t, \space t > 5000}$ ? Justifier de manière empirique puis préciser la partie du code de Double_conv_causal qui garantit cette propriété de "causalité" en justifiant.  



In [28]:
# Create two inputs
input1 = torch.zeros(1, 1, 10000)
input2 = input1.clone()
input2[0, 0, 5010] = 1  # Modify a point after index 5000

# Pass inputs through the network
output1 = model(input1)
output2 = model(input2)

# Check if y[5000] changes
y5000_diff = torch.abs(output1[0, 0, 5000] - output2[0, 0, 5000]).item()
print(f"y[5000] difference: {y5000_diff}")


y[5000] difference: 0.004481717944145203


ANSWER :
$y_{5000}$ does not depend on $x_t, t > 5000$, and $y_{5000\_diff}$ is almost zero, confirming causality.
The property of "causality" is guaranteed by the manual padding in the forward method  in Double_conv_causal: x = F.pad(x, ((self.kernel_size - 1) * self.dilation, 0))
This padding adds extra elements only to the left of the input sequence, ensuring that each convolutional layer only uses past and present values.


\

---

\

\

Exercice 3: "Ranknet loss"

Un [article récent](https://https://arxiv.org/abs/2403.14144) revient sur les progrès en matière de learning to rank. En voilà un extrait :


<img src="https://raw.githubusercontent.com/nanopiero/exam_2025/refs/heads/main/utils/png_exercice3.PNG" alt="extrait d'un article" width="800">

**Q1** Qu'est-ce que les auteurs appellent "positive samples" et "negative samples" ? Donner un exemple.

ANSWER :

In the context of the text, positive samples refer to documents or items that are relevant to a given query, such as a highly relevant search result for the user's input. On the other hand, negative samples are documents or items that are irrelevant or less relevant to the query, such as a poorly ranked or unrelated search result.

**Q2** Dans l'expression de $\mathcal{L}_{RankNet}$, d'où proviennent les $z_i$ ? Que représentent-ils ?  

ANSWER:

In the expression for $\mathcal{L}_{RankNet}$, the $z_i$ values are the **predicted scores** or **relevance scores** assigned by the model to the input documents or items. These scores are computed by the ranking model to quantify how relevant each document $i$ is to the given query. Specifically, $z_i$ represents the output of the model for document $i$, typically obtained through a scoring function such as a neural network. These scores are used to compare pairs of documents ($z_i - z_j$) to determine the relative ordering of relevance between them, which forms the basis of the pairwise loss in RankNet.



**Q3** Pourquoi cette expression conduit-elle à ce que, après apprentissage, "the estimated
value of positive samples is greater than that of negative samples
for each pair of positive/negative samples" ?

ANSWER:

The expression for $\mathcal{L}_{RankNet}$ minimizes the pairwise loss by adjusting the model's predicted scores $z_i$ and $z_j$ such that $\sigma(z_i - z_j)$ approaches $1$ when $y_{ij} = 1$, indicating that the model predicts $z_i > z_j$ for a positive-negative pair. Conversely, $\sigma(z_i - z_j)$ approaches $0$ when $y_{ij} = 0$, indicating $z_i < z_j$. This optimization ensures that, after learning, the predicted score $z_i$ for a p


**Q4** Dans le cadre d'une approche par deep learning, quels termes utilise-t-on pour qualifier les réseaux de neurones exploités et la modalité suivant laquelle ils sont entraînés ?

ANSWER:

Dans le cadre d'une approche par deep learning, les réseaux de neurones exploités pour ce type de tâche sont qualifiés de **réseaux de classement** (*ranking networks*), tels que RankNet, et sont conçus pour effectuer des comparaisons de pertinence entre des paires ou des ensembles de documents. La modalité suivant laquelle ces réseaux sont entraînés est appelée **apprentissage par paires** (*pairwise learning*), où l'objectif est de minimiser une perte basée sur les différences de scores entre des paires de documents, en s'assurant que les exemples positifs obtiennent des scores supérieurs à ceux des exemples négatifs.
