# Chapter 6 - Exercises

> Author : Badr TAJINI - Large Language model (LLMs) - ESIEE 2024-2025

---


## Exercise 6.1: Increasing the context length

**Padding Input Sequences in Neural Language Models**

**Key Research Question: How does padding inputs to the maximum `token` length affect model predictive performance?**

*Methodological Approach:*
- Implement systematic `token` padding 
- Analyze padding's impact on model performance
- Explore input representation interactions

*Critical Parameters:*
- Input `padding` strategy
- Maximum `token` length
- Predictive performance metrics

*Recommended Investigation:*
1. Implement maximum-length input `padding`
2. Measure performance variations
3. Compare padded versus non-padded inputs
4. Assess computational implications

In [6]:
import torch
from previous_labs import GPTModel

# Configuration mise à jour
BASE_CONFIG = {
    "vocab_size": 50257,    
    "context_length": 2048 ,  
    "drop_rate": 0.0,         
    "qkv_bias": True          
}

# Charger les configurations du modèle existant
model_configs = {
    "gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
    "gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
    "gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
    "gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
}

# Exemple avec GPT-2 Small
CHOOSE_MODEL = "gpt2-small (124M)"
BASE_CONFIG.update(model_configs[CHOOSE_MODEL])

# Instancier le modèle avec la nouvelle configuration
model = GPTModel(BASE_CONFIG)

# Afficher les paramètres pour validation
print(f"Configuration du modèle : {BASE_CONFIG}")

Configuration du modèle : {'vocab_size': 50257, 'context_length': 2048, 'drop_rate': 0.0, 'qkv_bias': True, 'emb_dim': 768, 'n_layers': 12, 'n_heads': 12}


## Exercise 6.2: Finetuning the whole model

**Model-Wide Fine-Tuning Performance Assessment**

**Key Research Question: What is the impact of `fine-tuning` the entire transformer model versus a single final block on predictive performance?**


*Methodological Approach:*
- Implement comprehensive model `fine-tuning`
- Compare performance against single block tuning
- Assess computational and representational changes

*Critical Parameters:*
- Full model `fine-tuning` strategy
- Performance evaluation metrics
- Comparative analysis methodology

*Recommended Investigation:*
1. `Fine-tune` entire transformer model
2. Measure predictive performance metrics
3. Compare with previous single-block tuning results
4. Analyze performance variation mechanisms

In [7]:
# Débloquer tous les paramètres pour le fine-tuning complet
for param in model.parameters():
    param.requires_grad = True

# Initialiser l'optimiseur pour le fine-tuning complet
optimizer = torch.optim.AdamW(model.parameters(), lr=5e-5, weight_decay=0.1)

# Afficher les paramètres pour validation
print(f"Configuration du modèle pour le fine-tuning complet : {BASE_CONFIG}")

Configuration du modèle pour le fine-tuning complet : {'vocab_size': 50257, 'context_length': 2048, 'drop_rate': 0.0, 'qkv_bias': True, 'emb_dim': 768, 'n_layers': 12, 'n_heads': 12}


## Exercise 6.3: Finetuning the first versus last token 

**First Token Fine-Tuning: Predictive Performance Analysis**

**Key Research Question: How do predictive performance characteristics change when fine-tuning the first output `token` compared to the last output `token`?**

*Methodological Approach:*
- Fine-tune first output `token`
- Compare performance against last `token` fine-tuning
- Assess representational learning variations

*Critical Parameters:*
- Initial `token` fine-tuning strategy
- Performance evaluation metrics
- Comparative analysis methodology

*Recommended Investigation:*
1. Implement first `token` fine-tuning
2. Measure predictive performance
3. Compare with last `token` fine-tuning results
4. Analyze performance variation mechanisms

In [8]:
import torch
from previous_labs import GPTModel

# Configuration mise à jour
BASE_CONFIG = {
    "vocab_size": 50257,      
    "context_length": 2048,  
    "drop_rate": 0.0,         
    "qkv_bias": True          
}

# Charger les configurations du modèle existant
model_configs = {
    "gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
    "gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
    "gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
    "gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
}

# Exemple avec GPT-2 Small
CHOOSE_MODEL = "gpt2-small (124M)"
BASE_CONFIG.update(model_configs[CHOOSE_MODEL])

# Instancier le modèle avec la nouvelle configuration
model = GPTModel(BASE_CONFIG)

# Débloquer uniquement les premiers et derniers blocs pour le fine-tuning
for i, block in enumerate(model.trf_blocks):
    if i == 0 or i == len(model.trf_blocks) - 1:
        for param in block.parameters():
            param.requires_grad = True
    else:
        for param in block.parameters():
            param.requires_grad = False

# Initialiser l'optimiseur pour le fine-tuning sélectif
optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()), lr=5e-5, weight_decay=0.1
)

# Afficher les paramètres pour validation
print(f"Configuration du modèle pour le fine-tuning des premiers et derniers tokens : {BASE_CONFIG}")


Configuration du modèle pour le fine-tuning des premiers et derniers tokens : {'vocab_size': 50257, 'context_length': 2048, 'drop_rate': 0.0, 'qkv_bias': True, 'emb_dim': 768, 'n_layers': 12, 'n_heads': 12}
