# TP NLP ‚Äî T2 : Encodeur‚ÄìD√©codeur **Bidirectionnel** (BiLSTM) ‚Äî Master IA

Ce notebook correspond au **Tutoriel 2 (T2)** du module NLP.
Il introduit un **encodeur bidirectionnel** (BiLSTM) afin d‚Äôam√©liorer le mod√®le Seq2Seq vu en T1.

---
## üéØ Objectifs p√©dagogiques
- Comprendre les limites du Seq2Seq unidirectionnel
- Expliquer le fonctionnement d‚Äôun **RNN bidirectionnel**
- Impl√©menter un **encodeur BiLSTM + d√©codeur LSTM**
- Comparer empiriquement avec le mod√®le T1
- Pr√©parer conceptuellement le m√©canisme d‚Äô**attention** (T3)

---
## üß† Motivation
Dans un encodeur classique, la s√©quence est lue de gauche √† droite.
Les premiers tokens sont donc **moins bien repr√©sent√©s** dans l‚Äô√©tat final.

Un **BiLSTM** lit la s√©quence :
- une fois de gauche √† droite,
- une fois de droite √† gauche,

et concat√®ne les deux repr√©sentations.

---
## üß© Probl√®me √©tudi√©
Nous reprenons le **probl√®me d‚Äôinversion de s√©quence** :
```
[1, 5, 7, 3] ‚Üí [3, 7, 5, 1]
```
Ce probl√®me permet d‚Äôobserver clairement le gain apport√© par la bidirectionnalit√©.
---


In [None]:

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import random
import numpy as np

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device


## 1) Param√®tres

In [None]:

V = 20
MIN_LEN, MAX_LEN = 3, 12

TRAIN_SIZE = 8000
VALID_SIZE = 1000

BATCH_SIZE = 64
EMBED_DIM = 64
HIDDEN_DIM = 128

EPOCHS = 10
LR = 1e-3
TEACHER_FORCING_RATIO = 0.7


## 2) Vocabulaire

In [None]:

PAD = 0
SOS = V + 1
EOS = V + 2
VOCAB_SIZE = V + 3


## 3) Dataset

In [None]:

def generate_pair():
    L = random.randint(MIN_LEN, MAX_LEN)
    src = [random.randint(1, V) for _ in range(L)]
    tgt = [SOS] + list(reversed(src)) + [EOS]
    return src, tgt

class ReverseDataset(Dataset):
    def __init__(self, n):
        self.data = [generate_pair() for _ in range(n)]
    def __len__(self): return len(self.data)
    def __getitem__(self, i): return self.data[i]

def pad(seqs):
    m = max(len(s) for s in seqs)
    return torch.tensor([s + [PAD]*(m-len(s)) for s in seqs])

def collate(batch):
    src = pad([b[0] for b in batch])
    tgt = pad([b[1] for b in batch])
    return src, tgt[:,:-1], tgt[:,1:]

train_loader = DataLoader(ReverseDataset(TRAIN_SIZE), batch_size=BATCH_SIZE, shuffle=True, collate_fn=collate)
valid_loader = DataLoader(ReverseDataset(VALID_SIZE), batch_size=BATCH_SIZE, collate_fn=collate)


## 4) Encodeur **Bidirectionnel**

In [None]:

class BiEncoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.emb = nn.Embedding(VOCAB_SIZE, EMBED_DIM, padding_idx=PAD)
        self.lstm = nn.LSTM(
            EMBED_DIM, HIDDEN_DIM, batch_first=True, bidirectional=True
        )
        self.fc_h = nn.Linear(HIDDEN_DIM*2, HIDDEN_DIM)
        self.fc_c = nn.Linear(HIDDEN_DIM*2, HIDDEN_DIM)

    def forward(self, x):
        emb = self.emb(x)
        _, (h, c) = self.lstm(emb)
        h_cat = torch.cat([h[0], h[1]], dim=1)
        c_cat = torch.cat([c[0], c[1]], dim=1)
        h0 = torch.tanh(self.fc_h(h_cat)).unsqueeze(0)
        c0 = torch.tanh(self.fc_c(c_cat)).unsqueeze(0)
        return h0, c0


## 5) D√©codeur

In [None]:

class Decoder(nn.Module):
    def __init__(self):
        super().__init__()
        self.emb = nn.Embedding(VOCAB_SIZE, EMBED_DIM, padding_idx=PAD)
        self.lstm = nn.LSTM(EMBED_DIM, HIDDEN_DIM, batch_first=True)
        self.fc = nn.Linear(HIDDEN_DIM, VOCAB_SIZE)

    def forward(self, x, h, c):
        emb = self.emb(x)
        out, (h, c) = self.lstm(emb, (h, c))
        return self.fc(out), h, c


## 6) Seq2Seq bidirectionnel

In [None]:

class Seq2Seq(nn.Module):
    def __init__(self, enc, dec):
        super().__init__()
        self.enc = enc
        self.dec = dec

    def forward(self, src, tgt_in, tf=0.7):
        B, T = tgt_in.shape
        V = VOCAB_SIZE
        h, c = self.enc(src)
        outputs = torch.zeros(B, T, V, device=src.device)

        x = tgt_in[:,0].unsqueeze(1)
        for t in range(T):
            logits, h, c = self.dec(x, h, c)
            outputs[:,t:t+1,:] = logits
            pred = logits.argmax(-1)
            if t+1 < T:
                x = tgt_in[:,t+1].unsqueeze(1) if random.random()<tf else pred
        return outputs


## 7) Entra√Ænement

In [None]:

model = Seq2Seq(BiEncoder(), Decoder()).to(device)
optimizer = optim.Adam(model.parameters(), lr=LR)
criterion = nn.CrossEntropyLoss(ignore_index=PAD)

def run_epoch(loader, train=True):
    model.train() if train else model.eval()
    total = 0
    for src, tin, tout in loader:
        src, tin, tout = src.to(device), tin.to(device), tout.to(device)
        if train: optimizer.zero_grad()
        logits = model(src, tin, TEACHER_FORCING_RATIO if train else 0.0)
        B,T,V = logits.shape
        loss = criterion(logits.view(B*T, V), tout.view(B*T))
        if train:
            loss.backward()
            optimizer.step()
        total += loss.item()
    return total/len(loader)

for e in range(1, EPOCHS+1):
    tr = run_epoch(train_loader, True)
    va = run_epoch(valid_loader, False)
    print(f"Epoch {e:02d} | train {tr:.4f} | valid {va:.4f}")


---
## 8) Analyse p√©dagogique

### Ce que montre ce TP
- Le BiLSTM encode mieux les **d√©pendances longues**
- Les premiers tokens sont mieux restitu√©s
- Le goulot d‚Äô√©tranglement subsiste

üëâ Prochaine question naturelle :
**Pourquoi ne pas laisser le d√©codeur acc√©der √† tous les √©tats de l‚Äôencodeur ?**

‚û°Ô∏è R√©ponse : **Attention (T3)**.
---
