### Fine Tuning Techniques in NLP

### Discriminative Fine Tuning

- Different layers of a pre-trained model capture different types of information.
- **Approach:**  
    - Use different learning rates for different layers of the model.
    - Lower learning rates for early layers (which capture general features).
    - Higher learning rates for later layers (which capture task-specific features).
- **Benefit:**  
    - Allows fine-tuning to adapt higher layers to the new task while preserving useful representations in lower layers.

---

### Slanted Triangular Learning Rates (SLTR)

- Dynamically adjusts learning rates during training to balance exploration and convergence.
- **Phases:**
    - **Warm-up:** Gradually increase the learning rate to promote exploration and avoid early convergence to poor minima.
    - **Decay:** Slowly decrease the learning rate to ensure stable convergence.
- **Use Case:**  
    - Particularly effective for fine-tuning pre-trained models like BERT and GPT, where careful learning rate scheduling can prevent catastrophic forgetting and improve performance.

---

### Regularization and Dropout for Preventing Overfitting in NLP Models

- **Regularization:**
    - **L1 Regularization:** Encourages sparsity by penalizing the absolute values of weights, leading to some weights becoming zero.
    - **L2 Regularization (Ridge):** Penalizes large weights by adding the squared magnitude of weights to the loss function, helping to prevent overfitting.
- **Dropout:**
    - Randomly drops units (neurons) during training.
    - Prevents over-reliance on specific neurons, encouraging the model to learn more robust features.
    - Commonly used in transformer-based models and other deep learning architectures.

---

### Evaluating Model Performance with NLP-Specific Metrics

- **Key Metrics:**
    - **F1 Score:**
        - Harmonic mean of precision and recall.
        - Suitable for classification tasks, especially with imbalanced datasets.
    - **BLEU Score:**
        - Evaluates the quality of generated text against reference text.
        - Commonly used for machine translation and text summarization tasks.
    - **ROUGE Score:**
        - Measures overlap between generated and reference summaries.
        - Widely used for evaluating summarization models.
    - **Perplexity:**
        - Measures how well a language model predicts a sample.
        - Lower perplexity indicates better performance for language modeling tasks.

In [None]:
import torch
from torch.utils.data import DataLoader
from torch import nn
from transformers import AutoTokenizer, AutoModelForSequenceClassification, get_scheduler
from datasets import load_dataset
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

In [None]:
# load dataset
dataset = load_dataset("imdb")
train_texts, test_texts, train_labels, test_labels = train_test_split(
    dataset["train"]["text"], dataset["train"]["label"], test_size=0.2, random_state=42
)

# load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")


def tokenize_data(texts, labels, tokenizer, max_length=128):
    encodings = tokenizer(texts, truncation=True, padding=True, max_length=max_length)
    return {
        "input_ids": encodings["input_ids"],
        "attention_mask": encodings["attention_mask"],
        "labels": labels,
    }


train_data = tokenize_data(train_texts, train_labels, tokenizer)
test_data = tokenize_data(test_texts, test_labels, tokenizer)


class IMDBDataset(torch.utils.data.Dataset):
    def __init__(self, data):
        self.input_ids = data["input_ids"]
        self.attention_mask = data["attention_mask"]
        self.labels = data["labels"]

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, index):
        return {
            "input_ids": torch.tensor(self.input_ids[index]),
            "attention_mask": torch.tensor(self.attention_mask[index]),
            "labels": torch.tensor(self.labels[index]),
        }


train_dataset = IMDBDataset(train_data)
test_dataset = IMDBDataset(test_data)

train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# load model
model = AutoModelForSequenceClassification.from_pretrained(
    "ber-base-uncased", num_labels=2
)

# define optimiser and STLR scheduler
optimizer = torch.optim.Adam(model.parameters(), lr=2e-5)
# define STLR scheduler
num_training_steps = len(train_dataloader) * 3
warmup_steps = int(0.1 * num_training_steps)
scheduler = get_scheduler(
    "slanted_triangular",
    optimizer,
    num_warmup_steps=warmup_steps,
    num_training_steps=num_training_steps,
)

# training loop
device = torch.device("cuda"  torch.cuda.is_available() else "cpu")
model.to(device)

def train_model():
    model.train()
    for epoch in range(3):
        for batch in train_dataloader:
            batch = (k: v.to(device) for k, v in batch.items())
            outputs = model(**batch)
            loss = outputs.loss
            loss.backward()
            optimizer.step()
            scheduler.step()
            optimizer.zero_grad()

train_model()

# evaluate model using f1 score
model.eval()
all_preds, all_labels = [], []

with torch.no_grad():
    for batch in test_dataloader:
        batch = (k: v.to(device) for k, v in batch.items())
        outputs = model(**batch)
        preds = torch.argmax(outputs.logits, dim=-1)
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(batch["labels"].cpu().numpy())

f1 = f1_score(all_labels, all_preds, average="macro")
print(f"F1 score: {f1}")

from sacredbleu import BLEU

# evaluate model using BLEU score
model.eval()
all_preds, all_labels = [], []

with torch.no_grad():
    for batch in test_dataloader:
        batch = (k: v.to(device) for k, v in batch.items())
        outputs = model(**batch)
        preds = torch.argmax(outputs.logits, dim=-1)
        all_preds.extend(preds.cpu().numpy())
        all_labels.extend(batch["labels"].cpu().numpy())

bleu = BLEU(all_labels, all_preds)
print(f"BLEU score: {bleu}")