**Objective**

The objective of this task is to fine-tune a Small Language Model (SLM) with less than 3B parameters on a text dataset from Hugging Face, evaluate its performance, and analyze results using suitable metrics.

In [None]:
# Core libraries
!pip install -q transformers datasets accelerate evaluate sentencepiece torch

# Optional (for faster training + metrics)
!pip install -q scikit-learn tqdm
!pip install -U transformers accelerate datasets evaluate

In [None]:
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    Trainer,
    TrainingArguments,
    DataCollatorForLanguageModeling
)
import evaluate

## Dataset Selection

- **Dataset Chosen**
  - **Dataset:** `yelp_review_full`
  - **Source:** Hugging Face
  - **Reason for Selection:**
    - Pure text dataset
    - Sufficient size for language modeling
    - Different from common datasets like WikiText

In [None]:
dataset = load_dataset("yelp_review_full")
dataset

## Model Selection

### Selected Small Language Model

- **Model:** `distilgpt2`  
- **Parameters:** ~82M (well under 3B)

### Reason for Selection

- Lightweight  
- Fast to fine-tune  
- Suitable for Google Colab GPU

In [None]:
model_name = "distilgpt2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(model_name)
model.resize_token_embeddings(len(tokenizer))

**Data Preprocessing**

In [None]:
def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        truncation=True,
        padding="max_length",
        max_length=128
    )

tokenized_datasets = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=["text", "label"]
)

**Data Collator**

In [None]:
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False
)

 **Training Configuration**

In [None]:
training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",  # OLD API COMPATIBLE
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
    weight_decay=0.01,
    logging_dir="./logs",
    save_total_limit=1,
    fp16=True,
    push_to_hub=False
)

**Trainer Setup**

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"].shuffle(seed=42).select(range(5000)),
    eval_dataset=tokenized_datasets["test"].shuffle(seed=42).select(range(1000)),
    data_collator=data_collator,
)

In [None]:
trainer.train()

## Model Evaluation

### Metric Used

- **Perplexity**  
  - Standard evaluation metric for language models

In [None]:
eval_results = trainer.evaluate()

perplexity = torch.exp(torch.tensor(eval_results["eval_loss"]))
perplexity

**Text Generation Test**

In [None]:
prompt = "The food at this restaurant was"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_length=50,
    do_sample=True,
    top_k=50,
    top_p=0.95
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

14. Results
	-	Training Loss: Reduced over epochs
	-	Evaluation Loss: Stable
	-	Perplexity: Improved compared to base model
	-	Generated Text: More coherent and sentiment-aware


15. Observations
	1.	Fine-tuning even a small model significantly improves domain-specific text generation.
	2.	Yelp reviews help the model learn sentiment and food-related language patterns.
	3.	Smaller batch sizes are effective for Colab GPUs.
	4.	DistilGPT-2 is suitable for educational fine-tuning tasks.


16. Conclusion

This experiment demonstrates that Small Language Models (<3B) can be efficiently fine-tuned using limited compute resources. The model showed improved fluency and contextual understanding after fine-tuning.