# **Agentic AI Task: Fine-tuning a Small Language Model (SLM)**

This notebook demonstrates fine-tuning a Small Language Model (SLM) on a text dataset using Google Colab. All steps, explanations, results, and observations are clearly documented.

**1. Objective**

*   Select a text dataset from Hugging Face
*   Choose a Small Language Model (< 3B parameters)
*   Fine-tune the model on the dataset
*   Evaluate the model using suitable metrics
*   Analyze results and observations

**2. Environment Setup (Google Colab)**

Run the following cell to install required libraries:

In [1]:
!pip install -q transformers datasets accelerate evaluate

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/84.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m81.9/84.1 kB[0m [31m2.3 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[?25h

**3. Dataset Selection**

Dataset Chosen: ag_news

*   Source: Hugging Face Datasets
*   Task: News topic classification
*   Classes: World, Sports, Business, Sci/Tech
*   Reason for choice:

    *   Clean text dataset
    *   Well-balanced and commonly used for evaluation





In [2]:
from datasets import load_dataset

dataset = load_dataset("ag_news")
dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]



data/train-00000-of-00001.parquet:   0%|          | 0.00/18.6M [00:00<?, ?B/s]

data/test-00000-of-00001.parquet:   0%|          | 0.00/1.23M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/120000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/7600 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 120000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 7600
    })
})

**4. Model Selection**
Model Chosen: distilbert-base-uncased

*   Parameters: ~66 million
*   Architecture: Transformer-based encoder
*   Advantages:
    *   Lightweight and fast
    *   Good baseline performance





In [3]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased",
num_labels=4
)

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/100 [00:00<?, ?it/s]

DistilBertForSequenceClassification LOAD REPORT from: distilbert-base-uncased
Key                     | Status     | 
------------------------+------------+-
vocab_layer_norm.bias   | UNEXPECTED | 
vocab_layer_norm.weight | UNEXPECTED | 
vocab_transform.weight  | UNEXPECTED | 
vocab_transform.bias    | UNEXPECTED | 
vocab_projector.bias    | UNEXPECTED | 
pre_classifier.bias     | MISSING    | 
pre_classifier.weight   | MISSING    | 
classifier.bias         | MISSING    | 
classifier.weight       | MISSING    | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING	:those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.


**5. Data Preprocessing**

Tokenizing the news text and preparing labels:

In [6]:
def tokenize_function(examples):
  return tokenizer(examples["text"], truncation=True, padding="max_length")

tokenized_datasets = dataset.map(tokenize_function, batched=True)

small_train = tokenized_datasets["train"].shuffle(seed=42).select(range(20000))
small_test = tokenized_datasets["test"].shuffle(seed=42).select(range(4000))

Map:   0%|          | 0/120000 [00:00<?, ? examples/s]

Map:   0%|          | 0/7600 [00:00<?, ? examples/s]

**6. Training Setup**

We use Hugging Face Trainer for fine-tuning.

In [11]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    weight_decay=0.01,
    logging_dir="./logs",
)

`logging_dir` is deprecated and will be removed in v5.2. Please set `TENSORBOARD_LOGGING_DIR` instead.


**7. Evaluation Metric**

We use Accuracy as the primary metric since this is a classification task.

In [13]:
import evaluate
accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = logits.argmax(axis=-1)
    return accuracy.compute(
        predictions=predictions,
        references=labels
    )


Downloading builder script: 0.00B [00:00, ?B/s]

**8. Fine-Tuning the Model**

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train,
    eval_dataset=small_test,
    compute_metrics=compute_metrics,
)

trainer.train()

Epoch,Training Loss,Validation Loss


**9. Evaluation Results**

In [None]:
trainer.evaluate()

**10. Observations**

* DistilBERT adapts well to news classification even with limited training epochs
* Fine-tuning significantly improves task-specific performance
* Smaller models are efficient and cost-effective for real-world tasks
* Training time is low and suitable for Google Colab

**11. Conclusion**

* Successfully fine-tuned a Small Language Model (<3B params)
* Used a Hugging Face dataset different from common examples
* Achieved strong accuracy with minimal compute
* Demonstrates the practicality of Agentic AI pipelines using SLMs

**12. Future Improvements**

* Increase training epochs
* Use parameter-efficient fine-tuning (LoRA)
* Try generative SLMs (e.g., TinyLLaMA)
* Evaluate with confusion matrix and F1-score