# Agentic AI Lab Task – Fine-tuning a Small Language Model (SLM)
## Ultra Tiny Dataset Version (Very Fast Training)
Dataset: sms_spam (subset 2,000 samples only)
Model: distilbert-base-uncased (<3B parameters)
Training time: ~1 minute on Colab GPU

## Step 1 – Install Libraries

In [1]:
!pip install transformers datasets evaluate accelerate -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/84.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25h

## Step 2 – Imports

In [2]:

import numpy as np
import evaluate
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer


## Step 3 – Load Dataset and Create SMALL Subset

In [4]:

dataset = load_dataset("sms_spam")

# The sms_spam dataset typically only has a 'train' split.
# To create a small test set, we will split the 'train' data.
# Take a total of 2000 samples for super fast training (1600 train, 400 test)
full_train_set = dataset["train"]
small_subset = full_train_set.select(range(2000))

# Split this small subset into training and testing sets
# 400 out of 2000 samples is 20% for the test set
split_datasets = small_subset.train_test_split(test_size=0.2, seed=42) # Added seed for reproducibility

train_small = split_datasets["train"]
test_small = split_datasets["test"]

dataset_small = {"train": train_small, "test": test_small}

dataset_small




{'train': Dataset({
     features: ['sms', 'label'],
     num_rows: 1600
 }),
 'test': Dataset({
     features: ['sms', 'label'],
     num_rows: 400
 })}

## Step 4 – Load Tokenizer

In [5]:

model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)


config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

## Step 5 – Tokenization

In [6]:

def tokenize(example):
    return tokenizer(example["sms"], padding="max_length", truncation=True, max_length=64)

dataset_small["train"] = dataset_small["train"].rename_column("label","labels")
dataset_small["test"] = dataset_small["test"].rename_column("label","labels")

train_ds = dataset_small["train"].map(tokenize, batched=True)
test_ds = dataset_small["test"].map(tokenize, batched=True)

train_ds.set_format("torch")
test_ds.set_format("torch")


Map:   0%|          | 0/1600 [00:00<?, ? examples/s]

Map:   0%|          | 0/400 [00:00<?, ? examples/s]

## Step 6 – Load Small Language Model

In [7]:

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/100 [00:00<?, ?it/s]

DistilBertForSequenceClassification LOAD REPORT from: distilbert-base-uncased
Key                     | Status     | 
------------------------+------------+-
vocab_layer_norm.bias   | UNEXPECTED | 
vocab_transform.weight  | UNEXPECTED | 
vocab_projector.bias    | UNEXPECTED | 
vocab_layer_norm.weight | UNEXPECTED | 
vocab_transform.bias    | UNEXPECTED | 
pre_classifier.bias     | MISSING    | 
pre_classifier.weight   | MISSING    | 
classifier.weight       | MISSING    | 
classifier.bias         | MISSING    | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING	:those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.


## Step 7 – Metrics

In [8]:

accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    return {"accuracy": accuracy.compute(predictions=preds, references=labels)["accuracy"]}


Downloading builder script: 0.00B [00:00, ?B/s]

## Step 8 – Training Arguments (FAST)

In [10]:

training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    num_train_epochs=2,
    logging_steps=50,
    eval_strategy="epoch", # Changed from evaluation_strategy
    save_strategy="no"
)


## Step 9 – Trainer

In [11]:

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset=test_ds,
    compute_metrics=compute_metrics
)


## Step 10 – Train (≈ 1 minute)

In [12]:
trainer.train()



Epoch,Training Loss,Validation Loss


KeyboardInterrupt: 

## Step 11 – Evaluate

In [None]:

results = trainer.evaluate()
print(results)


## Step 12 – Observations
- Very small dataset → extremely fast training
- Demonstrates quick SLM fine-tuning
- Good for demo and lab submission
- Accuracy usually around 90–95%