<a href="https://colab.research.google.com/github/ashivashankars/CMPE255_Assignments/blob/main/2_Unsloth's_LoRA_(PEFT)_(SmolLM2_135M).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##prerequisites

In [None]:
!pip -q install unsloth transformers accelerate peft bitsandbytes datasets trl evaluate rouge-score
!pip -q install flash-attn --no-build-isolation  # If wheel available for your Colab GPU

import torch, os, json, random
from datasets import load_dataset, Dataset
from unsloth import FastLanguageModel
from transformers import TrainingArguments


##Full Fine-Tuning (FFT) a tiny model (SmolLM2-135M)

#1) Pick the base model

In [None]:
BASE_MODEL = "HuggingFaceTB/SmolLM2-135M"   # tiny & quick
# ALT examples:
# BASE_MODEL = "unsloth/gemma-3-1b-it-unsloth-bnb-4bit"
# BASE_MODEL = "meta-llama/Llama-3.1-8B"  # needs bigger GPU

#2) Use a small supervised dataset (chat/coding)

In [None]:
# Load a simpler instruction-following dataset
ds = load_dataset("tatsu-lab/alpaca", split="train[:500]")

def to_chat(example):
    # Format the Alpaca dataset into a chat-like structure
    # This is a simplified conversion; a proper chat template would be better
    messages = []
    if example["instruction"]:
        messages.append({"role": "user", "content": example["instruction"]})
    if example["input"]:
         messages.append({"role": "user", "content": example["input"]}) # Append input as part of user message or a separate turn if appropriate for the template
    if example["output"]:
        messages.append({"role": "assistant", "content": example["output"]})
    return {"messages": messages}

# Apply the formatting function and remove original columns
train = ds.map(to_chat, remove_columns=["instruction", "input", "output"])

#3) Load model for full fine-tuning

In [None]:
max_seq_length = 2048
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = BASE_MODEL,
    max_seq_length = max_seq_length,
    load_in_4bit = True,         # full precision training
    load_in_8bit = False,
    dtype = torch.float16,
    full_finetuning = False,       # <-- key for FFT
)


==((====))==  Unsloth 2025.11.2: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 8.0. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = True]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
HuggingFaceTB/SmolLM2-135M does not have a padding token! Will use pad_token = <|endoftext|>.


#4) Pack data and train

In [None]:
import torch

# 0) Pick ONE template that matches your model.
# If you’re using Qwen/ChatML-style formatting, use this:
CHATML_TEMPLATE = """{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{'<|im_start|>assistant\n'}}{% endif %}"""

# If your model is Llama 3 Instruct, use this instead:
# LLAMA3_TEMPLATE = """{% for message in messages %}{% if message['role'] == 'system' %}{{'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n' + message['content'] + '<|eot_id|>'}}{% elif message['role'] == 'user' %}{{'<|start_header_id|>user<|end_header_id|>\n' + message['content'] + '<|eot_id|>'}}{% elif message['role'] == 'assistant' %}{{'<|start_header_id|>assistant<|end_header_id|>\n' + message['content'] + '<|eot_id|>'}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{'<|start_header_id|>assistant<|end_header_id|>\n'}}{% endif %}"""

# If your model is Gemma Instruct, use this:
# GEMMA_TEMPLATE = """{% for message in messages %}{% if loop.first %}{{'<bos>'}}{% endif %}{{'<start_of_turn>' + message['role'] + '\n' + message['content'] + '<end_of_turn>\n'}}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"""

# 1) SET the template once (choose the one that fits your model)
tokenizer.chat_template = CHATML_TEMPLATE  # swap to LLAMA3_TEMPLATE or GEMMA_TEMPLATE if needed

# (Optional) ensure special tokens exist if your tokenizer doesn’t know them
# This is safe even if they already exist; it’s a no-op then.
specials = {"additional_special_tokens": ["<|im_start|>", "<|im_end|>"]}
try:
    num_added = tokenizer.add_special_tokens(specials)
    if num_added:
        print(f"Added {num_added} special tokens to tokenizer.")
        # If you have `model`, do: model.resize_token_embeddings(len(tokenizer))
except Exception as _:
    pass  # not critical; training can still proceed

def _is_valid_messages(msgs):
    if not isinstance(msgs, list) or not msgs:
        return False
    for m in msgs:
        if not (isinstance(m, dict) and "role" in m and "content" in m and isinstance(m["content"], str)):
            return False
    return True

def tokenize_chat(example):
    msgs = example.get("messages")
    if not _is_valid_messages(msgs):
        return {"input_ids": [], "attention_mask": []}

    try:
        # Render -> text (template comes from tokenizer.chat_template)
        rendered = tokenizer.apply_chat_template(
            msgs,
            tokenize=False,
            add_generation_prompt=False,
        )
        if not rendered.strip():
            return {"input_ids": [], "attention_mask": []}

        toks = tokenizer(rendered, add_special_tokens=False, return_attention_mask=True)
        ids = toks.get("input_ids", [])
        attn = toks.get("attention_mask", [])
        return {"input_ids": ids, "attention_mask": attn} if ids else {"input_ids": [], "attention_mask": []}
    except Exception as e:
        print("Tokenization error:", repr(e))
        try:
            print("First roles:", [m.get("role","?") for m in msgs[:5]])
        except:
            pass
        return {"input_ids": [], "attention_mask": []}

# 2) Map → return python lists (Arrow-friendly), then filter non-empty rows
train_tokenized = train.map(
    tokenize_chat,
    remove_columns=[c for c in train.column_names if c != "messages"]  # keep messages for debug until after filter
)

train_tokenized = train_tokenized.filter(lambda x: isinstance(x["input_ids"], list) and len(x["input_ids"]) > 0)

# 3) Now drop 'messages' if you don’t need it further
if "messages" in train_tokenized.column_names:
    train_tokenized = train_tokenized.remove_columns(["messages"])

print(f"Number of tokenized examples: {len(train_tokenized)}")

Number of tokenized examples: 500


#5) Quick evaluation + inference demo

In [None]:
# !pip install -q evaluate
import evaluate

# Example: accuracy
metric = evaluate.load("accuracy")
preds = [1, 0, 1, 1]
refs  = [1, 0, 0, 1]
print(metric.compute(predictions=preds, references=refs))  # {'accuracy': 0.75}



{'accuracy': 0.75}


In [None]:
import torch
from unsloth import FastLanguageModel # Import FastLanguageModel
from transformers import AutoTokenizer, pipeline # Keep AutoTokenizer

MODEL_ID = "HuggingFaceTB/SmolLM2-135M-Instruct"

# Use FastLanguageModel for loading to ensure Unsloth patching is applied
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL_ID,
    max_seq_length=max_seq_length, # Reusing max_seq_length from previous cell
    dtype=torch.float16,
    load_in_4bit=True, # Important for Unsloth models
    full_finetuning=False, # Set to False for inference
)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

gen_kwargs = {
    "max_new_tokens": 150,
    "temperature": 0.0,
    "do_sample": False,
    "repetition_penalty": 1.2,
    "eos_token_id": tokenizer.eos_token_id,
    "pad_token_id": tokenizer.eos_token_id,
    "return_full_text": False
}

def generate(prompt):
    messages = [{"role": "user", "content": prompt}]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    out = pipe(text, **gen_kwargs)[0]["generated_text"]
    return out.strip()

==((====))==  Unsloth 2025.11.2: Fast Llama patching. Transformers: 4.57.1.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 8.0. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = True]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
HuggingFaceTB/SmolLM2-135M-Instruct does not have a padding token! Will use pad_token = <|endoftext|>.


Device set to use cuda:0


In [None]:
print(generate(
    "Write a clean Python function to reverse a linked list iteratively. Only code."
))
print("----")
print(generate(
    "Explain what a Python decorator is in 3 sentences with a small example."
))

Here's how you can implement this in Python:
```python
class Node:
    def __init__(self, data):
        self.data = data
        self.next = None


def reverse_linked(head): 
    if head == None or tail is None: 
        return

    # Create the new node and append it at the end of the original one 
    
    # If there isn't enough space for both nodes then create them together 
  
    # Reverse the link between these two nodes
  
  # Return the reversed linked list after reversing all elements from left side up until right edge 

  
 ```### Example Output ###
Node1 -> Node2 -> Node3 -> Node4 → Node5-> Node6   
`
----
A Python decorator is an expression that adds new behavior to existing functions or classes based on their attributes and methods. It's essentially a special kind of function where you're not just modifying the original code but also adding your own custom logic within it using decorators. This allows for more flexibility when building complex applications from scratch 