# Lab 4 – LoRA Fine-Tuning
**Part 4 of the 7 Lab Hands-On SLM Training Series**

In this lab, we fine-tune a Small Language Model (SLM) on your domain data using **LoRA** (Low‑Rank Adaptation) with the `peft` library. This notebook is designed to be stable in Google local machine and to load the processed dataset saved in Lab 3 from your local filesystem.

**Outcome**
• Attach LoRA adapters to a base model
• Run a short fine‑tuning loop on your tokenized dataset
• Save the LoRA adapters back to Drive for reuse


## Step 0. Stable installs for local machine

In [None]:
pip install -q --force-reinstall "numpy==2.0.2" "pandas==2.2.2" "pyarrow==17.0.0"
pip install -q "datasets>=3.0.0" "transformers>=4.41.0" "peft>=0.11.0" "accelerate>=0.29.0" "sentencepiece>=0.1.99" "tqdm>=4.66.0" bitsandbytes

import importlib, traceback
mods = ["numpy", "pandas", "pyarrow", "datasets", "transformers", "peft", "accelerate", "sentencepiece", "tqdm"]
for m in mods:
    try:
        mod = importlib.import_module(m)
        print(f"{m}: {getattr(mod, '__version__', 'unknown')}")
    except Exception as e:
        print(f"[Import error] {m}: {e}")
print("If any import failed, go to Runtime → Restart runtime, then re-run this cell.")

## Step 1. Load the prepared dataset from your local filesystem

In [None]:
from datasets import load_from_disk
# from google.colab import drive  # not needed locally
# drive.mount('./drive')  # not needed locally

DATA_DIR = "./lab3_tokenized"  # Path where Lab 3 saved the tokenized dataset
dataset = load_from_disk(DATA_DIR)
print(dataset)
print("Train rows:", len(dataset["train"]))
print("Columns:", dataset["train"].column_names)

## Step 2. Load a base model (4‑bit on GPU if available)

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Choose a public model that works well with chat templates and LoRA
PREFERRED_MODELS = [
    "HuggingFaceH4/zephyr-7b-beta",
    "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
]

def load_base_model(name: str):
    use_gpu = torch.cuda.is_available()
    print(f"CUDA available: {use_gpu}")
    quant_cfg = None
    kwargs = {}
    if use_gpu:
        try:
            quant_cfg = BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_compute_dtype=torch.float16,
                bnb_4bit_use_double_quant=True,
            )
            kwargs.update(dict(device_map="auto", quantization_config=quant_cfg, torch_dtype=torch.float16))
        except Exception as e:
            print("bitsandbytes not available, falling back to non-quantized load.")
            kwargs.update(dict(torch_dtype=torch.float16 if use_gpu else torch.float32))
    else:
        kwargs.update(dict(torch_dtype=torch.float32))

    tok = AutoTokenizer.from_pretrained(name, use_fast=True)
    mdl = AutoModelForCausalLM.from_pretrained(name, **kwargs)
    if tok.pad_token is None:
        tok.pad_token = tok.eos_token
    return tok, mdl

tokenizer = model = None
last_err = None
for cand in PREFERRED_MODELS:
    try:
        print(f"Attempting model: {cand}")
        tokenizer, model = load_base_model(cand)
        model_name = cand
        print(f"Loaded: {cand}")
        break
    except Exception as e:
        last_err = e
        print(f"Failed to load {cand}: {e}")

if model is None:
    raise RuntimeError(f"Could not load any model. Last error: {last_err}")

## Step 3. Attach LoRA adapters with PEFT

In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Prepare model for k-bit training if quantized
model = prepare_model_for_kbit_training(model)

# Common LoRA target modules for decoder-only models (LLaMA/Mistral/Zephyr families)
TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

lora_cfg = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=TARGET_MODULES,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_cfg)
model.print_trainable_parameters()

## Step 4. Fine‑tune with transformers.Trainer

In [None]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling
import math

# Use validation split if present; otherwise train only
train_ds = dataset["train"]
eval_ds = dataset.get("validation") or dataset.get("test")

collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

args = TrainingArguments(
    output_dir="./outputs",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    weight_decay=0.0,
    warmup_steps=10,
    max_steps=100,  # keep small for demo; increase for real training
    logging_steps=10,
    save_strategy="no",
    fp16=torch.cuda.is_available(),
    bf16=False,
    report_to=[],
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_ds,
    eval_dataset=eval_ds,
    data_collator=collator,
)

train_result = trainer.train()
print(train_result)

## Step 5. Save LoRA adapters to your local filesystem

In [None]:
from pathlib import Path
SAVE_DIR = "./lab4_lora_adapters"
Path(SAVE_DIR).mkdir(parents=True, exist_ok=True)
model.save_pretrained(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)
print("Saved LoRA adapters to:", SAVE_DIR)

## Optional: Quick generation check

In [None]:
from transformers import TextStreamer

prompt = "Summarize the key considerations when drafting a cardiology discharge note."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

with torch.no_grad():
    _ = model.generate(
        **inputs,
        max_new_tokens=200,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        streamer=streamer,
        pad_token_id=tokenizer.eos_token_id,
    )

### Wrap‑up
You have:
• Loaded your tokenized dataset from your local filesystem
• Attached LoRA adapters with PEFT
• Run a short fine‑tuning loop using `transformers.Trainer`
• Saved adapters back to Drive for reuse in inference or future training

Next up: **Lab 5 – Hyperparameter Tuning and Optimization**.