# 🧪 Lab 4: Fine-Tuning Demo — Google Colab

This lab demonstrates **two practical ways** to fine-tune an LLM for software-engineering tasks:

1. **Option A (OpenAI SFT)**: Supervised fine-tuning via the OpenAI API using a tiny JSONL dataset.
2. **Option B (Local LoRA)**: Parameter-efficient fine-tuning (LoRA) on a small open model (`distilgpt2`) using Hugging Face PEFT on Colab GPU.

### What you'll learn
- When to pick Fine-Tuning vs RAG
- How to prepare **instruction JSONL** datasets
- How to launch a **fine-tuning job** (OpenAI) and **consume** the tuned model
- How to **train adapters** locally with **LoRA** and generate improved outputs

> **Why are we doing this?**
> Fine-tuning is ideal to enforce **style, tone, or constrained behavior** (e.g., your team's docstring format). RAG is better for **fresh facts**. In practice, teams often combine both.


## ✅ Step 0 — Colab Runtime & (Optional) GPU
GPU helps for Option B (LoRA). For Option A (OpenAI SFT), CPU is fine.

In [None]:
import sys, platform\nprint('Python:', sys.version)\ntry:\n    import google.colab  # type: ignore\n    print('✅ Running in Google Colab')\nexcept Exception:\n    print('ℹ️ Not in Colab (that is okay).')\ntry:\n    import torch\n    print('Torch:', torch.__version__)\n    print('CUDA available:', torch.cuda.is_available())\n    if torch.cuda.is_available():\n        print('GPU:', torch.cuda.get_device_name(0))\nexcept Exception as e:\n    print('Torch not available yet:', e)

---
## 🔁 Fine-Tuning vs RAG (1 slide recap)
- **RAG**: Best for *factual grounding* from your knowledge base; no weight changes.\n- **Fine-Tuning**: Best for *style, format, persona, guardrails*; changes model behavior on patterns you show.

In this lab we’ll do **both paths** so you can adopt whichever fits your environment.

---
## ▶️ Option A — OpenAI Supervised Fine-Tuning (SFT)

We’ll create a **tiny** instruction dataset (JSONL), upload it, start a fine-tune job, poll status, and then use the tuned model.

**Prereqs:** OpenAI account & API key; model access that supports fine-tuning.

**Note:** This is a minimal demo; real datasets require more examples (~100–1000+) and review.


### 🔐 Step A1 — Install SDK & Load API Key

In [None]:
!pip -q install --upgrade openai>=1.40\n\nimport os, json, time\nfrom getpass import getpass\nfrom openai import OpenAI\n\nif 'OPENAI_API_KEY' not in os.environ or not os.environ['OPENAI_API_KEY']:\n    print('Enter your OpenAI API key (hidden):')\n    os.environ['OPENAI_API_KEY'] = getpass()\nprint('✅ API key set in environment.')\nclient = OpenAI()

### 🧱 Step A2 — Create a Tiny Instruction Dataset (JSONL)

Each line is a training example with `messages` (system/user/assistant). Keep it **consistent** and **on-style**. Below we bias for **Python-docstring, concise, testable** outputs.

In [None]:
dataset = [\n  {"messages": [\n    {"role": "system", "content": "You are a senior Python tutor. Use docstrings, type hints, and short examples."},\n    {"role": "user", "content": "Write a function that returns the Fibonacci sequence up to n."},\n    {"role": "assistant", "content": """\ndef fibonacci(n: int) -> list:\n    """Return Fibonacci numbers from 0 up to n (inclusive)."""\n    a, b, out = 0, 1, []\n    while a <= n:\n        out.append(a)\n        a, b = b, a + b\n    return out\n"""}\n  ]},\n  {"messages": [\n    {"role": "system", "content": "You are a senior Python tutor. Use docstrings, type hints, and short examples."},\n    {"role": "user", "content": "Explain Python decorators with one short example."},\n    {"role": "assistant", "content": """\n"""Decorators wrap a function to add behavior without changing its code."""\ndef log_calls(fn):\n    def wrapper(*args, **kwargs):\n        print(f"calling {fn.__name__}")\n        return fn(*args, **kwargs)\n    return wrapper\n\n@log_calls\ndef hello(name: str) -> str:\n    return f"Hello {name}"\n"""}\n  ]},\n  {"messages": [\n    {"role": "system", "content": "You are a senior Python tutor. Use docstrings, type hints, and short examples."},\n    {"role": "user", "content": "Show a robust average function that avoids ZeroDivisionError and handles None."},\n    {"role": "assistant", "content": """\nfrom statistics import mean\nfrom typing import Iterable, Optional\n\ndef safe_avg(xs: Optional[Iterable[float]]) -> float:\n    """Return the average; 0.0 if xs is None/empty."""\n    xs = list(x for x in (xs or []) if x is not None)\n    return mean(xs) if xs else 0.0\n"""}\n  ]}\n]\n\njsonl_path = 'tiny_python_tutor.jsonl'\nwith open(jsonl_path, 'w', encoding='utf-8') as f:\n    for row in dataset:\n        f.write(json.dumps(row, ensure_ascii=False) + '\n')\nprint('Wrote', jsonl_path)

### ⬆️ Step A3 — Upload File & Start Fine-Tune Job

- `purpose` must be `'fine-tune'` for training data.\n- Use an available base model that supports SFT. Here we use a small, cost-efficient model as a placeholder (adjust to what your account supports).

In [None]:
ft_base_model = 'gpt-4o-mini'  # adjust to a fine-tune-able model available to your account\n\nfile_obj = client.files.create(\n    file=open(jsonl_path, 'rb'),\n    purpose='fine-tune'\n)\nprint('Uploaded file id:', file_obj.id)\n\njob = client.fine_tuning.jobs.create(\n    training_file=file_obj.id,\n    model=ft_base_model,\n    # optional: validation_file=..., suffix='python-tutor-style', hyperparameters={...}\n)\njob_id = job.id\njob_id

### ⏳ Step A4 — Poll Job Status
We’ll check the job until it completes/failed (this is just a helper loop for the demo).

In [None]:
import time\ndef wait_for_job(jid, sleep_s=10, max_wait_s=1200):\n    start = time.time()\n    while True:\n        j = client.fine_tuning.jobs.retrieve(jid)\n        print('status:', j.status)\n        if j.status in ('succeeded', 'failed', 'cancelled'):\n            return j\n        if time.time() - start > max_wait_s:\n            print('⏱️ Timed out waiting. Check job in dashboard.')\n            return j\n        time.sleep(sleep_s)\n\nfinal = wait_for_job(job_id, sleep_s=15)\nfinal

### 🤖 Step A5 — Use the Fine-Tuned Model
If `status = succeeded`, a `fine_tuned_model` name should be available. Use it like any other chat model.

In [None]:
ft_model = getattr(final, 'fine_tuned_model', None) or getattr(final, 'result', None)\nprint('Fine-tuned model:', ft_model)\nif ft_model:\n    resp = client.chat.completions.create(\n        model=ft_model,\n        messages=[\n            {"role": "user", "content": "Write a Python function to compute factorial with docstring and type hints."}\n        ],\n        temperature=0.2,\n        max_tokens=350\n    )\n    print(resp.choices[0].message.content)\nelse:\n    print('⚠️ Fine-tuned model not available yet. Check job status or dashboard.')

---
## ▶️ Option B — Local LoRA (PEFT) on Colab GPU

We’ll fine-tune a small open model (`distilgpt2`) with **LoRA adapters** using PEFT. This is quick & cheap and demonstrates the mechanics of SFT without external services.

**Tip**: For faster runs, keep dataset tiny and epochs low. For quality, expand dataset and tune hyperparameters.

### 📦 Step B1 — Install Libraries

In [None]:
!pip -q install --upgrade transformers datasets accelerate peft bitsandbytes\nimport torch, os\nprint('CUDA available:', torch.cuda.is_available())\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\ndevice

### 🧱 Step B2 — Tiny Instruction Dataset (Same Intent as Option A)
We’ll build a very small list of instruction/response pairs to bias `distilgpt2` towards **Python-tutor** style outputs.

In [None]:
train_pairs = [\n    {"instruction": "Write fibonacci(n) with docstring and type hints.",\n     "response": """\ndef fibonacci(n: int) -> list:\n    """Return Fibonacci numbers from 0 up to n (inclusive)."""\n    a, b, out = 0, 1, []\n    while a <= n:\n        out.append(a)\n        a, b = b, a + b\n    return out\n"""},\n    {"instruction": "Explain Python decorators with a short example.",\n     "response": """\n"""Decorators wrap a function to add behavior without changing its code."""\ndef log_calls(fn):\n    def wrapper(*args, **kwargs):\n        print(f"calling {fn.__name__}")\n        return fn(*args, **kwargs)\n    return wrapper\n\n@log_calls\ndef hello(name: str) -> str:\n    return f"Hello {name}"\n"""},\n    {"instruction": "Provide a safe average function handling empty list and None.",\n     "response": """\nfrom statistics import mean\nfrom typing import Iterable, Optional\ndef safe_avg(xs: Optional[Iterable[float]]) -> float:\n    """Return the average; 0.0 if xs is None/empty."""\n    xs = list(x for x in (xs or []) if x is not None)\n    return mean(xs) if xs else 0.0\n"""}\n]\nlen(train_pairs)

### 🔤 Step B3 — Tokenization & Formatting
We’ll build a simple prompt template: `Instruction:\n...\nResponse:\n...` and train the model to produce the **response** when given the **instruction** prefix.

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM\nfrom datasets import Dataset\n\nbase_model = 'distilgpt2'\ntok = AutoTokenizer.from_pretrained(base_model)\nif tok.pad_token is None:\n    tok.pad_token = tok.eos_token\n\ndef build_text(rec):\n    return f"Instruction:\n{rec['instruction']}\n\nResponse:\n{rec['response']}"\n\nds = Dataset.from_list([{"text": build_text(x)} for x in train_pairs])\n\ndef tok_fn(batch):\n    out = tok(batch['text'], truncation=True, padding='max_length', max_length=512)\n    out['labels'] = out['input_ids'].copy()\n    return out\n\nds_tok = ds.map(tok_fn, batched=True, remove_columns=['text'])\nds_tok

### 🧩 Step B4 — LoRA Configuration & Trainer
We apply LoRA adapters to a few attention modules and train for a couple epochs (very quick).

In [None]:
from peft import LoraConfig, get_peft_model, TaskType\nfrom transformers import TrainingArguments, Trainer\n\nmodel = AutoModelForCausalLM.from_pretrained(base_model).to(device)\n\npeft_cfg = LoraConfig(\n    task_type=TaskType.CAUSAL_LM,\n    r=8, lora_alpha=16, lora_dropout=0.05,\n    target_modules=['c_attn', 'q_attn'] if any('q_attn' in n for n,_ in model.named_modules()) else ['c_attn']\n)\nmodel = get_peft_model(model, peft_cfg)\nmodel.print_trainable_parameters()\n\nargs = TrainingArguments(\n    output_dir='out-lora',\n    per_device_train_batch_size=2,\n    num_train_epochs=2,\n    learning_rate=2e-4,\n    fp16=torch.cuda.is_available(),\n    gradient_accumulation_steps=2,\n    logging_steps=5,\n    save_strategy='no'\n)\n\ntrainer = Trainer(\n    model=model,\n    args=args,\n    train_dataset=ds_tok\n)\ntrainer.train()\nmodel.to('cpu').save_pretrained('out-lora/model')\ntok.save_pretrained('out-lora/tokenizer')\nprint('✅ Saved LoRA adapter and tokenizer to out-lora/')

### 🧪 Step B5 — Inference with the LoRA-tuned Model
Generate responses in the new **Python-tutor** style.

In [None]:
from peft import PeftModel\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\ntok = AutoTokenizer.from_pretrained('out-lora/tokenizer')\nbase = AutoModelForCausalLM.from_pretrained(base_model)\nmodel = PeftModel.from_pretrained(base, 'out-lora/model').to(device)\n\ndef generate(prompt, max_new_tokens=180, temperature=0.2):\n    text = f"Instruction:\n{prompt}\n\nResponse:\n"\n    ids = tok(text, return_tensors='pt').to(device)\n    out = model.generate(**ids, max_new_tokens=max_new_tokens, do_sample=True, temperature=temperature, pad_token_id=tok.eos_token_id)\n    return tok.decode(out[0], skip_special_tokens=True).split('Response:\n', 1)[-1]\n\nprint(generate('Write factorial(n) with docstring and type hints.'))\nprint('\n---\n')\nprint(generate('Explain Python context managers with a tiny example.'))

---
## ✅ Wrap-Up & Next Steps
You built two **fine-tuning** paths:
- **OpenAI SFT**: Upload JSONL → fine-tune → use tuned model.
- **Local LoRA**: Train lightweight adapters on `distilgpt2` and generate tutor-style code.

### What to try next
1. Expand your dataset to 100–1000+ high-quality examples (consistent tone & formatting).
2. Add **evals** (prompted unit tests, BLEU/ROUGE for style tasks) to measure gains.
3. Mix with **RAG** for factual grounding + fine-tuned style.
4. For LoRA: try bigger backbones (e.g., `gpt2`, `TinyLlama`), more steps, and better prompt templates.
5. For OpenAI: add **validation_file**, **suffix**, and **hyperparameters**; track results and iterate.

**You’ve completed Lab 4. 🎉**