# 🧡 Track B – Unsloth Instruction Tuning (Memory Optimized)

Fine-tune `Qwen/Qwen2.5-Coder-1.5B-Instruct` (or 7B!) on the synthetic Python coding dataset using **Unsloth** for memory efficiency and speed on a free Colab T4 GPU.

### Why Unsloth?
- **Zero OOM**: Uses 4-bit quantization with minimal accuracy loss.
- **Faster Training**: Up to 2x speedup compared to standard HF+PEFT.
- **Larger Batch Sizes**: Enables longer context lengths and larger batch sizes on T4.

---

## 📦 Dataset: [`archit11/track_b_sft`](https://huggingface.co/datasets/archit11/track_b_sft)

| Field | Detail |
|-------|--------|
| **Source** | [verl](https://github.com/volcengine/verl) Python library (AST-extracted functions) |
| **Train** | 514 examples |
| **Test** | 46 examples |
| **Pair format** | `instruction` → `response` (Python code) |
| **License** | Apache 2.0 |

### Data Card Summary

| Category | Description | Count (Train) |
|----------|-------------|---------------|
| `docstring` | Write Google-style docstrings | ~100 |
| `explain` | Explain function logic | ~100 |
| `bugfix` | Fix injected bugs (off-by-one, logic errors) | ~100 |
| `complete` | Complete function bodies | ~100 |
| `unit_test` | Write pytest unit tests | ~100 |

---

## 🤖 Model: `Qwen/Qwen2.5-Coder-1.5B-Instruct` (4-bit)

| Field | Detail |
|-------|--------|
| **Base** | `Qwen/Qwen2.5-Coder-1.5B-Instruct` |
| **Method** | LoRA + 4-bit Quantization (QLoRA) |
| **Quantization** | 4-bit NF4 |
| **Gradient Checkpointing** | Enabled (Unsloth optimized) |
| **Target Modules** | All linear layers |
| **LoRA Rank** | 16 |

> ⚡ **Make sure Runtime → Change runtime type → T4 GPU is selected before running.**

In [None]:
# Cell 1 – Install Unsloth (Robust Setup)
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    import torch; v = re.match(r'[\d]{1,}\.[\d]{1,}', str(torch.__version__)).group(0)
    xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, "0.0.34")
    !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

print("✓ Unsloth installed")

In [None]:
# Cell 2 – Load Model & Tokenizer
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Qwen/Qwen2.5-Coder-1.5B-Instruct",  # Can swap to "Qwen/Qwen2.5-Coder-7B-Instruct" if you feel adventurous!
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
)


In [None]:
# Cell 3 – Format Dataset
from datasets import load_dataset

dataset = load_dataset("archit11/track_b_sft", split = "train")

# Apply standard ChatML template using tokenizer.apply_chat_template
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "chatml", # Qwen uses ChatML-like format
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"} # Standard ShareGPT mapping if needed, but we essentially build raw text
)

def formatting_prompts_func(examples):
    convos = []
    for i in range(len(examples["instruction"])):
        # Construct conversation
        c = [
            {"role": "user", "content": examples["instruction"][i]},
            {"role": "assistant", "content": examples["response"][i]},
        ]
        # Apply template
        text = tokenizer.apply_chat_template(c, tokenize=False, add_generation_prompt=False)
        convos.append(text)
    return { "text" : convos }

dataset = dataset.map(formatting_prompts_func, batched = True)
print(dataset["text"][0])

In [None]:
# Cell 4 – Train
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can be True for speed, but standard SFT usually fine without packing for short instrs
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 1,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

trainer.train()

In [None]:
# Cell 5 – Inference
FastLanguageModel.for_inference(model)

msg = [
    {"role": "user", "content": "Write a python function to check if a number is prime."}
]
inputs = tokenizer.apply_chat_template(msg, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(inputs, max_new_tokens = 256, use_cache = True)
print(tokenizer.decode(outputs[0]))