
# 🧠 Synthetic Fine‑Tuner — QLoRA on Mistral‑7B

This notebook demonstrates **LoRA/QLoRA** fine‑tuning of `mistralai/Mistral-7B-Instruct-v0.3` on a custom Q&A dataset about Transformers.

**Pipeline**
1. Load & clean the dataset  
2. Configure LoRA adapters (+ 4‑bit quantization)  
3. Fine‑tune with Hugging Face `Trainer`  
4. Monitor training loss live  
5. Evaluate (BLEU + semantic similarity)  
6. Quick inference sanity checks

> Tip: Run `Kernel → Restart & Run All` once dependencies are installed.


## ⚙️ Environment Setup

In [None]:
# %pip install -U transformers==4.57.1 peft==0.17.1 accelerate bitsandbytes
# %pip install -U datasets pandas sentencepiece tokenizers sacrebleu sentence-transformers tqdm wandb


In [1]:
import os, json, math
from pathlib import Path
import pandas as pd
from datasets import Dataset
import torch
import re
from typing import List, Dict
from PyPDF2 import PdfReader

from transformers import (
    AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig,
    TrainingArguments, Trainer
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training


  from .autonotebook import tqdm as notebook_tqdm
2025-10-20 00:30:01.060423: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-10-20 00:30:01.126375: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-10-20 00:30:17.055965: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


## 📁 Configuration

In [4]:

BASE_MODEL = "mistralai/Mistral-7B-Instruct-v0.3"         # or a local path if pre-downloaded
DATA_PATH  = "transformers_200Q.jsonl"  # <-- update if different
OUTPUT_DIR = "outputs/mistral7b-transformers-lora"
WANDB_PROJECT = None   # e.g., "transformers-qa-finetune" to enable W&B

# Training knobs
EPOCHS = 3
BATCH_SIZE = 2
GRAD_ACCUM = 16
LR = 1e-4
MAX_SEQ_LEN = 2048
LOG_STEPS = 1
EVAL_STEPS = 100

## \

In [None]:
def extract_text(pdf_path: str) -> str:
    """Extract full text from the PDF."""
    reader = PdfReader(pdf_path)
    text = []
    for page in reader.pages:
        txt = page.extract_text()
        if txt:
            text.append(txt)
    return "\n".join(text)

def extract_qa_pairs(text: str) -> List[Dict]:
    """
    Extract question-answer pairs of the form:
    Question N:\n<question>\n<answer>
    """
    # Normalize spacing
    text = re.sub(r'\s+', ' ', text)
    # Match patterns like "Question 12: What is ..." capturing Q & A
    pattern = re.compile(r'(Question\s*\d+\s*:\s*)([^?]+?\?)\s*(.*?)(?=Question\s*\d+\s*:|$)', re.IGNORECASE)
    pairs = []
    for i, match in enumerate(pattern.finditer(text), 1):
        question = match.group(2).strip()
        answer = match.group(3).strip()
        if len(answer) > 15:  # simple filter to skip broken lines
            pairs.append({"id": i, "question": question, "answer": answer})
    return pairs

def save_jsonl(pairs: List[Dict], out_path: str):
    """Write extracted pairs to JSONL."""
    Path(out_path).parent.mkdir(parents=True, exist_ok=True)
    with open(out_path, "w", encoding="utf-8") as f:
        for p in pairs:
            f.write(json.dumps(p, ensure_ascii=False) + "\n")

# if __name__ == "__main__":
pdf_path = "transformers_200Qs_v11.pdf"  #  file path
out_path = "./transformers_200Q.jsonl"

text = extract_text(pdf_path)
qa_pairs = extract_qa_pairs(text)
# save_jsonl(qa_pairs, out_path)

print(f"✅ Extracted {len(qa_pairs)}")

## 🧹 Load & Preview Dataset

In [5]:

# Load JSONL with fields: id (opt), question, answer
with open(DATA_PATH, "r", encoding="utf-8") as f:
    rows = [json.loads(line) for line in f if line.strip()]

print(f"Loaded {len(rows)} rows")
pd.DataFrame(rows[:5])


Loaded 200 rows


Unnamed: 0,id,question,answer
0,1,What is a neural network in the context of art...,A neural network is a computational model insp...
1,2,What is deep learning and how is it different ...,Deep learning is a subfield of machine learnin...
2,3,Why are neural networks considered universal f...,Neural networks are capable of approximating a...
3,4,What are activation functions and why are they...,Activation functions introduce non-linearity i...
4,5,What is backpropagation and how does it enable...,Backpropagation is an algorithm used to train ...


## 🧩 Tokenizer & Model (QLoRA config)

In [6]:

# Tokenizer
tok = AutoTokenizer.from_pretrained(BASE_MODEL)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

# 4-bit quantization via bitsandbytes (QLoRA base)
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16)

model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    quantization_config=bnb,
)
model = prepare_model_for_kbit_training(model)

# LoRA adapters
lora_cfg = LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.05, task_type="CAUSAL_LM",
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]
)
model = get_peft_model(model, lora_cfg)
model.print_trainable_parameters()


`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|██████████| 3/3 [00:28<00:00,  9.64s/it]


trainable params: 41,943,040 || all params: 7,289,966,592 || trainable%: 0.5754


## 🔧 Data Collator & Train/Test Split

In [7]:

# Convert to HF datasets and split
ds = Dataset.from_pandas(pd.DataFrame(rows)).train_test_split(test_size=0.2, seed=42)

# Keep only needed columns for collator
for split in ["train","test"]:
    keep = {"question","answer"}
    drop = [c for c in ds[split].column_names if c not in keep]
    ds[split] = ds[split].remove_columns(drop)

def collate(batch):
    qs  = [b["question"] for b in batch]
    ans = [b["answer"]   for b in batch]
    texts = [f"<s>[INST] {q} [/INST] {a}" for q,a in zip(qs, ans)]
    toks = tok(texts, truncation=True, padding=True, max_length=MAX_SEQ_LEN, return_tensors="pt")
    toks["labels"] = toks["input_ids"].clone()
    toks["labels"][toks["labels"] == tok.pad_token_id] = -100
    return toks


## 🔁 Training (with live loss logging)

In [9]:

report_to = ["wandb"] if WANDB_PROJECT else []
if WANDB_PROJECT:
    import wandb
    wandb.init(project=WANDB_PROJECT, name=Path(OUTPUT_DIR).name, config={
        "epochs": EPOCHS, "batch_size": BATCH_SIZE, "grad_accum": GRAD_ACCUM,
        "lr": LR, "max_seq_len": MAX_SEQ_LEN
    })

args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRAD_ACCUM,
    learning_rate=LR,
    bf16=True,
    logging_strategy="steps",
    logging_steps=LOG_STEPS,
    # evaluation_strategy="steps",
    eval_steps=EVAL_STEPS,
    save_steps=EVAL_STEPS,
    save_total_limit=1,
    gradient_checkpointing=True,
    report_to=report_to,
    remove_unused_columns=False,
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=ds["train"],
    eval_dataset=ds["test"],
    data_collator=collate,
    tokenizer=tok,
)

train_output = trainer.train()
print("✅ Training complete →", OUTPUT_DIR)
print("Final train loss:", train_output.training_loss)


  trainer = Trainer(
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 2}.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss
1,3.7301
2,2.8907
3,2.2747
4,2.3135
5,2.2976
6,2.0768
7,1.931
8,1.9506
9,1.8797
10,1.6266


✅ Training complete → outputs/mistral7b-transformers-lora
Final train loss: 2.07542044321696


## 📉 Evaluation (Perplexity, BLEU, Semantic Similarity)

In [11]:

# Optional extra metrics
# %pip install -U sacrebleu sentence-transformers tqdm

import sacrebleu
from tqdm import tqdm
from sentence_transformers import SentenceTransformer, util

def generate_answer(question: str, max_new_tokens: int = 128) -> str:
    # Safer: use chat template to avoid prompt echo
    messages = [{"role": "user", "content": question}]
    input_ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
    input_len = input_ids.shape[-1]
    with torch.inference_mode():
        out = model.generate(
            input_ids=input_ids,
            max_new_tokens=max_new_tokens,
            do_sample=False, temperature=0.0,
            eos_token_id=tok.eos_token_id, pad_token_id=tok.eos_token_id,
            return_dict_in_generate=True
        )
    gen_ids = out.sequences[:, input_len:]
    return tok.decode(gen_ids[0], skip_special_tokens=True).strip()

# Quick perplexity on eval split
import math
eval_metrics = trainer.evaluate()
ppl = math.exp(eval_metrics["eval_loss"])
print(f"Eval loss: {eval_metrics['eval_loss']:.4f} | Perplexity: {ppl:.3f}")

# BLEU + cosine on a capped subset for speed
CAP = 10  # e.g., 100
subset = ds["test"].select(range(min(len(ds["test"]), CAP))) if CAP else ds["test"]

sbert = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2", device="cuda" if torch.cuda.is_available() else "cpu")
records = []
for ex in tqdm(subset, desc="Evaluating"):
    q, ref = ex["question"].strip(), ex["answer"].strip()
    hyp = generate_answer(q, max_new_tokens=96)
    bleu = sacrebleu.sentence_bleu(hyp, [ref]).score
    emb = sbert.encode([hyp, ref], convert_to_tensor=True, normalize_embeddings=True)
    cos = float(util.cos_sim(emb[0], emb[1]).item())
    records.append({"question": q, "reference": ref, "hypothesis": hyp, "bleu": bleu, "cosine": cos})

df = pd.DataFrame.from_records(records)
Path(OUTPUT_DIR).mkdir(parents=True, exist_ok=True)
df.to_csv(Path(OUTPUT_DIR) / "eval_results.csv", index=False)
print("Saved →", Path(OUTPUT_DIR) / "eval_results.csv")
print("BLEU mean:", df["bleu"].mean() if not df.empty else "n/a")
print("Cosine mean:", df["cosine"].mean() if not df.empty else "n/a")
df.head(5)


Eval loss: 2.0016 | Perplexity: 7.401


Evaluating: 100%|██████████| 10/10 [01:28<00:00,  8.87s/it]

Saved → outputs/mistral7b-transformers-lora/eval_results.csv
BLEU mean: 7.763581669168518
Cosine mean: 0.7954990267753601





Unnamed: 0,question,reference,hypothesis,bleu,cosine
0,What is TensorFlow Lite?,TensorFlow Lite is a lightweight version of Te...,TensorFlow Lite is a framework for deploying m...,7.45287,0.877083
1,What is the encoder-decoder structure in trans...,The encoder-decoder structure consists of two ...,Encoder-decoder structures are used for tasks ...,7.314799,0.748851
2,What is attention dropout?,Attention dropout is applied to the attention ...,Attention dropout randomly masks attention wei...,3.489212,0.780709
3,How are image patches encoded in Vision Transf...,"In ViT, an image is divided into non-overlappi...",Vision Transformers (ViTs) encode image patche...,5.582553,0.705827
4,How are embeddings used in transformers?,Transformers use learned embeddings to represe...,Embeddings are used to represent input tokens ...,6.086276,0.727321


## 💬 Example Inference

In [17]:

question = "What is the exploding gradient problem?"
print(generate_answer(question, max_new_tokens=56))


The exploding gradient problem occurs when gradients grow exponentially during backpropagation, causing instability in training. This is often seen in transformer models with large vocabularies or deep architectures. To mitigate this, techniques like gradient clipping or layer normalization



## 📊 Results Summary (Fill after running)
- **Training loss (final): *2.07* ``  
- **Eval loss: *2.0016* ``  
- **Perplexity:** `7.401`  
- **BLEU mean:** `7.763`  
- **Cosine mean:** `0.795`  

**Key Takeaways**
- QLoRA enables efficient 7B fine‑tuning on modest GPUs via 4‑bit quantization + LoRA adapters
- Trainer logs training loss every few steps; `eval_results.csv` contains per‑sample metrics
- Use the chat template for clean decoding without prompt echo


## 💾 Save Adapters & Tokenizer

In [18]:

model.save_pretrained(OUTPUT_DIR)
tok.save_pretrained(OUTPUT_DIR)
print("Saved LoRA adapters & tokenizer →", OUTPUT_DIR)


Saved LoRA adapters & tokenizer → outputs/mistral7b-transformers-lora
