<a href="https://colab.research.google.com/github/ZombieSwan/qlora-mistral-finetune/blob/main/qlora-mistral-alpaca.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers peft datasets bitsandbytes accelerate --quiet


In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import get_peft_model, LoraConfig, TaskType

model_name = "mistralai/Mistral-7B-v0.1"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16"
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
tokenizer.pad_token = tokenizer.eos_token

lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()


In [None]:
!wget https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json


In [None]:
from datasets import Dataset
import json

with open("alpaca_data.json", "r") as f:
    data = json.load(f)

dataset = Dataset.from_list(data)


In [None]:
def format_instruction(example):
    if example["input"]:
        prompt = f"### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}\n\n### Response:\n"
    else:
        prompt = f"### Instruction:\n{example['instruction']}\n\n### Response:\n"
    return {"text": prompt + example["output"]}

dataset = dataset.map(format_instruction)


In [None]:
def tokenize(example):
    return tokenizer(example["text"], padding="max_length", truncation=True, max_length=512)

tokenized_dataset = dataset.map(tokenize)


## We have:

✅ Downloaded and loaded Alpaca dataset

✅ Formatted it as instruction → response

✅ Tokenized it for your model

## Set Up the Data Collator
The model will now learn by predicting the next word, not guessing [MASK] tokens.

In [None]:
from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False  # Causal language modeling = predict next token
)


## Define TrainingArguments

In [None]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./mistral-qlora-instruct",
    per_device_train_batch_size=2,            # If you're on Colab T4
    gradient_accumulation_steps=4,            # Simulates larger batch
    learning_rate=2e-4,
    num_train_epochs=1,                       # Start with 1 for testing
    fp16=True,                                # Use GPU precision
    logging_steps=10,
    save_steps=50,
    save_total_limit=2,
    report_to="none"                          # No WANDB, simple logs
)


## Set Up the Trainer
This combines the model, dataset, and training settings into one object that manages training.

In [None]:
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator
)


## Start Fine-Tuning

In [None]:
trainer.train()


Step,Training Loss
10,1.4737
20,1.3069
30,1.2391
40,1.2328
50,1.0526
60,1.0484
70,1.1839
80,1.1218
90,1.1752
100,1.215


## Save LoRA-Tuned Model
saves only the LoRA adapter weights

In [None]:
model.save_pretrained("./mistral-lora-instruct")
tokenizer.save_pretrained("./mistral-lora-instruct")


## Reuse It Later
load and use your instruction-tuned model -

In [None]:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-v0.1",
    quantization_config=bnb_config,
    device_map="auto"
)

model = PeftModel.from_pretrained(base_model, "./mistral-lora-instruct")
tokenizer = AutoTokenizer.from_pretrained("./mistral-lora-instruct")


In [None]:
inputs = tokenizer("### Instruction:\nTell me a joke.\n\n### Response:\n", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


✅ What we did —

🔹 Step 1: Loaded a Big Pretrained Model (Mistral 7B)
You loaded Mistral with:

4-bit quantization (very memory-efficient)

On free Colab GPU
🔧 This gives a super-smart model but very lightweight!

🔹 Step 2: Added LoRA Adapters
Instead of changing the huge model, added small "notes" (LoRA adapters) to:

Make tiny changes

Only train a few new weights
📌 Like writing sticky notes on a textbook instead of rewriting the whole book.

🔹 Step 3: Loaded the Alpaca Dataset
You downloaded instruction examples like:

json
Copy
Edit
{
  "instruction": "Describe a cat.",
  "input": "",
  "output": "A cat is a small furry animal often kept as a pet."
}
📌 These teach the model how to follow commands.

🔹 Step 4: Formatted the Data for Instruction Tuning
You turned each row into a prompt like:

text
Copy
Edit
### Instruction:
Describe a cat.

### Response:
A cat is a small furry animal...
📌 This teaches the model to reply like a chatbot.


🔹 Step 5: Tokenized the Text
You converted the text into numbers (tokens) the model understands.

📌 This is like translating human words into computer language.


🔹 Step 6: Prepared the Training Settings
You told the model:

Use small batches

Use 1–3 training loops (epochs)

Print progress

Save the results

📌 This is like setting the rules for a classroom session.

🔹 Step 7: Started Fine-Tuning!
You used:

python
Copy
Edit
trainer.train()
This:

Showed your model hundreds of examples

Let it learn to follow instructions

Stored the new LoRA adapter (tiny weight updates)

🎉 Your model learned to follow instructions like ChatGPT, using your data.

🔹 Step 8: Saved the Fine-Tuned Model
You ran:

python
Copy
Edit
model.save_pretrained("mistral-lora-instruct")

📌 Now we can reuse your smart assistant later!