In [1]:

print("✅ STEP 1: Installing necessary libraries...")

# Install unsloth for efficient memory usage and faster training
# We use the new Colab kernel compatibility
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

# Install other required libraries
!pip install "transformers>=4.41.0" "peft>=0.10.0" "accelerate>=0.30.0" "datasets>=2.18.0" "evaluate>=0.4.0" "sentence-transformers>=2.2.2" "rouge_score" --quiet
print("\n✅ Installation complete. Please restart the runtime if prompted.")
print("Go to Runtime -> Restart session in the menu bar.")

✅ STEP 1: Installing necessary libraries...
Collecting unsloth@ git+https://github.com/unslothai/unsloth.git (from unsloth[colab-new]@ git+https://github.com/unslothai/unsloth.git)
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-fjqt1rqr/unsloth_79b0551a8e794e798af3e995052195c6
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-fjqt1rqr/unsloth_79b0551a8e794e798af3e995052195c6
  Resolved https://github.com/unslothai/unsloth.git to commit dc26a7a0eb20c31549318396f53639ba8c01025e
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone

✅ Installation complete. Please restart the runtime if prompted.
Go to Runtime -> Restart session in the menu bar.


In [2]:

print("✅ STEP 2: Loading the Qwen2-0.5B model and tokenizer...")

import torch
from unsloth import FastLanguageModel
import pandas as pd
from tqdm import tqdm

# --- Model Configuration ---
model_name = "Qwen/Qwen2-0.5B-Instruct"
max_seq_length = 2048
dtype = None # Unsloth will decide the best dtype
load_in_4bit = True

# Load the model and tokenizer using unsloth for memory efficiency
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

print("\n✅ Model and tokenizer loaded successfully.")
print(f"Model: {model_name}")
print(f"Parameters: ~0.5 Billion")

✅ STEP 2: Loading the Qwen2-0.5B model and tokenizer...
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.8.6: Fast Qwen2 patching. Transformers: 4.55.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

✅ Model and tokenizer loaded successfully.
Model: Qwen/Qwen2-0.5B-Instruct
Parameters: ~0.5 Billion


In [3]:

print("✅ STEP 3: Loading and preparing the Alpaca dataset...")

from datasets import load_dataset

# --- Prompt Formatting ---
# This template structures the data for the model to learn instruction-following
alpaca_prompt = """### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

# --- Load and Prepare Dataset ---
dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

# Split the dataset (80% train, 20% test)
dataset = dataset.train_test_split(test_size=0.2, seed=42)
train_dataset = dataset["train"]
test_dataset = dataset["test"]

print("\n✅ Dataset prepared and split.")
print(f"Training samples: {len(train_dataset)}")
print(f"Testing samples: {len(test_dataset)}")
print("\nExample prompt:")
print(train_dataset[0]['text'])

✅ STEP 3: Loading and preparing the Alpaca dataset...


Map:   0%|          | 0/51760 [00:00<?, ? examples/s]


✅ Dataset prepared and split.
Training samples: 41408
Testing samples: 10352

Example prompt:
### Instruction:
Convert the decimal number 0.425 into a fraction.

### Input:


### Response:
To convert the decimal number 0.425 into a fraction, you can follow these steps:

1. Count the number of decimal places: The decimal 0.425 has 3 decimal places.
2. Write down the decimal as a fraction, using the place value of the last digit: Since the last digit is in the thousandth place, we can write 0.425 as 425/1000
3. Simplify the fraction if possible: Both the numerator (425) and the denominator (1000) can be divided by 25, resulting in (425 ÷ 25) / (1000 ÷ 25) = 17/40. 

So, the decimal number 0.425 can be expressed as the fraction 17/40.<|im_end|>


In [4]:

print("✅ STEP 4: Configuring LoRA for parameter-efficient fine-tuning...")

from peft import LoraConfig

# Prepare the model for LoRA training
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Rank of the update matrices. Higher rank means more parameters.
    lora_alpha = 32, # Scaling factor for the LoRA updates.
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_dropout = 0.05,
    bias = "none",
    use_gradient_checkpointing = True,
    random_state = 42,
    max_seq_length = max_seq_length,
)

print("\n✅ LoRA configured and applied to the model.")

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.


✅ STEP 4: Configuring LoRA for parameter-efficient fine-tuning...


Unsloth 2025.8.6 patched 24 layers with 0 QKV layers, 0 O layers and 0 MLP layers.



✅ LoRA configured and applied to the model.


In [5]:

print("✅ STEP 5: Starting the fine-tuning process...")

from transformers import TrainingArguments, Trainer
from trl import SFTTrainer

# --- Training Configuration ---
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = 100,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "outputs",
        report_to = "none", # This is the new, correct way to disable wandb
    ),
)

# --- Start Training ---
trainer_stats = trainer.train()

print("\n✅ Fine-tuning complete.")

# CORRECTED LINE TO FIX THE ATTRIBUTE ERROR:
print(f"Training Time: {trainer_stats.metrics['train_runtime']:.2f} seconds")

✅ STEP 5: Starting the fine-tuning process...


Unsloth: Tokenizing ["text"]:   0%|          | 0/41408 [00:00<?, ? examples/s]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 41,408 | Num Epochs = 1 | Total steps = 100
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 8,798,208 of 502,830,976 (1.75% trained)


Step,Training Loss
10,1.62
20,1.5338
30,1.5365
40,1.4533
50,1.4023
60,1.5542
70,1.4831
80,1.4669
90,1.4611
100,1.4747



✅ Fine-tuning complete.
Training Time: 221.27 seconds


In [6]:

print("✅ STEP 6: Running inference on the test set...")

import evaluate
from sentence_transformers import SentenceTransformer, util

# --- Setup for Inference ---
# Load the base model to compare against
FastLanguageModel.for_inference(model) # Enable native 4-bit inference
base_model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

# Define a function to generate responses
def get_model_response(model, tokenizer, prompt_text, max_new_tokens=100):
    # Re-format the prompt for inference
    instruction = prompt_text.split("### Instruction:\n")[1].split("\n\n### Input:")[0].strip()
    input_text = prompt_text.split("### Input:\n")[1].split("\n\n### Response:")[0].strip()

    inference_prompt = alpaca_prompt.format(instruction, input_text, "")

    inputs = tokenizer([inference_prompt], return_tensors = "pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, use_cache = True)
    response = tokenizer.batch_decode(outputs)[0]

    # Extract only the generated response part
    return response.split("### Response:\n")[1].replace(EOS_TOKEN, "").strip()


# --- Generate Responses ---
# We'll use a subset of the test data for faster evaluation
num_samples = 50
test_subset = test_dataset.select(range(num_samples))

ground_truths = []
base_model_responses = []
finetuned_model_responses = []

for sample in tqdm(test_subset, desc="Generating responses"):
    ground_truth_response = sample['text'].split("### Response:\n")[1].replace(EOS_TOKEN, "").strip()
    ground_truths.append(ground_truth_response)

    # Base model response
    base_response = get_model_response(base_model, tokenizer, sample['text'])
    base_model_responses.append(base_response)

    # Fine-tuned model response
    finetuned_response = get_model_response(model, tokenizer, sample['text'])
    finetuned_model_responses.append(finetuned_response)

print("\n✅ Inference complete for both models.")


# --- Calculate Metrics ---
print("\nCalculating evaluation metrics...")

# Load metrics
rouge = evaluate.load('rouge')
bleu = evaluate.load('bleu')
similarity_model = SentenceTransformer('all-MiniLM-L6-v2', device='cuda')

def calculate_metrics(predictions, references):
    rouge_results = rouge.compute(predictions=predictions, references=references)
    bleu_results = bleu.compute(predictions=predictions, references=[[ref] for ref in references])

    # Semantic Similarity
    embeddings1 = similarity_model.encode(predictions, convert_to_tensor=True)
    embeddings2 = similarity_model.encode(references, convert_to_tensor=True)
    cosine_scores = util.cos_sim(embeddings1, embeddings2)
    semantic_avg = torch.mean(torch.diag(cosine_scores)).item()

    return {
        "rougeL": rouge_results['rougeL'],
        "bleu": bleu_results['bleu'],
        "semantic_similarity": semantic_avg
    }

# Calculate for both models
base_model_metrics = calculate_metrics(base_model_responses, ground_truths)
finetuned_model_metrics = calculate_metrics(finetuned_model_responses, ground_truths)

# --- Display Results ---
results_df = pd.DataFrame([base_model_metrics, finetuned_model_metrics],
                          index=["Base Model", "Fine-Tuned Model"])

print("\n--- Performance Comparison ---")
display(results_df)

✅ STEP 6: Running inference on the test set...
==((====))==  Unsloth 2025.8.6: Fast Qwen2 patching. Transformers: 4.55.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Generating responses: 100%|██████████| 50/50 [04:38<00:00,  5.56s/it]



✅ Inference complete for both models.

Calculating evaluation metrics...


Downloading builder script: 0.00B [00:00, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules: 0.00B [00:00, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]


--- Performance Comparison ---


Unnamed: 0,rougeL,bleu,semantic_similarity
Base Model,0.22744,0.03152,0.634131
Fine-Tuned Model,0.33041,0.045207,0.69262


In [7]:

print("✅ STEP 7: Generating final report and displaying examples...")

# --- Display Qualitative Examples ---
example_df = pd.DataFrame({
    "Instruction": [s['instruction'] for s in test_subset.select(range(5))],
    "Input": [s['input'] for s in test_subset.select(range(5))],
    "Ground Truth": ground_truths[:5],
    "Fine-Tuned Model Response": finetuned_model_responses[:5],
})

print("\n--- Example Model Outputs ---")
pd.set_option('display.max_colwidth', None)
display(example_df.style.set_properties(**{'text-align': 'left', 'white-space': 'normal'}))

# --- Generate Report File ---
def generate_report():
    report_str = "======================================================================\n"
    report_str += "         ASSIGNMENT 5: QWEN2-0.5B FINE-TUNING REPORT\n"
    report_str += "======================================================================\n\n"

    # Section 1: Model Details
    report_str += "----------------------------------------------------------------------\n"
    report_str += "SECTION 1: MODEL DETAILS\n"
    report_str += "----------------------------------------------------------------------\n"
    report_str += "1.  **Model Name**: Qwen/Qwen2-0.5B-Instruct\n"
    report_str += "2.  **Number of Parameters**: 0.5 Billion\n"
    report_str += "3.  **Fine-Tuning Method**: LoRA (Low-Rank Adaptation) using Unsloth\n\n"

    # Section 2: Experiment Results
    report_str += "----------------------------------------------------------------------\n"
    report_str += "SECTION 2: EXPERIMENT RESULTS\n"
    report_str += "----------------------------------------------------------------------\n"
    report_str += "1.  **Prompt Used for Generation**:\n"
    report_str += f"    ```\n    {alpaca_prompt.format('{instruction}', '{input}', '{response}')}\n    ```\n\n"

    # Add metrics table
    report_str += "2.  **Performance Metrics**:\n\n"
    report_str += results_df.to_markdown() + "\n\n"

    # Add five examples
    report_str += "3.  **Five Example Results**:\n"
    for i in range(5):
        report_str += f"\n    ------------------------------- Example {i + 1} -------------------------------\n"
        report_str += f"    **Instruction**: {example_df.iloc[i]['Instruction']}\n"
        report_str += f"    **Input**: {example_df.iloc[i]['Input']}\n"
        report_str += f"    **Ground Truth Response**: {example_df.iloc[i]['Ground Truth']}\n"
        report_str += f"    **Fine-Tuned Model Response**: {example_df.iloc[i]['Fine-Tuned Model Response']}\n"

    report_str += "\n========================= END OF REPORT ========================="
    return report_str

# Write the report to a file
report_content = generate_report()
with open("report.txt", "w") as f:
    f.write(report_content)

print("\n✅ `report.txt` has been generated. You can download it from the file panel on the left.")

✅ STEP 7: Generating final report and displaying examples...

--- Example Model Outputs ---


Unnamed: 0,Instruction,Input,Ground Truth,Fine-Tuned Model Response
0,Rearrange the following sentence to make the sentence more interesting.,She left the party early,"Early, she left the party.",She made an early departure from the party.
1,"Let f(x) = {[ -x - 3 if x ≤ 1,; x/2 + 1 if x > 1. ]. Find the sum of all values of x such that f(x) = 0.",,"We solve the equation f(x) = 0 on the domains x ≤ 1 and x > 1. If x ≤ 1, then f(x) = -x - 3, so we want to solve -x - 3 = 0. The solution is x = -3, which satisfies x ≤ 1. If x > 1, then f(x) = x/2 + 1, so we want to solve x/2 + 1 = 0. The solution is x = -2, but this value does not satisfy x > 1. Therefore, the only solution is x = -3.","To find the sum of all values of x for which \(f(x) = 0\), we need to solve the equation \(-x - 3 = 0\) and \(\frac{x}{2} + 1 = 0\). For the first equation, solving for \(x\) gives us \(x = 3\). For the second equation, solving for \(x\) gives us \(x = -2\). So, the sum of all values of \("
2,Compose a haiku poem about a summer day.,,"Warm breeze on my face Endless sun brings joy and peace Summer, please don't fade","Summer's warm breeze, As flowers bloom in every hue, Nature's rhythm is at work."
3,What methods can be used to improve the accuracy of machine learning models?,,"Here are several methods to improve the accuracy of machine learning models: 1. Gathering and cleaning more data: The more data the model is trained on, the more patterns it can recognize, and the better it can generalize to new data. It’s also important to clean the data to handle missing, inconsistent or duplicate information. 2. Data preprocessing: Several preprocessing techniques like normalization, standardization, encoding, and feature selection can improve model accuracy by making the data more suitable for the learning algorithm. 3. Selecting the right algorithm: Different algorithms are suited to different types of data and problems. Experimenting with various algorithms, including ensemble methods that combine several algorithms, can boost accuracy. 4. Hyperparameter tuning: Hyperparameters are the fixed parameters that control the behavior of the learning algorithm. Tuning them can sometimes result in large improvements in accuracy. 5. Regularization: Regularization is a technique to prevent overfitting by adding a penalty term to the loss function. It encourages the model to use all the input features in a more balanced way, thus resulting in better generalization on new data. 6. Cross-validation: Cross-validation is a technique to evaluate how well the model generalizes on new, unseen data. By using cross-validation to help choose the best combination of preprocessing techniques, algorithms, and hyperparameters, it’s possible to increase model accuracy.","There are several methods that can be used to improve the accuracy of machine learning models. Here are some examples: 1. Regularization: This technique is commonly used in machine learning to prevent overfitting, where a model learns patterns or relationships too well from training data. Regularization helps reduce the size of the model's parameters and prevents it from learning patterns that are similar to those already learned. 2. Tuning hyperparameters: By tuning the hyperparameters of your model, you can adjust how much"
4,Fill in the blanks to complete the sentence.,Global warming can be reversed by reducing ________ and __________.,Global warming can be reversed by reducing greenhouse gas emissions and deforestation.,The global warming can be reversed by reducing our consumption of fossil fuels and increasing renewable energy use.



✅ `report.txt` has been generated. You can download it from the file panel on the left.
