<a href="https://colab.research.google.com/github/HasinduUdantha/huggingface/blob/main/llama_3_3b_instruct_finetune_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Step 1.** Install Python Packages

In [1]:
!pip install -U xformers --index-url https://download.pytorch.org/whl/cu121
!pip install --no-deps packaging ninja einops flash-attn trl peft accelerate bitsandbytes
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

Looking in indexes: https://download.pytorch.org/whl/cu121
Collecting xformers
  Downloading https://download.pytorch.org/whl/cu121/xformers-0.0.29.post1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
INFO: pip is looking at multiple versions of torch to determine which version is compatible with other requirements. This could take a while.
Collecting torch==2.5.1 (from xformers)
  Downloading https://download.pytorch.org/whl/cu121/torch-2.5.1%2Bcu121-cp311-cp311-linux_x86_64.whl (780.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m780.5/780.5 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch==2.5.1->xformers)
  Downloading https://download.pytorch.org/whl/cu121/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting nvidia-cuda-runtime-cu12==12.

# **Step 2.** Import Python Packages

In [1]:
import torch
import os
import json
import pandas as pd
from datasets import Dataset, DatasetDict
from datasets import load_dataset
from huggingface_hub import notebook_login
from transformers import TrainingArguments
from trl import SFTTrainer
from unsloth import FastLanguageModel

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


# **Step 3.** Login to Your Hugging Face with hf_token. (write access token)

In [2]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [29]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-3B-bnb-4bit", # or choose "unsloth/Llama-3.2-1B-Instruct"
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

==((====))==  Unsloth 2025.2.4: Fast Llama patching. Transformers: 4.48.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [30]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",

    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

# **Step 4.** Convert your JSON dataset to Llama3 finetuning format


In [8]:
import os
import json
import pandas as pd
from datasets import Dataset, DatasetDict

# Hugging Face User and Dataset Name
huggingface_user = "Hasindu21"
dataset_name = "study-plans"

class Llama3InstructDataset:
    def __init__(self, data):
        self.data = data
        self.prompts = self.create_prompts()

    def create_prompt(self, row):
        """Generates a structured study plan prompt."""
        skill_level = row.get("level", "unknown level")
        available_time = row.get("availableTime", "unspecified")
        daily_study_time = row.get("dailyStudyTime", "unspecified")
        subject = row.get("subject", "a subject")
        goals = ", ".join(row.get("goals", [])) if "goals" in row else "no specific goals"
        preferred_style = ", ".join(row.get("preferredStyle", [])) if "preferredStyle" in row else "any method"
        study_plan = row.get("studyPlan", "Your AI-generated study plan will appear here.")

        return (
            f"Generate a study plan for a {skill_level} student who has {available_time} "
            f"to study {subject}. They can study {daily_study_time} daily and prefer {preferred_style}. "
            f"Their goal is to {goals}.\n\n"
            f"### Study Plan:\n"
            f"{study_plan}"
        )

    def create_prompts(self):
        """Creates prompts for all data entries."""
        return [self.create_prompt(row) for row in self.data]

    def get_dataset(self):
        """Converts prompts to a Pandas DataFrame."""
        return pd.DataFrame({'prompt': self.prompts})


def create_hf_dataset(dataset_df):
    """Converts Pandas DataFrame to Hugging Face DatasetDict."""
    dataset_df.reset_index(drop=True, inplace=True)
    return DatasetDict({"train": Dataset.from_pandas(dataset_df)})


def main():
    """Main function to process data and upload to Hugging Face Hub."""
    input_file = '/content/training-data.json'
    output_dir = 'processed_data'
    os.makedirs(output_dir, exist_ok=True)

    # Load data
    with open(input_file, 'r') as f:
        data = json.load(f)

    # Process dataset
    dataset = Llama3InstructDataset(data)
    df = dataset.get_dataset()
    llama3_dataset = create_hf_dataset(df)

    # Save and push to Hugging Face
    dataset_path = os.path.join(output_dir, "llama3_dataset_eduplanner")
    llama3_dataset.save_to_disk(dataset_path)
    llama3_dataset.push_to_hub(f"{HUGGINGFACE_USER}/{DATASET_NAME}")

    print("Dataset successfully processed and uploaded to Hugging Face Hub!")


In [31]:
import json

with open('/content/training-data.json', 'r') as f:
    data = json.load(f)

# Print sample entry
print(data[:3])

[{'instruction': 'You are EduPlanner - an AI-powered adaptive study planner. Follow these rules:\nInput Analysis: Collect these parameters from users:\nSubject/Topic, Skill Level, Total Available Time, Daily Study Time, Specific Goals, Preferred Learning Style.\nPlan Generation: Create structured daily/weekly schedules.\nFormat output as markdown tables. Include time estimates.\nAdaptive Features: Adjust plans via natural language feedback.\nMotivational Elements: Encourage progress with milestones.', 'input': 'Generate a study plan for a Beginner Level level student who wants to learn Web Development and has 1.5 hours hours per session.', 'output': {'Recommended Course': 'JavaScript project Learn to create a memory Game and more', 'Duration': '1.5 hours', 'Level': 'Beginner Level', 'Topics Covered': 'Web Development'}}, {'instruction': 'You are EduPlanner - an AI-powered adaptive study planner. Follow these rules:\nInput Analysis: Collect these parameters from users:\nSubject/Topic, S

# **Step 5.** LoRa Finetuning Configurations
- "finetuned_model" sets your models name on HF
- "num_train_epochs" sets the number of epochs for training

    (epoch = 1 pass through your entire dataset)

In [32]:
# Defining the configuration for the base model, LoRA and training
config = {
    "hugging_face_username":huggingface_user,
    "model_config": {
        "base_model":"unsloth/Llama-3.2-3B-Instruct-bnb-4bit", # The base model
        "finetuned_model":"Llama-3.2-3B-Instruct-bnb-4bit-Hasindu21-Eduplanner", # The finetuned model
        "max_seq_length": 2048, # The maximum sequence length
        "dtype":torch.float16, # The data type
        "load_in_4bit": True, # Load the model in 4-bit
    },
    "lora_config": {
      "r": 16, # The number of LoRA layers 8, 16, 32, 64
      "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"], # The target modules
      "lora_alpha":16, # The alpha value for LoRA
      "lora_dropout":0, # The dropout value for LoRA
      "bias":"none", # The bias for LoRA
      "use_gradient_checkpointing":True, # Use gradient checkpointing
      "use_rslora":False, # Use RSLora
      "use_dora":False, # Use DoRa
      "loftq_config":None # The LoFTQ configuration
    },
    "training_dataset":{
        "name":f"{huggingface_user}/{dataset_name}", # The dataset name(huggingface/datasets)
        "split":"train", # The dataset split
        "input_field":"prompt", # The input field
    },
    "training_config": {
        "per_device_train_batch_size": 2, # The batch size
        "gradient_accumulation_steps": 4, # The gradient accumulation steps
        "warmup_steps": 5, # The warmup steps
        "max_steps":0, # The maximum steps (0 if the epochs are defined)
        "num_train_epochs": 1, # The number of training epochs(0 if the maximum steps are defined)
        "learning_rate": 2e-4, # The learning rate
        "fp16": not torch.cuda.is_bf16_supported(),  # The fp16
        "bf16": torch.cuda.is_bf16_supported(), # The bf16
        "logging_steps": 1, # The logging steps
        "optim" :"adamw_8bit", # The optimizer
        "weight_decay" : 0.01,  # The weight decay
        "lr_scheduler_type": "linear", # The learning rate scheduler
        "seed" : 42, # The seed
        "output_dir" : "outputs", # The output directory
    }
}

# **Step 6.** Load Llama3-3B, QLoRA & Trainer Model

In [33]:
# Loading the model and the tokinizer for the model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = config.get("model_config").get("base_model"),
    max_seq_length = config.get("model_config").get("max_seq_length"),
    dtype = config.get("model_config").get("dtype"),
    load_in_4bit = config.get("model_config").get("load_in_4bit"),
)

# Setup for QLoRA/LoRA peft of the base model
model = FastLanguageModel.get_peft_model(
    model,
    r = config.get("lora_config").get("r"),
    target_modules = config.get("lora_config").get("target_modules"),
    lora_alpha = config.get("lora_config").get("lora_alpha"),
    lora_dropout = config.get("lora_config").get("lora_dropout"),
    bias = config.get("lora_config").get("bias"),
    use_gradient_checkpointing = config.get("lora_config").get("use_gradient_checkpointing"),
    random_state = 42,
    use_rslora = config.get("lora_config").get("use_rslora"),
    use_dora = config.get("lora_config").get("use_dora"),
    loftq_config = config.get("lora_config").get("loftq_config"),
)

# Loading the training dataset
dataset_train = load_dataset(config.get("training_dataset").get("name"), split = config.get("training_dataset").get("split"))

# Setting up the trainer for the model
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset_train,
    dataset_text_field = config.get("training_dataset").get("input_field"),
    max_seq_length = config.get("model_config").get("max_seq_length"),
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = config.get("training_config").get("per_device_train_batch_size"),
        gradient_accumulation_steps = config.get("training_config").get("gradient_accumulation_steps"),
        warmup_steps = config.get("training_config").get("warmup_steps"),
        max_steps = config.get("training_config").get("max_steps"),
        num_train_epochs= config.get("training_config").get("num_train_epochs"),
        learning_rate = config.get("training_config").get("learning_rate"),
        fp16 = config.get("training_config").get("fp16"),
        bf16 = config.get("training_config").get("bf16"),
        logging_steps = config.get("training_config").get("logging_steps"),
        optim = config.get("training_config").get("optim"),
        weight_decay = config.get("training_config").get("weight_decay"),
        lr_scheduler_type = config.get("training_config").get("lr_scheduler_type"),
        seed = 42,
        output_dir = config.get("training_config").get("output_dir"),
    ),
)

==((====))==  Unsloth 2025.2.4: Fast Llama patching. Transformers: 4.48.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


# **Step 7.** Train Your Finetuned Model

In [34]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 250 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 31
 "-____-"     Number of trainable parameters = 24,313,856


Step,Training Loss
1,4.7302
2,4.7302
3,4.6599
4,4.3935
5,4.0125
6,3.5591
7,2.9969
8,2.4278
9,1.8521
10,1.3071


# **Step 8.** Save Trainer Stats

In [35]:
with open("trainer_stats.json", "w") as f:
    json.dump(trainer_stats, f, indent=4)

# **Step 9.** Save Finetuned Model & Push to HF Hub

In [36]:
model.save_pretrained_gguf(config.get("model_config").get("finetuned_model"), tokenizer, quantization_method = "q4_k_m")
model.push_to_hub_gguf(config.get("model_config").get("finetuned_model"), tokenizer, quantization_method = "q4_k_m")

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 6.19 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 28/28 [00:00<00:00, 28.34it/s]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving Llama-3.2-3B-Instruct-bnb-4bit-Hasindu21-Eduplanner/pytorch_model-00001-of-00002.bin...
Unsloth: Saving Llama-3.2-3B-Instruct-bnb-4bit-Hasindu21-Eduplanner/pytorch_model-00002-of-00002.bin...
Done.
==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: [1] Converting model at Llama-3.2-3B-Instruct-bnb-4bit-Hasindu21-Eduplanner into f16 GGUF format.
The output location will be /content/Llama-3.2-3B-Instruct-bnb-4bit-Hasindu21-Eduplanner/unsloth.F16.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: Llama-3.2-3B-Instruct-bnb-4bit-Hasindu21-Eduplanner
INFO:gguf.gguf_write

100%|██████████| 28/28 [00:00<00:00, 29.28it/s]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving Llama-3.2-3B-Instruct-bnb-4bit-Hasindu21-Eduplanner/pytorch_model-00001-of-00002.bin...
Unsloth: Saving Llama-3.2-3B-Instruct-bnb-4bit-Hasindu21-Eduplanner/pytorch_model-00002-of-00002.bin...
Done.
==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp might take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits might take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q4_k_m'] might take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: Installing llama.cpp. This might take 3 minutes...
Unsloth: [1] Converting model at Llama-3.2-3B-Instruct-bnb-4bit-Hasindu21-Eduplanner into f16 GGUF format.
The output location will be /content/Llama-3.2-3B-Instruct-bnb-4bit-Hasindu21-Eduplanner/unsloth.F16.gguf
This might take 3 minutes...
INFO:hf-to-gguf:Loading model: Llama-3.2-3B-Instruct-bnb-4bit-Hasindu21-Eduplanner
INFO:gguf.gguf_write

# **Step 10.** Test your pretrained model in Colab

In [38]:
# Loading the fine-tuned model and the tokenizer for inference
model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = config.get("model_config").get("finetuned_model"),
        max_seq_length = config.get("model_config").get("max_seq_length"),
        dtype = config.get("model_config").get("dtype"),
        load_in_4bit = config.get("model_config").get("load_in_4bit"),
    )

# Using FastLanguageModel for fast inference
FastLanguageModel.for_inference(model)

system_prompt = f"You are EduPlanner - an AI-powered adaptive study planner. Follow these rules:1. Input Analysis:- Collect these parameters from users: • Subject/Topic (e.g., Python Programming) • Skill Level (beginner/intermediate/expert) • Total Available Time (e.g., 4 weeks)• Daily Study Time (hours)• Specific Goals (max 3)• Preferred Learning Style (videos/reading/practice)2. Plan Generation:- Create structured daily/weekly schedules with: [1] Topic Breakdown[2] Curated Resources[3] Practice Activities[4] Progress Checkpoints- Format output as markdown tables- Include time estimates for each task3. Adaptive Features:- Allow plan adjustments via natural language feedback- Maintain progress tracking between sessions"
# Tokenizing the input and generating the output
prompt = input('TYPE PROMPT TO LLAMA3: ')
inputs = tokenizer(
[
    f"<|start_header_id|>system<|end_header_id|>{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>{prompt}<|end_header_id|>"
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 256, use_cache = True)
tokenizer.batch_decode(outputs, skip_special_tokens = True)

==((====))==  Unsloth 2025.2.4: Fast Llama patching. Transformers: 4.48.2.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

TYPE PROMPT TO LLAMA3: Generate a study plan for an intermediate-level student who wants to learn Python in 4 weeks with 2 hours per day.


['systemYou are EduPlanner - an AI-powered adaptive study planner. Follow these rules:1. Input Analysis:- Collect these parameters from users: • Subject/Topic (e.g., Python Programming) • Skill Level (beginner/intermediate/expert) • Total Available Time (e.g., 4 weeks)• Daily Study Time (hours)• Specific Goals (max 3)• Preferred Learning Style (videos/reading/practice)2. Plan Generation:- Create structured daily/weekly schedules with: [1] Topic Breakdown[2] Curated Resources[3] Practice Activities[4] Progress Checkpoints- Format output as markdown tables- Include time estimates for each task3. Adaptive Features:- Allow plan adjustments via natural language feedback- Maintain progress tracking between sessionsuserGenerate a study plan for an intermediate-level student who wants to learn Python in 4 weeks with 2 hours per day.assistant)**Intermediate Python Study Plan (4 weeks, 2 hours/day)**\n\n**Week 1: Fundamentals and Data Structures**\n\n| Day | Topic | Time Estimate | Resources |\n

In [39]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive
