<a href="https://colab.research.google.com/github/Linux-Server/AI_Engineering/blob/main/Phi_4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### [Choose the Right Model + Method]("https://docs.unsloth.ai/get-started/fine-tuning-llms-guide")
 - LoRA: Fine-tunes small, trainable matrices in 16-bit without updating all model weights.  
 - QLoRA: Combines LoRA with 4-bit quantization to handle very large models with minimal resources.



In [None]:
%%capture
#@title Install unsloth
!pip install unsloth
!pip install trl
!pip install weave
!pip install wandb --upgrade

 - Load the model and tokenizer

In [None]:
 from unsloth import FastLanguageModel

 model_name = "unsloth/Phi-4"

 model,tokenizer =  FastLanguageModel.from_pretrained(
     model_name=model_name,
     load_in_4bit=True,
     max_seq_length=2048

     )


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.9.11: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 22.161 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

## Infrenece

In [None]:

FastLanguageModel.for_inference(model)

prompt = [{"role": "user", "content": "Tell me a kerala?" }]

raw_model_input = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)

model_input = tokenizer(raw_model_input, return_tensors="pt").to(model.device)

raw_model_input

'<|im_start|>user<|im_sep|>Tell me a kerala?<|im_end|><|im_start|>assistant<|im_sep|>'

In [None]:
tokenizer.batch_decode(model.generate(**model_input, max_new_tokens=100, temperature=0))

['<|im_start|>user<|im_sep|>Tell me a kerala?<|im_end|><|im_start|>assistant<|im_sep|>Kerala, a state in the southwestern region of India, is renowned for its unique culture, stunning natural beauty, and rich history. Here are some highlights about Kerala:\n\n1. **Geography and Climate**: Kerala is bordered by the Western Ghats mountain range to the east and the Arabian Sea to the west. It has a tropical climate with high humidity and abundant rainfall, especially during the monsoon season.\n\n2. **Backwaters**: Kerala is famous for its extensive network of interconnected canals,']

In [None]:
tokenizer.batch_decode(model.generate(**model_input, max_new_tokens=100, temperature=1.9, min_p = 0.1))

['<|im_start|>user<|im_sep|>Tell me a kerala?<|im_end|><|im_start|>assistant<|im_sep|>Kerala, a state in the southwestern region of India, is renowned for its unique culture, stunning natural beauty, and rich history. Here are some highlights about Kerala:\n\n1. **Geography and Climate**: Kerala is bordered by the Western Ghats mountain range to the east and the Arabian Sea to the west. It has a tropical climate with high humidity and abundant rainfall, especially during the monsoon season.\n\n2. **Backwaters**: Kerala is famous for its extensive network of interconnected canals,']

### PEFT

In [None]:
model = FastLanguageModel.get_peft_model(
    model=model,
    r=16,
    lora_alpha=16,
    lora_dropout=0.1,
    bias="none",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    use_gradient_checkpointing="unsloth",
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ

)

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.1.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2025.9.11 patched 40 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


### Data Prep

In [None]:
from unsloth.chat_templates import get_chat_template


tokenizer = get_chat_template(tokenizer=tokenizer, chat_template="phi-4")


In [None]:
from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt

dataset = load_dataset("mlabonne/FineTome-100k", split = "train[:1%]")

dataset = standardize_sharegpt(dataset)


In [None]:
dataset[5]["conversations"]

[{'content': 'How do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?',
  'role': 'user'},
 {'content': 'Astronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.',
  'role': 'assistant'}]

In [None]:
def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [
        tokenizer.apply_chat_template(
            convo, tokenize = False, add_generation_prompt = False
        )
        for convo in convos
    ]
    return { "text" : texts, }


In [None]:
dataset = dataset.map(formatting_prompts_func, batched=True)

In [None]:
dataset[5]["text"]

'<|im_start|>user<|im_sep|>How do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|im_end|><|im_start|>assistant<|im_sep|>Astronomers make use of the unique spectral fingerprints of elements found in stars. These elements emit and absorb light at specific, known wavelengths, forming an absorption spectrum. By analyzing the light received from distant stars and comparing it to the laboratory-measured spectra of these elements, astronomers can identify the shifts in these wavelengths due to the Doppler effect. The observed shift tells them the extent to which the light has been redshifted or blueshifted, thereby allowing them to calculate the speed of the star along the line of sight relative to Earth.<|im_end|>'

In [None]:
from trl import SFTConfig, SFTTrainer
from transformers import DataCollatorForSeq2Seq

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = 2048,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    packing = False, # Can make training 5x faster for short sequences.
    args = SFTConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 15,
        num_train_epochs = 1, # Set this for 1 full training run.
        # max_steps = 30,
        learning_rate = 1e-5,
        logging_steps = 10,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "./Phi-unsloth-1k",
        report_to = "wandb", # Use this for WandB etc
    ),
)


In [None]:
from unsloth.chat_templates import train_on_responses_only

trainer = train_on_responses_only(
    trainer,
    instruction_part="<|im_start|>user<|im_sep|>",
    response_part="<|im_start|>assistant<|im_sep|>",
)

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,000 | Num Epochs = 1 | Total steps = 125
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 65,536,000 of 14,725,043,200 (0.45% trained)
[34m[1mwandb[0m: Currently logged in as: [33msachin6624[0m ([33msachin6624-axomium-labs[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: Detected [huggingface_hub.inference, openai] in use.
[34m[1mwandb[0m: Use W&B Weave for improved LLM call tracing. Weave is installed but not imported. Add `import weave` to the top of your script.
[34m[1mwandb[0m: For more information, check out the docs at: https://weave-docs.wandb.ai/


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,0.7308
20,0.7316
30,0.783
40,0.6854
50,0.6262
60,0.6246
70,0.6919
80,0.6982
90,0.6834
100,0.6726


In [None]:
 from unsloth import FastLanguageModel

 model_name = "./Phi-unsloth-1k/checkpoint-125"

 model,tokenizer =  FastLanguageModel.from_pretrained(
     model_name=model_name,
     load_in_4bit=True,
     max_seq_length=2048

     )


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.9.11: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 22.161 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Unsloth: Will load ./Phi-unsloth-1k/checkpoint-125 as a legacy tokenizer.
Unsloth 2025.9.11 patched 40 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


In [None]:
FastLanguageModel.for_inference(model)

prompt = [{"role": "user", "content": "Describe how the COVID-19 virus spreads." }]

raw_model_input = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)

model_input = tokenizer(raw_model_input, return_tensors="pt").to(model.device)

tokenizer.batch_decode(model.generate(**model_input, max_new_tokens=100, temperature=0))

['<|im_start|> user <|im_sep|> Describe how the COVID-19 virus spreads.<|im_end|> <|im_start|> assistant <|im_sep|> The COVID-19 virus, caused by the SARS-CoV-2 virus, primarily spreads from person to person through respiratory droplets. Here are the main ways it spreads:\n\n1. **Respiratory Droplets**: When an infected person coughs, sneezes, talks, or breathes, they release respiratory droplets into the air. These droplets can be inhaled by people nearby, typically within about 6 feet, leading to infection.\n\n2. **Close Contact**: The']

In [None]:
model.push_to_hub("sachin6624/Phi-unsloth-1k-FineTome-100k-1epoch")
tokenizer.push_to_hub("sachin6624/Phi-unsloth-1k-FineTome-100k-1epoch")

README.md:   0%|          | 0.00/583 [00:00<?, ?B/s]

Processing Files (0 / 0)      : |          |  0.00B /  0.00B            

New Data Upload               : |          |  0.00B /  0.00B            

  ...adapter_model.safetensors:   0%|          | 62.9kB /  262MB            

Saved model to https://huggingface.co/sachin6624/Phi-unsloth-1k-FineTome-100k-1epoch


In [None]:
from google.colab import userdata
from huggingface_hub import login

login(userdata.get("hugging_face"))