**Environment Setup**


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install unsloth



In [None]:
import torch
torch.cuda.is_available()  # Should return True


True

In [None]:
!wandb login 191de93ace5012ba24c39b6ef2b8c178c207761f


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin


**Dataset Preparation**
***Load Medical COT dataset from Hugging face***


In [None]:
from datasets import load_dataset

dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en")


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


**Preprocessing**


In [None]:
from datasets import load_dataset

dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en")
dataset["train"].column_names


['Question', 'Complex_CoT', 'Response']

In [None]:
def format_to_think_response(example):
    return {
        "text": f"<think>{example['Complex_CoT']}</think> <response>{example['Response']}</response>"
    }


In [None]:
formatted_dataset = dataset["train"].map(format_to_think_response)


Map:   0%|          | 0/19704 [00:00<?, ? examples/s]

In [None]:
formatted_dataset[0]["text"]

"<think>Okay, let's see what's going on here. We've got sudden weakness in the person's left arm and leg - and that screams something neuro-related, maybe a stroke?\n\nBut wait, there's more. The right lower leg is swollen and tender, which is like waving a big flag for deep vein thrombosis, especially after a long flight or sitting around a lot.\n\nSo, now I'm thinking, how could a clot in the leg end up causing issues like weakness or stroke symptoms?\n\nOh, right! There's this thing called a paradoxical embolism. It can happen if there's some kind of short circuit in the heart - like a hole that shouldn't be there.\n\nLet's put this together: if a blood clot from the leg somehow travels to the left side of the heart, it could shoot off to the brain and cause that sudden weakness by blocking blood flow there.\n\nHmm, but how would the clot get from the right side of the heart to the left without going through the lungs and getting filtered out?\n\nHere's where our cardiac anomaly com

In [None]:
dataset["train"][0]

{'Question': 'Given the symptoms of sudden weakness in the left arm and leg, recent long-distance travel, and the presence of swollen and tender right lower leg, what specific cardiac abnormality is most likely to be found upon further evaluation that could explain these findings?',
 'Complex_CoT': "Okay, let's see what's going on here. We've got sudden weakness in the person's left arm and leg - and that screams something neuro-related, maybe a stroke?\n\nBut wait, there's more. The right lower leg is swollen and tender, which is like waving a big flag for deep vein thrombosis, especially after a long flight or sitting around a lot.\n\nSo, now I'm thinking, how could a clot in the leg end up causing issues like weakness or stroke symptoms?\n\nOh, right! There's this thing called a paradoxical embolism. It can happen if there's some kind of short circuit in the heart - like a hole that shouldn't be there.\n\nLet's put this together: if a blood clot from the leg somehow travels to the l

**Split Dataset**

In [None]:
train_dataset = formatted_dataset.select(range(100, len(formatted_dataset)))
val_dataset = formatted_dataset.select(range(100))


**Model Selection and Fine Tunning Strategy**
*Load LLama 3.2B Quantized Model using Unsloth*

In [None]:
from uunslothunslothnsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-Instruct-unsloth-bnb-4bit",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.6.2: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


**Apply LORA Fine Tunning**

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing=True,

)


Unsloth 2025.6.2 patched 16 layers with 16 QKV layers, 16 O layers and 16 MLP layers.


**Fine Tunning & Tracking with wandb**


In [None]:
import os
os.environ["TRITON_DISABLE_LINE_INFO"] = "1"
os.environ["TRITON_CACHE_DIR"] = "/tmp/triton_cache"

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
args = TrainingArguments(
    per_device_train_batch_size = 1,
    gradient_accumulation_steps = 4,
    warmup_steps = 5,
    max_steps = 100,
    learning_rate = 2e-5,
    logging_steps = 1,
    output_dir = "outputs",
    report_to = "wandb",
    optim = "paged_adamw_8bit",
)



In [None]:
def formatting_func(example):
    return example["text"]

In [None]:
trainer = SFTTrainer(
    model = model,
    train_dataset = train_dataset,
    eval_dataset = val_dataset,
    peft_config = model.peft_config,
    formatted_function =formatting_func,
    tokenizer =tokenizer,
    args =args,
    dataset_text_field = "text",
    max_seq_length =1024
      # ✅ Required
)


Unsloth: Tokenizing ["text"]:   0%|          | 0/19604 [00:00<?, ? examples/s]

In [None]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 19,604 | Num Epochs = 1 | Total steps = 100
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
 "-____-"     Trainable parameters = 11,272,192/1,000,000,000 (1.13% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,2.5166
2,2.3829
3,2.1449
4,2.3398
5,2.5549
6,2.3307
7,2.1829
8,2.2359
9,2.123
10,2.3499


TrainOutput(global_step=100, training_loss=2.114664553403854, metrics={'train_runtime': 136.7248, 'train_samples_per_second': 2.926, 'train_steps_per_second': 0.731, 'total_flos': 1305763939811328.0, 'train_loss': 2.114664553403854})

**Save and Upload on Hugging face**

In [None]:
model.save_pretrained("lora_medical_adapter")

In [None]:
tokenizer.save_pretrained("tokenizer")

('tokenizer/tokenizer_config.json',
 'tokenizer/special_tokens_map.json',
 'tokenizer/chat_template.jinja',
 'tokenizer/tokenizer.json')

**Upload Via Hugging Face**

In [None]:
!pip install -q huggingface_hub
!huggingface-cli login



    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) Y
Token is valid (permission: fineGrained).
The token `Medical_Sft_finetune` has been saved to /root/.cache/huggingface/stored_tokens
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might ha

In [None]:
from huggingface_hub import notebook_login
from peft import PeftModel

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

**Save LORA Adapter**

In [None]:
model.save_pretrained("llama3-3b-lora-medcot")

**Save Tokenizer**

In [None]:
tokenizer.save_pretrained("llama3-3b-lora-medcot")

('llama3-3b-lora-medcot/tokenizer_config.json',
 'llama3-3b-lora-medcot/special_tokens_map.json',
 'llama3-3b-lora-medcot/chat_template.jinja',
 'llama3-3b-lora-medcot/tokenizer.json')

**Push to Hugging Face Hub Using CLI**

In [None]:
model.push_to_hub("noor_ul_saballama3-3b-lora-medcot")
tokenizer.push_to_hub("noor_ul_saballama3-3b-lora-medcot")


Uploading...:   0%|          | 0.00/45.1M [00:00<?, ?B/s]

Saved model to https://huggingface.co/noor_ul_saballama3-3b-lora-medcot


README.md:   0%|          | 0.00/5.18k [00:00<?, ?B/s]

Uploading...:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

**Load for Inference**

In [None]:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

In [None]:
model = AutoModelForCausalLM.from_pretrained(
    "NoorUlSaba/Noor_ul_saballama3-3b-lora-medcot",  # ✅ Correct model name
    device_map="auto",
    trust_remote_code=True
)

adapter_config.json:   0%|          | 0.00/876 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/45.1M [00:00<?, ?B/s]

In [None]:
base_model = AutoModelForCausalLM.from_pretrained("NoorUlSaba/Noor_ul_saballama3-3b-lora-medcot",  load_in_4bit=True)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


In [None]:
tokenizer = AutoTokenizer.from_pretrained("NoorUlSaba/Noor_ul_saballama3-3b-lora-medcot")

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/3.83k [00:00<?, ?B/s]

In [None]:
model = PeftModel.from_pretrained(base_model, "NoorUlSaba/Noor_ul_saballama3-3b-lora-medcot")



In [2]:
from google.colab import files
uploaded = files.upload()


Saving PEFT_Fine_Tune_of_LLAMA_on_Medical_Chain_of_thought (3).ipynb to PEFT_Fine_Tune_of_LLAMA_on_Medical_Chain_of_thought (3).ipynb


In [3]:
for fn in uploaded.keys():
    print(f"Uploaded file path: /content/{fn}")


Uploaded file path: /content/PEFT_Fine_Tune_of_LLAMA_on_Medical_Chain_of_thought (3).ipynb


In [5]:
import nbformat

file_path = "/content/PEFT_Fine_Tune_of_LLAMA_on_Medical_Chain_of_thought (3).ipynb"  # replace with your filename

with open(file_path, "r", encoding="utf-8") as f:
    notebook = nbformat.read(f, as_version=4)
if "widgets" in notebook.metadata:
    del notebook.metadata["widgets"]

with open(file_path, "w", encoding="utf-8") as f:
    nbformat.write(notebook, f)

print("Fixed metadata and saved notebook.")


Fixed metadata and saved notebook.
