##Checking the GPU availability

We should make sure Colab is using a GPU.

In [2]:
import torch
torch.cuda.is_available()

True

Perfect,we're good to go now.

## Installing the required Libraries

In [3]:
pip install unsloth wandb torch transformers datasets accelerate peft bitsandbytes

Collecting unsloth
  Downloading unsloth-2025.1.8-py3-none-any.whl.metadata (53 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/53.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.9/53.9 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.45.1-py3-none-manylinux_2_24_x86_64.whl.metadata (5.8 kB)
Collecting unsloth_zoo>=2025.1.4 (from unsloth)
  Downloading unsloth_zoo-2025.1.5-py3-none-any.whl.metadata (16 kB)
Collecting xformers>=0.0.27.post2 (from unsloth)
  Downloading xformers-0.0.29.post2-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting tyro (from unsloth)
  Downloading tyro-0.9.13-py3-none-any.whl.metadata (9.4 kB)
Collecting trl!=0.9.0,!=0.9.1,!=0.9.2,!=0.9.3,>=0.7.9 (from unsloth)
  Downloading trl-0.14.0-py3-none-any.whl.metadata (12 kB)
Collecting

## Importing all packages


In [1]:
## For fine-tuning
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from unsloth import is_bfloat16_supported

# Hugging Face modules
from huggingface_hub import login # To let me login to the API
from transformers import TrainingArguments # Defines the training hyperparameters
from datasets import load_dataset # to lemme load the fine-tuning datasets

# Import weight and biases
import wandb




🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


## Adding the API keys to the Colab notebooks and login to HF and WB

In [4]:
import os
hugging_face_token = os.getenv("HUGGING_FACE_TOKEN")
wnb_token = os.getenv("WANDB_API_KEY")

print("Hugging Face Token:", hugging_face_token[:5] + "*****")  # Masquer la sortie
print("W&B Token:", wnb_token[:5] + "*****")  # Masquer la sortie

Hugging Face Token: hf_WN*****
W&B Token: 92195*****


In [5]:
hugging_face_token = os.getenv("HUGGING_FACE_TOKEN")
wnb_token = os.getenv("WANDB_API_KEY")

# Login to HF
login(hugging_face_token)
print ("Connected succesfully to Hugging Face")

# Login to W&B
wandb.login(key=wnb_token)
print("Connected succesfully to Weight & Biases")
run=wandb.init(
    project="Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical Database",
    job_type="training",
    anonymous="allow"
)


Connected succesfully to Hugging Face


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mfifahdgaming00[0m ([33mfifahdgaming00-insea[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Connected succesfully to Weight & Biases


## Loading DeepSeek R1 and the Tokenizer

We'll use the 4-bit quantization to **allow the LLM to run efficiently on the GPUs** without needing massive amounts of memory

In [6]:
# Setting the parameters
max_seq_length=2048 # Defines the max sequence length a model can handle
dtype=None # Set to default
load_in_4bit= True

# Loading the DeepSeek R1 model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/DeepSeek-R1-Distill-lLaMA-8B",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit
)

==((====))==  Unsloth 2025.1.8: Fast Llama patching. Transformers: 4.47.1.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/231 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/52.9k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Perfect ! We're good to go now and Test it on a **medical use-case before fine-tuning it.**

## Testing DeepSeek R1 on a financial use-case before fine-tuning it.

Defining a system prompt :

In [7]:
prompt_style=""" Below is an instruction that describes a task. Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}""" #this think tag is the chain-of-thought.

## Running an inference on the model
Let's test the model by asking it a financial question and generating a response.
this process involves the following steps:

1-Define a test qstn related to a financial case.

2-Format the question using the prompt above (prompt_style) to ensure the model follows a logical reasoning process.

3-Tokenizating the input and movint it the GPU for faster inference.

4-Decode the output tokens into text in order to obtain a final readable answer

In [8]:
# Creating a test question
question= """ A 56-year-old male with a history of hypertension and type 2 diabetes
              presents to the emergency department with acute-onset chest pain radiating to the left arm.
              His ECG shows ST-segment elevation in leads II, III, and aVF.
              Based on this presentation, what is the most likely diagnosis,
              and what immediate steps should be taken for management? Provide a step-by-step clinical reasoning process leading to your conclusion."""

# Enable optimized inference mode for Unsloth models in order to improve speed and memory efficiency
FastLanguageModel.for_inference(model)

# Format the question using the structured prompt ('prompt_style') and tokenize it
inputs=tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generating the response using the pre-trained model (non fine-tuned)
outputs=model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,  # for the model to handle padding and empty data
    max_new_tokens=1200,
    use_cache=True,

)

# Decode the generated output tokens into readable text
response=tokenizer.batch_decode(outputs)

# Extract and print only the relevant answer part (without the prompt)
print(response[0].split("### Response:")[1])





<think>
Okay, so I'm trying to figure out the most likely diagnosis for this 56-year-old man who comes into the emergency department with acute-onset chest pain radiating to his left arm. He's got a history of hypertension and type 2 diabetes. His ECG shows ST-segment elevation in leads II, III, and aVF. 

Alright, let's break this down. The key symptoms here are the chest pain and the ECG changes. The pain is radiating to the left arm, which makes me think about the location of the heart. The left arm corresponds to the left anterior descending (LAD) coronary artery. 

ST-segment elevation is a common finding in acute coronary syndrome (ACS). Specifically, the presence of ST elevation in multiple leads, especially II, III, and aVF, points towards a left ventricular (LV) aneurysm or a transmural myocardial infarction (MI). The aVF lead is in the left lower quadrant, so if there's elevation there along with II and III, it's more indicative of a left-sided issue rather than right.

Now,

After reading the 'thinking' part of the response above, we'll **notice that the reasoning process wasn't concise and specific + the answer was short**.
We want the final answer to be consistent in a specific style ( by a medical expertas an example).

=> **Fine-tuning** 🙂

# Fine-tuning Process

In [9]:
train_prompt_style=""" Below is an instruction that describes a task. Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
{}

### Response:
<think>
{}
</think>
{}"""

### Download the fine-tuning dataset and format it for fine-tuning

Since we are focusing on the medical domain, we will use **the Medical O1 reasoning SFT dataset from Hugging Face.**
[click_here](https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT)

This dataset was used to fine-tune HuatuoGPT-O1, a medical LLM specifically designed for advanced medical reasoning.
It incorporates Chain of Thought (CoT) reasoning, making it well-suited for complex diagnostic and treatment-based tasks.


**Reference:**
@misc{chen2024huatuogpto1medicalcomplexreasoning,
      title={HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs},
      author={Junying Chen and Zhenyang Cai and Ke Ji and Xidong Wang and Wanlong Liu and Rongsheng Wang and Jianye Hou and Benyou Wang},
      year={2024},
      eprint={2412.18925},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.18925},
}


In [10]:
# Downlaod the Dataset using HF
dataset=load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en",split= "train[0:500]",trust_remote_code=True)
dataset

README.md:   0%|          | 0.00/1.25k [00:00<?, ?B/s]

medical_o1_sft.json:   0%|          | 0.00/74.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25371 [00:00<?, ? examples/s]

Dataset({
    features: ['Question', 'Complex_CoT', 'Response'],
    num_rows: 500
})

In [11]:
# Show an entry from the dataset
dataset[1]

{'Question': 'A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis?',
 'Complex_CoT': "Alright, let’s break this down. We have a 45-year-old man here, who suddenly starts showing some pretty specific symptoms: dysarthria, shuffling gait, and those intention tremors. This suggests something's going wrong with motor control, probably involving the cerebellum or its connections.\n\nNow, what's intriguing is that he's had a history of alcohol use, but he's been off it for the past 10 years. Alcohol can do a number on the cerebellum, leading to degeneration, and apparently, the effects can hang around or even appear long after one stops drinking.\n\nAt first glance, these symptoms look like they could be some kind of chronic degeneration, maybe something like alcoholic cerebellar degeneration, 

Now , let's structure the finetuning dataset according to the updated prompt_style (train_prompt_style)



  *   **structure :** *question => cot reasoning => final response*
  *   Ensures the model is follwoing **a consistent pattern.**




In [12]:
 # Formatting the dataset to fit our train prompt style
EOS_TOKEN=tokenizer.eos_token
EOS_TOKEN

'<｜end▁of▁sentence｜>'

During fine-tuning, we want to ensure that each response **properly ends with this token.**



In [13]:
# Define the formatting prompt function
def formatting_prompts_function(examples):
  inputs=examples["Question"]
  cots= examples["Complex_CoT"]
  outputs=examples["Response"]

  texts=[]

  for input,cot,output in zip(inputs,cots,outputs):
    text=train_prompt_style.format(input,cot,output+EOS_TOKEN)
    texts.append(text)

  return {
        "text": texts,
    }



In [14]:
# Update the initial dataset into a dataset ready to finetune

dataset_finetune=dataset.map(formatting_prompts_function,batched=True)
dataset_finetune['text'][0]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

" Below is an instruction that describes a task. Write a response that appropriately completes the request.\nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.\nPlease answer the following medical question.\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a 

## Setting up the model using LoRA (Low-Rank Adaptation)

instead of updating the *8b parameters* , we'll focus only on few ones.

=> **Less memory will be used, and the training will be accelerated.**

In [18]:
# Apply LoRA fine-tuning to the model

model_lora=FastLanguageModel.get_peft_model(
    model,
    r= 16, # Determines the size of the trainable adapters
    target_modules=["q_proj",
                    "k_proj",
                    "v_proj",
                    "o_proj",
                    "gate_proj",
                    "up_proj",
                    "down_proj",
    ], #LoRA will be applied there in order to have a balance between flexibility and performance.
    lora_alpha=16,
    lora_dropout=0, # Full retention of info
    bias="none",
    use_gradient_checkpointing="unsloth", # Saving memory
    random_state=3407,
    use_rslora=False,
    loftq_config= None  # Disabled cause we have 4-bit quantization

)

Unsloth 2025.1.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


Below , we'll **fine-tune the model with a minimum GPU memory** using advanced optimisation techniques like:


*  Gradient Accumulation
*  16-bit floating point precision (FP16/BF16)
*   Optimisation 8-bit (adamw_8bit)






In [21]:
# Initialise the trainer
trainer=SFTTrainer(
    model=model_lora,
    train_dataset=dataset_finetune,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,  # accelerate the loading and the preprocessing of data.
    tokenizer=tokenizer,

    #Define the training hyperparameters (arguments)
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        num_train_epochs=1,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01, # small regularization to avoid overfitting
        lr_scheduler_type = "linear",
        seed=3407,
        output_dir= "outputs",
    )
)




Map (num_proc=2):   0%|          | 0/500 [00:00<?, ? examples/s]

## Model Training

In [22]:
# Start the fine-tuning process
trainer_stats=trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 500 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
10,1.9014
20,1.4845
30,1.4254
40,1.3311
50,1.3673
60,1.3382


In [23]:
# Save the fine-tuned model
wandb.finish()

0,1
train/epoch,▁▂▄▅▇██
train/global_step,▁▂▄▅▇██
train/grad_norm,█▂▃▁▂▂
train/learning_rate,█▇▅▄▂▁
train/loss,█▃▂▁▁▁

0,1
total_flos,1.7732935437828096e+16
train/epoch,0.96
train/global_step,60.0
train/grad_norm,0.26339
train/learning_rate,0.0
train/loss,1.3382
train_loss,1.47465
train_runtime,1263.4209
train_samples_per_second,0.38
train_steps_per_second,0.047


## Run the Model and test it out

In [24]:
# Creating a test question
question= """ A 56-year-old male with a history of hypertension and type 2 diabetes
              presents to the emergency department with acute-onset chest pain radiating to the left arm.
              His ECG shows ST-segment elevation in leads II, III, and aVF.
              Based on this presentation, what is the most likely diagnosis,
              and what immediate steps should be taken for management? Provide a step-by-step clinical reasoning process leading to your conclusion."""

# Enable optimized inference mode for Unsloth models in order to improve speed and memory efficiency
FastLanguageModel.for_inference(model_lora)

# Format the question using the structured prompt ('prompt_style') and tokenize it
inputs=tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generating the response using the pre-trained model (non fine-tuned)
outputs=model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,  # for the model to handle padding and empty data
    max_new_tokens=1200,
    use_cache=True,

)

# Decode the generated output tokens into readable text
response=tokenizer.batch_decode(outputs)

# Extract and print only the relevant answer part (without the prompt)
print(response[0].split("### Response:")[1])





<think>
Alright, let's see what we've got here. We've got a 56-year-old guy who's been dealing with hypertension and type 2 diabetes. He's come in with sudden chest pain that's spreading to his left arm. Hmm, that's definitely a red flag for a heart issue. 

Now, his ECG shows ST-segment elevation in leads II, III, and aVF. That's interesting. I know that ST-segment elevation usually points to a heart attack, but it's not always straightforward. Let's think about this. If it's a heart attack, the ST elevation is usually in the left precordial leads (I, II, III) or the inferior wall leads (aVF). Here, we're seeing ST elevation in II, III, and aVF. That makes me think it's a inferior wall myocardial infarction. 

Oh, and there's also chest pain. That's classic for a heart attack, right? But let's not jump to conclusions yet. We need to make sure it's not something else. 

So, I'm thinking, what else could it be? Could it be a left ventricular aneurysm? That's a possibility, but I haven'