<a href="https://colab.research.google.com/github/easonlai/deepseek_examples/blob/main/Fine_Tuning_DeepSeek_R1_Example_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
# Install & upgrade necessary libraries.
!pip install --upgrade transformers
!pip install --upgrade bitsandbytes
!pip install --upgrade langchain-community langchain-core
!pip install --upgrade langchain-huggingface
!pip install --upgrade trl

In [None]:
# Display the information about Nvidia GPU devices.
!nvidia-smi

Mon Feb 10 07:25:43 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   44C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [None]:
# Import PyTorch
import torch
from trl import SFTTrainer # Trainer for supervised fine-tuning (SFT)
# Import Hugging Face for Login via API
from huggingface_hub import login
# Import the necessary classes from the transformers library
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
# Importing the TrainingArguments class from the transformers library (Used for defining training configurations)
from transformers import TrainingArguments
# Importing the get_peft_model function from the peft library
# (get_peft_model - Used for obtaining parameter-efficient training models)
# (LoraConfig - Used for configuring the Low-Rank Adaptation for parameter-efficient models)
# (TaskType - Defines different task types for model training)
from peft import get_peft_model, LoraConfig, TaskType
# Importing the load_dataset function from the datasets library (Used for loading datasets for training and evaluation)
from datasets import load_dataset
# Import Colab secrets
from google.colab import userdata
# Import LangChain
from langchain_huggingface import HuggingFacePipeline
from langchain_core.prompts import PromptTemplate
# Importing the wandb library (Used for experiment tracking and logging with Weights and Biases)
import wandb
# Import random library for random generation
import random

In [None]:
# Access saved token secret from Colab.
hugging_face_token = userdata.get('HF_Token_DEV_R')
wnb_token = userdata.get('WNB_Secret_Example_3')

# Login to Hugging Face API with Access Token.
login(hugging_face_token)

# Login to Weights and Biases API with Access Token.
wandb.login(key=wnb_token) # import wandb
run = wandb.init(
    project='Example_3',
    job_type="training",
    anonymous="allow"
)

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33measonlai888[0m ([33measonlai888-minions-app[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


In [None]:
# Define the model name, specifying which pre-trained model to use.
# https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B
model_name = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"

# Set the maximum sequence length for the model's input.
max_seq_length = 2048

# Configure the quantization settings for efficient model loading and inference.
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True, # Enable 4-bit quantization.
    bnb_4bit_use_double_quant=True, # Use double quantization for improved accuracy.
    bnb_4bit_quant_type="nf4", # Specify the type of 4-bit quantization.
    bnb_4bit_compute_dtype=torch.bfloat16 # Set the computation data type to bfloat16 for optimized performance.
)

# Load the tokenizer for the pre-trained model.
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Load the pre-trained language model with specific configurations.
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, # Set the data type for tensors based on CUDA availability.
    device_map="auto",  # Automatically map the model to available devices (CPU/GPU).
    quantization_config=quantization_config  # Apply the 4-bit quantization for efficient model loading and processing.
)

print("Model and tokenizer loaded successfully!")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/3.07k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-000002.safetensors:   0%|          | 0.00/8.67G [00:00<?, ?B/s]

model-00002-of-000002.safetensors:   0%|          | 0.00/7.39G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

Model and tokenizer loaded successfully!


In [None]:
# Create a text generation pipeline using the specified model and tokenizer, setting a maximum of 1200 new tokens for the generated text.
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=1200)

# Wrap the pipeline in a HuggingFacePipeline object, making it compatible with specific Hugging Face integrations and functionalities.
hf = HuggingFacePipeline(pipeline=pipe)

Device set to use cuda:0


In [None]:
# Define a system prompt.
prompt_format = """Question: {question}

Instruction: You are an assistant to a general practitioner doctor who is knowledgeable about some common diseases. You can give the answer, but always suggest consulting a human doctor.

Answer: Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response. Please answer the question in Tranditional Chinese."""

prompt = PromptTemplate.from_template(prompt_format)

chain = prompt | hf

In [None]:
# Crafting a question for inference.
question = """"過去幾天開始出現流鼻涕、喉嚨痛、打噴嚏和輕微發燒的症狀。
"""

# Print the response to the question.
print(chain.invoke({"question": question}))

Question: "過去幾天開始出現流鼻涕、喉嚨痛、打噴嚏和輕微發燒的症狀。


Instruction: You are an assistant to a general practitioner doctor who is knowledgeable about some common diseases. You can give the answer, but always suggest consulting a human doctor.

Answer: Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response. Please answer the question in Tranditional Chinese. "

首先，患者出现流鼻涕、喉嚨痛、打噴嚏和轻微发烧。这些症状可能提示哪些常见的疾病呢？流鼻涕通常与鼻炎（比如鼻窦炎或鼻腔炎）相关联，而喉嚨痛可能与喉炎或咽喉炎有关。打噴嚏可能是由于鼻腔炎或上呼吸道的炎症导致的。轻微发烧可能是体征的一部分，或者是由感染引起的。接下来，我需要考虑这些症状是否可能由同一个病原体引起，比如流感病毒、COVID-19或者其他呼吸道感染。流鼻涕和喉嚨痛在流感中很常见，同时伴随轻微发烧也是可能的。因此，首先要考虑是否是流感或者其他呼吸道感染。另外，打噴嚏可能与鼻腔炎有关，而喉嚨痛可能与咽喉炎有关，这两种情况可以一起发生，尤其是在鼻炎的情况下。考虑到这些症状，建议进行病因诊断，可能需要做鼻腔和喉咙的检查，或者进行病毒检测。同时，建议患者保持休息，多喝水，使用温和的喷嚏剂来缓解症状。如果症状恶化，比如出现呼吸困难、喉咙严重疼痛或者更严重的发烧，应立即就医。建议患者尽早咨询专业医生，以便进行准确的诊断和治疗。"
"

现在，回答问题，按照上述思考过程，以中文自然表达，避免使用专业术语过多，保持口语化。


---

好的，患者最近几天出现流鼻涕、喉嚨痛、打噴嚏和轻微发烧。首先，我需要考虑这些症状可能的病因。流鼻涕和喉嚨痛可能是鼻炎或咽喉炎的表现。打噴嚏可能与鼻腔炎有关，而轻微发烧可能是体温升高的表现，可能由

In [None]:
# Updated training prompt style by adding </think> tag.
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
Please answer the question in Tranditional Chinese.

### Instruction:
You are an assistant to a general practitioner doctor who is knowledgeable about some common diseases. You can give the answer, but always suggest consulting a human doctor.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

In [None]:
# Download and load the dataset using Hugging Face.
# The split parameter specifies that only the first 300 samples from the "train" split are being loaded.
# The trust_remote_code parameter is set to True, allowing the execution of code provided with the dataset.
dataset_full = load_dataset("easonlai/medical-common-diseases-reasoning-SFT", split="train", trust_remote_code=True)

# Define the disease categories and their respective record ranges.
categories = {
    'Common Cold': range(0, 49), # 普通感冒 (Common Cold)
    'Influenza': range(50, 100), # 流感 (Influenza)
    'Gastroenteritis': range(101, 151), # 腸胃炎 (Gastroenteritis)
    'Bronchitis': range(152, 202), # 支氣管炎 (Bronchitis)
    'Sinusitis': range(203, 253), # 鼻竇炎 (Sinusitis)
    'Skin Infections': range(254, 304) # 皮膚感染 (Skin Infections)
}

# Initialize a list to store the selected records.
selected_records = []

# Set the number of records to select.
num_records_to_select = 100

# Calculate the number of records to select from each category proportionally.
records_per_category = num_records_to_select // len(categories)
extra_records = num_records_to_select % len(categories)

for category, record_range in categories.items():
    if category != 'Skin Infections':
        # Randomly select the records.
        selected = random.sample(list(record_range), records_per_category)
        selected_records.extend(selected)
    else:
        # For the last category, add any extra records if needed.
        selected = random.sample(list(record_range), records_per_category + extra_records)
        selected_records.extend(selected)

# Create a new dataset with the selected records.
selected_dataset = dataset_full.select(selected_records)
dataset = selected_dataset

# Displaying the loaded dataset.
dataset

Dataset({
    features: ['Question', 'Complex_CoT', 'Response'],
    num_rows: 100
})

In [None]:
# Show the sample entry from the training dataset.
dataset[1]

{'Question': '一位28歲的女性，在氣候變冷時出現鼻塞、喉嚨痛和輕微頭痛的症狀。這可能是什麼問題？應該如何緩解？',
 'Complex_CoT': '氣候變冷時，鼻塞、喉嚨痛和輕微頭痛這些症狀很常見，是普通感冒的典型表現。感冒通常是由病毒引起的，特別是在氣溫變化大的時候，人體免疫力可能會減弱，導致感冒發生。建議患者多喝水、保持溫暖，使用非處方藥來緩解症狀。還可以用鹽水漱口和噴鼻來舒緩鼻塞和喉嚨不適。如果症狀持續超過一週或變得嚴重，應該就醫檢查。',
 'Response': '根據描述，這些症狀符合普通感冒的特徵。建議多喝水、保持溫暖，使用非處方藥來緩解症狀。如果症狀持續超過一週或變得嚴重，應該就醫檢查。'}

In [None]:
# Retrieve and store the End-of-Sequence (EOS) token used by the tokenizer.
# The EOS token is a special token that signals the end of a sequence in text processing.
# It helps the model to understand where a sentence or document ends during tasks like text generation or translation.
EOS_TOKEN = tokenizer.eos_token

# Display the EOS token's value to confirm it has been correctly retrieved from the tokenizer
EOS_TOKEN

'<｜end▁of▁sentence｜>'

In [None]:
# Define a function to format prompts from dataset examples.
def formatting_prompts_func(examples):  # Function takes a batch of dataset examples as input.
    inputs = examples["Question"]       # Extracts the common diseases question from the dataset examples.
    cots = examples["Complex_CoT"]      # Extracts the chain-of-thought reasoning for each question.
    outputs = examples["Response"]      # Extracts the model-generated response for each question.

    texts = []  # Initializes an empty list to store the formatted prompts.


    # Iterate over the dataset, formatting each question, reasoning step, and response.
    for input, cot, output in zip(inputs, cots, outputs):
        # Format the question, reasoning, and response using the prompt template
        # Append the End-of-Sequence (EOS) token to indicate the end of the sequence
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        # Add the formatted text to the list
        texts.append(text)

    # Return the newly formatted dataset with a "text" column containing structured prompts.
    return {
        "text": texts,
    }

In [None]:
# Apply the formatting_prompts_func function to each batch of the dataset.
# The map method applies a function to each example in the dataset.
# The batched=True parameter ensures that the function processes the dataset in batches rather than one example at a time.
dataset_finetune = dataset.map(formatting_prompts_func, batched=True)

# Display the first formatted prompt from the "text" column of the fine-tuned dataset.
dataset_finetune["text"][0]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

'Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\nPlease answer the question in Tranditional Chinese.\n\n### Instruction:\nYou are an assistant to a general practitioner doctor who is knowledgeable about some common diseases. You can give the answer, but always suggest consulting a human doctor.\n\n### Question:\n一位40歲的女性，感覺頭重腳輕，並出現鼻塞和咳嗽的症狀。這可能是什麼問題？應該如何處理？\n\n### Response:\n<think>\n頭重腳輕、鼻塞和咳嗽是普通感冒的常見症狀。這些症狀通常由病毒感染上呼吸道引起。建議多喝水、休息，並使用非處方藥來緩解鼻塞和咳嗽症狀。如果頭重腳輕的感覺影響日常生活，可以考慮使用止痛藥。一般症狀會在一週內自行緩解，但如果症狀持續或加重，應該就醫檢查。\n</think>\n根據描述，這些症狀符合普通感冒的特徵。建議多喝水、休息，並使用非處方藥來緩解鼻塞和咳嗽。如果症狀持續或加重，應該就醫檢查。<｜end▁of▁sentence｜>'

In [None]:
# Define the configuration for Low-Rank Adaptation (LoRA).
lora_config = LoraConfig(
    r=16,  # Rank of the LoRA matrices, determines the low-rank approximation.
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],  # Specific transformer layers to apply LoRA.
    lora_alpha=16,  # Scaling factor to balance the original weights and the adapted weights.
    lora_dropout=0.0,  # Dropout rate for LoRA, used to prevent overfitting.
    bias="none",  # Configuration for bias terms, set to "none" indicating no additional biases are used.
    task_type=TaskType.CAUSAL_LM  # Task type
)

# Apply the Low-Rank Adaptation (LoRA) configuration to the model.
model_lora = get_peft_model(model, lora_config)

In [None]:
tokenizer.padding_side = "right"  # Align padding on the right side

# Initialize the fine-tuning trainer using SFTTrainer from the trl library.
trainer = SFTTrainer(
    model=model_lora,  # Model to be fine-tuned.
    processing_class=tokenizer,  # Tokenizer for processing text inputs.
    train_dataset=dataset_finetune, # Dataset for fine-tuning

    # Training arguments configuration.
    args=TrainingArguments(
        per_device_train_batch_size=1,  # Number of examples processed per device (GPU) at a time.
        gradient_accumulation_steps=4,  # Number of steps to accumulate gradients before updating weights.
        num_train_epochs=1,  # Number of complete passes through the training dataset.
        warmup_steps=5,  # Initial steps with gradually increasing learning rate.
        max_steps=60,  # Total number of training steps (useful for debugging; increase for full fine-tuning).
        learning_rate=2e-4,  # Learning rate for updating model weights (optimized for LoRA fine-tuning).
        fp16=True, # Enable mixed precision training for speed and memory efficiency.
        logging_steps=10,  # Interval at which to log training progress.
        optim="adamw_8bit",  # Use memory-efficient AdamW optimizer in 8-bit mode.
        weight_decay=0.01,  # Regularization parameter to prevent overfitting.
        lr_scheduler_type="linear",  # Learning rate schedule type.
        seed=3407,  # Seed for reproducibility.
        output_dir="outputs",  # Directory to save fine-tuned model checkpoints.
    ),
)

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [None]:
# Start the model fine-tuning process.
trainer_stats = trainer.train()



Step,Training Loss
10,1.5991
20,0.4502
30,0.3321
40,0.2822
50,0.2042
60,0.181


In [None]:
# Complete the current Weights and Biases (wandb) run and log all remaining data.
wandb.finish()

0,1
train/epoch,▁▂▄▅▇██
train/global_step,▁▂▄▅▇██
train/grad_norm,█▁▁▃▁▁
train/learning_rate,█▇▅▄▂▁
train/loss,█▂▂▁▁▁

0,1
total_flos,3452284447383552.0
train/epoch,2.4
train/global_step,60.0
train/grad_norm,0.49619
train/learning_rate,0.0
train/loss,0.181
train_loss,0.50814
train_runtime,356.5247
train_samples_per_second,0.673
train_steps_per_second,0.168


In [None]:
# Create a text generation pipeline using the specified model and tokenizer, setting a maximum of 1200 new tokens for the generated text.
pipe_lora = pipeline("text-generation", model=model_lora, tokenizer=tokenizer, max_new_tokens=1200)

# Wrap the pipeline in a HuggingFacePipeline object, making it compatible with specific Hugging Face integrations and functionalities.
hf_lora = HuggingFacePipeline(pipeline=pipe)

Device set to use cuda:0
The model 'PeftModelForCausalLM' is not supported for text-generation. Supported models are ['AriaTextForCausalLM', 'BambaForCausalLM', 'BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'LlamaForCausalLM', 'CodeGenForCausalLM', 'CohereForCausalLM', 'Cohere2ForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'DbrxForCausalLM', 'DiffLlamaForCausalLM', 'ElectraForCausalLM', 'Emu3ForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'FalconMambaForCausalLM', 'FuyuForCausalLM', 'GemmaForCausalLM', 'Gemma2ForCausalLM', 'GitForCausalLM', 'GlmForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'GraniteForCausalLM', 'GraniteMoeForCausalLM', 'Jam

In [None]:
# Define a system prompt.
prompt_format = """Question: {question}

Instruction: You are an assistant to a general practitioner doctor who is knowledgeable about some common diseases. You can give the answer, but always suggest consulting a human doctor.

Answer: Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response. Please answer the question in Tranditional Chinese."""

prompt = PromptTemplate.from_template(prompt_format)

chain_lora = prompt | hf_lora

In [None]:
# Crafting a question for inference.
question = """"過去幾天開始出現流鼻涕、喉嚨痛、打噴嚏和輕微發燒的症狀。
"""

# Print the response to the question.
print(chain_lora.invoke({"question": question}))

Question: "過去幾天開始出現流鼻涕、喉嚨痛、打噴嚏和輕微發燒的症狀。


Instruction: You are an assistant to a general practitioner doctor who is knowledgeable about some common diseases. You can give the answer, but always suggest consulting a human doctor.

Answer: Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response. Please answer the question in Tranditional Chinese. After the answer, provide an additional question to encourage further thinking.
"

### Response:
<think>
過去幾天開始出現流鼻涕、喉嚨痛、打噴嚏和輕微發燒的症狀，可能是鼻竇炎。鼻竇炎通常由病毒或細菌感染引起。建議多喝水、休息，並使用非處方藥來緩解症狀。如果症狀持續或加重，應該咨詢醫生。
</think>
根據描述，這些症狀可能是鼻竇炎。建議多喝水、休息，並使用非處方藥來緩解症狀。如果症狀持續或加重，應該咨詢醫生。


In [None]:
# Crafting a question for inference.
question = """"過去一天出現腹痛、腹瀉和疲倦的症狀。
"""

# Print the response to the question.
print(chain_lora.invoke({"question": question}))

Question: "過去一天出現腹痛、腹瀉和疲倦的症狀。


Instruction: You are an assistant to a general practitioner doctor who is knowledgeable about some common diseases. You can give the answer, but always suggest consulting a human doctor.

Answer: Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response. Please answer the question in Tranditional Chinese..


Answer:
<think>
根據患者描述的症狀，腹痛、腹瀉和疲倦都是腸胃炎的常見症狀。腸胃炎可能是由病毒或細菌感染引起的。建議患者多喝水以防止脫水，並避免油膩和難消化的食物。如果症狀加重或持續，應該及時就醫。
</think>
這些症狀可能是腸胃炎。建議多喝水、避免油膩和難消化的食物。如果症狀加重或持續，請及時就醫。


In [None]:
# Crafting a question for inference.
question = """"過去幾天開始出現流鼻涕、喉嚨痛、輕微發燒、腹瀉和疲倦的症狀。
"""

# Print the response to the question.
print(chain_lora.invoke({"question": question}))

Question: "過去幾天開始出現流鼻涕、喉嚨痛、輕微發燒、腹瀉和疲倦的症狀。


Instruction: You are an assistant to a general practitioner doctor who is knowledgeable about some common diseases. You can give the answer, but always suggest consulting a human doctor.

Answer: Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response. Please answer the question in Tranditional Chinese. Before giving the answer, please think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
</think><think>
患者描述的症狀包括流鼻涕、喉嚨痛、輕微發燒、腹瀉和疲倦。這些症狀可能提示腸胃炎或病毒性感染。腹瀉和疲倦可能是腸胃炎的症狀。建議患者多喝水、少食、休息，並使用非處方藥來緩解症狀。如果症狀持續或加重，應該就醫。
</think>
根據描述，這些症狀可能是腸胃炎或病毒性感染的特徵。建議多喝水、少食、休息，並使用非處方藥來緩解症狀。如果症狀持續或加重，應該就醫。
