### Installation

In [None]:
%%capture
!pip install unsloth
# 安装最新版本的Unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [None]:
!pip install --upgrade transformers




In [None]:
!pip install git+https://github.com/huggingface/peft.git

Collecting git+https://github.com/huggingface/peft.git
  Cloning https://github.com/huggingface/peft.git to /tmp/pip-req-build-5hektwt_
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git /tmp/pip-req-build-5hektwt_
  Resolved https://github.com/huggingface/peft.git to commit 1e2d6b5832401e07e917604dfb080ec474818f2b
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


### Unsloth

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [None]:
from huggingface_hub import login
# HT_KEY
# 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
hf_token = "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";
login(hf_token)

In [None]:
import wandb
# WB_KEY
# c4c6d76f96d159c331bf07fc4728de4ecfdde8e1
wb_token = "c4c6d76f96d159c331bf07fc4728de4ecfdde8e1";

wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit on SylvanL Traditional-Chinese-Medicine-Dataset-SFT',
    job_type="training",
    anonymous="allow"
)

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mchord-0322[0m ([33mchord-0322-cc[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


## Loading the model and tokenizer

In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Qwen-1.5B-unsloth-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_token,
)

==((====))==  Unsloth 2025.2.12: Fast Qwen2 patching. Transformers: 4.49.0.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


## Model inference before fine-tuning

In [None]:
prompt_style = """下面是一条描述任务的指令。
输出一个适当的完成任务的响应。
在回答之前，仔细思考问题，并创建一个循序渐进的思路链，以确保逻辑和准确的回答。

### instruction:
你是一名中医专家，请回答下面的问题

### input:
{}

### output:
<think>{}"""

In [None]:
import torch

input = "什么是冬温？";

# 基于前面加载的模型进行推理
FastLanguageModel.for_inference(model)

# model.config.torch_dtype = torch.float16

# token化前序定义的prompt_style
inputs = tokenizer([prompt_style.format(input, "")], return_tensors="pt").to("cuda")

# 执行模型
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
# 对模型输出进行token解码
response = tokenizer.batch_decode(outputs)
# 输出解码结果
print(response[0])

<｜begin▁of▁sentence｜>下面是一条描述任务的指令。
输出一个适当的完成任务的响应。
在回答之前，仔细思考问题，并创建一个循序渐进的思路链，以确保逻辑和准确的回答。

### instruction:
你是一名中医专家，请回答下面的问题

### input:
什么是冬温？

### output:
<think>
好，我现在需要回答用户关于“冬温”的问题。冬温指的是冬季的温暖，通常与寒冷相伴随。首先，我应该确认冬温的定义，它是指冬季的温暖，通常与寒冷形成对比。然后，我应该解释它的意义，帮助用户理解其重要性。接着，我可以举一些例子，比如冬天的温暖环境对身体的影响，比如增强免疫力。最后，总结一下冬温在中医中的应用，比如在治疗中如何利用冬温来促进身体恢复。这样，用户就能全面了解冬温的含义及其在中医中的应用。
</think>

冬温是指冬季的温暖，通常与寒冷形成对比。它是指冬天的温度高于夏季，具有一定的温度优势。冬温在中医中具有重要的意义，它能够增强人体的免疫力，促进血液循环，帮助身体恢复。在中医的治疗中，冬温常被用于治疗各种健康问题，比如增强免疫力、促进消化、增强体质等。<｜end▁of▁sentence｜>


## Loading and processing the dataset

In [None]:
EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs = examples["input"]
    outputs = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        text = prompt_style.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }


In [None]:
from datasets import load_dataset
# dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = load_dataset("SylvanL/Traditional-Chinese-Medicine-Dataset-SFT","default", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

README.md:   0%|          | 0.00/19.2k [00:00<?, ?B/s]

SFT_medicalKnowledge_source1_548404.json:   0%|          | 0.00/152M [00:00<?, ?B/s]

SFT_medicalKnowledge_source2_99334.json:   0%|          | 0.00/23.0M [00:00<?, ?B/s]

SFT_medicalKnowledge_source3_556540.json:   0%|          | 0.00/110M [00:00<?, ?B/s]

SFT_nlpDiseaseDiagnosed_61486.json:   0%|          | 0.00/109M [00:00<?, ?B/s]

SFT_nlpSyndromeDiagnosed_48665.json:   0%|          | 0.00/43.1M [00:00<?, ?B/s]

SFT_structGeneral_310860.json:   0%|          | 0.00/267M [00:00<?, ?B/s]

SFT_structPrescription_92896.json:   0%|          | 0.00/42.9M [00:00<?, ?B/s]

_SFT_traditionalTrans_1959542.json:   0%|          | 0.00/683M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/3677727 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

'下面是一条描述任务的指令。\n输出一个适当的完成任务的响应。\n在回答之前，仔细思考问题，并创建一个循序渐进的思路链，以确保逻辑和准确的回答。\n\n### instruction:\n你是一名中医专家，请回答下面的问题\n\n### input:\n\n\n### output:\n<think>为什么远行奔走时脚上会起泡？<｜end▁of▁sentence｜>'

## Setting up the model

解决问题：'PeftModelForCausalLM' object has no attribute '_unwrapped_old_generate'

In [None]:
FastLanguageModel.for_training(model);

In [None]:
# 对模型进行低秩自适应（LoRA）微调
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # 设置低秩矩阵的秩（rank），值越小参数量越少
    # 目标模块列表，指定哪些层的权重需要被 LoRA 微调
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,  # LoRA 的缩放因子，控制低秩更新的幅度
    lora_dropout=0,  # LoRA 层的 dropout 概率，用于防止过拟合
    bias="none",  # 指定是否对 LoRA 层添加偏置项
    use_gradient_checkpointing="unsloth",  # 是否使用梯度检查点技术以节省显存
    random_state=3407,  # 随机种子，确保实验的可复现性
    use_rslora=False,  # 是否使用 RSLoRA（一种改进的 LoRA 变体），
    loftq_config=None,  # 配置 LoftQ（一种量化方法）
)



解决问题：half != float

In [None]:
model.config.torch_dtype = torch.float16

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs"
    ),
)

Tokenizing train dataset (num_proc=2):   0%|          | 0/500 [00:00<?, ? examples/s]

Tokenizing train dataset (num_proc=2):   0%|          | 0/500 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


## Model training

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 500 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 18,464,768


Step,Training Loss
1,4.8645
2,4.8002
3,4.8199
4,4.6929
5,4.4581
6,4.2188
7,3.9213
8,3.4978
9,3.1979
10,3.0898


In [None]:
# Save the fine-tuned model
wandb.finish()

0,1
train/epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇▇███
train/global_step,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇██
train/grad_norm,█▇▇▇▇▇▇███▅▃▃▃▄▄▄▅▅▅▆▆▆▅▄▂▂▂▂▁▁▁▂▂▁▁▂▁▁▁
train/learning_rate,▂▅▇██▇▇▇▇▇▆▆▆▆▅▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁
train/loss,████▆▅▅▄▄▃▂▂▂▂▂▁▁▂▁▁▂▂▁▁▁▂▁▂▁▁▁▂▁▁▁▁▁▁▁▁

0,1
total_flos,338574194632704.0
train/epoch,0.96
train/global_step,60.0
train/grad_norm,0.47016
train/learning_rate,0.0
train/loss,0.772
train_loss,1.51298
train_runtime,109.7044
train_samples_per_second,4.375
train_steps_per_second,0.547


## Model inference after fine-tuning

In [None]:
input = "什么是冬温？";
FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(input, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0])


<｜begin▁of▁sentence｜>下面是一条描述任务的指令。
输出一个适当的完成任务的响应。
在回答之前，仔细思考问题，并创建一个循序渐进的思路链，以确保逻辑和准确的回答。

### instruction:
你是一名中医专家，请回答下面的问题

### input:
什么是冬温？

### output:
<think>在中医中，“冬温”是指在寒冷的冬季，通过中医理论来调节体温和人体生理活动的温热。具体来说，冬温可以指以下几种含义：

首先，冬温可能指的是在寒冷的季节，通过中医手法来调节体温，使其逐渐变得温暖。这可能包括中医在寒冷期间的体温调节方法，如补阳针或温补针的使用，以增强人体的免疫力和能量。

其次，冬温可能指的是在寒冷的季节，通过中医理论来增强人体的体温。这可能包括中医在寒冷期间的补救措施，如补阳汤或温补汤的使用，以帮助人体恢复正常的生理活动。

此外，冬温还可能指的是在寒冷的季节，通过中医理论来调节人体的温热。这可能包括中医在寒冷期间的温补方法，如补阳针或温补针的使用，以增强人体的免疫力和能量。

总的来说，冬温在中医中指的是通过中医手法来调节体温和人体生理活动的温热。它涵盖了多种具体含义，包括调节体温使其温暖、增强体温以适应寒冷环境，以及调节体温使其温暖以增强人体免疫力和能量。<｜end▁of▁sentence｜>


In [None]:
input = "什么是出血？";
inputs = tokenizer([prompt_style.format(input, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0])

<｜begin▁of▁sentence｜>下面是一条描述任务的指令。
输出一个适当的完成任务的响应。
在回答之前，仔细思考问题，并创建一个循序渐进的思路链，以确保逻辑和准确的回答。

### instruction:
你是一名中医专家，请回答下面的问题

### input:
什么是出血？

### output:
<think>在中医中，出血是指体内出现的一种常见的生理现象，主要表现为血液的流动障碍，通常表现为脉动不稳、脉搏轻快等症状。在中医理论中，出血可能与某些症状或体质有关，因此需要结合体质和证候进行辨治。<｜end▁of▁sentence｜>


## Saving the model locally

In [None]:
new_model_online = "FastHappySharp/DeepSeek-R1-SylvanL"
new_model_local = "DeepSeek-R1-SylvanL"
model.save_pretrained(new_model_local) # Local saving
tokenizer.save_pretrained(new_model_local)

('DeepSeek-R1-SylvanL/tokenizer_config.json',
 'DeepSeek-R1-SylvanL/special_tokens_map.json',
 'DeepSeek-R1-SylvanL/tokenizer.json')

## Pushing the model to Hugging Face hub

In [None]:
model.push_to_hub(new_model_online) # Online saving
tokenizer.push_to_hub(new_model_online) # Online saving

README.md:   0%|          | 0.00/635 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/73.9M [00:00<?, ?B/s]

Saved model to https://huggingface.co/FastHappySharp/DeepSeek-R1-SylvanL


  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

In [None]:
model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)
model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")

Unsloth: You have 1 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 1.8G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 4.98 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 28/28 [00:00<00:00, 46.41it/s]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving DeepSeek-R1-SylvanL/pytorch_model.bin...
Done.


Unsloth: You are pushing to hub, but you passed your HF username = FastHappySharp.
We shall truncate FastHappySharp/DeepSeek-R1-SylvanL to DeepSeek-R1-SylvanL


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 4.96 out of 12.67 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 28/28 [00:00<00:00, 54.41it/s]


Unsloth: Saving to organization with address FastHappySharp/DeepSeek-R1-SylvanL
Unsloth: Saving tokenizer... Done.
Unsloth: Saving to organization with address FastHappySharp/DeepSeek-R1-SylvanL
Unsloth: Saving FastHappySharp/DeepSeek-R1-SylvanL/pytorch_model.bin...
Unsloth: Uploading all files... Please wait...


  0%|          | 0/1 [00:00<?, ?it/s]

pytorch_model.bin:   0%|          | 0.00/3.55G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/FastHappySharp/DeepSeek-R1-SylvanL
