In [None]:
## Setting up

# 注意⚠️
如果huggingface 无法连接，deepseek-r1-llama-8b模型和数据集无法下载。因此为了方便大家，提供一个网盘下载地址，请到该地址，即可下载模型和相应的训练数据集：
> 模型下载地址：https://pan.quark.cn/s/d45a88061e3d
> 训练数据集下载地址：https://pan.quark.cn/s/fea6a395deff

In [1]:
# import unsloth
import torch
print(f"PyTorch 版本: {torch.__version__}")
print(f"CUDA 是否可用: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA 版本: {torch.version.cuda}")
    print(f"GPU 名称: {torch.cuda.get_device_name(0)}")

PyTorch 版本: 2.6.0+cu126
CUDA 是否可用: True
CUDA 版本: 12.6
GPU 名称: NVIDIA GeForce RTX 4060 Laptop GPU


In [2]:

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


## Loading the model and tokenizer

In [4]:

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "E:/DS_8b/DeepSeek-R1-Medical-COT_1110",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
    device_map = {"": 0},  # 强制指定GPU 0
    # 或者试试这个：
    # device_map = "cuda:0",
)

==((====))==  Unsloth 2025.6.2: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    NVIDIA GeForce RTX 4060 Laptop GPU. Num GPUs = 1. Max memory: 7.996 GB. Platform: Windows.
O^O/ \_/ \    Torch: 2.6.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Unsloth 2025.6.2 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [3]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "E:/Deepseek/tmp", # 这里改成你本地模型，以我的为例，我已经huggingface上的模型文件下载到本地。
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

==((====))==  Unsloth 2025.6.2: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    NVIDIA GeForce RTX 4060 Laptop GPU. Num GPUs = 1. Max memory: 7.996 GB. Platform: Windows.
O^O/ \_/ \    Torch: 2.6.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `llm_int8_enable_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details. 

## Model inference before fine-tuning

In [4]:
# 在模型加载后运行这个测试
try:
    import xformers.ops
    print("✅ xFormers ops module loaded successfully")

    # 测试 memory efficient attention 是否可用
    if hasattr(xformers.ops, 'memory_efficient_attention'):
        print("✅ Memory efficient attention is available")
    else:
        print("❌ Memory efficient attention is NOT available")

except ImportError as e:
    print(f"❌ xFormers import failed: {e}")

✅ xFormers ops module loaded successfully
✅ Memory efficient attention is available


In [5]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""

In [7]:
question = "淡棕色背景上，沟嵴模式伴粟粒样囊肿、粉刺样开口，黄白色鳞屑。"


FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])



<think>
好的，我来仔细分析一下这个问题。首先，背景是淡棕色，这可能提示皮肤问题，比如紫外线斑或者其他皮肤病变。接着是沟嵴模式，这可能与皮肤的结构有关，比如皮肤层的结构异常。粟粒样囊肿通常出现在皮肤下层，可能与脂肪代谢有关，或者是某种纤维化。粉刺样开口可能是指皮肤角或皮孔的异常，或者是某种皮肤增生。黄白色鳞屑通常是皮肤炎症的表现，比如湿疹或者荨麻疹。

接下来，考虑这些现象的可能原因。淡棕色背景可能与皮肤病变有关，比如色素沉着或皮肤增生。沟嵴模式可能提示皮肤的结构异常，可能与皮肤癌或其他良性肿瘤有关。粟粒样囊肿可能与脂肪瘤有关，或者是皮肤的脂肪代谢异常。粉刺样开口可能是皮肤角的异常，或者是皮肤上皮的增生。黄白色鳞屑可能是皮肤炎症的表现，比如湿疹或荨麻疹。

再考虑这些现象的可能诊断。淡棕色背景可能提示紫外线斑，但如果有其他结构异常，可能需要考虑其他皮肤病变。沟嵴模式可能提示皮肤的结构异常，比如皮肤层的变形，可能与皮肤癌或其他皮肤病有关。粟粒样囊肿可能与脂肪瘤或皮肤的脂肪代谢异常有关，但也可能与其他类型的肿瘤有关，比如纤维瘤。粉刺样开口可能是皮肤角的异常，或者是皮肤上皮的增生，比如角化性皮炎。黄白色鳞屑通常是炎症的表现，可能与湿疹、荨麻疹或其他皮肤炎症有关。

综合来看，这些现象可能提示皮肤的炎症性或结构性问题，需要进一步的皮肤检查和病理分析。可能需要做皮肤的分型，包括肿瘤学检查，来确定具体的诊断。同时，考虑患者的其他症状和病史，来评估是否有全身性疾病或其他系统的问题。

在治疗方面，如果诊断为皮肤炎症，可能需要使用抗炎药物，比如外用糖皮质激素或维生素D3激发剂。如果是皮肤结构性问题，可能需要切除或化疗。同时，需要关注是否有潜在的系统性疾病，比如免疫功能异常，需要进行相应的评估和治疗。

总结来说，这些皮肤表现可能提示多种可能性，需要通过进一步的检查和病理分析来明确诊断，并制定相应的治疗方案。
</think>

淡棕色背景、沟嵴模式伴粟粒样囊肿、粉刺样开口，黄白色鳞屑。

基于这些皮肤表现，可能的诊断包括：

1. **皮肤炎症**：黄白色鳞屑通常与炎症有关，可能是湿疹或荨麻疹。
2. **皮肤结构异常**：淡棕色背景和沟嵴模式可能提示皮肤病变，可能涉及皮肤癌或其他皮肤病。
3. **脂肪代谢异常**：粟粒样囊肿可能与脂肪瘤或脂肪代谢异常有关。
4. **皮

In [6]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)


Unsloth 2025.6.2 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


## Loading and processing the dataset

In [7]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""


In [8]:
EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }


In [9]:
######################
# 1. 数据集拆分 & 预处理
######################
from datasets import load_dataset

data_path = "E:/Deepseek/data_sorce/HZ/medical_data_with_cot_reasoner_1110.json"

raw_dataset = load_dataset("json", data_files={"data": data_path})

split_dataset = raw_dataset["data"].train_test_split(
    test_size=0.3,
    seed=3407,
    shuffle=True
)

train_raw = split_dataset["train"]
test_raw  = split_dataset["test"]

train_dataset = train_raw.map(formatting_prompts_func, batched=True)
test_dataset  = test_raw.map(formatting_prompts_func,  batched=True)

print(f"Train set size: {len(train_dataset)}")
print(f"Test  set size: {len(test_dataset)}")

Train set size: 777
Test  set size: 333


## Setting up the model

In [10]:
######################
# 2. 配置 SFTTrainer
######################
from trl import SFTTrainer
from transformers import TrainingArguments, EarlyStoppingCallback
from unsloth import is_bfloat16_supported

training_args = TrainingArguments(
    output_dir="outputs",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    num_train_epochs=3,
    eval_strategy="steps",
    eval_steps=10,
    save_strategy="steps",
    save_steps=10,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    warmup_steps=5,
    learning_rate=2e-4,
    fp16=not is_bfloat16_supported(),
    bf16=is_bfloat16_supported(),
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    logging_steps=10,
    seed=3407,
    report_to=[],  # 空列表
)

# 重新创建trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=1,
    args=training_args,
    callbacks=[
        EarlyStoppingCallback(early_stopping_patience=3)
    ],
)

Unsloth: Tokenizing ["text"]:   0%|          | 0/333 [00:00<?, ? examples/s]

## Model training

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 777 | Num Epochs = 3 | Total steps = 294
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040/4,582,543,360 (0.92% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss,Validation Loss


Unsloth: Not an error, but LlamaForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import os

# 提取训练和测试损失数据
log_history = trainer.state.log_history
train_steps, train_losses = [], []
eval_steps, eval_losses = [], []

for r in log_history:
    if "loss" in r:
        train_steps.append(r["step"])
        train_losses.append(r["loss"])
    if "eval_loss" in r:
        eval_steps.append(r["step"])
        eval_losses.append(r["eval_loss"])

# 绘制损失曲线图
plt.figure(figsize=(12, 7))
plt.plot(train_steps, train_losses, 'b-', label="训练损失", linewidth=2)
plt.plot(eval_steps, eval_losses, 'r-', label="测试损失", linewidth=2)
plt.xlabel("步骤数", fontsize=12)
plt.ylabel("损失值", fontsize=12)
plt.title("训练与测试损失曲线", fontsize=14)
plt.legend(fontsize=11)
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()

# 显示图像
plt.show()

# 打印一些统计信息
print(f"训练步数: {len(train_steps)}")
print(f"评估步数: {len(eval_steps)}")
print(f"最终训练损失: {train_losses[-1]:.4f}")
print(f"最终测试损失: {eval_losses[-1]:.4f}")
print(f"最低测试损失: {min(eval_losses):.4f}")


In [None]:
# 添加保存CSV和图片的代码
# 创建两个单独的DataFrame
train_df = pd.DataFrame({
    'step': train_steps,
    'loss': train_losses,
    'type': ['train'] * len(train_steps)
})

eval_df = pd.DataFrame({
    'step': eval_steps,
    'loss': eval_losses,
    'type': ['eval'] * len(eval_steps)
})

# 合并两个DataFrame
all_data = pd.concat([train_df, eval_df], ignore_index=True)

# 保存到CSV文件
csv_path = 'all_loss_1110.csv'
all_data.to_csv(csv_path, index=False)
print(f"所有损失数据已保存到: {os.path.abspath(csv_path)}")

# 重新绘制并保存图像
plt.figure(figsize=(12, 7))
plt.plot(train_steps, train_losses, 'b-', label="训练损失", linewidth=2)
plt.plot(eval_steps, eval_losses, 'r-', label="测试损失", linewidth=2)
plt.xlabel("步骤数", fontsize=12)
plt.ylabel("损失值", fontsize=12)
plt.title("训练与测试损失曲线", fontsize=14)
plt.legend(fontsize=11)
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()

# 保存图像
fig_path = 'loss_curve_1110.png'
plt.savefig(fig_path, dpi=300, bbox_inches='tight')
print(f"损失曲线图已保存到: {os.path.abspath(fig_path)}")


In [16]:
question = "黄色无结构区域，少量不规则分布的红色、褐色、黑色、点状/线状出血。皮损内可见\n疣状突起，角化明显。"


FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])



<think>




In [56]:
question = "A 59-year-old man presents with a fever, chills, night sweats, and generalized fatigue, and is found to have a 12 mm vegetation on the aortic valve. Blood cultures indicate gram-positive, catalase-negative, gamma-hemolytic cocci in chains that do not grow in a 6.5% NaCl medium. What is the most likely predisposing factor for this patient's condition?"

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>
**推理链：**

1. **病理特征分析**  
   - **体征与病理表现**  
     - 发热（fever）、寒冷感（chills）、夜间出汗（night sweats）、全身性疲劳（generalized fatigue）提示慢性感染或炎症过程。  
     - 结合解剖发现存在于亚临床或轻度炎症状态，需进一步鉴别急性与慢性感染倾向。  

   - **微生物学数据**  
     - **血文化表现**  
       - 格氏正链菌（Staphylococcus aureus）的典型表现为圆形到椭圆形菌体，形成可见的"花生肉样"结构，但需与其他GRAM正链菌（如S. pyogenes）区分。  
       - "γ-hemolytic"提示该菌感染具有特征性破坏γ链的红细胞，需与其他非溶菌菌（如引起α-hemolytic感染的S. enteritidis）鉴别。  
     - **生理盐水培养基特性**  
       - 不适于6.5% NaCl培养基（常见于培养细菌如E. coli）可能提示双歧抗性或其他特异性培养基需求，需结合抗菌谱分析。  

2. **病原体定位与病理机制**  
   - **病原体类型**  
     - 格氏正链菌（S. aureus）是常见的急性外耳道、骨代谢相关感染的病原体，但也可造成慢性纤维化性炎症。  
     - "catalase-negative"表明菌体不产生催化性消除酶，可能与缺乏酶编码基因相关，需与其他GRAM正链菌（如S. pyogenes）区别。  
   - **感染特征**  
     - "vegetations"（肉芽组织）多见于端粒相关感染（如环形菌、真菌），但本例为$aortic valve$肉芽，需考虑慢性化脓性中耳炎继发肺部发热或直接肺炎继发心脏病变。  

3. **病理生理学关联**  
   - **心脏病变与感染关系**  
     - 肉芽形成可能源于长期炎症导致的血管新生物形成或胆脂瘤形成（如胆脂瘤性肉芽肿）。  
     - 需排除胆脂瘤性肉芽肿的先天性病变（常见于男孩）或慢性炎症导致的纤维化修复。  
   - **慢性感染与炎症的体征**  
     - 慢性髓膜炎（chronic osteomyelitis）

## Saving the model locally

In [None]:
# new_model_online = "kingabzpro/DeepSeek-R1-Medical-COT"
new_model_local = "DeepSeek-R1-Medical-COT_1110"
model.save_pretrained(new_model_local) # Local saving
tokenizer.save_pretrained(new_model_local)