### 一、unsloth快速使用入门

#### 1.借助unsloth进行模型推理

#### 环境：jupyter内核：deepseekR1_cu126     虚拟环境：deepseekR1_cu126

In [None]:
!pip install --upgrade pip
!pip install "unsloth[cu126-torch260] @ git+https://github.com/unslothai/unsloth.git"

In [None]:
!pip install unsloth

In [None]:
!git config --global http.proxy http://127.0.0.1:7890

In [None]:
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [None]:
!pip show unsloth

In [None]:
!pip show pip

In [None]:
import sys
print(sys.prefix)

In [None]:
!pip list

In [None]:
!pip install ./triton-2.1.0-cp311-cp311-win_amd64.whl

In [None]:
!pip install ./psutil-6.1.1-cp37-abi3-win_amd64.whl

In [None]:
!python -m ipykernel install --user --name=your_env_name --display-name "Python (deepseekR1_cu126_2)"

In [None]:
import torch
print(torch.__version__)


In [None]:
import torch
print(torch.cuda.is_available())  # 如果输出是 True，说明 GPU 支持正确配置
print(torch.cuda.current_device())  # 输出当前使用的设备
print(torch.cuda.get_device_name(0))  # 输出 GPU 名称


In [None]:
!pip uninstall triton

In [None]:
!pip install https://github.com/woct0rdho/triton-windows/releases/download/v3.2.0-windows.post9/triton-3.2.0-cp311-cp311-win_amd64.whl

In [None]:
! pip show triton

In [None]:
!pip show psutil

- 尝试用unsloth进行LLama模型推理

In [None]:
!pip install modelscope

&emsp;&emsp;首先设置关键参数，并读取模型：

In [None]:
!mkdir ./QwQ-32B

In [None]:
!modelscope download --model Qwen/QwQ-32B --local_dir ./QwQ-32B

In [None]:
!mkdir ./DeepSeek-R1-Distill-Llama-70B

In [None]:
!mkdir ./DeepSeek-R1-Distill-Qwen-32B

In [None]:
!modelscope download --model deepseek-ai/Deepseek-R1-Distill-Llama-70B --local_dir ./DeepSeek-R1-Distill-Llama-70B

In [None]:
!modelscope download --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --local_dir ./DeepSeek-R1-Distill-Qwen-32B

In [None]:
!modelscope download --model deepseek-ai/Deepseek-R1-Distill-Llama-8B --local_dir ./Deepseek-R1-Distill-Llama-8B

In [None]:
!mkdir ./Deepseek-R1-Distill-Llama-8B

2846b8100e2b47cf0866f7c8e314e473d129bdd9

In [None]:
from unsloth import FastLanguageModel

In [None]:
max_seq_length = 4096
dtype = None 
load_in_4bit = True

In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "./DeepSeek-R1-Distill-Qwen-32B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

此时model就是读取进来的DeepSeek R1 8B蒸馏模型：

In [None]:
model

而tokenizer则是分词器：

In [None]:
tokenizer

将模型调整为推理模式：

In [None]:
FastLanguageModel.for_inference(model) 

然后即可和模型进行对话：

In [None]:
question = "hello"

然后这里我们首先需要借助分词器，将输入的问题转化为标记索引：

In [None]:
inputs = tokenizer([question], return_tensors="pt").to("cuda")

In [None]:
inputs

最后再带入inputs进行对话

In [None]:
outputs = model.generate(
    input_ids=inputs.input_ids,
    max_new_tokens=1200,
    use_cache=True,
)

此时得到的回复也是词索引：

In [None]:
outputs

同样需要分词器将其转化为文本：

In [None]:
response = tokenizer.batch_decode(outputs)

In [None]:
response

In [None]:
print(response[0])

至此我们就完成了unsloth模型推理流程。

In [None]:
model_qwen, tokenizer_qwen = FastLanguageModel.from_pretrained(
    model_name = "./DeepSeek-R1-Distill-Qwen-7B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

In [None]:
FastLanguageModel.for_inference(model_qwen) 

In [None]:
inputs = tokenizer_qwen([question], return_tensors="pt").to("cuda")

outputs = model_qwen.generate(
    input_ids=inputs.input_ids,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer_qwen.batch_decode(outputs)

print(response[0])

#### 2.带入问答模板进行回答

- 结构化输入方法

In [None]:
prompt_style_chat = """请写出一个恰当的回答来完成当前对话任务。

### Instruction:
你是一名助人为乐的助手。

### Question:
{}

### Response:
<think>{}"""

In [None]:
question = "你好，好久不见！"

In [None]:
[prompt_style_chat.format(question, "")]

In [None]:
inputs = tokenizer([prompt_style_chat.format(question, "")], return_tensors="pt").to("cuda")

In [None]:
outputs = model.generate(
    input_ids=inputs.input_ids,
    max_new_tokens=1200,
    use_cache=True,
)

In [None]:
response = tokenizer.batch_decode(outputs)

In [None]:
response

In [None]:
print(response[0].split("### Response:")[1])

- 复杂问题测试

In [None]:
question = "请证明根号2是无理数。"

In [None]:
inputs = tokenizer([prompt_style_chat.format(question, "")], return_tensors="pt").to("cuda")

In [None]:
outputs = model.generate(
    input_ids=inputs.input_ids,
    max_new_tokens=1200,
    use_cache=True,
)

In [None]:
response = tokenizer.batch_decode(outputs)

In [None]:
print(response[0].split("### Response:")[1])

### 二、最小可行性实验

#### 1.数据集准备

In [None]:
import subprocess
import os

result = subprocess.run('bash -c "source /etc/network_turbo && env | grep proxy"', shell=True, capture_output=True, text=True)
output = result.stdout
for line in output.splitlines():
    if '=' in line:
        var, value = line.split('=', 1)
        os.environ[var] = value

- 下载数据集

&emsp;&emsp;接下来使用datasets进行数据集下载

In [None]:
!pip install datasets

In [None]:
import os
from datasets import load_dataset

再次确认提示词模板：

In [None]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a hydrogen storage big data artificial intelligence.
Please answer the following question. 

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

然后提取并设置文本生成结束的标记：

In [None]:
tokenizer.eos_token = "</s>"


In [None]:
EOS_TOKEN = tokenizer.eos_token
tokenizer.eos_token


In [None]:
def formatting_prompts_func(examples):
    inputs = examples["input"]
    cots = examples["chain_of_thought"]
    outputs = examples["output"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [None]:
import os

# 修改为你的代理地址（例如 Clash/V2rayN/SSTap 等工具提供的本地代理端口）
os.environ["HTTP_PROXY"] = "http://127.0.0.1:7890"
os.environ["HTTPS_PROXY"] = "http://127.0.0.1:7890"

from datasets import load_dataset

dataset = load_dataset("Yy245/cot_2000", split="train", trust_remote_code=True)


In [None]:
dataset[0]

然后进行结构化处理：

In [None]:
dataset = dataset.map(formatting_prompts_func, batched = True,)


将数据集整理为如下形式：

In [None]:
dataset["text"][0]

- 数据集保存地址

默认情况下数据集保存在主目录下.cache文件夹中，数据文件格式如下所示：

#### 2.开启微调

&emsp;&emsp;然后即可把模型设置为微调模式：

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)

然后导入相关的库：

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

创建有监督微调对象：

In [None]:
#2月15日训练
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=1,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        num_train_epochs = 5,

        warmup_steps=5,
        # max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

In [None]:
# TODO:if you want to train integrated
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=1,
    args=TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=5,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

这段代码主要是用 **`SFTTrainer`** 进行 **监督微调（Supervised Fine-Tuning, SFT）**，适用于 `transformers` 和 `Unsloth` 生态中的模型微调：
**1. 导入相关库**
- **`SFTTrainer`**（来自 `trl` 库）：
  - `trl`（Transformer Reinforcement Learning）是 Hugging Face 旗下的 `trl` 库，提供 **监督微调（SFT）** 和 **强化学习（RLHF）** 相关的功能。
  - `SFTTrainer` 主要用于 **有监督微调（Supervised Fine-Tuning）**，适用于 `LoRA` 等低秩适配微调方式。

- **`TrainingArguments`**（来自 `transformers` 库）：
  - 这个类用于定义 **训练超参数**，比如批量大小、学习率、优化器、训练步数等。

- **`is_bfloat16_supported()`**（来自 `unsloth`）：
  - 这个函数检查 **当前 GPU 是否支持 `bfloat16`（BF16）**，如果支持，则返回 `True`，否则返回 `False`。
  - `bfloat16` 是一种更高效的数值格式，在 **新款 NVIDIA A100/H100** 等 GPU 上表现更优。

**2. 初始化 `SFTTrainer` 进行模型微调**

##### **参数解析**
##### **① `SFTTrainer` 部分**
| 参数                              | 作用 |
|---------------------------------|------|
| `model=model`                   | 指定需要进行微调的 **预训练模型** |
| `tokenizer=tokenizer`           | 指定 **分词器**，用于处理文本数据 |
| `train_dataset=dataset`         | 传入 **训练数据集** |
| `dataset_text_field="text"`     | 指定数据集中哪一列包含 **训练文本**（在 `formatting_prompts_func` 里处理） |
| `max_seq_length=max_seq_length` | **最大序列长度**，控制输入文本的最大 Token 数量 |
| `dataset_num_proc=1`            | **数据加载的并行进程数**，提高数据预处理效率 |

##### **② `TrainingArguments` 部分**
| 参数 | 作用 |
|------|------|
| `per_device_train_batch_size=2` | 每个 **GPU/设备** 的训练批量大小（较小值适合大模型） |
| `gradient_accumulation_steps=4` | **梯度累积步数**（相当于 `batch_size=2 × 4 = 8`） |
| `warmup_steps=5` | **预热步数**（初始阶段学习率较低，然后逐步升高） |
| `max_steps=60` | **最大训练步数**（控制训练的总步数，此处总共约消耗60*8=480条数据） |
| `learning_rate=2e-4` | **学习率**（`2e-4` = 0.0002，控制权重更新幅度） |
| `fp16=not is_bfloat16_supported()` | 如果 **GPU 不支持 `bfloat16`，则使用 `fp16`（16位浮点数）** |
| `bf16=is_bfloat16_supported()` | 如果 **GPU 支持 `bfloat16`，则启用 `bfloat16`（训练更稳定）** |
| `logging_steps=10` | **每 10 步记录一次训练日志** |
| `optim="adamw_8bit"` | **使用 `adamw_8bit`（8-bit AdamW优化器）减少显存占用** |
| `weight_decay=0.01` | **权重衰减（L2 正则化）**，防止过拟合 |
| `lr_scheduler_type="linear"` | **学习率调度策略**（线性衰减） |
| `seed=3407` | **随机种子**（保证实验结果可复现） |
| `output_dir="outputs"` | **训练结果的输出目录** |

然后设置wandb（可选）：

In [None]:
import wandb

In [None]:
wandb.login(key="2846b8100e2b47cf0866f7c8e314e473d129bdd9")

然后开始微调：

In [None]:
for batch in train_dataloader:
    inputs = batch["input_ids"]
    print(inputs.shape)  # 确保形状正确
    break


In [None]:
trainer_stats = trainer.train()

此时wandb中显示内容如下：

<center><img src="https://ml2022.oss-cn-hangzhou.aliyuncs.com/img/image-20250206200441907.png" alt="image-20250206200441907" style="zoom:50%;" />

In [None]:
trainer_stats

注意，unsloth在微调结束后，会自动更新模型权重（在缓存中），因此无需手动合并模型权重即可直接调用微调后的模型：

In [None]:
FastLanguageModel.for_inference(model)

In [None]:
inputs = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)

In [None]:
print(response[0].split("### Response:")[1])

测试第二个问题问答效果：

In [None]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a solid-state hydrogen storage big data artificial intelligence.
Please answer the following question. 


### Question:
{}

### Response:
<think>{}"""

In [None]:
question_1 = "What is the hydrogen storage data for component Ti"

In [None]:
inputs = tokenizer([prompt_style.format(question_1, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)

In [None]:
print(response[0].split("### Response:")[1])

#### 3.模型合并

此时本地保存的模型权重在`outputs`文件夹中：

<center><img src="https://ml2022.oss-cn-hangzhou.aliyuncs.com/img/image-20250206195427494.png" alt="image-20250206195427494" style="zoom:50%;" />

然后可使用如下代码进行模型权重合并：

In [None]:
new_model_local = "qing_gpt_100_tips_5_epoch"
model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)

model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

保存结束后，即可在当前文件夹中看到对应模型：

然后即可将其推送到huggingface上并保存为GGUF格式文件并进行调用。

In [None]:
!cd D:\llama.cpp 

In [None]:
!python convert_lora_to_gguf.py DeepSeek-R1-Medical-COT

In [None]:
#model.save_pretrained_gguf("./model", tokenizer, quantization_method = "q4_k_m")
# model.save_pretrained_gguf("dir", tokenizer, quantization_method = "q8_0")
model.save_pretrained_gguf("./model", tokenizer, quantization_method = "f16")

### 三、完整高效微调实验

&emsp;&emsp;接下来我们尝试带入全部数据进行高效微调，以提升模型微调效果。

In [None]:
dataset = load_dataset("Yy245/qing_chat","en", split = "train",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)

这里设置epoch为3，遍历3次数据集：

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=1,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        num_train_epochs = 1,
        warmup_steps=5,
        # max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

In [None]:
trainer_stats = trainer.train()

这里总共训练约15个小时。

In [None]:
trainer_stats

最后进行模型权重保存：

In [None]:
new_model_local = "DeepSeek-R1-Medical-COT—1-epoch"
model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)

model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)