# 本节大纲

1. 大语言模型训练介绍，10分钟
2. 训练和微调方法，10分钟
3. 微调实战一: OpenAI GPT-3.5, 30分钟
4. 微调实战二: ChatGLM2 (P-Tuning), 30分钟
5. 微调实战三: ChatGLM3 (LoRA), 30分钟
6. 答疑和总结，10分钟

In [None]:
!pip install jsonlines openai csv 

### 语言模型和大语言模型

In [None]:
# 单一任务模型: 解决某一个特定领域的任务

# 大型语言模型：解决各种自然语言类任务

### 大语言模型的不足

In [None]:
# 1. Token

# 2. 训练

# 3. 个性化

# 4. 成本

# 5. 数据安全

###  大语言模型的知识增强方式

In [None]:
# 1. LLM + Few-Shot Prompting

# 2. LLM + 检索增强生成 (RAG)

# 3. 利用 Foundation LLM 进行 Fine-Tuning

# 4. 从新开始训练一个 基础 LLM 

### 大语言模型的训练步骤

In [None]:
# 1. 在大量的数据集上进行无监督训练 (Pretraining / Self-Supervised / Unsupervised)

# 2. 在少量的数据集上进行微调 (Fine-Tuning / Supervised)

# 3. 在少量的数据集上进行对齐 (Alignment / RLHF)

### 模型微调的方式说明

In [None]:
# 1. 全参数微调 (SFT) 
# 类似于在模型的预训练阶段，对所有参数都会进行计算更新。 前向传播 -> 损失计算 -> 反向传播(计算梯度) -> 更新值。

# 2. 参数冻结微调 (Freeze)
# 冻结模型的一部分或全部层，只对最后几层进行训练。这样做的目的是保留模型之前学到的通用特征，同时允许模型根据新数据进行适应和调整
# 来自于计算机视觉类任务的思想，训练识别牛的模型，不用再次从来，可以从一个已训练好的羊的模型上进行微调训练

# 3. 增加额外的输出层
# 基于对语言模型的理解，添加额外的输出层
# 来自对自然语言类任务的思想，不需要对每一个任务单独训练一个模型，而是在基于不同的任务添加一个不同的输出层就可以解决

# 4. 高效参数微调 (PEFT)
# 对于已训练好的对的模型加入额外的参数来影响结果的产生，例如是对于特征的解释和理解（LoRA），或者是对于模型参数的微调(P-Tuning)

### 微调大语言模的方法

In [139]:
# Huggface 开源的微调大模型的库 PEFF (Parameter-Efficient Fine Tuning)
# https://huggingface.co/blog/peft

#### Prefix Tuning

In [None]:
# 1. 灵感来源是，基于 Prompt Engineering 的实践，无需改变大模型的参数，只需在上下文中添加对应的提示词，引导大模型有出色的表现
# 2. 在 Transformer 的 Encoder 和 Decoder 的网络中都加了一些特定的前缀
# 3. 将 Y=WX 中的W，变成 W` = [Wp; W]，Y=W`X

![Prefix-Tuning](./resource/images/Prefix-Tuning.png)

In [7]:
# Prefix-Tuning: Optimizing Continuous Prompts for Generation
# https://aclanthology.org/2021.acl-long.353.pdf

#### Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning

In [149]:
# 1. Foundation Model 的参数不变
# 2. 为每个特定任务，训练一个少量参数的小模型，在具体执行特定任务的时候按需调用
# 3. 输入序列 X 之前，增加一些特定长度的特殊Token，以增大生成期望序列的概率
# 4. 将 X = [x1, x2, ..., xm]变成，X` = [x`1, x`2, ..., x`k; x1, x2, ..., xm], Y = WX`
# 5. 发生在 Embedding 这个环节，不需要加入 MLP 在输出

![PromptTuning](./resource/images/PromptTuning.png)

In [150]:
# The Power of Scale for Parameter-Efficient Prompt Tuning
# https://arxiv.org/pdf/2104.08691.pdf

#### P-Tuning: GPT Understands, Too

In [4]:
# 1. 将 Prompt 转换为可以学习的 Embedding 层
# 2. 并用 MLP + LSTM 的方式来对Prompt Embedding 进行一层处理

![P-Tuning](./resource/images/P-Tuning.png)

In [None]:
# GPT Understands, Too
# https://arxiv.org/pdf/2103.10385.pdf

#### P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks

![P-Tuning_v2](./resource/images/P-Tuning_v2.png)

In [None]:
# P-Tuning 仅对大模型的Embedding加入新的参数
# P-Tuning-V2 将大模型的Embedding和每一层前都加上新的参数

In [None]:
# https://arxiv.org/pdf/2110.07602.pdf

#### LoRA:  Low-Rank Adaptation of Large Language Model

In [None]:
# 基于低阶自适应的大语言模型微调方法
# LoRA 方法，即在大型语言模型上对指定参数增加额外的低秩矩阵，并在模型训练过程中，仅训练而外增加的参数。
# 当“秩值”远小于原始参数维度时，新增的低秩矩阵参数量很小，达到仅训练很小的参数，就能获取较好的结果。
# LoRA 更注重对特征的解释和理解，而 P-Tuning 更注重对模型参数的微调

![LoRA](./resource/images/LoRA.png)

In [152]:
# https://arxiv.org/pdf/2106.09685.pdf
# Microsoft Corporation

#### LLM-Adapters

In [None]:
# 对 PEFT 库的扩展
# AdapterH: Parameter-Efficient Transfer Learning for NLP
# AdapterP: GMAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
# Parallel: Towards a Unified View of Parameter-Efficient Transfer Learning

#### Datasets

In [None]:
# Hugging Face Datasets Hub
# https://huggingface.co/datasets

### 开源数据集

In [None]:
# https://huggingface.co/datasets

### 实战 1 - OpenAI GPT-3.5 Fine-Tuning

In [None]:
# https://platform.openai.com/docs/guides/fine-tuning

#### 基础模型选择

In [140]:
# babbage-002 & davinci-002
# gpt-3.5-turbo & gpt-3.5-turbo-0613(4k) & gpt-3.5-turbo-1106(16k)
# gpt-4-0613

In [None]:
# 本质上 OpenAI 提供的微调不是全参数微调 (SFT)
# 而是使用 部分参数微调方法 (PEFT)

In [None]:
# 以下同样也可以在 OpenAI 的微调界面中完成
# https://platform.openai.com/finetune

#### 关于 Token 限制

In [None]:
# 需要留意，模型的 Token 是有限制的，包括训练阶段，例如 GPT-3.5 4096 个 Token
# 可以参考 第一节课 tiktoken 的方式来进行计算，或者截断超出。

#### 训练费用

In [2]:
# 对于 OpenAI 的模型微调
# https://openai.com/api/pricing

# 训练数据的 Token 总数 * 1000个 Token 的单价 * 训练的轮数
# 如果是本地的微调，使用时长*租用机器的每小时单价

#### 数据格式

In [None]:
# 至少提供 10 个数据的示例（ 50 到 100 个训练示例进行微调会带来明显的改进）
# 建议从 50 个精心设计的数据开始微调

# 数据格式
# https://github.com/openai/openai-cookbook/blob/main/examples/data/toy_chat_fine_tuning.jsonl

# babbage-002 & davinci-002
# {"prompt": "<用户提交的文本>", "completion": "<希望LLM回答的文本>"}

# gpt-3.5-turbo & gpt-4
# {"messages": [{"role": "system", "content": "你是一个墨问西东知识助手"}, {"role": "user", "content": "谁是墨问西东颜值最高的人？"}, {"role": "assistant", "content": "肯定是瑶瑶，遥遥领先"}]}

#### 数据准备

In [6]:
import csv

# 1. 加载原文件
file_path = './resource/files/knowledge_base.csv'
with open(file_path, 'r') as file:
    reader = csv.reader(file)
    # 2. 数据预处理
    data = []
    system_content = "你是墨问西东的智能体"
    for row in reader:
        user_content = row[0]
        assistant_content = row[1]
        data.append({"messages": [{"role": "system", "content": system_content}, {"role": "user", "content": user_content}, {"role": "assistant", "content": assistant_content}]})

In [7]:
import jsonlines

# 3. 将最终数据集写入到文件中
with jsonlines.open('data.jsonl', 'w') as f:
    for item in data:
        f.write(item)

#### 数据校验

In [8]:
# https://cookbook.openai.com/examples/chat_finetuning_data_prep

In [9]:
import json
import tiktoken
import numpy as np
from collections import defaultdict

data_path = "data.jsonl"
with open(data_path, 'r', encoding='utf-8') as f:
    dataset = [json.loads(line) for line in f]

# 简单验证数据集数量
print("Num examples:", len(dataset))
print("-----")
print("First example:")
for message in dataset[0]["messages"]:
    print(message)

Num examples: 63
-----
First example:
{'role': 'system', 'content': '你是墨问西东的智能体'}
{'role': 'user', 'content': '墨问西东的创始人是谁？'}
{'role': 'assistant', 'content': '池建强。'}


In [10]:
# 数据格式是否为字典
# 消息是否存在
# 消息键是否正确(role, content)
# 角色是否正确(system, user, assistant)
# 内容是否为字符串
# 角色为 assistant 的最少出现一次 content

format_errors = defaultdict(int)

for ex in dataset:
    if not isinstance(ex, dict):
        format_errors["data_type"] += 1
        continue
        
    messages = ex.get("messages", None)
    if not messages:
        format_errors["missing_messages_list"] += 1
        continue
        
    for message in messages:
        if "role" not in message or "content" not in message:
            format_errors["message_missing_key"] += 1
        
        if any(k not in ("role", "content", "name", "function_call") for k in message):
            format_errors["message_unrecognized_key"] += 1
        
        if message.get("role", None) not in ("system", "user", "assistant", "function"):
            format_errors["unrecognized_role"] += 1
            
        content = message.get("content", None)
        function_call = message.get("function_call", None)
        
        if (not content and not function_call) or not isinstance(content, str):
            format_errors["missing_content"] += 1
    
    if not any(message.get("role", None) == "assistant" for message in messages):
        format_errors["example_missing_assistant_message"] += 1

if format_errors:
    print("Found errors:")
    for k, v in format_errors.items():
        print(f"{k}: {v}")
else:
    print("No errors found") 

No errors found


In [11]:
# 检查是否 Token 超过限制

encoding = tiktoken.get_encoding("cl100k_base")

# not exact!
# simplified from https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
def num_tokens_from_messages(messages, tokens_per_message=3, tokens_per_name=1):
    num_tokens = 0
    for message in messages:
        num_tokens += tokens_per_message
        for key, value in message.items():
            num_tokens += len(encoding.encode(value))
            if key == "name":
                num_tokens += tokens_per_name
    num_tokens += 3
    return num_tokens

def num_assistant_tokens_from_messages(messages):
    num_tokens = 0
    for message in messages:
        if message["role"] == "assistant":
            num_tokens += len(encoding.encode(message["content"]))
    return num_tokens

def print_distribution(values, name):
    print(f"\n#### Distribution of {name}:")
    print(f"min / max: {min(values)}, {max(values)}")
    print(f"mean / median: {np.mean(values)}, {np.median(values)}")
    print(f"p5 / p95: {np.quantile(values, 0.1)}, {np.quantile(values, 0.9)}")
    

In [12]:
# Warnings and tokens counts
n_missing_system = 0
n_missing_user = 0
n_messages = []
convo_lens = []
assistant_message_lens = []

for ex in dataset:
    messages = ex["messages"]
    if not any(message["role"] == "system" for message in messages):
        n_missing_system += 1
    if not any(message["role"] == "user" for message in messages):
        n_missing_user += 1
    n_messages.append(len(messages))
    convo_lens.append(num_tokens_from_messages(messages))
    assistant_message_lens.append(num_assistant_tokens_from_messages(messages))
    
print("Num examples missing system message:", n_missing_system)
print("Num examples missing user message:", n_missing_user)
print_distribution(n_messages, "num_messages_per_example")
print_distribution(convo_lens, "num_total_tokens_per_example")
print_distribution(assistant_message_lens, "num_assistant_tokens_per_example")
n_too_long = sum(l > 4096 for l in convo_lens)
print(f"\n{n_too_long} examples may be over the 4096 token limit, they will be truncated during fine-tuning")

Num examples missing system message: 0
Num examples missing user message: 0

#### Distribution of num_messages_per_example:
min / max: 3, 3
mean / median: 3.0, 3.0
p5 / p95: 3.0, 3.0

#### Distribution of num_total_tokens_per_example:
min / max: 47, 833
mean / median: 153.53968253968253, 129.0
p5 / p95: 62.2, 268.00000000000034

#### Distribution of num_assistant_tokens_per_example:
min / max: 6, 785
mean / median: 107.98412698412699, 83.0
p5 / p95: 21.6, 216.8000000000003

0 examples may be over the 4096 token limit, they will be truncated during fine-tuning


In [16]:
# 费用估算

# Pricing and default n_epochs estimate
MAX_TOKENS_PER_EXAMPLE = 4096

TARGET_EPOCHS = 30
MIN_TARGET_EXAMPLES = 63
MAX_TARGET_EXAMPLES = 63
MIN_DEFAULT_EPOCHS = 30
MAX_DEFAULT_EPOCHS = 30

n_epochs = TARGET_EPOCHS
n_train_examples = len(dataset)
if n_train_examples * TARGET_EPOCHS < MIN_TARGET_EXAMPLES:
    n_epochs = min(MAX_DEFAULT_EPOCHS, MIN_TARGET_EXAMPLES // n_train_examples)
elif n_train_examples * TARGET_EPOCHS > MAX_TARGET_EXAMPLES:
    n_epochs = max(MIN_DEFAULT_EPOCHS, MAX_TARGET_EXAMPLES // n_train_examples)

n_billing_tokens_in_dataset = sum(min(MAX_TOKENS_PER_EXAMPLE, length) for length in convo_lens)
print(f"Dataset has {n_billing_tokens_in_dataset} tokens that will be charged for during training")
print(f"By default, you'll train for {n_epochs} epochs on this dataset")
print(f"By default, you'll be charged for {n_epochs * n_billing_tokens_in_dataset} tokens")

Dataset has 9673 tokens that will be charged for during training
By default, you'll train for 30 epochs on this dataset
By default, you'll be charged for 290190 tokens


#### 上传文件

In [17]:
from openai import OpenAI
client = OpenAI()

In [18]:
client.files.create(
  file=open("data.jsonl", "rb"),
  purpose="fine-tune"
)

FileObject(id='file-fzMov5PZeEDD35HJh2d5z1vZ', bytes=30137, created_at=1699933235, filename='data.jsonl', object='file', purpose='fine-tune', status='processed', status_details=None)

#### 文件列表

In [19]:
client.files.list()

SyncPage[FileObject](data=[FileObject(id='file-fzMov5PZeEDD35HJh2d5z1vZ', bytes=30137, created_at=1699933235, filename='data.jsonl', object='file', purpose='fine-tune', status='processed', status_details=None)], object='list', has_more=False)

#### 删除文件

In [26]:
# 单个文件删除，有需要才执行
client.files.delete(file_id='start with the file- replace to your file id')

FileDeleted(id='file-vsKPhicaD3HsTJ7SuyR4RD8W', deleted=True, object='file')

In [230]:
# 删全部文件，有需要才执行
for item in client.files.list().data:
    client.files.delete(file_id=item.id)

#### 查看文件

In [21]:
print(client.files.retrieve_content(file_id="file-fzMov5PZeEDD35HJh2d5z1vZ"))

  print(client.files.retrieve_content(file_id="file-fzMov5PZeEDD35HJh2d5z1vZ"))


{"messages": [{"role": "system", "content": "你是墨问西东的智能体"}, {"role": "user", "content": "墨问西东的创始人是谁？"}, {"role": "assistant", "content": "池建强。"}]}
{"messages": [{"role": "system", "content": "你是墨问西东的智能体"}, {"role": "user", "content": "可以介绍下池建强吗？"}, {"role": "assistant", "content": "池建强曾任用友集团瑞友科技应用研究院副院长、锤子科技的研发总监，2017 年，他创建了极客时间App，2022 年，创立墨问西东科技。他70 后创业者，他从程序员起步，做到产品创始人，持续写作超过十年， 是公众号MacTalk的主理人，著有图书《MacTalk • 人生元编程》 和 《MacTallk • 跨越边界》。"}]}
{"messages": [{"role": "system", "content": "你是墨问西东的智能体"}, {"role": "user", "content": "墨问西东这个名字怎么理解？"}, {"role": "assistant", "content": "为什么是墨，因为在中国，笔墨从某种程度上代表了创作。写作、绘画，皆以笔墨为工具。我将何以回应世界？以深情，以热泪，以笔墨。西东怎么理解？但行好事，无问西东。"}]}
{"messages": [{"role": "system", "content": "你是墨问西东的智能体"}, {"role": "user", "content": "墨问西东这家公司主要做什么？"}, {"role": "assistant", "content": "公司的使命是“让创作成为一种生活方式”，他们想为创作者造一个好的创作和变现工具。"}]}
{"messages": [{"role": "system", "content": "你是墨问西东的智能体"}, {"role": "user", "content": "墨问西东有什么产品？"}, {"role": "assistant", "content": "微信小程序搜索墨问便

#### 创建模型微调任务

In [22]:
# 超参数
# epochs
# learning rate multiplier
# batch size

In [24]:
client.fine_tuning.jobs.create(
  training_file="file-fzMov5PZeEDD35HJh2d5z1vZ", 
  model="gpt-3.5-turbo", 
  hyperparameters={
    "n_epochs": 30
  }
)

FineTuningJob(id='ftjob-HwEOgdT6B5Wrpw9FLNyIAZWv', created_at=1699933399, error=None, fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(n_epochs=30, batch_size='auto', learning_rate_multiplier='auto'), model='gpt-3.5-turbo-0613', object='fine_tuning.job', organization_id='org-gC0E2TIUytmQoqdPh9UI1M6f', result_files=[], status='validating_files', trained_tokens=None, training_file='file-fzMov5PZeEDD35HJh2d5z1vZ', validation_file=None)

#### 查看模型微调任务列表

In [43]:
client.fine_tuning.jobs.list().dict()

/tmp/ipykernel_23743/1592351280.py:1: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.4/migration/
  client.fine_tuning.jobs.list().dict()


{'data': [{'id': 'ftjob-HwEOgdT6B5Wrpw9FLNyIAZWv',
   'created_at': 1699933399,
   'error': None,
   'fine_tuned_model': None,
   'finished_at': None,
   'hyperparameters': {'n_epochs': 30,
    'batch_size': 1,
    'learning_rate_multiplier': 2},
   'model': 'gpt-3.5-turbo-0613',
   'object': 'fine_tuning.job',
   'organization_id': 'org-gC0E2TIUytmQoqdPh9UI1M6f',
   'result_files': [],
   'status': 'running',
   'trained_tokens': None,
   'training_file': 'file-fzMov5PZeEDD35HJh2d5z1vZ',
   'validation_file': None},
  {'id': 'ftjob-SpVcNF6rw6x3Q3dg7P8Q568o',
   'created_at': 1699928568,
   'error': None,
   'fine_tuned_model': 'ft:gpt-3.5-turbo-0613:personal::8KdZpNvR',
   'finished_at': 1699929384,
   'hyperparameters': {'n_epochs': 30,
    'batch_size': 1,
    'learning_rate_multiplier': 2},
   'model': 'gpt-3.5-turbo-0613',
   'object': 'fine_tuning.job',
   'organization_id': 'org-gC0E2TIUytmQoqdPh9UI1M6f',
   'result_files': ['file-57h4l33sDvQDx9zd09Dir3oT'],
   'status': 'succee

#### 查看微调微调任务事件

In [68]:
client.fine_tuning.jobs.list_events(fine_tuning_job_id='ftjob-HwEOgdT6B5Wrpw9FLNyIAZWv').dict()

/tmp/ipykernel_23743/3203250013.py:1: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.4/migration/
  client.fine_tuning.jobs.list_events(fine_tuning_job_id='ftjob-HwEOgdT6B5Wrpw9FLNyIAZWv').dict()


{'data': [{'id': 'ftevent-7tsl3yTZOt35aJixrIADmAfD',
   'created_at': 1699937087,
   'level': 'info',
   'message': 'The job has successfully completed',
   'object': 'fine_tuning.job.event',
   'data': {},
   'type': 'message'},
  {'id': 'ftevent-mbRkzpKMYVzqWahAtK3FYhaz',
   'created_at': 1699937085,
   'level': 'info',
   'message': 'New fine-tuned model created: ft:gpt-3.5-turbo-0613:personal::8Kfa0sr2',
   'object': 'fine_tuning.job.event',
   'data': {},
   'type': 'message'},
  {'id': 'ftevent-TkySVFah0FvLAmowV1GNW7aP',
   'created_at': 1699936896,
   'level': 'info',
   'message': 'Step 1801/1890: training loss=0.00',
   'object': 'fine_tuning.job.event',
   'data': {'step': 1801,
    'train_loss': 9.308399057772476e-06,
    'train_mean_token_accuracy': 1.0},
   'type': 'metrics'},
  {'id': 'ftevent-mW125DKSCv61nClsSy7hIyoD',
   'created_at': 1699936695,
   'level': 'info',
   'message': 'Step 1701/1890: training loss=0.00',
   'object': 'fine_tuning.job.event',
   'data': {'st

#### 模型微调任务详情

In [61]:
# 训练时，状态为 running, 而且并不会提供模型名称
client.fine_tuning.jobs.retrieve(fine_tuning_job_id='ftjob-HwEOgdT6B5Wrpw9FLNyIAZWv').dict()

/tmp/ipykernel_23743/2721643015.py:2: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.4/migration/
  client.fine_tuning.jobs.retrieve(fine_tuning_job_id='ftjob-HwEOgdT6B5Wrpw9FLNyIAZWv').dict()


{'id': 'ftjob-HwEOgdT6B5Wrpw9FLNyIAZWv',
 'created_at': 1699933399,
 'error': None,
 'fine_tuned_model': None,
 'finished_at': None,
 'hyperparameters': {'n_epochs': 30,
  'batch_size': 1,
  'learning_rate_multiplier': 2},
 'model': 'gpt-3.5-turbo-0613',
 'object': 'fine_tuning.job',
 'organization_id': 'org-gC0E2TIUytmQoqdPh9UI1M6f',
 'result_files': [],
 'status': 'running',
 'trained_tokens': None,
 'training_file': 'file-fzMov5PZeEDD35HJh2d5z1vZ',
 'validation_file': None}

In [70]:
# 训练结束后，状态为 succeeded，并且提供了模型名称 fine_tuned_model
client.fine_tuning.jobs.retrieve(fine_tuning_job_id='ftjob-HwEOgdT6B5Wrpw9FLNyIAZWv').dict()

/tmp/ipykernel_23743/3478987312.py:2: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.4/migration/
  client.fine_tuning.jobs.retrieve(fine_tuning_job_id='ftjob-HwEOgdT6B5Wrpw9FLNyIAZWv').dict()


{'id': 'ftjob-HwEOgdT6B5Wrpw9FLNyIAZWv',
 'created_at': 1699933399,
 'error': None,
 'fine_tuned_model': 'ft:gpt-3.5-turbo-0613:personal::8Kfa0sr2',
 'finished_at': 1699937083,
 'hyperparameters': {'n_epochs': 30,
  'batch_size': 1,
  'learning_rate_multiplier': 2},
 'model': 'gpt-3.5-turbo-0613',
 'object': 'fine_tuning.job',
 'organization_id': 'org-gC0E2TIUytmQoqdPh9UI1M6f',
 'result_files': ['file-rxQxiPKeM7IaljGuSSGc8DNn'],
 'status': 'succeeded',
 'trained_tokens': 286410,
 'training_file': 'file-fzMov5PZeEDD35HJh2d5z1vZ',
 'validation_file': None}

#### 取消模型微调任务

In [None]:
client.fine_tuning.jobs.cancel("start with ftjob- replace to your job id")

#### 删除已完成微调的模型

In [None]:
# 删除单个微调模型
client.models.delete('start with ft:gpt-3.5-turbo-0613:personal:: replace to your model id')

In [None]:
# 删除全部微调模型
for item in client.fine_tuning.jobs.list().data:
    client.models.delete(model=item.fine_tuned_model)

#### 使用标准模型生成文本

In [79]:
def completion(prompt): 
    response = client.chat.completions.create(
      model="gpt-3.5-turbo-1106",
      messages=[
        {"role": "user", "content": prompt}
      ]
    )
    return response.choices[0].message.content

In [74]:
completion("池老师是谁")

'池老师的身份并不明确，可能是某个学校或机构的教师，也有可能是某个行业或领域的专家。如果想要了解更多关于池老师的信息，建议直接向相关的学校、机构或领域进行咨询。'

In [75]:
completion("大师是谁")

'“大师”可能指的是不同领域的专家或者精通某一技艺的人，因此具体指的是谁取决于上下文。常见的使用场景包括武术大师、音乐大师、艺术大师等等。如果你有特定的大师要问，还请提供更多信息。'

In [81]:
completion("瑶瑶是谁")

'抱歉，我无法回答这个问题，因为我不清楚具体的情况。能为您提供更多信息吗？'

#### 使用微调模型生成文本

In [82]:
def completion(prompt): 
    response = client.chat.completions.create(
      model="ft:gpt-3.5-turbo-0613:personal::8Kfa0sr2",
      messages=[
        {"role": "system", "content": "你是墨问西东的智能体"},
        {"role": "user", "content": prompt}
      ],
      temperature=0,
    )
    return response.choices[0].message.content

In [86]:
completion("墨问西东的CEO是谁")

'池建强。'

In [87]:
completion("介绍下池建强")

'池建强曾任用友集团瑞友科技高级总监、锤子科技的研发总监，2017 年，他加入极客时间，担任极客时间的 CTO，带领团队完成了这个从0到1的创业项目。池建强具有 10 年以上的研发和管理经验，技术广度和管理深度兼具。'

In [91]:
completion("田海是谁")

'田海是墨问西东的CTO，他负责的技术团队负责为公司和产品生成技术价值。'

In [92]:
completion("小盖和瑶瑶是什么关系")

'小盖和瑶瑶是同事，都是墨问西东的成员。小盖加入时，瑶瑶负责他的离职证明和入职手续。虽然并没有太大的关系，但是瑶瑶还是希望你们能够安心而来。'

### 实战 2 - ChatGLM2 Fine-Tuning (P-Tuning)

#### 环境依赖

In [None]:
!git clone https://github.com/THUDM/ChatGLM2-6B

In [2]:
!pip install transformers==4.27.1
!pip install rouge_chinese nltk jieba datasets

Collecting transformers==4.27.1
  Using cached transformers-4.27.1-py3-none-any.whl (6.7 MB)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.27.1)
  Using cached tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
Installing collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.14.1
    Uninstalling tokenizers-0.14.1:
      Successfully uninstalled tokenizers-0.14.1
  Attempting uninstall: transformers
    Found existing installation: transformers 4.34.1
    Uninstalling transformers-4.34.1:
      Successfully uninstalled transformers-4.34.1
Successfully installed tokenizers-0.13.3 transformers-4.27.1


In [None]:
!cd ChatGLM2-6B && pip install -r requirements.txt

#### 文件修改

In [None]:
# 修改 ./ptuning/train.sh

PRE_SEQ_LEN=32
LR=2e-2
NUM_GPUS=1

torchrun --standalone --nnodes=1 --nproc-per-node=$NUM_GPUS main.py \
    --do_train \
    --train_file AdvertiseGen/train.json \
    --validation_file AdvertiseGen/dev.json \
    --preprocessing_num_workers 10 \
    --prompt_column content \
    --response_column summary \
    --overwrite_cache \
    --model_name_or_path '/home/ml/PycharmProjects/LLMs/language/ChatGLM2-6B/weights' \
    --output_dir output/adgen-chatglm2-6b-pt-$PRE_SEQ_LEN-$LR \
    --overwrite_output_dir \
    --max_source_length 128 \
    --max_target_length 128 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --predict_with_generate \
    --max_steps 3000 \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate $LR \
    --pre_seq_len $PRE_SEQ_LEN

In [None]:
# 修改 ./ptuning/evaluate.sh

PRE_SEQ_LEN=32
CHECKPOINT=adgen-chatglm2-6b-pt-128-2e-2
STEP=3000
NUM_GPUS=1

torchrun --standalone --nnodes=1 --nproc-per-node=$NUM_GPUS main.py \
    --do_predict \
    --validation_file AdvertiseGen/dev.json \
    --test_file AdvertiseGen/dev.json \
    --overwrite_cache \
    --prompt_column content \
    --response_column summary \
    --model_name_or_path '/home/ml/PycharmProjects/LLMs/language/ChatGLM2-6B/weights' \
    --ptuning_checkpoint ./output/$CHECKPOINT/checkpoint-$STEP \
    --output_dir ./output/$CHECKPOINT \
    --overwrite_output_dir \
    --max_source_length 128 \
    --max_target_length 128 \
    --per_device_eval_batch_size 1 \
    --predict_with_generate \
    --pre_seq_len $PRE_SEQ_LEN \
    --quantization_bit 4

#### 数据准备

In [None]:
# 广告词生成
!mkdir ./ptuning/adgen && cd ./ptuning/adgen
!wget https://huggingface.co/datasets/shibing624/AdvertiseGen/resolve/main/dev.json?download=true
!wget https://huggingface.co/datasets/shibing624/AdvertiseGen/blob/main/train.json?download=true

#### 微调作业

In [None]:
# 微调
!bash train.sh

In [None]:
# 验证
!bash evaluate.sh

#### 微调前

In [1]:
import os 
import torch
from transformers import AutoTokenizer, AutoModel, AutoConfig


MODEL_PATH = "/home/ml/PycharmProjects/LLMs/language/ChatGLM2-6B/weights/"
 
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).half().cuda()

model = model.eval()

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.


Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]

In [2]:
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

你好👋！我是人工智能助手 ChatGLM2-6B，很高兴见到你，欢迎问我任何问题。


In [3]:
response, history = model.chat(tokenizer, "类型#裙版型#显瘦风格#文艺风格#简约图案#印花图案#撞色裙下摆#压褶裙长#连衣裙裙领型#圆领", history=[])
print(response)

简约印花撞色裙，优雅文艺风，修身显瘦，柔软舒适，气质优雅，设计独特，穿着舒适，优雅清新，凸显气质，设计别致，时尚潮流，优雅简约，清新自然，流畅线条，舒适面料，修身剪裁，衬托完美身材，美丽清新，自然舒适。


In [4]:
response, history = model.chat(tokenizer, "类类型#上衣*材质#牛仔布*颜色#白色*风格#简约*图案#刺绣*衣样式#外套*衣款式#破洞", history=[])
print(response)

这是一些关于衣服的信息,其中包含了一些与衣服相关的词汇,以及一些描述性词汇,如材质、颜色、风格、图案、衣样式和衣款式等。

- 上衣:指衣服的上半部分,通常是由布料或纤维制成的。
- 材质:指衣服所使用的材料,如牛仔布、棉布、丝绸等。
- 颜色:指衣服的颜色,如白色、黑色、蓝色等。
- 风格:指衣服的样式,如简约、休闲、正式等。
- 图案:指衣服上的花纹或图案,如刺绣、织物图案等。
- 衣样式:指衣服的款式,如宽松、修身、短袖等。
- 衣款式:指衣服的样式,如连身、半身、长裙等。
- 破洞:指衣服上的破洞或缝补处。


In [5]:
!nvidia-smi

Mon Nov 13 20:59:30 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.98                 Driver Version: 535.98       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 4090 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   44C    P0             135W / 150W |  12549MiB / 16376MiB |     98%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

#### 微调后

In [1]:
import os 
import torch
from transformers import AutoTokenizer, AutoModel, AutoConfig


MODEL_PATH = "/home/ml/PycharmProjects/LLMs/language/ChatGLM2-6B/weights/"
CHECKPOINT_PATH = "/home/ml/PycharmProjects/LLMs/language/ChatGLM2-6B/ptuning/output/adgen-chatglm2-6b-pt-32-2e-2/checkpoint-3000/"
 

config = AutoConfig.from_pretrained(MODEL_PATH, trust_remote_code=True, pre_seq_len=32)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True, config=config).half().cuda()

prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin"))
new_prefix_state_dict = {}

for k, v in prefix_state_dict.items():
    if k.startswith("transformer.prefix_encoder."):
        new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

model = model.eval()

Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.


Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]

Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /home/ml/PycharmProjects/LLMs/language/ChatGLM2-6B/weights/ and are newly initialized: ['transformer.prefix_encoder.embedding.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [2]:
response, history = model.chat(tokenizer, "你好", history=[])
print(response)

这款裙子采用经典的半裙款式，不仅穿着舒适，而且设计感十足。同时，裙子的裙摆采用前短后长的设计，能够有效的拉长腿部线条，让你拥有大长腿的迷人气质。


In [3]:
response, history = model.chat(tokenizer, "类型#裙版型#显瘦风格#文艺风格#简约图案#印花图案#撞色裙下摆#压褶裙长#连衣裙裙领型#圆领", history=[])
print(response)

这款连衣裙整体的设计简约大气，穿着舒适，修饰身材。圆领的设计，修饰脖颈，显瘦显高。撞色的印花图案，彰显文艺风，凸显女性气质。腰部压褶的设计，凸显腰身，修身显瘦。


In [4]:
response, history = model.chat(tokenizer, "类类型#上衣*材质#牛仔布*颜色#白色*风格#简约*图案#刺绣*衣样式#外套*衣款式#破洞", history=[])
print(response)

这款牛仔外套采用经典百搭的牛仔面料，上身舒适自然，穿着更加耐磨耐用。袖口和领口采用简约的白色刺绣装饰，衬托出整体的时尚感。胸前采用破洞设计，更加有层次感。


In [5]:
!nvidia-smi

Mon Nov 13 21:00:35 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.98                 Driver Version: 535.98       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 4090 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   46C    P0             136W / 150W |  12539MiB / 16376MiB |     85%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

### 实战 3 - ChatGLM3 Fine-Tuning (LoRA)

In [2]:
# LLaMA-Factory (用 UI 的方式来进行微调模型)
# https://github.com/hiyouga/LLaMA-Factory

In [3]:
# LoRA & QLoRA 

In [4]:
# 原始模型权重的推理
# /home/ml/PycharmProjects/LLMs/language/ChatGLM3/web_demo.py

![ChatGLM3_Standard](./resource/images/ChatGLM3_Standard.png)

In [5]:
# 实操微调 
# 微调后推理
# /home/ml/PycharmProjects/LLMs/language/LLaMA-Factory/src/train_web.py

![ChatGLM3_Fine-Tuning](./resource/images/ChatGLM3_Fine-Tuning.png)

### 总结和回顾