# ChatGLM-6B-LoRA

本示例介绍 ChatGLM-6B 模型基于 LoRA 的微调。

下面以 [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) 数据集为例介绍代码的使用方法。

SageMaker Notebook kernel 环境选择 **conda_pytorch_p310**.

## 训练

### LoRA

下载代码

In [1]:
%%script bash
rm -rf ChatGLM-Tuning
git clone https://github.com/mymusise/ChatGLM-Tuning.git
cd ChatGLM-Tuning
git reset --hard 997393046a49510e6cda36962f9a399297959311

Cloning into 'ChatGLM-Tuning'...


HEAD is now at 9973930 Merge pull request #142 from mymusise/fix/infer_with_int8


### 软件依赖
运行微调需要4.27.1版本的`transformers`。除 ChatGLM-6B 的依赖之外，还需要安装以下依赖

In [2]:
!pip install -r ChatGLM-Tuning/requirements.txt

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting git+https://github.com/huggingface/peft.git (from -r ChatGLM-Tuning/requirements.txt (line 15))
  Cloning https://github.com/huggingface/peft.git to /tmp/pip-req-build-pauqpzfk
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git /tmp/pip-req-build-pauqpzfk
  Resolved https://github.com/huggingface/peft.git to commit 032fff92fb74b737a2934e91a08d82142fb79dc3
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone


### 清洗数据集

In [3]:
!python ChatGLM-Tuning/cover_alpaca2jsonl.py \
    --data_path ChatGLM-Tuning/data/alpaca_data.json \
    --save_path ChatGLM-Tuning/data/alpaca_data.jsonl

formatting..: 100%|███████████████████| 52002/52002 [00:00<00:00, 180192.34it/s]


### 生成数据集
1. 使用ChatGLM Tokenizer对数据进行编码
2. 依据Alpaca数据集格式进行Instruction FineTune

**根据数据集的实际情况调整下述max_seq_length**

In [4]:
!python ChatGLM-Tuning/tokenize_dataset_rows.py \
    --jsonl_path ChatGLM-Tuning/data/alpaca_data.jsonl \
    --save_path ChatGLM-Tuning/data/alpaca \
    --max_seq_length 128

Found cached dataset generator (/home/ec2-user/.cache/huggingface/datasets/generator/default-35c9670637d69edf/0.0.0)
                                                                                

### 加载原始模型

In [5]:
from transformers import AutoTokenizer, AutoModel, TrainingArguments, AutoConfig
import torch
import torch.nn as nn


class CastOutputToFloat(nn.Sequential):
    def forward(self, x): return super().forward(x).to(torch.float32)


tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b", load_in_8bit=True, trust_remote_code=True, device_map='auto')
model.supports_gradient_checkpointing = True
model.gradient_checkpointing_enable()
model.enable_input_require_grads()
model.lm_head = CastOutputToFloat(model.lm_head)
model.config.use_cache = False

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
  warn(msg)
  warn(msg)



Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.8/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

### 原始模型测试

In [6]:
import sys

sys.path.append("./ChatGLM-Tuning")

In [7]:
from cover_alpaca2jsonl import format_example
import json


instructions = json.load(open("ChatGLM-Tuning/data/alpaca_data.json"))


with torch.no_grad():
    for idx, item in enumerate(instructions[:5]):
        feature = format_example(item)
        input_text = feature["context"]
        input_ids = tokenizer.encode(input_text, return_tensors='pt')
        out = model.generate(
            input_ids=input_ids,
            max_length=150,
            temperature=0
        )
        answer = tokenizer.decode(out[0])
        print(answer)
        item['infer_answer'] = answer
        print(f"### {idx+1}.Answer:\n", item.get('output'), '\n\n')

The dtype of attention mask (torch.int64) is not bool


Instruction: Give three tips for staying healthy.
Answer: 1. Eat a balanced diet that includes a variety of fruits, vegetables, whole grains, lean proteins, and healthy fats.
2. Exercise regularly, either through physical activity or simply by walking or engaging in other low-impact activities.
3. Get enough sleep each night, as sleep is essential for maintaining physical and mental health.
### 1.Answer:
 1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. 
2. Exercise regularly to keep your body active and strong. 
3. Get enough sleep and maintain a consistent sleep schedule. 


Instruction: What are the three primary colors?
Answer: The three primary colors are red, blue, and yellow. These colors are used in painting, printing, and other visual arts to create colors that can be easily mixed together to create other colors.
### 2.Answer:
 The three primary colors are red, blue, and yellow. 


Instruction: Describe the structure of an atom.
Answer: The struc

### 训练参数展示

**1. LoRA参数选择不同, 训练参数占比也不同**

**2. 加载LoRA模型后,只有每层LoRA矩阵权重需要更新、训练**

In [15]:
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {trainable_model_params*100/all_model_params:.4f}%"


def print_trainable_non_trainable_layers(model):
    for name, param in model.named_parameters():
        if param.requires_grad:
            print(f"Trainable: {name}")
        else:
            print(f"Not trainable: {name}")

### 设置LoRA参数

In [9]:
from peft import get_peft_model, LoraConfig, TaskType


peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)

model = get_peft_model(model, peft_config)
model.is_parallelizable = True
model.model_parallel = True

In [10]:
print(print_number_of_trainable_model_parameters(model))
# print_trainable_non_trainable_layers(model)

trainable model parameters: 3670016
all model parameters: 6176956416
percentage of trainable model parameters: 0.0594%


### 提取训练数据集

In [11]:
import datasets

dataset_path = "ChatGLM-Tuning/data/alpaca/"

dataset = datasets.load_from_disk(dataset_path)

train_num = 500
# train_num = 50

mini_train_dataset = datasets.Dataset.from_dict(dataset[:train_num])

### 定义数据提取器和Trainer

In [12]:
from transformers import Trainer, HfArgumentParser


def data_collator(features: list) -> dict:
    len_ids = [len(feature["input_ids"]) for feature in features]
    longest = max(len_ids)
    input_ids = []
    labels_list = []
    
    for ids_l, feature in sorted(zip(len_ids, features), key=lambda x: -x[0]):
        ids = feature["input_ids"]
        seq_len = feature["seq_len"]
        labels = (
            [-100] * (seq_len - 1) + ids[(seq_len - 1) :] + [-100] * (longest - ids_l)
        )
        ids = ids + [tokenizer.pad_token_id] * (longest - ids_l)
        _ids = torch.LongTensor(ids)
        labels_list.append(torch.LongTensor(labels))
        input_ids.append(_ids)

    input_ids = torch.stack(input_ids)
    labels = torch.stack(labels_list)
    
    return {
        "input_ids": input_ids,
        "labels": labels,
    }

class ModifiedTrainer(Trainer):

    def compute_loss(self, model, inputs, return_outputs=False):
        return model(
            input_ids=inputs["input_ids"],
            labels=inputs["labels"],
        ).loss

### 定义训练参数

In [13]:
training_args = TrainingArguments(
    "output",
    fp16=True,
    gradient_accumulation_steps=1,
    per_device_train_batch_size = 1,
    learning_rate = 1e-4,
    max_steps=50,
    save_steps=50,
    # max_steps=1500,
    logging_steps=50,
    remove_unused_columns=False,
    seed=0,
    data_seed=0,
    group_by_length=False,
)

trainer = ModifiedTrainer(
    model=model,
    train_dataset=mini_train_dataset,
    args=training_args,
    data_collator=data_collator,
)


### 开启模型训练

In [14]:
trainer.train()



Step,Training Loss
50,2.6257


TrainOutput(global_step=50, training_loss=2.6257489013671873, metrics={'train_runtime': 90.6931, 'train_samples_per_second': 0.551, 'train_steps_per_second': 0.551, 'total_flos': 127662990409728.0, 'train_loss': 2.6257489013671873, 'epoch': 0.1})

In [16]:
model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): ChatGLMForConditionalGeneration(
      (transformer): ChatGLMModel(
        (word_embeddings): Embedding(130528, 4096)
        (layers): ModuleList(
          (0-27): 28 x GLMBlock(
            (input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
            (attention): SelfAttention(
              (rotary_emb): RotaryEmbedding()
              (query_key_value): Linear8bitLt(
                in_features=4096, out_features=12288, bias=True
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=12288, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_e

### 使用微调后的模型进行推理

In [17]:
from cover_alpaca2jsonl import format_example
import json


instructions = json.load(open("ChatGLM-Tuning/data/alpaca_data.json"))


with torch.no_grad():
    for idx, item in enumerate(instructions[:5]):
        feature = format_example(item)
        input_text = feature["context"]
        input_ids = tokenizer.encode(input_text, return_tensors='pt')
        out = model.generate(
            input_ids=input_ids,
            max_length=150,
            temperature=0
        )
        answer = tokenizer.decode(out[0])
        print(answer)
        item['infer_answer'] = answer
        print(f"### {idx+1}.Answer:\n", item.get('output'), '\n\n')



Instruction: Give three tips for staying healthy.
Answer: 1. Eat a balanced diet rich in fruits, vegetables, whole grains, lean proteins, and healthy fats.
2. Exercise regularly to improve cardiovascular health and boost energy levels.
3. Get enough sleep each night to help regulate your body's internal clock and improve overall health.
### 1.Answer:
 1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. 
2. Exercise regularly to keep your body active and strong. 
3. Get enough sleep and maintain a consistent sleep schedule. 


Instruction: What are the three primary colors?
Answer: The three primary colors are red, blue, and yellow.
### 2.Answer:
 The three primary colors are red, blue, and yellow. 


Instruction: Describe the structure of an atom.
Answer: The structure of an atom is made up of a central nucleus, surrounded by electrons. The electrons are in constant motion around the nucleus, and they are negatively charged. The electrons are in constant mot

### 保存模型和参数

In [18]:
import os


def save_tunable_parameters(model, path):
    saved_params = {
        k: v.to("cpu")
        for k, v in model.named_parameters()
        if v.requires_grad
    }
    torch.save(saved_params, path)

model.save_pretrained("output")
# save_tunable_parameters(model, os.path.join("output", "adapter_model.bin"))

## 加载微调模型并进行推理
由于笔记本实例显存限制, 建议“restart kernel”后再执行下述单元, 否则会出现加载模型错误的情况

In [1]:
import sys

sys.path.append("./ChatGLM-Tuning")

In [2]:
from transformers import AutoTokenizer, AutoModel, TrainingArguments, AutoConfig
import torch
import torch.nn as nn
from peft import PeftModel

# 加载原始模型
model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, load_in_8bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)

# 加载LoRA模型
model = PeftModel.from_pretrained(model, "output/")

  warn(msg)
  warn(msg)
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.



Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.8/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...


Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.


In [3]:
model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): ChatGLMForConditionalGeneration(
      (transformer): ChatGLMModel(
        (word_embeddings): Embedding(130528, 4096)
        (layers): ModuleList(
          (0-27): 28 x GLMBlock(
            (input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
            (attention): SelfAttention(
              (rotary_emb): RotaryEmbedding()
              (query_key_value): Linear8bitLt(
                in_features=4096, out_features=12288, bias=True
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=12288, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_e

In [None]:
from cover_alpaca2jsonl import format_example
import json


instructions = json.load(open("ChatGLM-Tuning/data/alpaca_data.json"))


with torch.no_grad():
    for idx, item in enumerate(instructions[:5]):
        feature = format_example(item)
        input_text = feature["context"]
        input_ids = tokenizer.encode(input_text, return_tensors='pt')
        out = model.generate(
            input_ids=input_ids,
            max_length=150,
            temperature=0
        )
        answer = tokenizer.decode(out[0])
        print(answer)
        item['infer_answer'] = answer
        print(f"### {idx+1}.Answer:\n", item.get('output'), '\n\n')

The dtype of attention mask (torch.int64) is not bool


Instruction: Give three tips for staying healthy.
Answer: 1. Eat a balanced diet rich in fruits, vegetables, whole grains, lean proteins, and healthy fats.
2. Exercise regularly to maintain a healthy weight and improve cardiovascular health.
3. Get enough sleep each night to help your body recover and function properly.
### 1.Answer:
 1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. 
2. Exercise regularly to keep your body active and strong. 
3. Get enough sleep and maintain a consistent sleep schedule. 


Instruction: What are the three primary colors?
Answer: The three primary colors are red, blue, and yellow.
### 2.Answer:
 The three primary colors are red, blue, and yellow. 


