<a href="https://colab.research.google.com/github/Crossme0809/frenzyTechAI/blob/main/finetune_mistral-7b/Finetune_Mistral7B_on_a_single_GPU_with_PEFT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 使用 PEFT LORA 和 Google Colab 在单个 GPU 上训练 Mistral-7b AI

本笔记将向您展示如何使用最新的 peft 库和 bitsandbytes 来微调 Mistral-7b，以加载 4 位大型模型。

微调方法将依赖于一种名为“低等级适配器”（LoRA）的最新方法，而不是微调整个模型，您只需微调这些适配器并将它们正确加载到模型中即可。微调模型后，您还可以在 🤗 Hub 上共享您的适配器并轻松加载它们。

请注意，这可用于支持 device_map 的任何模型（即使用加速加载模型）。

## Step 0 -  定义一些辅助函数 :
1. 启用文本换行，这样我们就不必水平滚动
2. 定义一个包装函数，它将我们的查询传递给模型进行推理并返回解码后的模型的完成（响应）。


In [None]:
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))

get_ipython().events.register('pre_run_cell', set_css)


让我们定义一个包装函数，它将从用户问题的模型中获得补全

In [None]:
def get_completion(query: str, model, tokenizer) -> str:
  device = "cuda:0"

  prompt_template = """
  Below is an instruction that describes a task. Write a response that appropriately completes the request.
  ### Question:
  {query}

  ### Answer:
  """
  prompt = prompt_template.format(query=query)

  encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)

  model_inputs = encodeds.to(device)


  generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
  decoded = tokenizer.batch_decode(generated_ids)
  return (decoded[0])

## Step 1 - 安装必要的软件包

首先，安装下面的依赖项来开始。由于这些功能仅在主分支上可用，因此我们需要从源代码安装以下库。

In [None]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


## Step 2 - 加载模型并量化

我们将使用 QLoRA 量化加载模型以减少内存使用

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

现在我们指定模型 ID，然后使用之前定义的量化配置加载它。

In [None]:
model_id = "mistralai/Mistral-7B-v0.1"


# model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})
tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

对基本模型进行推理。该模型似乎无法理解我们的指令，并为我们提供了与我们的查询相关的问题列表。

In [None]:
# result = get_completion(query="Will capital gains affect my tax bracket?", model=model, tokenizer=tokenizer)
# print(result)

## Step 3 - 准备指令微调数据集

让我们加载一个金融数据集，以根据基本金融知识微调我们的模型。在本指南中，为了演示，我们将从原始数据集中加载 10% 的数据，只是为了展示如何使用与 HF 生态系统上现有工具的集成。

In [None]:
from datasets import load_dataset

data = load_dataset("gbharti/finance-alpaca", split='train')

# Explore the data
df = data.to_pandas()
df.head(10)




Unnamed: 0,input,output,instruction,text
0,,The car deal makes money 3 ways. If you pay in...,"For a car, what scams can be plotted with 0% f...",
1,,"That is kind of the point, one of the hopes is...",Why does it matter if a Central Bank has a neg...,
2,,"Pay off your debt. As you witnessed, no ""inve...",Where should I be investing my money?,
3,,"Equity options, at least those traded in the A...",Specifically when do options expire?,
4,,"Automatic exercisions can be extremely risky, ...",Negative Balance from Automatic Options Exerci...,
5,,"Generally ""default"" means that the company can...",Approximation of equity value for company in d...,
6,,The game is not zero sum. When a friend and I ...,Is it true that 90% of investors lose their mo...,
7,,"In general, you can only be charged for servic...",Can a company charge you for services never re...,
8,,Being self employed just means you fill out so...,Working out if I should be registered as self-...,
9,,"For eToro, just like any other brokerage firm,...",About eToro investments,


指令Fintuning - 以“提示”的格式准备数据集，以便模型更好地理解：

1. 函数generate_prompt：获取指令并输出并生成提示
2. shuffle the dataset
3. tokenizer the dataset

In [None]:
def generate_prompt(data_point):
    """Gen. input text based on a prompt, task instruction, (context info.), and answer

    :param data_point: dict: Data point
    :return: dict: tokenzed prompt
    """
    # Samples with additional context into.
    if data_point['input']:
        text = 'Below is an instruction that describes a task, paired with an input that provides' \
               ' further context. Write a response that appropriately completes the request.\n\n'
        text += f'### Instruction:\n{data_point["instruction"]}\n\n'
        text += f'### Input:\n{data_point["input"]}\n\n'
        text += f'### Response:\n{data_point["output"]}'

    # Without
    else:
        text = 'Below is an instruction that describes a task. Write a response that ' \
               'appropriately completes the request.\n\n'
        text += f'### Instruction:\n{data_point["instruction"]}\n\n'
        text += f'### Response:\n{data_point["output"]}'
    return text

# add the "prompt" column in the dataset
text_column = [generate_prompt(data_point) for data_point in data]
data = data.add_column("prompt", text_column)

我们需要对数据进行标记，以便模型能够理解。

In [None]:
data = data.shuffle(seed=1234)  # Shuffle dataset here
data = data.map(lambda samples: tokenizer(samples["prompt"]), batched=True)

将数据集分成 90% 用于训练，10% 用于测试

In [None]:
data = data.train_test_split(test_size=0.1)
train_data = data["train"]
test_data = data["test"]


In [None]:
print(test_data)

Dataset({
    features: ['input', 'output', 'instruction', 'text', 'prompt', 'input_ids', 'attention_mask'],
    num_rows: 6892
})


## Step 4 - 应用 Lora  

让我们加载一个 PeftModel 并指定我们将使用低秩适配器 (LoRA)，使用 get_peft_model 实用程序函数和 PEFT 的 prepare_model_for_kbit_training 方法。

In [None]:
from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj","o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

peft_model = get_peft_model(model, lora_config)
print_trainable_parameters(peft_model)

trainable params: 6815744 || all params: 3758886912 || trainable%: 0.18132346515244138


将适配器添加到模型中

In [None]:
model.add_adapter(lora_config, adapter_name="adapter")

## Step 5 - 运行训练

In [None]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

设置训练参数：

* 由于演示的原因，我们只运行了几个步骤 (100)，只是为了展示如何将这种集成与 HF 生态系统上的现有工具结合使用。

In [None]:
# from datasets import load_dataset
# data = load_dataset("ronal999/finance-alpaca-demo", split='train')
# data = data.train_test_split(test_size=0.1)
# train_data = data["train"]
# test_data = data["test"]

In [None]:
# import transformers

# tokenizer.pad_token = tokenizer.eos_token


# trainer = transformers.Trainer(
#     model=model,
#     train_dataset=train_data,
#     eval_dataset=test_data,
#     args=transformers.TrainingArguments(
#         per_device_train_batch_size=1,
#         gradient_accumulation_steps=4,
#         warmup_steps=0.03,
#         max_steps=100,
#         learning_rate=2e-4,
#         fp16=True,
#         logging_steps=1,
#         output_dir="outputs_mistral_b_finance_finetuned_test",
#         optim="paged_adamw_8bit",
#         save_strategy="epoch",
#     ),
#     data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
# )


In [None]:
!pip install -q trl

In [None]:
#Here I reload the model and specify it should be loaded in a single GPU to avoid errors" Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! when resuming training"
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
#new code using SFTTrainer
import transformers

from trl import SFTTrainer

tokenizer.pad_token = tokenizer.eos_token
torch.cuda.empty_cache()

trainer = SFTTrainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=test_data,
    dataset_text_field="prompt",
    peft_config=lora_config,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=0.03,
        max_steps=100,
        learning_rate=2e-4,
        logging_steps=1,
        output_dir="outputs",
        optim="paged_adamw_8bit",
        save_strategy="epoch",
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

Map:   0%|          | 0/6892 [00:00<?, ? examples/s]

开始训练

In [None]:
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()


Step,Training Loss
1,1.9379
2,2.1124
3,1.8149
4,1.3104
5,1.428
6,1.4236
7,1.5968
8,0.9395
9,1.2016
10,1.4021


Step,Training Loss
1,1.9379
2,2.1124
3,1.8149
4,1.3104
5,1.428
6,1.4236
7,1.5968
8,0.9395
9,1.2016
10,1.4021


TrainOutput(global_step=100, training_loss=1.200958268046379, metrics={'train_runtime': 215.2674, 'train_samples_per_second': 1.858, 'train_steps_per_second': 0.465, 'total_flos': 2902691701923840.0, 'train_loss': 1.200958268046379, 'epoch': 0.01})

在 🤗 Hub 上共享适配器

In [None]:
model.push_to_hub("mistral_b_finance_finetuned_test")
tokenizer.push_to_hub("mistral_b_finance_finetuned_test")

## 第 6 步 定性评估模型：运行推理！



In [None]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets

直接从 Hub 加载适配器。

In [None]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = "Ronal999/mistral_b_finance_finetuned_test"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_4bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

然后，利用从 Hub 加载的训练模型进行推理。提出同样的问题并检查结果。

In [None]:
result = get_completion(query="Will capital gains affect my tax bracket?", model=model, tokenizer=tokenizer)
print(result)