# Finetune Mistral-7b model on the Guanaco Dataset - Using Supervised Fine Tuning Trainer

Task Description: Causal Language Modelling (CLM) is text generation. Given a prompt/source sequence, CLM will generate words to continue the source sequence.

Original Tutorial: https://colab.research.google.com/drive/1DNenc5BpdqaS10prtklYyIe9qW_7gUnb?usp=sharing#scrollTo=ZwXZbQ2dSwzI

Credit to **Younes Belkada** for the original notebook

The Guanaco dataset is a clean subset of the OpenAssistant dataset adapted to train general purpose chatbots.

In [1]:
!pip install -q -U trl accelerate git+https://github.com/huggingface/peft.git git+https://github.com/huggingface/transformers.git
!pip install -q datasets bitsandbytes evaluate accelerate bitsandbytes loralib  python-dotenv

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


# Load Guanaco dataset

In [2]:
from datasets import load_dataset

guanaco = load_dataset(
    "timdettmers/openassistant-guanaco",
    split = "train"
    )

Repo card metadata block was not found. Setting CardData to empty.


In [3]:
guanaco

Dataset({
    features: ['text'],
    num_rows: 9846
})

In [32]:
# Split the dataset into a train and test set
guanaco = guanaco.train_test_split(test_size=0.2)
guanaco

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 7876
    })
    test: Dataset({
        features: ['text'],
        num_rows: 1970
    })
})

In [33]:
# Look at the data
from pprint import pprint
pprint(guanaco['train'][1000])

# The text column is our model input


{'text': '### Human: 安装并使用诸如openpyxl的库，使用其读取方法即可### Assistant: '
         '好的，以下是使用openpyxl库读取Excel文件的示例代码：\n'
         '\n'
         '先安装openpyxl库，可以在命令行中使用如下命令：\n'
         '```\n'
         'pip install openpyxl\n'
         '```\n'
         '然后可以使用如下代码读取Excel文件：\n'
         '```\n'
         'import openpyxl\n'
         '\n'
         '# 打开Excel文件\n'
         "workbook = openpyxl.load_workbook('example.xlsx')\n"
         '\n'
         '# 选择表格\n'
         "worksheet = workbook['Sheet1']\n"
         '\n'
         '# 读取单元格的值\n'
         'cell_value = worksheet.cell(row=1, column=1).value\n'
         '\n'
         '# 打印单元格的值\n'
         'print(cell_value)\n'
         '```\n'
         '在上面的代码中，我们首先使用load_workbook()方法打开了一个Excel文件，并选择了一个表格。然后使用cell()方法读取第1行第1列的单元格的值，并将它打印出来。'}


# Load Tokenizer

In [4]:
# Load Tokenizer
from transformers import AutoTokenizer

checkpoint = "ybelkada/Mistral-7B-v0.1-bf16-sharded"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

# Add special tokens
tokenizer.pad_token = tokenizer.eos_token

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# Load a Model with BitsandBytes

In [5]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import get_peft_model, LoraConfig, TaskType
import torch

# Create a config corresponding to the PEFT method
peft_config = LoraConfig(
    task_type = TaskType.CAUSAL_LM,
    inference_mode = False,
    r=64,
    target_modules = ["q_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0.1
)

nf4_config = BitsAndBytesConfig(
  # Load Model in 4bit precision
   load_in_4bit=True,
  # use normalized float 4 (default)
   bnb_4bit_quant_type="nf4",
  # uses a second quantization after the first one to save an additional 0.4 bits per parameter
   bnb_4bit_use_double_quant=True,
  # Format in which computations will occur
   bnb_4bit_compute_dtype=torch.bfloat16
)

In [6]:
# Wrap base model
model = AutoModelForCausalLM.from_pretrained(
    checkpoint,
    quantization_config = nf4_config,
    #device_map="auto"
    )


Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [7]:
print(model)

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )

# Train using the SFTTrainer API
The main training steps are:

1. Define training hyperparameters using a model specific TrainingArguments function. At the end of each epoch, the Trainer will evaluate the defined loss metric and save the training checkpoint.

2. Pass the training arguments to a Trainer function alongside the model, dataset, tokenizer, data collator.

3. Call train() to finetune the model

In [8]:
from transformers import TrainingArguments, Trainer
from trl import SFTTrainer

In [9]:
output_dir = "./guanaco_clm"
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 10
logging_steps = 10
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 10
warmup_ratio = 0.03
lr_scheduler_type = "constant"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=False,
    lr_scheduler_type=lr_scheduler_type,
    gradient_checkpointing=True,

)


In [35]:
from trl import SFTTrainer

max_seq_length = 512

trainer = SFTTrainer(
    model=model,
    train_dataset=guanaco['train'],
    eval_dataset=guanaco['test'],
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=True,
)




In [36]:
# Upcast the layer norms in float32 for more stable training
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to('cuda')

In [37]:
# Train the model
trainer.train()

Step,Training Loss
10,1.2383


TrainOutput(global_step=10, training_loss=1.23828125, metrics={'train_runtime': 122.7214, 'train_samples_per_second': 1.304, 'train_steps_per_second': 0.081, 'total_flos': 3569248685260800.0, 'train_loss': 1.23828125, 'epoch': 0.02})

In [38]:
# Evaluate the fine tuned model and obtain the perplexity score
import math

eval_results = trainer.evaluate()
print(f"Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

Perplexity: 3.37


In [15]:
trainer.save_model("guanaco_causal_modell")

In [16]:
# In this case, the tokenizer was not saved automatically, save it manually in the model folder for inference
#tokenizer.save_pretrained("guanaco_causal_modell", legacy_format=False)

('guanaco_causal_modell/tokenizer_config.json',
 'guanaco_causal_modell/special_tokens_map.json',
 'guanaco_causal_modell/tokenizer.json')

# Inference

Use model for inference using a pipeline wrapper

In [39]:
from peft import PeftModel, PeftConfig

# Warning: Saving and Reloading will cause a memory shortage. The rest of the notebook uses the in memory trainer model.
peft_model_id = "guanaco_causal_modell"
config = PeftConfig.from_pretrained(peft_model_id)

model = AutoModelForCausalLM.from_pretrained(
    "ybelkada/Mistral-7B-v0.1-bf16-sharded",
    quantization_config = nf4_config,
    #device_map="auto",
    )
model = PeftModel.from_pretrained(model, peft_model_id)

In [None]:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [19]:
device = "cuda"
model = trainer.model
model = model.to(device)
model.eval()

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): MistralForCausalLM(
      (model): MistralModel(
        (embed_tokens): Embedding(32000, 4096)
        (layers): ModuleList(
          (0-31): 32 x MistralDecoderLayer(
            (self_attn): MistralAttention(
              (q_proj): Linear4bit(
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
              )
              (k_proj): Linear4bit(
                (lora_dropout): ModuleDict(


In [27]:
prompt = """
### Human: Can you please write a short paragraph about why philosophy is an important subject for study. ###"
Assistant:
"""


In [28]:
# Inference Pipeline using Pytorch
inputs = tokenizer(prompt, return_tensors="pt").input_ids

print(inputs)

tensor([[    1, 28705,    13, 27332, 10649, 28747,  2418,   368,  4665,  3324,
           264,  2485, 18438,   684,  2079, 13795,   349,   396,  2278,  3817,
           354,  3881, 28723,   774, 28739,    13,  7226, 11143, 28747,    13]])


In [29]:
import torch

with torch.no_grad():
  # Generate method is used to generate text
  outputs = model.generate(
      input_ids=inputs.to(device),
      max_new_tokens=100,
      do_sample=True,
      top_k=50,
      top_p=0.95
      )

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [30]:
# Decode the generated token ids back into text
import pprint
import numpy as np

decoded_output = tokenizer.batch_decode(outputs.detach().cpu().numpy(),
                                     skip_special_tokens=True,
                                     )
decoded_output

['\n### Human: Can you please write a short paragraph about why philosophy is an important subject for study. ###"\nAssistant:\nAbsolutely! Philosophy is an important subject because it helps us to understand the world, our place in it, and our potential to live fulfilled and meaningful lives. It provides a framework for critical thinking, helps us to ask important questions about the nature of reality, and develops our ability to engage in deep and meaningful conversations with others.\n\nAdditionally, philosophy is a critical component of a liberal arts education, and can help us to develop a well-rounded education and a broader']

In [31]:
print("".join(decoded_output))


### Human: Can you please write a short paragraph about why philosophy is an important subject for study. ###"
Assistant:
Absolutely! Philosophy is an important subject because it helps us to understand the world, our place in it, and our potential to live fulfilled and meaningful lives. It provides a framework for critical thinking, helps us to ask important questions about the nature of reality, and develops our ability to engage in deep and meaningful conversations with others.

Additionally, philosophy is a critical component of a liberal arts education, and can help us to develop a well-rounded education and a broader
