## Mistral-7B-Instruct fine tuning with PEFT/LoRA

This notebook serves as a comprehensive guide for Mistral 7B Instruct fine-tuning with PEFT/LoRA on Amazon SageMaker. 

Refer to this [AWS Blog](https://aws.amazon.com/blogs/machine-learning/deploy-large-models-on-amazon-sagemaker-using-djlserving-and-deepspeed-model-parallel-inference/) post for more details. 

Steps: 

1. Prepare training data 
- Prepare the training dataset in csv format 
- Load the training data in Dataset
- Process the data with Mistral model prompt template keys/tags

2. Fine tuning
- Load the Mistral-7B-Instruct-v0.2 model and tokenizer
- Configure quantization if needed 
- Set PEFR/LoRA parameters
- Define SFTTrainer with trainning args 
- Start the training process 

3. Save the trained model locally 


### 0. Initialization

In [None]:
#!pip install transformers
#!pip install datasets
#!pip install py7zr
#!pip install accelerate
#!pip install bitsandbytes
#!pip install peft
#!pip install trl
#!pip install einops

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from transformers import TextDataset, DataCollatorForLanguageModeling

import torch
from torch.utils.data import Dataset, random_split
from transformers import TrainingArguments, Trainer

### 1. Prepare training data

Load the training data in Dataset. You need to put the data in a csv format with the following data fields  
- instruction: question 
- response: reference answer used as ground truth for training

In [None]:
from datasets import Dataset, load_dataset

In [None]:
TRAINING_FILE = "data/your_train_dataset.csv"

dataset = load_dataset("csv", data_files=TRAINING_FILE)

Add Mistral model keys/tags into the training data. The data used for training is in the following format   

\<s\> [INST] your question [\/INST] your reference answer \<\/s\>


In [None]:
INSTRUCTION_KEY = "<s>[INST]"
RESPONSE_KEY = "[/INST]"
END_KEY = "</s>"
DEFAULT_SEED = 42

# This is a training prompt that does not contain an input string.  The instruction by itself has enough information
# to respond.  For example, the instruction might ask for the year a historic figure was born.
PROMPT_NO_INPUT_FORMAT = """{instruction_key}{instruction}{response_key}{response}{end_key}""".format(
    instruction_key=INSTRUCTION_KEY,
    instruction="{instruction}",
    response_key=RESPONSE_KEY,
    response="{response}",
    end_key=END_KEY,
)


In [None]:
def _add_text(rec):
        instruction = rec["instruction"]
        response = rec["response"]
        context = rec.get("context")
        
        if not instruction:
            raise ValueError(f"Expected an instruction in: {rec}")

        if not response:
            raise ValueError(f"Expected a response in: {rec}")

        # For some instructions there is an input that goes along with the instruction, providing context for the
        # instruction.  For example, the input might be a passage from Wikipedia and the instruction says to extract
        # some piece of information from it.  The response is that information to extract.  In other cases there is
        # no input.  For example, the instruction might be open QA such as asking what year some historic figure was
        # born.
        if context:
            rec["text"] = PROMPT_WITH_INPUT_FORMAT.format(instruction=(instruction.replace('\n','')).replace('  ',''), 
                                                          response=(response.replace('\n','')).replace('  ',''), 
                                                          input=(context.replace('\n','')).replace('  ','')
                                                         ).strip()
        else:
            rec["text"] = PROMPT_NO_INPUT_FORMAT.format(instruction=(instruction.replace('\n','')).replace('  ',''),
                                                        response=(response.replace('\n','')).replace('  ','')
                                                       ).strip()
        
        return rec

In [None]:
dataset = dataset.map(_add_text)

Remove the original data fields

In [None]:
dataset = dataset.map(
        batched=True,
        remove_columns=["instruction", "input","response"],
)

### 2. Fine tuning

Load the Mistral-7B-Instruct-v0.2 foundation model and configure the tokenizer 

In [None]:
BASE_MODEL = "mistralai/Mistral-7B-Instruct-v0.2"

device_map="auto"
max_length = 2048  

In [None]:
# If you want to train a 4-bit quantized model (qLoRA), uncomment the below configuration

#bnb_config = BitsAndBytesConfig(  
#    load_in_4bit= True,
#    bnb_4bit_quant_type= "nf4",
#    bnb_4bit_compute_dtype= torch.bfloat16,
#    bnb_4bit_use_double_quant= False,
#)

In [None]:
model = AutoModelForCausalLM.from_pretrained(      
    BASE_MODEL, 
    #quantization_config=bnb_config,        # uncomment if you want to train by qLoRA
    torch_dtype=torch.bfloat16,             
    device_map=device_map,
)

In [None]:
model.config.use_cache = False # silence the warnings
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()

In [None]:
tokenizer = AutoTokenizer.from_pretrained(
    BASE_MODEL, 
)

tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token

Set PEFT/LoRA parameters

In [None]:
from peft import LoraConfig, get_peft_model, TaskType, prepare_model_for_kbit_training

model = prepare_model_for_kbit_training(model)

# Define LoRA Config
lora_config = LoraConfig(
 r=256,                   
 lora_alpha=64,                        
 lora_dropout=0.05, 
 target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj"], 
 bias="none",
 task_type="CAUSAL_LM"
)

# add LoRA adaptor
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

In [None]:
OUTPUT_DIR = 'models/your_model_artifacts_dir'

Setup trianing arguments and define SFTTrainer

In [None]:
import os
os.environ["WANDB_DISABLED"] = "true" 

training_args = TrainingArguments(    
        output_dir=OUTPUT_DIR,
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        learning_rate=1e-5,
        num_train_epochs=10,
        logging_strategy="steps",
        logging_steps=20,    
        save_strategy="steps",
        save_steps=20000,
        save_total_limit=10,
        fp16=False,
        bf16=False,
        max_grad_norm=0.3,
        max_steps=-1,
        warmup_ratio=0.03,
        group_by_length=True,
        lr_scheduler_type="constant",
        report_to=None,
)

In [None]:
from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset["train"],
    peft_config=lora_config,
    max_seq_length= None,
    dataset_text_field="text",
    args=training_args,
    packing= False,
)

Start the training process

In [None]:
import time
import numpy as np

st = time.time()

trainer.train()

et = time.time()
elapsed_time = et - st
print('Training process time:', elapsed_time, 'seconds')

### 3. Save the trained model to local storage

After the training, you can save the model artifacts to local storage for futhre deployment

In [None]:
trainer.model.save_pretrained(OUTPUT_DIR)    
trainer.save_model()
tokenizer.save_pretrained(OUTPUT_DIR)