<img src="https://res.cloudinary.com/dbl53sidm/image/upload/v1696398508/mistral-7b-v0.1_opibjl.jpg" width="100%">

## Instruct Fine-tuning [Mistral 7B Instruct](https://mistral.ai/news/announcing-mistral-7b/) using qLora and Supervise Finetuning

This is a comprahensive notebook and tutorial on how to fine tune the Mistral-7b-Instruct Model

## Meet Mistral 7B Instruct

The team at [MistralAI](https://mistral.ai/news/announcing-mistral-) has created an exceptional language model called Mistral 7B Instruct. It has consistently delivered outstanding results in a range of benchmarks, which positions it as an ideal option for natural language generation and understanding. This guide will concentrate on how to fine-tune the model for coding purposes, but the methodology can effectively be applied to other tasks.

All the code will be available on the Github [adithya-s-k](https://github.com/adithya-s-k)

## Prerequisites

Before diving into the fine-tuning process, make sure you have the following prerequisites in place:

1. **GPU**: While this tutorial can run on a free Google Colab notebook with a GPU, it's recommended to use more powerful GPUs like V100 or A100 for better performance.
2. **Python Packages**: Ensure you have the required Python packages installed. You can run the following commands to install them:

Let's start by checking if your GPU is correctly detected:

In [2]:
!nvidia-smi

Wed Nov 22 09:28:36 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 462.75       Driver Version: 462.75       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce RTX 305... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   58C    P0    13W /  N/A |    105MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Lets Get Started

Let install the python packages

In [3]:
!pip install torch transformers datasets peft bitsandbytes trl wandb gdown

Collecting peft
  Downloading peft-0.6.2-py3-none-any.whl.metadata (23 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.41.2.post2-py3-none-any.whl.metadata (9.8 kB)
Collecting trl
  Downloading trl-0.7.4-py3-none-any.whl.metadata (10 kB)
Collecting wandb
  Downloading wandb-0.16.0-py3-none-any.whl.metadata (9.8 kB)
Collecting tyro>=0.5.11 (from trl)
  Downloading tyro-0.5.17-py3-none-any.whl.metadata (7.5 kB)
Collecting GitPython!=3.1.29,>=1.0.0 (from wandb)
  Downloading GitPython-3.1.40-py3-none-any.whl.metadata (12 kB)
Collecting sentry-sdk>=1.0.0 (from wandb)
  Downloading sentry_sdk-1.36.0-py2.py3-none-any.whl.metadata (9.7 kB)
Collecting docker-pycreds>=0.4.0 (from wandb)
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting setproctitle (from wandb)
  Downloading setproctitle-1.3.3-cp311-cp311-win_amd64.whl.metadata (10 kB)
Collecting gitdb<5,>=4.0.1 (from GitPython!=3.1.29,>=1.0.0->wandb)
  Downloading gitdb-4.0.11-py3-none-any.whl.metadata (1.2 k

DEPRECATION: omegaconf 2.0.6 has a non-standard dependency specifier PyYAML>=5.1.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of omegaconf or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063


In [5]:
import json
import re
from pprint import pprint

import pandas as pd
import torch
from datasets import Dataset, load_dataset, load_from_disk
from huggingface_hub import notebook_login
from peft import LoraConfig, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
    logging,
)
from trl import SFTTrainer # For supervised finetuning


The following directories listed in your path were found to be non-existent: {WindowsPath('vs/workbench/api/node/extensionHostProcess')}
The following directories listed in your path were found to be non-existent: {WindowsPath('/matplotlib_inline.backend_inline'), WindowsPath('module')}
The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
DEBUG: Possible options found for libcudart.so: set()
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 8.6.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Loading binary C:\Users\minhd\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\bitsandbytes\libbitsandbytes_cuda118.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to 

RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

In [None]:
from huggingface_hub import notebook_login
# Log in to HF Hub
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### Lets Load the Dataset

We will fine-tune Mistral 7B Instruct for Conversational Chatbot.

We will be using this [dataset](https://drive.google.com/file/d/1-0EVUR6gD-lbhIO29-6e_8PHtE7aOYnZ/view?usp=share_link). The dataset structure should resemble the following:

```json
{
  "input": "your prompt/query/question",
  "response": "your lable",
  "text": "combination of `input` and `response`"
}
```
user's utterance is wrapped by `[INST]` and `[/INST]` special token. On the other hand, user's utterance is seemed to be an instruction for the bot

In [None]:
!gdown 1yPswHVQTTLt0aR8zJhuilL672xEPu_cM

Downloading...
From: https://drive.google.com/uc?id=1yPswHVQTTLt0aR8zJhuilL672xEPu_cM
To: /content/datasetdict_20_samples.zip
  0% 0.00/16.1k [00:00<?, ?B/s]100% 16.1k/16.1k [00:00<00:00, 45.4MB/s]


In [None]:
!unzip /content/datasetdict_20_samples.zip

Archive:  /content/datasetdict_20_samples.zip
 extracting: datasetdict_20_samples/dataset_dict.json  
   creating: datasetdict_20_samples/train/
  inflating: datasetdict_20_samples/train/data-00000-of-00001.arrow  
  inflating: datasetdict_20_samples/train/dataset_info.json  
  inflating: datasetdict_20_samples/train/state.json  


In [None]:
dataset = load_from_disk("/content/datasetdict_20_samples")
dataset

DatasetDict({
    train: Dataset({
        features: ['entities', 'passages', 'answer', 'triples', 'complex_question'],
        num_rows: 20
    })
})

In [None]:
dataset["train"][0]["entities"]

['Move (1970 film)',
 'Méditerranée (1963 film)',
 'Stuart Rosenberg',
 'Jean-Daniel Pollet']

In [None]:
dataset["train"][0]["answer"]

'No, the director of the film "Move" (1970) is from the United States, and the director of the film "Méditerranée" (1963) is from France.'

In [None]:
dataset["train"][0]["complex_question"]

'Are director of film Move (1970 Film) and director of film Méditerranée (1963 Film) from the same country?'

In [None]:
dataset["train"][0]["triples"]

[{'answer': ' The director of the film "Move" is Stuart Rosenberg.',
  'evidence': '"Move" is a 1970 American film directed by Stuart Rosenberg.',
  'question': 'Who is the director of the film "Move" (1970 Film)?'},
 {'answer': 'Stuart Rosenberg was from the United States.',
  'evidence': 'Stuart Rosenberg was an American film and television director.',
  'question': 'Where was Stuart Rosenberg from?'},
 {'answer': 'The director of the film "Méditerranée" (1963) is Jean-Daniel Pollet.',
  'evidence': '"Méditerranée" is a 1963 French short film directed by Jean-Daniel Pollet.',
  'question': 'Who is the director of the film "Méditerranée" (1963)?'},
 {'answer': ' Jean-Daniel Pollet was from France.',
  'evidence': 'Jean-Daniel Pollet was a French film director and screenwriter.',
  'question': ' Where was Jean-Daniel Pollet from?'}]

In [None]:
train_dataset = dataset["train"]

In [None]:
prompt_template = """Decompose the complex question to multiple simple questions.
Each generated simple question is represented each sub-problem from the complex question so that after answering we have the necessary knowledge to answer the complex question.
Complex question:
{complex_question}
Simple questions:"""

In [None]:
triples_sample = train_dataset["triples"]
triples_sample

[[{'answer': ' The director of the film "Move" is Stuart Rosenberg.',
   'evidence': '"Move" is a 1970 American film directed by Stuart Rosenberg.',
   'question': 'Who is the director of the film "Move" (1970 Film)?'},
  {'answer': 'Stuart Rosenberg was from the United States.',
   'evidence': 'Stuart Rosenberg was an American film and television director.',
   'question': 'Where was Stuart Rosenberg from?'},
  {'answer': 'The director of the film "Méditerranée" (1963) is Jean-Daniel Pollet.',
   'evidence': '"Méditerranée" is a 1963 French short film directed by Jean-Daniel Pollet.',
   'question': 'Who is the director of the film "Méditerranée" (1963)?'},
  {'answer': ' Jean-Daniel Pollet was from France.',
   'evidence': 'Jean-Daniel Pollet was a French film director and screenwriter.',
   'question': ' Where was Jean-Daniel Pollet from?'}],
 [{'answer': '"The Falcon" is a film title used in various contexts, and there is no specific information available without additional details

In [None]:
simple_question_samples = [[t["question"] for t in triple] for triple in triples_sample]

In [None]:
simple_question_samples

[['Who is the director of the film "Move" (1970 Film)?',
  'Where was Stuart Rosenberg from?',
  'Who is the director of the film "Méditerranée" (1963)?',
  ' Where was Jean-Daniel Pollet from?'],
 ['Who is the director of "The Falcon (Film)"?',
  'Who is the director of "Valentin the Good"?'],
 ['Who is the director of "Charge It to Me"?',
  'Who is the director of "Danger: Diabolik"?'],
 ["Follow up: Who is Mina Gerhardsen's father?"],
 ['Who is the director of "Wedding Night in Paradise" (1950 Film)?',
  'What nationality is Ralph Smart?'],
 [' Who is the composer of the film "Sruthilayalu"?',
  'The composer of the film "Sruthilayalu" is K. V. Mahadevan.'],
 ['Who is Rhescuporis I (Odrysian)?',
  "Who is Rhescuporis I (Odrysian)'s father?",
  'Who is Cotys III (Odrysian)?',
  "Who is Cotys III (Odrysian)'s paternal grandfather?"],
 ['Who is the director of "The Ex-Mrs. Bradford"?',
  ' What is the nationality of Stephen Roberts?',
  'Who is the director of "The Star of Santa Clara"

In [None]:
def transform(examples):
    simple_question_lst = [[t["question"] for t in triple] for triple in examples["triples"]]
    text = ["<s> [INST] " + prompt_template.format(complex_question=cq)+ "[/INST]\n" + "\n".join(sq) + " </s>"
            for cq, sq in zip(examples["complex_question"], simple_question_lst)]
    examples["text"] = text
    return examples

In [None]:
train_dataset = train_dataset.map(transform, batched=True)

In [None]:
print(train_dataset["text"][0])

<s> [INST] Decompose the complex question to multiple simple questions. 
Each generated simple question is represented each sub-problem from the complex question so that after answering we have the necessary knowledge to answer the complex question.
Complex question: 
Are director of film Move (1970 Film) and director of film Méditerranée (1963 Film) from the same country?
Simple questions:[/INST]
Who is the director of the film "Move" (1970 Film)?
Where was Stuart Rosenberg from?
Who is the director of the film "Méditerranée" (1963)?
 Where was Jean-Daniel Pollet from? </s>


## Setting Model Parameters

We need to set various parameters for our fine-tuning process, including QLoRA (Quantization LoRA) parameters, bitsandbytes parameters, and training arguments:

In [None]:
# The model that you want to train from the Hugging Face hub
model_name = "bn22/Mistral-7B-Instruct-v0.1-sharded" #"mistralai/Mistral-7B-Instruct-v0.1"

# Fine-tuned model name
new_model = "Mistral-7B-question-decomposition" #"mistralai-Code-Instruct"

Setting the QLora Parameters

In [None]:
################################################################################
# QLoRA parameters
################################################################################

# LoRA attention dimension
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

################################################################################
# bitsandbytes parameters
################################################################################

# Activate 4-bit precision base model loading
use_4bit = True

# Compute dtype for 4-bit base models
bnb_4bit_compute_dtype = "float16"

# Quantization type (fp4 or nf4)
bnb_4bit_quant_type = "nf4"

# Activate nested quantization for 4-bit base models (double quantization)
use_nested_quant = False

################################################################################
# TrainingArguments parameters
################################################################################

# Output directory where the model predictions and checkpoints will be stored
output_dir = "./results"

# Number of training epochs
num_train_epochs = 1

# Enable fp16/bf16 training (set bf16 to True with an A100)
fp16 = False
bf16 = False

# Batch size per GPU for training
per_device_train_batch_size = 2

# Batch size per GPU for evaluation
per_device_eval_batch_size = 2

# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 1

# Enable gradient checkpointing
gradient_checkpointing = True

# Maximum gradient normal (gradient clipping)
max_grad_norm = 0.3

# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4

# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001

# Optimizer to use
optim = "paged_adamw_32bit"

# Learning rate schedule (constant a bit better than cosine)
lr_scheduler_type = "constant"

# Number of training steps (overrides num_train_epochs)
max_steps = -1

# Ratio of steps for a linear warmup (from 0 to learning rate)
warmup_ratio = 0.03

# Group sequences into batches with same length
# Saves memory and speeds up training considerably
group_by_length = True

# Save checkpoint every X updates steps
save_steps = 25

# Log every X updates steps
logging_steps = 25

################################################################################
# SFT parameters
################################################################################

# Maximum sequence length to use
max_seq_length = None

# Pack multiple short examples in the same input sequence to increase efficiency
packing = False

# Load the entire model on the GPU 0
device_map = {"": 0}

### Lets Load the base model
Let's load the Mistral 7B Instruct base model:

In [None]:
# Load the base model with QLoRA configuration
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map={"": 0}
)

base_model.config.use_cache = False
base_model.config.pretraining_tp = 1

# Load MistralAI tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/11 [00:00<?, ?it/s]

In [None]:
base_model

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )

## Let's Check how the base model performs


In [None]:
train_dataset[5]["complex_question"]

"When is the composer of film Sruthilayalu 's birthday?"

In [None]:
eval_simple_questions = [t["question"] for t in train_dataset[5]["triples"]]
eval_simple_questions

[' Who is the composer of the film "Sruthilayalu"?',
 'The composer of the film "Sruthilayalu" is K. V. Mahadevan.']

In [None]:
tokenizer.pad_token_id

2

## Zero shot

In [None]:
def first_evaluate(complex_question):
    eval_prompt = """Decompose the complex question to multiple simple questions.
    Each generated simple question is represented each sub-problem from the complex question so that after answering we have the necessary knowledge to answer the complex question.
    Complex question:
    {complex_question}
    Simple questions:"""
    input_text = "<s> [INST] "+eval_prompt.format(complex_question=complex_question) + " [/INST]\n"
    model_input = tokenizer(input_text, return_tensors="pt").to("cuda")

    base_model.eval()
    with torch.no_grad():
        output = tokenizer.decode(base_model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True)
    return output

In [None]:
complex_question = train_dataset[0]["complex_question"]
print(first_evaluate(complex_question))

[INST] Decompose the complex question to multiple simple questions. 
    Each generated simple question is represented each sub-problem from the complex question so that after answering we have the necessary knowledge to answer the complex question.
    Complex question: 
    Are director of film Move (1970 Film) and director of film Méditerranée (1963 Film) from the same country?
    Simple questions: [/INST]

1. Who is the director of film Move (1970)?
2. Who is the director of film Méditerranée (1963)?
3. What is the country of origin of the director of film Move (1970)?
4. What is the country of origin of the director of film Méditerranée (1963)?
5. Are the countries of origin of the directors of film Move (1970) and film Méditerranée (1963) the same?


In [None]:
complex_question = train_dataset[1]["complex_question"]
print(first_evaluate(complex_question))

[INST] Decompose the complex question to multiple simple questions. 
    Each generated simple question is represented each sub-problem from the complex question so that after answering we have the necessary knowledge to answer the complex question.
    Complex question: 
    Do both films The Falcon (Film) and Valentin The Good have the directors from the same country?
    Simple questions: [/INST]

1. What is the name of the film The Falcon?
2. What is the name of the film Valentin The Good?
3. What is the country of origin of the director of The Falcon?
4. What is the country of origin of the director of Valentin The Good?
5. Do the countries of origin of the directors of The Falcon and Valentin The Good match?


In [None]:
complex_question = train_dataset[2]["complex_question"]
print(first_evaluate(complex_question))

[INST] Decompose the complex question to multiple simple questions. 
    Each generated simple question is represented each sub-problem from the complex question so that after answering we have the necessary knowledge to answer the complex question.
    Complex question: 
    Which film whose director is younger, Charge It To Me or Danger: Diabolik?
    Simple questions: [/INST]

1. Who is the director of Charge It To Me?
2. Who is the director of Danger: Diabolik?
3. Is the director of Charge It To Me younger than the director of Danger: Diabolik?


In [None]:
complex_question = train_dataset[-1]["complex_question"]
print(first_evaluate(complex_question))

[INST] Decompose the complex question to multiple simple questions. 
    Each generated simple question is represented each sub-problem from the complex question so that after answering we have the necessary knowledge to answer the complex question.
    Complex question: 
    Are both businesses, Vakıfbank and Infopro Sdn Bhd, located in the same country?
    Simple questions: [/INST]

1. What is the name of the first business?
2. What is the name of the second business?
3. What is the country in which both businesses are located?


In [None]:
complex_question = train_dataset[2]["complex_question"]
print(first_evaluate(complex_question))

[INST] Decompose the complex question to multiple simple questions. 
    Each generated simple question is represented each sub-problem from the complex question so that after answering we have the necessary knowledge to answer the complex question.
    Complex question: 
    Which film whose director is younger, Charge It To Me or Danger: Diabolik?
    Simple questions: [/INST]

1. Who is the director of Charge It To Me?
2. Who is the director of Danger: Diabolik?
3. Is the director of Charge It To Me younger than the director of Danger: Diabolik?


## One shot

In [None]:
print(train_dataset[0]["complex_question"])
[t["question"] for t in train_dataset[0]["triples"]]

Are director of film Move (1970 Film) and director of film Méditerranée (1963 Film) from the same country?


['Who is the director of the film "Move" (1970 Film)?',
 'Where was Stuart Rosenberg from?',
 'Who is the director of the film "Méditerranée" (1963)?',
 ' Where was Jean-Daniel Pollet from?']

In [None]:
def oneshot_evaluate(complex_question):
    eval_prompt = """Decompose the complex question to multiple simple questions.
    Each generated simple question is represented each sub-problem from the complex question so that after answering we have the necessary knowledge to answer the complex question.
    Complex question:
    Are director of film Move (1970 Film) and director of film Méditerranée (1963 Film) from the same country?
    Simple questions:
    1. Who is the director of the film "Move" (1970 Film)?
    2. Where was Stuart Rosenberg from?
    3. Who is the director of the film "Méditerranée" (1963)?
    4. Where was Jean-Daniel Pollet from?

    Complex question:
    {complex_question}
    Simple questions:"""
    input_text = "<s> [INST] "+eval_prompt.format(complex_question=complex_question) + " [/INST]\n"
    model_input = tokenizer(input_text, return_tensors="pt").to("cuda")

    base_model.eval()
    with torch.no_grad():
        output = tokenizer.decode(base_model.generate(**model_input, max_new_tokens=256, pad_token_id=2)[0], skip_special_tokens=True)
    return output

In [None]:
complex_question = train_dataset[1]["complex_question"]
print(oneshot_evaluate(complex_question))

[INST] Decompose the complex question to multiple simple questions. 
    Each generated simple question is represented each sub-problem from the complex question so that after answering we have the necessary knowledge to answer the complex question.
    Complex question: 
    Are director of film Move (1970 Film) and director of film Méditerranée (1963 Film) from the same country?
    Simple questions: 
    1. Who is the director of the film "Move" (1970 Film)?
    2. Where was Stuart Rosenberg from?
    3. Who is the director of the film "Méditerranée" (1963)?
    4. Where was Jean-Daniel Pollet from?
    
    Complex question: 
    Do both films The Falcon (Film) and Valentin The Good have the directors from the same country?
    Simple questions: [/INST]

1. Who is the director of the film "The Falcon"?
2. Where was John Sturges from?
3. Who is the director of the film "Valentin The Good"?
4. Where was Jean-Jacques Annaud from?
5. Do John Sturges and Jean-Jacques Annaud come from th

In [None]:
[t["question"] for t in train_dataset[1]["triples"]]

['Who is the director of "The Falcon (Film)"?',
 'Who is the director of "Valentin the Good"?']

The results from the base model tend to be of poor quality and always re-generate the input

## Fine-Tuning with qLora and Supervised Fine-Tuning

We're ready to fine-tune our model using qLora. For this tutorial, we'll use the `SFTTrainer` from the `trl` library for supervised fine-tuning. Ensure that you've installed the `trl` library as mentioned in the prerequisites.

In [None]:
# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    bf16=bf16,
    max_grad_norm=max_grad_norm,
    max_steps=100, # the number of training steps the model will take
    warmup_ratio=warmup_ratio,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    report_to="tensorboard"
)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=base_model,
    train_dataset=train_dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)



## Let's start the training process

In [None]:
# Train model
trainer.train()

# Save trained model
trainer.model.save_pretrained(new_model)

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
25,0.9842
50,0.1486
75,0.0709
100,0.066


## Merge and Share

After fine-tuning, if you want to merge the model with LoRA weights or share it with the Hugging Face Model Hub, you can do so. This step is optional and depends on your specific use case.

In [None]:
# Empty VRAM
import gc
del base_model
gc.collect()

del trainer
gc.collect()

0

In [None]:
torch.cuda.empty_cache() # PyTorch thing

In [None]:
gc.collect()

0

In [None]:
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"": 0},
)
merged_model= PeftModel.from_pretrained(base_model, new_model)
merged_model= merged_model.merge_and_unload()

# Save the merged model
merged_model.save_pretrained("merged_model",safe_serialization=True)
tokenizer.save_pretrained("merged_model")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/11 [00:00<?, ?it/s]

OutOfMemoryError: ignored

In [None]:
# Push the model and tokenizer to the Hugging Face Model Hub
merged_model.push_to_hub(new_model, use_temp_dir=False)
tokenizer.push_to_hub(new_model, use_temp_dir=False)

pytorch_model.bin:   0%|          | 0.00/663M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/khangnguyen2907/opt-350m-telegram-chat/commit/d6b725f4c74ded9b8f08016c9fc7b5b69f9f1726', commit_message='Upload tokenizer', commit_description='', oid='d6b725f4c74ded9b8f08016c9fc7b5b69f9f1726', pr_url=None, pr_revision=None, pr_num=None)