## Refining Open Source Model with PEFT, LoRA, 4Bit Quantization, and TRL!

This notebook aims to explore a novel approach for effortlessly constructing a task-specific model tailored to your unique requirements.

To craft your model, proceed to the initial code cell and articulate the specifics of the model you wish to develop within the prompt. Provide detailed and lucid instructions.

Choose a temperature setting (high for creativity, low for precision) and determine the quantity of training examples to generate for model training. Once configured, simply execute all the cells.

If you wish to modify the model earmarked for fine-tuning, alter the model_name parameter in the Define Hyperparameters cell.

Split into train and test sets.

In [13]:
## Notebook created by Adil Jaleel

In [1]:
!pip install -q accelerate peft bitsandbytes transformers trl datasets einops

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/265.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.2/265.7 kB[0m [31m2.5 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m256.0/265.7 kB[0m [31m3.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m168.3/168.3 kB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.9/133.9 kB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m521.2/521.2 kB[0m [31m45.5 MB/s[0m eta [36m0:00:0

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
!git clone https://github.com/adil22jaleel/llm_finetuning_openassistant

Cloning into 'llm_finetuning_openassistant'...
remote: Enumerating objects: 9, done.[K
remote: Counting objects: 100% (9/9), done.[K
remote: Compressing objects: 100% (8/8), done.[K
remote: Total 9 (delta 1), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (9/9), 845.49 KiB | 22.85 MiB/s, done.
Resolving deltas: 100% (1/1), done.


In [4]:
# Training and validation datasaet from HF Datasets
# https://huggingface.co/datasets/OpenAssistant/oasst1?row=1
import pandas as pd
from datasets import load_dataset

# Load datasets
train_dataset = load_dataset('json', data_files='/content/drive/MyDrive/data_prep/finetuning_data.jsonl', split="train")
valid_dataset = load_dataset('json', data_files='/content/drive/MyDrive/data_prep/finetuning_data_val.jsonl', split="train")


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [None]:
train_dataset

Dataset({
    features: ['prompt', 'response'],
    num_rows: 20587
})

In [None]:
valid_dataset

Dataset({
    features: ['prompt', 'response'],
    num_rows: 1095
})

# Install necessary libraries

In [5]:
import os
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer



# Define Hyperparameters

In [6]:
model_name = "microsoft/phi-2"
dataset_name = "finetuning_data_train.jsonl"
new_model = "phi_2_finetuned"

lora_r = 64
lora_alpha = 16
lora_dropout = 0.05
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
bnb_4bit_use_double_quant=True
# use_nested_quant = False

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=bnb_4bit_use_double_quant,
)

output_dir = "./outputs"
num_train_epochs = 1
fp16 = True
bf16 = False
per_device_train_batch_size = 1
per_device_eval_batch_size = 1
gradient_accumulation_steps = 1
gradient_checkpointing = False
max_grad_norm = 0.3
learning_rate = 2e-4
weight_decay = 0.001
optim = "paged_adamw_32bit"
lr_scheduler_type = "constant"
max_steps = -1
group_by_length = True
logging_steps=5000
logging_strategy="steps"
max_seq_length = 512
packing = False
device_map = {"": 0}


################################## newly added
save_strategy="epoch"


model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map,
    trust_remote_code=True)

model.config.use_cache = False
model.config.pretraining_tp = 1
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    target_modules=['Wqkv','out_proj'],
    task_type="CAUSAL_LM",
)

# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=True,
    bf16=False,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    group_by_length=group_by_length,
    lr_scheduler_type=lr_scheduler_type,
    gradient_checkpointing=False,
    # save_strategy=save_strategy,
    report_to="all",
    evaluation_strategy="steps",
    eval_steps=5000  # Evaluate every n steps
)


config.json:   0%|          | 0.00/755 [00:00<?, ?B/s]

configuration_phi.py:   0%|          | 0.00/2.03k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/phi-2:
- configuration_phi.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi.py:   0%|          | 0.00/33.4k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/phi-2:
- modeling_phi.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/24.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/577M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/69.0 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.34k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


# Load Datasets and Train

In [7]:
# Preprocess datasets
system_message = "You are a question answering chatbot. Provide a clear and detailed explanation"
train_dataset_mapped = train_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)
valid_dataset_mapped = valid_dataset.map(lambda examples: {'text': [f'[INST] <<SYS>>\n{system_message.strip()}\n<</SYS>>\n\n' + prompt + ' [/INST] ' + response for prompt, response in zip(examples['prompt'], examples['response'])]}, batched=True)

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=train_dataset_mapped,
    eval_dataset=valid_dataset_mapped,  # Pass validation dataset here
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

trainer.train()
trainer.model.save_pretrained(new_model)

# Cell 4: Test the model
logging.set_verbosity(logging.CRITICAL)
prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n Suggest some food dishes [/INST]" # replace the command here with something relevant to your task
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(prompt)
print(result[0]['generated_text'])

Map:   0%|          | 0/20587 [00:00<?, ? examples/s]

Map:   0%|          | 0/1095 [00:00<?, ? examples/s]

Map:   0%|          | 0/20587 [00:00<?, ? examples/s]

Map:   0%|          | 0/1095 [00:00<?, ? examples/s]

You're using a CodeGenTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss
5000,1.5855,1.488493
10000,1.5462,1.474721
15000,1.5335,1.463478
20000,1.506,1.461445




[INST] <<SYS>>
You are a question answering chatbot. Provide a clear and detailed explanation
<</SYS>>

 Suggest some food dishes [/INST] Sure, here are some food dishes that you can try:

1. Tacos: Tacos are a popular Mexican dish that consists of a tortilla filled with various ingredients such as meat, vegetables, and cheese.

2. Pizza: Pizza is an Italian dish that consists of a flatbread topped with tomato sauce, cheese, and various toppings such as vegetables, meats, and seafood.

3. Sushi: Sushi is a Japanese dish that consists of vinegared rice and various toppings such as raw or cooked seafood, vegetables, and sometimes tropical fruits.

4. Curry: Curry is a dish that originated in India and is made with a sauce made from a blend of spices and herbs, which is then cooked with meat, vegetables, or legumes.




# Run Inference

In [8]:
from transformers import pipeline

prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n What is LSTM used for [/INST]" # replace the command here with something relevant to your task
num_new_tokens = 500  # change to the number of new tokens you want to generate

# Count the number of tokens in the prompt
num_prompt_tokens = len(tokenizer(prompt)['input_ids'])

# Calculate the maximum length for the generation
max_length = num_prompt_tokens + num_new_tokens

gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)
result = gen(prompt)
print(result[0]['generated_text'].replace(prompt, ''))

 Long Short Term Memory (LSTM) is a type of Recurrent Neural Network (RNN) that is commonly used in natural language processing (NLP) and other applications where sequential data needs to be processed. LSTMs are designed to overcome the limitations of traditional RNNs, which can suffer from the vanishing gradient problem and struggle to remember long sequences of data.

LSTMs are composed of multiple layers of cells, each of which has a memory cell that stores information about the previous inputs. This allows the LSTM to remember long sequences of data and to learn from past experiences. LSTMs are particularly useful for tasks such as language translation, speech recognition, and time series forecasting.

In NLP, LSTMs are often used to process text data, such as in sentiment analysis or named entity recognition. They can also be used for tasks such as image captioning and language modeling. LSTM stands for Long Short Term Memory. It is a type of neural network that is used for proces

# Merge the model and save

In [9]:
# Merge and save the fine-tuned model
model_path = "/content/drive/MyDrive/finetuned_phi2"  # change to your preferred path

# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Save the merged model
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

The repository for microsoft/phi-2 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/microsoft/phi-2.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

('/content/drive/MyDrive/finetuned_phi2/tokenizer_config.json',
 '/content/drive/MyDrive/finetuned_phi2/special_tokens_map.json',
 '/content/drive/MyDrive/finetuned_phi2/vocab.json',
 '/content/drive/MyDrive/finetuned_phi2/merges.txt',
 '/content/drive/MyDrive/finetuned_phi2/added_tokens.json',
 '/content/drive/MyDrive/finetuned_phi2/tokenizer.json')

# Load a fine-tuned model from Drive and run inference

In [10]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "/content/drive/MyDrive/finetuned_phi2"  # change to your preferred path

model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [12]:
from transformers import pipeline

system_message = "You are a question answering chatbot. Provide a clear and detailed explanation"
question = "How to beat a person in a sprint??"
prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n {question} [/INST]" # replace the command here with something relevant to your task

num_new_tokens = 500  # change to the number of new tokens you want to generate
# Count the number of tokens in the prompt
num_prompt_tokens = len(tokenizer(prompt)['input_ids'])
# Calculate the maximum length for the generation
max_length = num_prompt_tokens + num_new_tokens

gen = pipeline('text-generation', model=model, tokenizer=tokenizer, max_length=max_length)
result = gen(prompt)
print(result[0]['generated_text'].replace(prompt, ''))

 To beat a person in a sprint, you need to focus on improving your speed and endurance. Here are some tips to help you improve your sprinting performance:

1. Train regularly: Sprinting is a high-intensity exercise that requires a lot of training. You should aim to train at least 3-4 times a week, with each session lasting around 30-60 minutes.

2. Incorporate interval training: Interval training involves alternating between high-intensity sprints and periods of rest or low-intensity exercise. This type of training can help improve your speed and endurance.

3. Focus on your form: Proper form is essential for sprinting. Make sure you are running with a straight back, arms at your sides, and feet landing directly under your hips.

4. Build strength: Strength training can help improve your sprinting performance by building muscle and increasing your power. Focus on exercises that target your legs, core, and upper body.

5. Eat a healthy diet: A balanced diet that includes plenty of prote