## Diffrence Between finetuning with LORA and QLORA

LoRA is an improved finetuning method where instead of finetuning all the weights that constitute the weight matrix of the pre-trained large language model, two smaller matrices that approximate this larger matrix are fine-tuned. These matrices constitute the LoRA adapter. This fine-tuned adapter is then loaded to the pretrained model and used for inference.

QLoRA is an even more memory efficient version of LoRA where the pretrained model is loaded to GPU memory as quantized 4-bit weights (compared to 8-bits in the case of LoRA), while preserving similar effectiveness to LoRA. Probing this method, comparing the two methods when necessary, and figuring out the best combination of QLoRA hyperparameters to achieve optimal performance with the quickest training time will be the focus here.

LoRA is implemented in the Hugging Face Parameter Efficient Fine-Tuning (PEFT) library, offering ease of use and QLoRA can be leveraged by using bitsandbytes and PEFT together. HuggingFace Transformer Reinforcement Learning (TRL) library offers a convenient trainer for supervised finetuning with seamless integration for LoRA. These three libraries will provide the necessary tools to finetune the chosen pretrained model to generate coherent and convincing product descriptions once prompted with an instruction indicating the desired attributes

In [1]:
# https://www.databricks.com/blog/efficient-fine-tuning-lora-guide-llms#:~:text=LoRA%20is%20an%20improved%20finetuning,matrices%20constitute%20the%20LoRA%20adapter.
#https://github.com/avisoori-databricks/Tuning-the-Finetuning/blob/main/Step%201%20Fine%20tuning%20using%20%20QLoRA.py
#https://github.com/databricks/databricks-ml-examples
! pip --quiet install datasets
! pip --quiet install transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m17.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m62.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m65.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [3]:
import pandas as pd
from datasets import load_dataset
from datasets import Dataset

In [4]:
#Load the dataset from the HuggingFace Hub
rd_ds = load_dataset("xiyuez/red-dot-design-award-product-description")

#Convert to pandas dataframe for convenient processing
rd_df = pd.DataFrame(rd_ds['train'])

#Combine the two attributes into an instruction string
rd_df['instruction'] = 'Create a detailed description for the following product: '+ rd_df['product']+', belonging to category: '+ rd_df['category']

rd_df = rd_df[['instruction', 'description']]

#Get a 5000 sample subset for fine-tuning purposes
rd_df_sample = rd_df.sample(n=5000, random_state=42)

In [5]:
rd_df_sample.head()

Unnamed: 0,instruction,description
18952,Create a detailed description for the followin...,The CG8565 is a gaming PC offering space for h...
12584,Create a detailed description for the followin...,The iSHOXS BullBar ProX mount can be used to a...
5702,Create a detailed description for the followin...,The S81 Pro focuses on two things: outstanding...
20503,Create a detailed description for the followin...,The CenFlex superfinish machine is designed fo...
2480,Create a detailed description for the followin...,The THALION S gas absorption heat pump uses na...


In [6]:
print(rd_df_sample['instruction'][0:1])

18952    Create a detailed description for the followin...
Name: instruction, dtype: object


In [7]:
#Define template and format data into the template for supervised fine-tuning
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

{}

### Response:\n"""

rd_df_sample['prompt'] = rd_df_sample["instruction"].apply(lambda x: template.format(x))
for i in rd_df_sample['prompt']:
    print(i)
    break

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

Create a detailed description for the following product: CG8565, belonging to category: Desktop Computer

### Response:



In [8]:
rd_df_sample.rename(columns={'description': 'response'}, inplace=True)
rd_df_sample.head()


Unnamed: 0,instruction,response,prompt
18952,Create a detailed description for the followin...,The CG8565 is a gaming PC offering space for h...,Below is an instruction that describes a task....
12584,Create a detailed description for the followin...,The iSHOXS BullBar ProX mount can be used to a...,Below is an instruction that describes a task....
5702,Create a detailed description for the followin...,The S81 Pro focuses on two things: outstanding...,Below is an instruction that describes a task....
20503,Create a detailed description for the followin...,The CenFlex superfinish machine is designed fo...,Below is an instruction that describes a task....
2480,Create a detailed description for the followin...,The THALION S gas absorption heat pump uses na...,Below is an instruction that describes a task....


In [9]:
rd_df_sample['response'] = rd_df_sample['response'] + "\n### End"
for i in rd_df_sample['response']:
    print(i)
    break

The CG8565 is a gaming PC offering space for high-quality equipment. The Windows 7 system works with an Intel Core i7 2600K processor and supports up to 32 GB of working memory. Two graphics cards, an SSD hard drive, an efficient water-cooling system and three chassis fans guarantee excellent performance. At the push of a button, the system can be overclocked by up to  35 per cent during continuous operation.
### End


In [10]:
rd_df_sample.head()

Unnamed: 0,instruction,response,prompt
18952,Create a detailed description for the followin...,The CG8565 is a gaming PC offering space for h...,Below is an instruction that describes a task....
12584,Create a detailed description for the followin...,The iSHOXS BullBar ProX mount can be used to a...,Below is an instruction that describes a task....
5702,Create a detailed description for the followin...,The S81 Pro focuses on two things: outstanding...,Below is an instruction that describes a task....
20503,Create a detailed description for the followin...,The CenFlex superfinish machine is designed fo...,Below is an instruction that describes a task....
2480,Create a detailed description for the followin...,The THALION S gas absorption heat pump uses na...,Below is an instruction that describes a task....


In [11]:
rd_df_sample = rd_df_sample[['prompt', 'response']]
rd_df_sample.head()
rd_df_sample['text'] = rd_df_sample["prompt"] + rd_df_sample["response"]

rd_df_sample.drop(columns=['prompt', 'response'], inplace=True)
rd_df_sample.head()

Unnamed: 0,text
18952,Below is an instruction that describes a task....
12584,Below is an instruction that describes a task....
5702,Below is an instruction that describes a task....
20503,Below is an instruction that describes a task....
2480,Below is an instruction that describes a task....


In [12]:
# the final prompt
for i in rd_df_sample.text:
  print(i)
  break

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:

Create a detailed description for the following product: CG8565, belonging to category: Desktop Computer

### Response:
The CG8565 is a gaming PC offering space for high-quality equipment. The Windows 7 system works with an Intel Core i7 2600K processor and supports up to 32 GB of working memory. Two graphics cards, an SSD hard drive, an efficient water-cooling system and three chassis fans guarantee excellent performance. At the push of a button, the system can be overclocked by up to  35 per cent during continuous operation.
### End


In [13]:
! pip install --quiet torch

In [14]:
! pip install --quiet sentencepiece

In [15]:
! pip install --quiet  bitsandbytes

In [16]:
! pip install --quiet accelerate

In [17]:
#Testing model performance before fine-tuning
import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

model_path = 'openlm-research/open_llama_3b_v2'
tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
model_path, load_in_8bit=True, device_map='auto',
)

#Pass in a prompt and infer with the model
prompt = 'Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse\nA:'
input_ids = tokenizer(prompt, return_tensors="pt").input_ids

generation_output = model.generate(
input_ids=input_ids, max_new_tokens=128
)

print(tokenizer.decode(generation_output[0]))

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


<s>Q: Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse
A: The Corelogic Smooth Mouse is a wireless optical mouse that is designed to be used with a computer. The mouse is designed to be used with a computer and is compatible with Windows 98, 2000, XP, Vista, and Windows 7. The mouse is designed to be used with a computer and is compatible with Windows 98, 2000, XP, Vista, and Windows 7. The mouse is designed to be used with a computer and is compatible with Windows 98, 2000, XP, Vista, and Windows 7. The mouse is designed to be


In [18]:
! pip install --quiet peft trl mlflow

In [19]:
from peft import get_peft_config, PeftModel, PeftConfig, get_peft_model, LoraConfig, TaskType
from transformers import AutoModelForCausalLM
from transformers import LlamaTokenizer, LlamaForCausalLM
import torch
from transformers.trainer_callback import TrainerCallback
import os
from transformers import BitsAndBytesConfig
from trl import SFTTrainer
import mlflow

In [20]:
from datasets import load_dataset
from datasets import Dataset
dataset = Dataset.from_pandas(rd_df_sample).train_test_split(test_size=0.05, seed=42)

In [21]:
dataset.keys(),len(dataset['train'])

(dict_keys(['train', 'test']), 4750)

In [22]:
#we can reduce the sample
rd_df_sample1=rd_df_sample.sample(n=500,random_state=42)
dataset1 = Dataset.from_pandas(rd_df_sample1).train_test_split(test_size=0.05, seed=42)

In [23]:
dataset1.keys(),len(dataset1['train']),len(dataset1['test'])

(dict_keys(['train', 'test']), 475, 25)

In [None]:
# if we select all liniear layer to update then result is more good as comparied to just targeting attaintion layers
target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj','lm_head'] # all layers
# we just updating attaintion layers as trainable parameters are just 2.662M with rank 8#
#or if only tageting attention blocks
target_modules = ['q_proj','v_proj']
lora_config = LoraConfig(
    r=8,#or r=16
    lora_alpha=8,
    lora_dropout=0.05,
    bias="none",
    target_modules = target_modules,
    task_type="CAUSAL_LM",
)

base_dir = "/content/sample_data"

per_device_train_batch_size = 2
gradient_accumulation_steps = 2
optim = 'adamw_hf'
learning_rate = 1e-5
max_grad_norm = 0.3
warmup_ratio = 0.03
lr_scheduler_type = "linear"
from transformers import TrainingArguments
training_args = TrainingArguments(
    output_dir=base_dir,
    save_strategy="epoch",
    evaluation_strategy="epoch",
    num_train_epochs = 1.0,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)




nf4_config = BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_quant_type="nf4",
  bnb_4bit_use_double_quant=True,
  bnb_4bit_compute_dtype=torch.bfloat16
)



model_path = 'openlm-research/open_llama_3b_v2'



tokenizer = LlamaTokenizer.from_pretrained(model_path)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})



model = LlamaForCausalLM.from_pretrained(
    model_path, device_map='auto', quantization_config=nf4_config,
)



model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# COMMAND ----------

trainer = SFTTrainer(
    model,
    train_dataset=dataset1['train'],
    eval_dataset = dataset1['test'],
    dataset_text_field="text",  #name of column
    max_seq_length=256,
    args=training_args,
)
#Upcast layer norms to float 32 for stability
for name, module in trainer.model.named_modules():
  if "norm" in name:
    module = module.to(torch.float32)

#model = get_peft_model(model, lora_config)
#model.print_trainable_parameters()
#Initiate the training process
with mlflow.start_run(run_name='first_experiment'):
   trainer.train()

In [25]:
# #https://github.com/NVIDIA/apex/issues/965
# for param in model.parameters():
#     # Check if parameter dtype is  Half (float16)
#     if param.dtype == torch.float16:
#         param.data = param.data.to(torch.float32)

In [26]:
# If loading from saved adapter

#dbutils.fs.ls('<base_dir_location>')

# inference code

model_path = 'openlm-research/open_llama_3b_v2'



tokenizer = LlamaTokenizer.from_pretrained(model_path)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})



model = LlamaForCausalLM.from_pretrained(
    model_path, load_in_8bit=True, device_map='auto',
)

# COMMAND ----------

peft_model_id = '<adapter_final_checkpoint_location>' # saved model path

# COMMAND ----------

peft_model = PeftModel.from_pretrained(model, peft_model_id)

# COMMAND ----------

test_strings = ["Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse",
"Create a detailed description for the following product: Hoover Lightspeed, belonging to category: Cordless Vacuum Cleaner",
"Create a detailed description for the following product: Flattronic Cinematron, belonging to category: High Definition Flatscreen TV"]

# COMMAND ----------

predictions = []
for test in test_strings:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

  ### Instruction:
  {}

  ### Response:""".format(test)
  input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')

  generation_output = peft_model.generate(
      input_ids=input_ids, max_new_tokens=156
  )
  predictions.append(tokenizer.decode(generation_output[0]))



def extract_response_text(input_string):
    start_marker = '### Response:'
    end_marker = '###'

    start_index = input_string.find(start_marker)
    if start_index == -1:
        return None

    start_index += len(start_marker)

    end_index = input_string.find(end_marker, start_index)
    if end_index == -1:
        return input_string[start_index:]

    return input_string[start_index:end_index].strip()


# predictions[2]

# COMMAND ----------

for i in range(3):
  pred = predictions[i]
  text = test_strings[i]
  print(text+'\n')
  print(extract_response_text(pred))
  print('--------')


NameError: ignored

In [None]:
Finetuniing with LORA

## Finetuning with LORA

In [None]:
# Databricks notebook source
# MAGIC %pip install transformers==4.31.0 datasets==2.13.0 peft==0.4.0 accelerate==0.21.0 bitsandbytes==0.40.2 trl==0.4.7



from peft import get_peft_config, PeftModel, PeftConfig, get_peft_model, LoraConfig, TaskType
from transformers import AutoModelForCausalLM
from transformers import LlamaTokenizer, LlamaForCausalLM
import torch
from transformers.trainer_callback import TrainerCallback
import os
from transformers import BitsAndBytesConfig
from trl import SFTTrainer
import mlflow

# COMMAND ----------

# MAGIC %sql
# MAGIC USE description_generator;

# COMMAND ----------

df = spark.sql("SELECT * FROM product_name_to_description").toPandas()
df['text'] = df["prompt"]+df["response"]
df.drop(columns=['prompt', 'response'], inplace=True)
display(df), df.shape

# COMMAND ----------

from datasets import load_dataset
from datasets import Dataset
dataset = Dataset.from_pandas(df).train_test_split(test_size=0.05, seed=42)

# COMMAND ----------

target_modules = ['q_proj','k_proj','v_proj','o_proj','gate_proj','down_proj','up_proj','lm_head']
#or
# target_modules = ['q_proj','v_proj']

lora_config = LoraConfig(
    r=8,#or r=16
    lora_alpha=8,
    lora_dropout=0.05,
    bias="none",
    target_modules = target_modules,
    task_type="CAUSAL_LM",
)

base_dir = "<base_dir_location>"

per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = 'adamw_hf'
learning_rate = 1e-5
max_grad_norm = 0.3
warmup_ratio = 0.03
lr_scheduler_type = "linear"

# COMMAND ----------

from transformers import TrainingArguments
training_args = TrainingArguments(
    output_dir=base_dir,
    save_strategy="epoch",
    evaluation_strategy="epoch",
    num_train_epochs = 3.0,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)


# COMMAND ----------

model_path = 'openlm-research/open_llama_3b_v2'

# COMMAND ----------

tokenizer = LlamaTokenizer.from_pretrained(model_path)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

# COMMAND ----------

model = LlamaForCausalLM.from_pretrained(
    model_path, device_map='auto', load_in_8bit=True,
)

# COMMAND ----------

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

# COMMAND ----------

trainer = SFTTrainer(
    model,
    train_dataset=dataset['train'],
    eval_dataset = dataset['test'],
    dataset_text_field="text",
    max_seq_length=256,
    args=training_args,
)
#Upcast layer norms to float 32 for stability
for name, module in trainer.model.named_modules():
  if "norm" in name:
    module = module.to(torch.float32)

# COMMAND ----------

# Initiate the training process
with mlflow.start_run(run_name='run_name_of_choice'):
  trainer.train()

# COMMAND ----------

# #https://github.com/NVIDIA/apex/issues/965
# for param in model.parameters():
#     # Check if parameter dtype is  Half (float16)
#     if param.dtype == torch.float16:
#         param.data = param.data.to(torch.float32)

# COMMAND ----------

# MAGIC %md
# MAGIC ### If loading from saved adapter

# COMMAND ----------

dbutils.fs.ls('<base_dir_location>')

# COMMAND ----------

model_path = 'openlm-research/open_llama_3b_v2'

# COMMAND ----------

tokenizer = LlamaTokenizer.from_pretrained(model_path)
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

# COMMAND ----------

model = LlamaForCausalLM.from_pretrained(
    model_path, load_in_8bit=True, device_map='auto',
)

# COMMAND ----------

peft_model_id = '<adapter_final_checkpoint_location>'

# COMMAND ----------

peft_model = PeftModel.from_pretrained(model, peft_model_id)

# COMMAND ----------

test_strings = ["Create a detailed description for the following product: Corelogic Smooth Mouse, belonging to category: Optical Mouse",
"Create a detailed description for the following product: Hoover Lightspeed, belonging to category: Cordless Vacuum Cleaner",
"Create a detailed description for the following product: Flattronic Cinematron, belonging to category: High Definition Flatscreen TV"]

# COMMAND ----------

predictions = []
for test in test_strings:
  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

  ### Instruction:
  {}

  ### Response:""".format(test)
  input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to('cuda')

  generation_output = model.generate(
      input_ids=input_ids, max_new_tokens=156
  )
  predictions.append(tokenizer.decode(generation_output[0]))

# COMMAND ----------

def extract_response_text(input_string):
    start_marker = '### Response:'
    end_marker = '###'

    start_index = input_string.find(start_marker)
    if start_index == -1:
        return None

    start_index += len(start_marker)

    end_index = input_string.find(end_marker, start_index)
    if end_index == -1:
        return input_string[start_index:]

    return input_string[start_index:end_index].strip()

# COMMAND ----------

# predictions[2]

# COMMAND ----------

for i in range(3):
  pred = predictions[i]
  text = test_strings[i]
  print(text+'\n')
  print(extract_response_text(pred))
  print('--------')

# COMMAND ----------

In [1]:
!wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin

'wget' is not recognized as an internal or external command,
operable program or batch file.


In [2]:
! pip install wget



In [4]:
! wget https://gpt4all.io/models/ggml-gpt4all-j-v1.3-groovy.bin

'wget' is not recognized as an internal or external command,
operable program or batch file.
