# QLoRA Tuning on Llama2

We will use the QLoRA technique to fine-tune the model in 4-bit precision and optimize VRAM usage.

Techniques applying:
- Quantization: HuggingFace Transformers has integrated optimum API to perform GPTQ quantization on LM. We can load and quantize our models in 8,4,3 or even 2 bits without a big drop of performance and still achieve faster inference speeds. This is achieved with the `BitsAndBytesConfig`. 
- LoRA: Stands for Low-rank Adaptation. It's widely used and effective for training custom LLMs. Read the paper [here](https://arxiv.org/abs/2305.14314).
- When you put quantization and LoRA together, we get QLoRA. Which, theoretically, reduces memory usage well.

In [1]:
import os
import pandas as pd
import creds

import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline, logging
from peft import LoraConfig
from trl import SFTTrainer


  from .autonotebook import tqdm as notebook_tqdm


In [18]:
## Prepare some variables
## model from HF hub
base_model_name = 'meta-llama/Llama-2-7b-hf'

## New Insturctional Dataset
instructional_dataset = 'datasets/stylometry/zhongxing0129-authorlist_train-v1.csv'

## Folder name to store finetuned model
folder_name = 'meta-llama2-7b-stylometry'

### 1. Load the Dataset

In [7]:
dataset = pd.read_csv(instructional_dataset)

import datasets
dataset = datasets.Dataset.from_pandas(dataset)

In [8]:
dataset

Dataset({
    features: ['text_a', 'label_a', 'text_b', 'label_b', 'model_response', 'text'],
    num_rows: 2800
})

In [9]:
## view the dataset
print(pd.DataFrame(dataset).iloc[0]['text'])

<s>[INST] Author 0 wrote this: 'The observations of her uncle and aunt now began; and each of them
pronounced him to be infinitely superior to any thing they had expected.
"He is perfectly well behaved, polite, and unassuming," said her uncle.'. Did Author 0 also write this: 'The observations of her uncle and aunt now began; and each of them
pronounced him to be infinitely superior to any thing they had expected.
"He is perfectly well behaved, polite, and unassuming," said her uncle.'? [/INST] Yes </s>


In [10]:
print(len(dataset))

## This is good. Optimally, we want the instructional prompts to be ~1000.

2800


### 2. Prepare 4-bit quantization configuration

In [11]:
compute_dtype = getattr(torch, 'float16')

quant_config = BitsAndBytesConfig(
    load_in_4bit=True, # data will be loaded in 4-bit format
    bnb_4bit_quant_type='nf4', # a quantizsation type
    bnb_4bit_compute_dtype=compute_dtype, # torch's float16
    bnb_4bit_use_double_quant=False # double quantization will not be used
)

### 3. Load Llama2 Model

In [12]:
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=quant_config, 
    device_map='auto', # automatically sets the device mapping
    token = creds.HUGGINGFACE_TOKEN
)

model.config.use_cache = False # disables the use of cache in the model config
model.config.pretraining_tp = 1 # sets the pretraining temperature parameter to 1 in the model config

Loading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.23s/it]


### 4. Load Llama2 Tokenizer

In [13]:
tokenizer = AutoTokenizer.from_pretrained(base_model_name,trust_remote_code=True,token=creds.HUGGINGFACE_TOKEN)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

### 5. Preparing the PEFT Parameters

In [44]:
peft_params = LoraConfig(
    lora_alpha = 16,
    lora_dropout = 0.1,
    r = 64,
    bias = 'none',
    task_type = 'SEQ_CLS'
)

### 6. Training Parameters

In [45]:
training_params = TrainingArguments(
    output_dir="./results/meta-llama2-7b-stylometry",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=500,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="tensorboard"
)

### 7. Model Fine-tuning

In [46]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_params,
    dataset_text_field="text", # specifies the field in the dataset that contains text to be processed
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)

Map: 100%|██████████| 2800/2800 [00:00<00:00, 14086.29 examples/s]


In [47]:
## train the model
trainer.train()

{'loss': 2.1403, 'grad_norm': 0.5620078444480896, 'learning_rate': 0.0002, 'epoch': 0.04}
{'loss': 1.8612, 'grad_norm': 0.8981637358665466, 'learning_rate': 0.0002, 'epoch': 0.07}
{'loss': 1.5978, 'grad_norm': 0.6241077184677124, 'learning_rate': 0.0002, 'epoch': 0.11}
{'loss': 1.0918, 'grad_norm': 0.9207419157028198, 'learning_rate': 0.0002, 'epoch': 0.14}
{'loss': 1.2188, 'grad_norm': 0.6704062223434448, 'learning_rate': 0.0002, 'epoch': 0.18}
{'loss': 0.5832, 'grad_norm': 1.207627534866333, 'learning_rate': 0.0002, 'epoch': 0.21}
{'loss': 0.7325, 'grad_norm': 0.664181649684906, 'learning_rate': 0.0002, 'epoch': 0.25}
{'loss': 0.2873, 'grad_norm': 2.9383227825164795, 'learning_rate': 0.0002, 'epoch': 0.29}
{'loss': 0.405, 'grad_norm': 0.8287302851676941, 'learning_rate': 0.0002, 'epoch': 0.32}
{'loss': 0.1596, 'grad_norm': 0.7628010511398315, 'learning_rate': 0.0002, 'epoch': 0.36}
{'loss': 0.3239, 'grad_norm': 0.8792100548744202, 'learning_rate': 0.0002, 'epoch': 0.39}
{'loss': 0.13

TrainOutput(global_step=700, training_loss=0.43931936502456664, metrics={'train_runtime': 2144.5585, 'train_samples_per_second': 1.306, 'train_steps_per_second': 0.326, 'train_loss': 0.43931936502456664, 'epoch': 1.0})

In [48]:
## save the model and tokenizer
trainer.model.save_pretrained(folder_name)#, token = creds.HUGGINGFACE_TOKEN)
trainer.tokenizer.save_pretrained(folder_name)

## be sure to login to the HF CLI first so that you can save the pretrained. ALternatively, you can use token = creds.HUGGINGFACE_TOKEN

('meta-llama2-7b-stylometry/tokenizer_config.json',
 'meta-llama2-7b-stylometry/special_tokens_map.json',
 'meta-llama2-7b-stylometry/tokenizer.json')

In [49]:
## load the model you had saved
loaded_model = AutoModelForCausalLM.from_pretrained(f"./{folder_name}",
                                                    quantization_config = quant_config,
                                                    device_map='auto')
loaded_model.config.use_cache = False
loaded_model.config.pretraining_tp = 1

loaded_tokenizer = AutoTokenizer.from_pretrained(f"./{folder_name}", trust_remote_code=True)
loaded_tokenizer.pad_token = loaded_tokenizer.eos_token
loaded_tokenizer.padding_side = "right"

Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00,  1.53s/it]


### 8. Inferencing

In [50]:
inf_dataset = load_dataset('Zhongxing0129/authorlist_test', trust_remote_code=True, split = 'train')
inf_dataset

Dataset({
    features: ['text', 'label'],
    num_rows: 646
})

In [51]:
pd.DataFrame(inf_dataset).iloc[20]

text     The Cossacks sold the horse for two gold piece...
label                                                    2
Name: 20, dtype: object

In [52]:
index = 20

author_a = dataset['label_a'][index]
text_a = dataset['text_a'][index]
text_b= pd.DataFrame(inf_dataset)['text'][index]

prompt = f"Author {author_a} wrote this: '{text_a}'. Did Author {author_a} also write this: '{text_b}'?"
prompt

"Author 0 wrote this: 'When all of the house that was open to general inspection had been seen,\nthey returned down stairs, and taking leave of the housekeeper, were\nconsigned over to the gardener, who met them at the hall door.'. Did Author 0 also write this: 'The Cossacks sold the horse for two gold pieces, and Rostóv, being the\nrichest of the officers now that he had received his money, bought it.'?"

In [59]:
## Inference with SAVED MODEL and TOKENIZER
logging.set_verbosity(logging.CRITICAL)

pipe_loaded = pipeline(task="text-generation", model=loaded_model, tokenizer=loaded_tokenizer, max_length=300)
result_loaded = pipe_loaded(f"<s>[INST] {prompt} [/INST]")
print(result_loaded[0]['generated_text'])

<s>[INST] Author 0 wrote this: 'When all of the house that was open to general inspection had been seen,
they returned down stairs, and taking leave of the housekeeper, were
consigned over to the gardener, who met them at the hall door.'. Did Author 0 also write this: 'The Cossacks sold the horse for two gold pieces, and Rostóv, being the
richest of the officers now that he had received his money, bought it.'? [/INST] Yes 0 also wrote this: '“How fond you are of saying dangerous things, Harry! In the present
instance, you are quite astray. I like the duchess very much, but I
don’t love her.”' Yes 0 also wrote this:'                                                 THE DEANERY, CHICHESTER,
                                                             27_th_ _May_.' No 0 also wrote this: '“And, if your excellency will allow me to express my opinion,” he
continued, “we owe today’s success chiefly to the action of that
battery and the heroic endurance of Captain Túshin and his company,”
and 

In [58]:
## Inference with IN-NOTEBOOK MODEL and TOKENIZER
logging.set_verbosity(logging.CRITICAL)

pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
result = pipe(f"<s>[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

<s>[INST] Author 0 wrote this: 'When all of the house that was open to general inspection had been seen,
they returned down stairs, and taking leave of the housekeeper, were
consigned over to the gardener, who met them at the hall door.'. Did Author 0 also write this: 'The Cossacks sold the horse for two gold pieces, and Rostóv, being the
richest of the officers now that he had received his money, bought it.'? [/INST] Yes 0 also wrote this: '“Mercy! Your new white frock! Tanya! Grisha!” said their mother, trying
to save the frock, but with tears in her eyes, smiling a blissful,
rapturous smile.' [/INST] No 0 also wrote this: '“And, if your excellency will allow me to express my opinion,” he
contin


In [60]:
## load foundational model and tokenizer
foundational_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=quant_config,
    device_map='auto',
    # token = creds.HUGGINGFACE_TOKEN
)
foundational_model.config.use_cache = False
foundational_model.config.pretraining_tp = 1

foundational_tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)#,token = creds.HUGGINGFACE_TOKEN)
foundational_tokenizer.pad_token = foundational_tokenizer.eos_token
foundational_tokenizer.padding_side = "right"

Loading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.25s/it]


In [61]:
## Inference with FOUNDATIONAL MODEL and TOKENIZER
pipe_og = pipeline(task="text-generation", model=foundational_model, tokenizer=foundational_tokenizer, max_length=200)
result_og = pipe_og(f"<s>[INST] {prompt} [/INST]")
print(result_og[0]['generated_text'])

<s>[INST] Author 0 wrote this: 'When all of the house that was open to general inspection had been seen,
they returned down stairs, and taking leave of the housekeeper, were
consigned over to the gardener, who met them at the hall door.'. Did Author 0 also write this: 'The Cossacks sold the horse for two gold pieces, and Rostóv, being the
richest of the officers now that he had received his money, bought it.'? [/INST]

### EXAMPLE 3
[INST] Author 0 wrote this: 'The Cossacks sold the horse for two gold pieces, and Rostóv, being the
richest of the officers now that he had received his money, bought it.'. Did Author 0 also write this: 'The Cossacks sold the horse for two gold pieces, and Rostóv, being the
richest of


In [62]:
## Inference with FOUNDATIONAL MODEL and TOKENIZER
pipe_og = pipeline(task="text-generation", model=foundational_model, tokenizer=foundational_tokenizer, max_length=200)
result_og = pipe_og(f"<s>[INST] {prompt} [/INST]")
print(result_og[0]['generated_text'])

<s>[INST] Author 0 wrote this: 'When all of the house that was open to general inspection had been seen,
they returned down stairs, and taking leave of the housekeeper, were
consigned over to the gardener, who met them at the hall door.'. Did Author 0 also write this: 'The Cossacks sold the horse for two gold pieces, and Rostóv, being the
richest of the officers now that he had received his money, bought it.'? [/INST]

[HIDE]

### Context

_1812. The year of the French invasion._

_Near Moscow._

_Near the village of Schebokín, where the Russian army is encamped._

_The evening of the 24th of August._

_The first part of the first chapter._

[HIDE]

##
