# Experiment 2: PEFT LORA with Full Training Dataset

## Objective
This experiment is designed to rigorously assess the performance of a state-of-the-art language model utilizing Parameter-Efficient Fine-Tuning (PEFT) with LORA (Low-Rank Adaptation). The principal aim is to evaluate the efficacy and adaptation capabilities of the fine-tuned model when applied to the full training dataset.

## Experimental Setup

### Model Specifications
- **Architecture:** Meta's Llama 3.1
- **Source:** Unsloth HuggingFace model repository (non gated)

### Computational Environment
- **Platform:** Google Colab Notebook
- **Infrastructure Tier:** Premium
- **GPU Specification:** NVIDIA Tesla A100

### Dataset
- **Corpus:** google-research-datasets/Disfl-QA
- **Training Data Subset:** Full Training Set (7182 rows)

## Methodology
In this experiment, we implement LORA-based Parameter-Efficient Fine-Tuning (PEFT) by training 16 rank-structured LORA adapters. The model is trained on the complete training dataset for one full epoch. The primary metric of interest is the training loss, which provides insight into the model's convergence and adaptation during the fine-tuning process.

## Evaluation Metrics
To quantify and benchmark the model's performance, the following established natural language processing metrics are employed:

1. **BLEU Score (Bilingual Evaluation Understudy)**
2. **ROUGE Score (Recall-Oriented Understudy for Gisting Evaluation)**

3. **Training Loss**

These metrics collectively offer a detailed assessment of the model's linguistic precision and relevance within the Disfl-QA dataset context. The insights garnered from this experiment are used to inform the subsequent analysis in Experiment 3, which involves fine-tuning the Llama 3.1 model on a smaller subset of data (100 rows) to compare performance across Experiments 1, 2, and 3.


## Step 1: Installing required dependencies.

In [1]:
%%capture
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install evaluate rouge_score

from torch import __version__; from packaging.version import Version as V
xformers = "xformers==0.0.27" if V(__version__) < V("2.4.0") else "xformers"
!pip install --no-deps {xformers} trl peft accelerate bitsandbytes triton

## Step 2: Load Data

In [2]:
import requests
import pandas as pd
import json

def process_github_json_files(base_url, file_names):
    dataframes = {}

    for file_name in file_names:
        url = f"{base_url}/{file_name}"
        try:
            response = requests.get(url)
            if response.status_code != 200:
                raise Exception(f"Failed to download {file_name}. Status code: {response.status_code}")

            data = json.loads(response.text)
            df = pd.DataFrame.from_dict(data, orient='index').reset_index().rename(columns={'index': 'id'})

            output_file = f"{file_name}"
            df.to_json(output_file, orient='records')

            key = file_name.split('.')[0]
            dataframes[key] = df

        except Exception as e:
            print(f"An error occurred while processing {file_name}: {str(e)}")

    return dataframes.get('train'), dataframes.get('test'), dataframes.get('dev')

base_url = "https://raw.githubusercontent.com/google-research-datasets/Disfl-QA/master"
file_names = ["train.json", "test.json", "dev.json"]

df_train, df_test, df_dev = process_github_json_files(base_url, file_names)

In [3]:
length_df_train = len(df_train)

In [4]:
print("Shape of train DataFrame:", df_train.shape if df_train is not None else "Not available")
print("Shape of test DataFrame:", df_test.shape if df_test is not None else "Not available")
print("Shape of dev DataFrame:", df_dev.shape if df_dev is not None else "Not available")

Shape of train DataFrame: (7182, 3)
Shape of test DataFrame: (3643, 3)
Shape of dev DataFrame: (1000, 3)


In [5]:
df_train.head(5)

Unnamed: 0,id,original,disfluent
0,5a5918ff3e1742001a15cf7e,What do unstable isotope studies indicate?,What do petrologists no what do unstable isoto...
1,5ad4f40c5b96ef001a10a774,What is the basic unit of territorial division...,What is the second level of territorial divisi...
2,572684365951b619008f7543,Which genus lack tentacles and sheaths?,Juvenile platyctenids no wow Which genus lack ...
3,5729f799af94a219006aa70a,Long-lived memory cells can remember previous ...,When a pathogen is met again scratch that I me...
4,5ad3b9cd604f3c001a3fee87,What led to Newcastle's rise to power as milit...,What led to the Duke of Cumberland's rise to p...


## Step 3: Load Llama 3.1 Model

In [6]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.0+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.27.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/345 [00:00<?, ?B/s]

## Step 4: Build a Training Dataset

In [7]:
instruction_template = """
You are an AI assistant that corrects disfluent questions.
Remove all disfluencies (filler words, false starts, hesitations, repetitions) and output a single, fluent, clear, and concise version of the input question.
Maintain the original meaning and intent. Use natural, formal English.
Do not change the subject, alter the question's meaning, or add any new information.
Provide only the corrected question as a single line, without explanations, examples, or additional formatting.
"""

prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}

### Input:
{}

### Response:
{}"""


In [9]:
TRAINING_ROWS = length_df_train
EOS_TOKEN = tokenizer.eos_token

In [10]:
from datasets import Dataset
import pandas as pd


def formatting_prompts_func(train_df, num_rows):
    df_subset = train_df.head(num_rows)

    instructions = [instruction_template] * num_rows
    inputs = df_subset['disfluent'].tolist()
    outputs = df_subset['original'].tolist()

    texts = []

    for instruction, input_text, output_text in zip(instructions, inputs, outputs):
        text = prompt_template.format(instruction, input_text, output_text) + EOS_TOKEN
        texts.append(text)

    return {
        "instruction": instructions,
        "input": inputs,
        "output": outputs,
        "text": texts
    }

formatted_data = formatting_prompts_func(df_train, TRAINING_ROWS)
dataset = Dataset.from_dict(formatted_data)


print(dataset)

Dataset({
    features: ['instruction', 'input', 'output', 'text'],
    num_rows: 7182
})


## Step 5: Configuring PEFT Model with LoRA :

In [11]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth 2024.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [12]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 32,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 1,
        #max_steps = 10,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

Map (num_proc=2):   0%|          | 0/7182 [00:00<?, ? examples/s]

## Step 6: raining a PEFT Model with LoRA

In [21]:
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.564 GB.
12.719 GB of memory reserved.


In [14]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 7,182 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 32 | Gradient Accumulation steps = 4
\        /    Total batch size = 128 | Total steps = 56
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.7521
2,2.7519
3,2.7423
4,2.6671
5,2.401
6,1.9936
7,1.561
8,1.2604
9,0.9672
10,0.7817


## Step 7: Perform Inference on dev dataset using the new fine tuned model.

#### 7.1 Inference on a single example:

In [15]:
FastLanguageModel.for_inference(model)
inputs = tokenizer(
[
    prompt_template.format(
        instruction_template,
        "What French no, British General negotiated at Montreal?",
        "",
    )
], return_tensors = "pt").to("cuda")

output_tokens= model.generate(**inputs, max_new_tokens=64, use_cache=True, pad_token_id = tokenizer.eos_token_id)
tokenizer.batch_decode(output_tokens[:, len(inputs[0].tokens): ], skip_special_tokens=True)[0]

'What British General negotiated at Montreal?'

#### 7.2 Inference on a dev dataset (1000 rows):

In [16]:
df_dev_experiment = df_dev

In [17]:
def generate_prediction(disfluent_input):
    inputs = tokenizer(
        [
            prompt_template.format(
                instruction_template,
                disfluent_input,
                "",
            )
        ], return_tensors="pt"
    ).to("cuda")

    output_tokens = model.generate(
        **inputs,
        max_new_tokens=64,
        use_cache=True
    )

    output_text = tokenizer.batch_decode(
        output_tokens[:, len(inputs[0].tokens):],
        skip_special_tokens=True
    )[0].replace('\n', ' ')

    return output_text


df_dev_experiment['prediction'] = df_dev_experiment['disfluent'].apply(generate_prediction)

In [18]:
df_dev_experiment[['original', 'disfluent', 'prediction']].head(10)

Unnamed: 0,original,disfluent,prediction
0,What did the government want Thoreau to do?,Who did no What did the government want Thorea...,What did the government want Thoreau to do?
1,What makes the Wells Fargo Center stand out?,What makes the Bank of America Tower or wait t...,What makes the Wells Fargo Center stand out?
2,What was the Colonia Agrippina's original name?,What was the Colonia Agrippina's original empi...,What was the Colonia Agrippina's original name?
3,Extended networking benefits helped those that...,"Extended authorization limitations, no sorry n...",Extended authorization limitations helped thos...
4,Who is the emphasis on when there is a private...,What is the no make that who is the emphasis o...,Who is the emphasis on when there is a private...
5,What dynasties inspired the Chinese-like eleme...,What dynasties reflected no inspired the Chine...,What dynasties inspired the Chinese-like eleme...
6,What is the density of all primes compatible w...,What is the density of all primes compatible w...,What is the density of all primes compatible w...
7,What did European empires rely on to supply th...,When or uh what did European empires rely on t...,What did European empires rely on to supply th...
8,What did Karlen and Singer present to the US s...,What did Wahl and Ammann no no Karlen and Sing...,What did Karlen and Singer present to the US s...
9,What is the current status of the Haensch study?,What is the current status of Schuenemann's st...,What is the current status of the Haensch study?


## Step 6: Computing bleu and rouge metrics on the predicitions:

In [19]:
originals_text = list(df_dev_experiment['original'])
predictions_text = list(df_dev_experiment['prediction'])

In [20]:
import evaluate
bleu = evaluate.load("bleu")

results = bleu.compute(predictions=predictions_text, references=originals_text)
print(results)

rouge = evaluate.load('rouge')
results = rouge.compute(predictions=predictions_text, references=originals_text)
print(results)

Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

{'bleu': 0.8869485080199839, 'precisions': [0.9435812060673326, 0.9002242152466368, 0.8681343622333182, 0.8392217101894521], 'brevity_penalty': 1.0, 'length_ratio': 1.0067039106145252, 'translation_length': 10812, 'reference_length': 10740}


Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

{'rouge1': 0.9533384148782855, 'rouge2': 0.911918115931697, 'rougeL': 0.9408844772036491, 'rougeLsum': 0.9409878868827071}


## Step 7: Saving LORA Adapaters:

In [22]:
model.save_pretrained("lora_model_v1_adapters")
tokenizer.save_pretrained("lora_model_v1_adapters")

('lora_model_v1_adapters/tokenizer_config.json',
 'lora_model_v1_adapters/special_tokens_map.json',
 'lora_model_v1_adapters/tokenizer.json')

## Step 8: Load LORA Adapters and perform Inference:

In [None]:
df_test[['disfluent']].head(3)

Unnamed: 0,disfluent
0,In what country is Norse found no wait Normand...
1,From which countries no tell me when were the ...
2,From which Norse leader I mean countries did t...


In [None]:
saved_model, saved_tokenizer = FastLanguageModel.from_pretrained(
    model_name = "lora_model_v1_adapters",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

FastLanguageModel.for_inference(saved_model)

disfluent_question = "From which countries no tell me when were the Normans in Normandy?"

inputs = saved_tokenizer(
[
    prompt_template.format(
        instruction_template,
        disfluent_question,
        "",
    )
], return_tensors = "pt").to("cuda")

output_tokens= saved_model.generate(**inputs, max_new_tokens=64, use_cache=True, pad_token_id = saved_tokenizer.eos_token_id)
output_text = saved_tokenizer.batch_decode(output_tokens[:, len(inputs[0].tokens): ], skip_special_tokens=True)[0]

print(f"Disfluent Question : {disfluent_question}")
print(f"Corrected Question : {output_text}")

## Step 9: Save GGUF / llama.cpp

In [None]:
#model.save_pretrained_gguf("lora_model_v2_partial_gguf", tokenizer,)