# Experiment 3: PEFT LORA with Partial Training Dataset (100 Rows)

## Objective
This experiment aims to assess the performance of a state-of-the-art language model utilizing Parameter-Efficient Fine-Tuning (PEFT) with LORA (Low-Rank Adaptation) on a reduced training dataset (100 rows). The primary goal is to evaluate the efficacy and adaptation capabilities of the fine-tuned model with modified training parameters and compare the performance with previous experiments.

## Experimental Setup

### Model Specifications
- **Architecture:** Meta's Llama 3.1
- **Source:** Unsloth HuggingFace model repository (non gated)

### Computational Environment
- **Platform:** Google Colab Notebook
- **Infrastructure Tier:** Collab Pro
- **GPU Specification:** NVIDIA Tesla A100

### Dataset
- **Corpus:** google-research-datasets/Disfl-QA
- **Training Data Subset:** 100 rows

## Methodology
In this experiment, I implement LORA-based Parameter-Efficient Fine-Tuning (PEFT) by training 16 rank-structured LORA adapters on the reduced dataset (100 rows). I adjusted the fine-tuning parameters and trained the model for 5 epochs. The primary metric of interest remains the training loss, which provides insight into the model's convergence and adaptation during the fine-tuning process.

## Evaluation Metrics
To quantify and benchmark the model's performance, I use the following established natural language processing metrics:

1. **BLEU Score (Bilingual Evaluation Understudy)**
2. **ROUGE Score (Recall-Oriented Understudy for Gisting Evaluation)**
3. **Training Loss**

These metrics offer a detailed assessment of the model's linguistic precision and relevance within the Disfl-QA dataset context, allowing for performance comparisons across Experiments 1, 2, and 3.


## Step 1: Installing required dependencies.

In [1]:
%%capture
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install evaluate rouge_score

from torch import __version__; from packaging.version import Version as V
xformers = "xformers==0.0.27" if V(__version__) < V("2.4.0") else "xformers"
!pip install --no-deps {xformers} trl peft accelerate bitsandbytes triton

## Step 2: Load Data

In [2]:
import requests
import pandas as pd
import json

def process_github_json_files(base_url, file_names):
    dataframes = {}

    for file_name in file_names:
        url = f"{base_url}/{file_name}"
        try:
            response = requests.get(url)
            if response.status_code != 200:
                raise Exception(f"Failed to download {file_name}. Status code: {response.status_code}")

            data = json.loads(response.text)
            df = pd.DataFrame.from_dict(data, orient='index').reset_index().rename(columns={'index': 'id'})

            output_file = f"{file_name}"
            df.to_json(output_file, orient='records')

            key = file_name.split('.')[0]
            dataframes[key] = df

        except Exception as e:
            print(f"An error occurred while processing {file_name}: {str(e)}")

    return dataframes.get('train'), dataframes.get('test'), dataframes.get('dev')

base_url = "https://raw.githubusercontent.com/google-research-datasets/Disfl-QA/master"
file_names = ["train.json", "test.json", "dev.json"]

df_train, df_test, df_dev = process_github_json_files(base_url, file_names)

In [3]:
print("Shape of train DataFrame:", df_train.shape if df_train is not None else "Not available")
print("Shape of test DataFrame:", df_test.shape if df_test is not None else "Not available")
print("Shape of dev DataFrame:", df_dev.shape if df_dev is not None else "Not available")

Shape of train DataFrame: (7182, 3)
Shape of test DataFrame: (3643, 3)
Shape of dev DataFrame: (1000, 3)


In [4]:
df_train.head(5)

Unnamed: 0,id,original,disfluent
0,5a5918ff3e1742001a15cf7e,What do unstable isotope studies indicate?,What do petrologists no what do unstable isoto...
1,5ad4f40c5b96ef001a10a774,What is the basic unit of territorial division...,What is the second level of territorial divisi...
2,572684365951b619008f7543,Which genus lack tentacles and sheaths?,Juvenile platyctenids no wow Which genus lack ...
3,5729f799af94a219006aa70a,Long-lived memory cells can remember previous ...,When a pathogen is met again scratch that I me...
4,5ad3b9cd604f3c001a3fee87,What led to Newcastle's rise to power as milit...,What led to the Duke of Cumberland's rise to p...


## Step 3: Load Llama 3.1 Model

In [5]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: NVIDIA L4. Max memory: 22.168 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.0+cu121. CUDA = 8.9. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.27.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/345 [00:00<?, ?B/s]

## Step 4: Build a Training Dataset

In [6]:
instruction_template = """
You are an AI assistant that corrects disfluent questions.
Remove all disfluencies (filler words, false starts, hesitations, repetitions) and output a single, fluent, clear, and concise version of the input question.
Maintain the original meaning and intent. Use natural, formal English.
Do not change the subject, alter the question's meaning, or add any new information.
Provide only the corrected question as a single line, without explanations, examples, or additional formatting.
"""

In [7]:
TRAINING_ROWS = 100
EOS_TOKEN = tokenizer.eos_token

In [8]:
from datasets import Dataset
import pandas as pd


prompt_template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}

### Input:
{}

### Response:
{}"""


def formatting_prompts_func(train_df, num_rows):
    df_subset = train_df.head(num_rows)

    instructions = [instruction_template] * num_rows
    inputs = df_subset['disfluent'].tolist()
    outputs = df_subset['original'].tolist()

    texts = []

    for instruction, input_text, output_text in zip(instructions, inputs, outputs):
        text = prompt_template.format(instruction, input_text, output_text) + EOS_TOKEN
        texts.append(text)

    return {
        "instruction": instructions,
        "input": inputs,
        "output": outputs,
        "text": texts
    }

formatted_data = formatting_prompts_func(df_train, TRAINING_ROWS)
dataset = Dataset.from_dict(formatted_data)
print(dataset)

Dataset({
    features: ['instruction', 'input', 'output', 'text'],
    num_rows: 100
})


## Step 5: Load LORA:

In [9]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth 2024.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [10]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 5,
        #max_steps = 10,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

Map (num_proc=2):   0%|          | 0/100 [00:00<?, ? examples/s]

## Step 6: Training

In [11]:
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = NVIDIA L4. Max memory = 22.168 GB.
5.984 GB of memory reserved.


In [12]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 100 | Num Epochs = 5
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.7486
2,2.7693
3,2.6848
4,2.6485
5,2.4155
6,2.0487
7,1.5122
8,1.2912
9,0.9895
10,0.7966


##### Training Time : 2 Minutes.

## Step 7: Perform Inference on dev dataset using the new fine tuned model.

#### 7.1 Inference on a single example:

In [13]:
FastLanguageModel.for_inference(model)
inputs = tokenizer(
[
    prompt_template.format(
        instruction_template,
        "What French no, British General negotiated at Montreal?",
        "",
    )
], return_tensors = "pt").to("cuda")

output_tokens= model.generate(**inputs, max_new_tokens=64, use_cache=True, pad_token_id = tokenizer.eos_token_id)
tokenizer.batch_decode(output_tokens[:, len(inputs[0].tokens): ], skip_special_tokens=True)[0]

'What British General negotiated at Montreal?'

#### 7.2 Inference on a dev dataset (1000 rows):

In [14]:
df_dev_experiment = df_dev

In [15]:
def generate_prediction(disfluent_input):
    inputs = tokenizer(
        [
            prompt_template.format(
                instruction_template,
                disfluent_input,
                "",
            )
        ], return_tensors="pt"
    ).to("cuda")

    output_tokens = model.generate(
        **inputs,
        max_new_tokens=64,
        use_cache=True
    )

    output_text = tokenizer.batch_decode(
        output_tokens[:, len(inputs[0].tokens):],
        skip_special_tokens=True
    )[0].replace('\n', ' ')

    return output_text


df_dev_experiment['prediction'] = df_dev_experiment['disfluent'].apply(generate_prediction)

In [19]:
df_dev_experiment[['original', 'disfluent', 'prediction']].tail(10)

Unnamed: 0,original,disfluent,prediction
990,WHen did ARPNET and SITA become operational,What year did ARPNET and SITA become operational?,What year did SITA become operational?
991,What causes the organism to attack more slowly...,What causes the adaptive immune system to reac...,What causes the organism to attack more slowly...
992,What advancements besides military technology ...,What did European chemists bah advancements be...,What did European chemists besides military te...
993,Who renovated the Santa Fe Railroad Depot?,Who renovated the San Joaquin Valley Railroad ...,Who renovated the Santa Fe Railroad Depot?
994,"After reopening, where will the art pieces be ...",where will the art pieces be located after res...,Where will the art pieces be located after reo...
995,What river is larger than the Rhine?,Which or no make that what river is larger tha...,What river is larger than the Rhine?
996,"Compared to other causes, the effect of trade ...","Compared to other causes, what is the effect o...",What effects does trade have on inequality in ...
997,In the layered model of the Earth there are se...,What do or instead in the layered model of the...,In the layered model of the Earth there are se...
998,What British General negotiated at Montreal?,What French no British General negotiated at M...,What British General negotiated at Montreal?
999,What president eliminated the Christian positi...,Which president signed no did away with the Ch...,Which president signed away with the Christian...


## Step 6: Computing bleu and rouge metrics on the predicitions:

In [17]:
originals_text = list(df_dev_experiment['original'])
predictions_text = list(df_dev_experiment['prediction'])

In [18]:
import evaluate
bleu = evaluate.load("bleu")
rouge = evaluate.load('rouge')

bleu_results = bleu.compute(predictions=predictions_text, references=originals_text)
rouge_results = rouge.compute(predictions=predictions_text, references=originals_text)

print(f"Bleu SCORE: {bleu_results}\n\n")
print(f"Rouge SCORE: {rouge_results}")

Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

Bleu SCORE: {'bleu': 0.8817856348041296, 'precisions': [0.9396455223880597, 0.8968106995884774, 0.8643348623853211, 0.8362694300518134], 'brevity_penalty': 0.9981360676417342, 'length_ratio': 0.9981378026070763, 'translation_length': 10720, 'reference_length': 10740}


Rouge SCORE: {'rouge1': 0.9418085659215288, 'rouge2': 0.8978316251358867, 'rougeL': 0.9326941061144526, 'rougeLsum': 0.932691457713838}


## Step 7: Saving LORA Adapaters:

In [20]:
model.save_pretrained("lora_model_v2_adapters")
tokenizer.save_pretrained("lora_model_v2_adapters")

('lora_model_v2_adapters/tokenizer_config.json',
 'lora_model_v2_adapters/special_tokens_map.json',
 'lora_model_v2_adapters/tokenizer.json')

## Step 8: Load LORA Adapters and perform Inference:

In [None]:
df_test[['disfluent']].head(3)

Unnamed: 0,disfluent
0,In what country is Norse found no wait Normand...
1,From which countries no tell me when were the ...
2,From which Norse leader I mean countries did t...


In [None]:
saved_model, saved_tokenizer = FastLanguageModel.from_pretrained(
    model_name = "lora_model_v2_adapters",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

FastLanguageModel.for_inference(saved_model)

disfluent_question = "From which countries no tell me when were the Normans in Normandy?"

inputs = saved_tokenizer(
[
    prompt_template.format(
        instruction_template,
        disfluent_question,
        "",
    )
], return_tensors = "pt").to("cuda")

output_tokens= saved_model.generate(**inputs, max_new_tokens=64, use_cache=True, pad_token_id = saved_tokenizer.eos_token_id)
output_text = saved_tokenizer.batch_decode(output_tokens[:, len(inputs[0].tokens): ], skip_special_tokens=True)[0]

print(f"Disfluent Question : {disfluent_question}")
print(f"Corrected Question : {output_text}")

==((====))==  Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.0+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.27.post2. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/230 [00:00<?, ?B/s]

Disfluent Question : From which countries no tell me when were the Normans in Normandy?
Corrected Question : When were the Normans in Normandy?


## Step 9: Save GGUF / llama.cpp

In [None]:
#model.save_pretrained_gguf("lora_model_v2_partial_gguf", tokenizer,)

Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 5.7G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 60.13 out of 83.48 RAM for saving.


100%|██████████| 32/32 [00:00<00:00, 35.78it/s]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving model... This might take 5 minutes for Llama-7b...
Done.


Unsloth: Converting llama model. Can use fast conversion = False.


==((====))==  Unsloth: Conversion from QLoRA to GGUF information
   \\   /|    [0] Installing llama.cpp will take 3 minutes.
O^O/ \_/ \    [1] Converting HF to GGUF 16bits will take 3 minutes.
\        /    [2] Converting GGUF 16bits to ['q8_0'] will take 10 minutes each.
 "-____-"     In total, you will have to wait at least 16 minutes.

Unsloth: [0] Installing llama.cpp. This will take 3 minutes...
Unsloth: [1] Converting model at lora_model_v2_partial_gguf into q8_0 GGUF format.
The output location will be ./lora_model_v2_partial_gguf/unsloth.Q8_0.gguf
This will take 3 minutes...
INFO:hf-to-gguf:Loading model: lora_model_v2_partial_gguf
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00004.safetensors'
INFO:hf-to-gguf:token_embd.weight,           torch.bfloat16 --> Q8_0, shape = {4096, 1282

In [None]:
import shutil

#folder_path = '/content/lora_model_v2_partial_gguf'
#output_path = '/content/lora_model_v2_partial_gguf.zip'
#shutil.make_archive(output_path.replace('.zip', ''), 'zip', folder_path)

In [None]:
folder_path = '/content/lora_model_v2_adapters'
output_path = '/content/lora_model_v2_adapters.zip'
shutil.make_archive(output_path.replace('.zip', ''), 'zip', folder_path)

'/content/lora_model_v2_adapters.zip'