In [None]:
'''
Fine-tuning Large Language Models (LLM) or Small Language Models (SLM) on a custom dataset using QLoRA

---

The goal is to make a system that can revise the sentences to make them sound like they were written by an angry person. some exaples are:
"Correct": "I still haven't received the report you promised yesterday."
"AngryTyped": "Where the hell is the report you swore you'd send yesterday?"

"Correct": "Could you please be more careful next time?"
"AngryTyped": "Be more freaking careful next time, seriously!"

---

The custom dataset was created using ChatGPT (GPT-4o) on its limited free plan.
The data contains repetitive examples across the training, validation, and test sets, making it unsuitable for the task. In addition, it lacks diversity, which makes it highly prone to overfitting.
Despite these issues, the code functions logically and shows improvements compared to the base model.
The project could benefit from further enhancements to the dataset in the future.

---

The code is written with assistance from https://dassum.medium.com/fine-tune-large-language-model-llm-on-a-custom-dataset-with-qlora-fb60abdeba07 and its resources.

'''

In [None]:
'''
The data os created using the following prompt:

Generate unique sentences of varying lengths, ranging from one to multiple sentences. Then, for each one, rewrite them as if an angry, frustrated, impatient, and impolite person were writing them, while keeping them close to the original sentences.
# Guidelines to follow:
* Create 2000 examples and save them in a JSON file so that I can download them.
* Include examples from various topics such as daily life, work, education, sports, art, science, culture, etc. Create a diverse set of sentences, some containing only one sign of the mentioned feelings, and some containing multiple signs across the sentence. You can use some curse words, but try to make more examples with other signs, like tone of speaking, way of writing the words, etc. Use a diverse set of words, structures, and other signs.
* Always returns the response in JSON following the next format. The **array should have {TOTAL_LENGTH} items**.
```json
{
"DataArray: [
{
"Correct": "The correct string",
"AngryTyped": "The angry typed string"
},
{
"Correct": "The correct string",
"AngryTyped": "The angry typed string"
}
]
}
```

here are some examples you can see, but add more diversity in topics and structures:
"DataArray": [
{
"Correct": "Can you please move your car? It's blocking the driveway.",
"AngryTyped": "MOVE your freaking car! It's blocking the driveway!"
},
{
"Correct": "The meeting is at 10 AM. Please be on time.",
"AngryTyped": "The meeting is at 10 AM! Can you be on time for once?!"
},
{
"Correct": "I still haven't received the report you promised yesterday.",
"AngryTyped": "Where the hell is the report you swore you'd send yesterday?"
},
{
"Correct": "Could you lower the volume? It's a bit too loud.",
"AngryTyped": "Turn down that blaring noise! It's way too loud!"
},
{
"Correct": "I'll need the documents by 3 PM today.",
"AngryTyped": "I need those freaking documents by 3 PM today!"
},
{
"Correct": "We need to finish this task before the deadline tomorrow.",
"AngryTyped": "We HAVE to finish this stupid task before the freaking deadline tomorrow!"
},
{
"Correct": "Please remember to take out the trash before 6 PM.",
"AngryTyped": "Take out the trash before 6 PM, or I'll throw it all over the place!"
},
{
"Correct": "Can you help me with this math problem?",
"AngryTyped": "Help me with this damn math problem already! It's driving me nuts!"
},
{
"Correct": "I'll need that file back as soon as possible.",
"AngryTyped": "I need that file back ASAP, no more excuses!"
},
{
"Correct": "Could you please stop interrupting me while I'm talking?",
"AngryTyped": "STOP cutting me off when I'm talking! Are you even listening?!"
},
{
"Correct": "I've called you three times, but you haven't picked up.",
"AngryTyped": "I've called you THREE times already! What the hell are you doing?!"
},
{
"Correct": "Your assignment was due yesterday. Please submit it by today.",
"AngryTyped": "Your assignment was due YESTERDAY! Get it in today, no more bullshit!"
},
{
"Correct": "Please ensure you finish the project by Friday.",
"AngryTyped": "Finish the damn project by Friday, or we're all screwed!"
},
{
"Correct": "We're out of paper in the printer again.",
"AngryTyped": "Out of paper in the printer AGAIN?! Seriously, does no one check this stuff?!"
},
{
"Correct": "Please remember to log out of your account when you're done.",
"AngryTyped": "Log out of your account when you're done, for crying out loud!"
},
{
"Correct": "I can't believe I forgot my keys again.",
"AngryTyped": "How the hell did I forget my freaking keys AGAIN?!"
}
]
}

'''

In [None]:
###########################################################################################################################################################################################################

In [1]:
!pip install -q -U bitsandbytes transformers peft accelerate datasets scipy einops evaluate trl rouge_score

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.8/60.8 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.5/137.5 MB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.5/9.5 MB[0m [31m119.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.4/296.4 kB[0m [31m23.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m315.1/315.1 kB[0m [31m25.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m527.3/527.3 kB[0m [31m36.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.2/41.2 MB[0m [31m15.6 MB/s[0m et

In [2]:
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    GenerationConfig
)
from tqdm import tqdm
from trl import SFTTrainer
import torch
import time
import pandas as pd
import numpy as np
from huggingface_hub import interpreter_login

interpreter_login()


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): ··········
Add token as git credential? (Y/n) Y
Token is valid (permission: write).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [3]:
import os
# disable Weights and Biases - can change later
os.environ['WANDB_DISABLED']="true"

In [4]:
import json
import pandas as pd
from sklearn.model_selection import train_test_split
from datasets import Dataset, DatasetDict

# Load the JSON file
with open('angrydata.json', 'r') as file:
    data = json.load(file)

# Extract the DataArray
data_array = data["DataArray"]

# Convert to a DataFrame (assuming data_array is a list of dicts)
df = pd.DataFrame(data_array)

# Split the data into train, validation, and test sets
train_df, temp_df = train_test_split(df, test_size=0.3, random_state=42)
validation_df, test_df = train_test_split(temp_df, test_size=0.5, random_state=42)

# Create Dataset objects
train_dataset = Dataset.from_pandas(train_df)
validation_dataset = Dataset.from_pandas(validation_df)
test_dataset = Dataset.from_pandas(test_df)

# Create the dataset dictionary
dataset = DatasetDict({
    'train': train_dataset,
    'validation': validation_dataset,
    'test': test_dataset
})

# Example: Accessing the train, validation, and test data
print(f"Training set size: {len(dataset['train'])}")
print(f"Validation set size: {len(dataset['validation'])}")
print(f"Test set size: {len(dataset['test'])}")


Training set size: 1400
Validation set size: 300
Test set size: 300


In [5]:
# # dataset
# huggingface_dataset_name = "neil-code/dialogsum-test"
# dataset = load_dataset(huggingface_dataset_name)

In [7]:
dataset['test'][0]

{'Correct': "I've told you this five times already.",
 'AngryTyped': "I've told you this FIVE freaking times already! Why don't you get it?!",
 '__index_level_0__': 771}

In [8]:
dataset['train'][0]

{'Correct': 'Could you please be more careful next time?',
 'AngryTyped': 'Be more freaking careful next time, seriously!',
 '__index_level_0__': 836}

In [9]:
# quantization setting
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=False,
    )

In [10]:
model_name='microsoft/phi-2'
device_map = {"": 0}
original_model = AutoModelForCausalLM.from_pretrained(model_name,
                                                      device_map=device_map,
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)



config.json:   0%|          | 0.00/735 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/35.7k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/564M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [11]:
# tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True,padding_side="left",add_eos_token=True,add_bos_token=True,use_fast=False)
tokenizer.pad_token = tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/7.34k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

In [12]:
eval_tokenizer = AutoTokenizer.from_pretrained(model_name, add_bos_token=True, trust_remote_code=True, use_fast=False)
eval_tokenizer.pad_token = eval_tokenizer.eos_token

def gen(model,p, maxlen=13, sample=True):
    toks = eval_tokenizer(p, return_tensors="pt")
    res = model.generate(**toks.to("cuda"), max_new_tokens=maxlen, do_sample=sample,num_return_sequences=1,temperature=0.1,num_beams=1,top_p=0.95,).to('cpu')
    return eval_tokenizer.batch_decode(res,skip_special_tokens=True)

In [13]:
# zero-shot check
%%time
from transformers import set_seed
seed = 42
set_seed(seed)

index = 10

prompt = dataset['test'][index]['Correct']
angrytyped = dataset['test'][index]['AngryTyped']

formatted_prompt = f"Instruct: Revise the sentences to make them sound like they were written by an angry person.\n{prompt}\nOutput:\n"
res = gen(original_model,formatted_prompt,13,)

print(res[0])
print("----------")

output = res[0].split('Output:\n')[1]

dash_line = '-'.join('' for x in range(13))
print(dash_line)
print(f'INPUT PROMPT:\n{formatted_prompt}')
print(dash_line)
print(f'BASELINE:\n{angrytyped}\n')
print(dash_line)
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Instruct: Revise the sentences to make them sound like they were written by an angry person.
I'll need that file back as soon as possible.
Output:
I demand that that file be returned to me immediately.

----------
------------
INPUT PROMPT:
Instruct: Revise the sentences to make them sound like they were written by an angry person.
I'll need that file back as soon as possible.
Output:

------------
BASELINE:
I need that file back ASAP, no more excuses!

------------
MODEL GENERATION - ZERO SHOT:
I demand that that file be returned to me immediately.

CPU times: user 2 s, sys: 276 ms, total: 2.27 s
Wall time: 3.4 s


In [14]:
# data preparation
def create_prompt_formats(sample):
    """
    Format various fields of the sample ('instruction','output')
    Then concatenate them using two newline characters
    :param sample: Sample dictionnary
    """
    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruct: Revise the sentences to make them sound like they were written by an angry person."
    RESPONSE_KEY = "### Output:"
    END_KEY = "### End"

    blurb = f"\n{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}"
    input_context = f"{sample['Correct']}" if sample["Correct"] else None
    response = f"{RESPONSE_KEY}\n{sample['AngryTyped']}"
    end = f"{END_KEY}"

    parts = [part for part in [blurb, instruction, input_context, response, end] if part]

    formatted_prompt = "\n\n".join(parts)
    sample["text"] = formatted_prompt

    # print(type(sample))
    # print(sample)
    return sample

In [15]:
from functools import partial

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length


def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokenizing a batch
    """
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True,
    )

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int,seed, dataset):
    """Format & tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """

    # Add prompt to each sample
    print("Preprocessing dataset...")
    # dataset = Dataset.from_dict({'text': dataset})
    print(dataset)
    dataset = dataset.map(create_prompt_formats)#, batched=True)

    # Apply preprocessing to each batch of the dataset & and remove 'instruction', 'context', 'response', 'category' fields
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=['Correct', 'AngryTyped'],
    )

    # Filter out samples that have input_ids exceeding max_length
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)

    # Shuffle dataset
    dataset = dataset.shuffle(seed=seed)

    return dataset

In [16]:
## Pre-process dataset
max_length = get_max_length(original_model)
print(max_length)

train_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['train'])
eval_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['validation'])

Found max lenth: 2048
2048
Preprocessing dataset...
Dataset({
    features: ['Correct', 'AngryTyped', '__index_level_0__'],
    num_rows: 1400
})


Map:   0%|          | 0/1400 [00:00<?, ? examples/s]

Map:   0%|          | 0/1400 [00:00<?, ? examples/s]

Filter:   0%|          | 0/1400 [00:00<?, ? examples/s]

Preprocessing dataset...
Dataset({
    features: ['Correct', 'AngryTyped', '__index_level_0__'],
    num_rows: 300
})


Map:   0%|          | 0/300 [00:00<?, ? examples/s]

Map:   0%|          | 0/300 [00:00<?, ? examples/s]

Filter:   0%|          | 0/300 [00:00<?, ? examples/s]

In [17]:
print(f"Shapes of the datasets:")
print(f"Training: {train_dataset.shape}")
print(f"Validation: {eval_dataset.shape}")
print(train_dataset)

Shapes of the datasets:
Training: (1400, 4)
Validation: (300, 4)
Dataset({
    features: ['__index_level_0__', 'text', 'input_ids', 'attention_mask'],
    num_rows: 1400
})


In [18]:
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(original_model))

trainable model parameters: 262364160
all model parameters: 1521392640
percentage of trainable model parameters: 17.24%


In [19]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

config = LoraConfig(
    r=32, #Rank
    lora_alpha=32,
    target_modules=[
        'q_proj',
        'k_proj',
        'v_proj',
        'dense'
    ],
    bias="none",
    lora_dropout=0.05,  # Conventional
    task_type="CAUSAL_LM",
)

# 1 - Enabling gradient checkpointing to reduce memory usage during fine-tuning
original_model.gradient_checkpointing_enable()

# 2 - Using the prepare_model_for_kbit_training method from PEFT
original_model = prepare_model_for_kbit_training(original_model)

peft_model = get_peft_model(original_model, config)

In [20]:
print(print_number_of_trainable_model_parameters(peft_model))

trainable model parameters: 20971520
all model parameters: 1542364160
percentage of trainable model parameters: 1.36%


In [21]:
output_dir = f'./peft-correct-angry-training'
import transformers

peft_training_args = TrainingArguments(
    output_dir = output_dir,
    warmup_steps=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    max_steps=1000, #1000
    learning_rate=2e-4,
    optim="paged_adamw_8bit",
    logging_steps=25,
    logging_dir="./logs",
    save_strategy="steps",
    save_steps=25,
    evaluation_strategy="steps",
    eval_steps=25,
    do_eval=True,
    gradient_checkpointing=True,
    report_to="none",
    overwrite_output_dir = 'True',
    group_by_length=True,
)

peft_model.config.use_cache = False

peft_trainer = transformers.Trainer(
    model=peft_model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=peft_training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

max_steps is given, it will override any value given in num_train_epochs


In [22]:
peft_trainer.train()

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss,Validation Loss
25,1.1675,0.377106
50,0.3934,0.251756
75,0.3114,0.213497
100,0.205,0.146293
125,0.1588,0.152621
150,0.1559,0.092089
175,0.1041,0.087252
200,0.0978,0.066628
225,0.0838,0.062705
250,0.0761,0.063623


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

TrainOutput(global_step=1000, training_loss=0.10838568139076232, metrics={'train_runtime': 2286.4432, 'train_samples_per_second': 1.749, 'train_steps_per_second': 0.437, 'total_flos': 5037760080322560.0, 'train_loss': 0.10838568139076232, 'epoch': 2.857142857142857})

In [23]:
# human test

In [24]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

base_model_id = "microsoft/phi-2"
base_model = AutoModelForCausalLM.from_pretrained(base_model_id,
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [25]:
eval_tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token=True, trust_remote_code=True, use_fast=False)
eval_tokenizer.pad_token = eval_tokenizer.eos_token

In [26]:
from peft import PeftModel

ft_model = PeftModel.from_pretrained(base_model, "peft-correct-angry-training/checkpoint-1000",torch_dtype=torch.float16,is_trainable=False)

In [27]:
%%time
from transformers import set_seed
set_seed(seed)

index = 10
correct = dataset['test'][index]['Correct']
angrytyped = dataset['test'][index]['AngryTyped']

prompt = f"Instruct: Revise the sentences to make them sound like they were written by an angry person.\n{correct}\nOutput:\n"

peft_model_res = gen(ft_model,prompt,13,)
peft_model_output = peft_model_res[0].split('Output:\n')[1]
print(peft_model_output)
print("---")
prefix, success, result = peft_model_output.partition('###')

dash_line = '-'.join('' for x in range(13))
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE:\n{angrytyped}\n')
print(dash_line)
print(f'PEFT MODEL:\n{prefix}')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I need that file back ASAP, no more excuses!


---
------------
INPUT PROMPT:
Instruct: Revise the sentences to make them sound like they were written by an angry person.
I'll need that file back as soon as possible.
Output:

------------
BASELINE:
I need that file back ASAP, no more excuses!

------------
PEFT MODEL:
I need that file back ASAP, no more excuses!


CPU times: user 1.14 s, sys: 12.9 ms, total: 1.16 s
Wall time: 1.21 s


In [28]:
# ROUGE test

In [29]:
original_model = AutoModelForCausalLM.from_pretrained(base_model_id,
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [33]:
import pandas as pd

corrects = dataset['test'][0:10]['Correct']
baseline = dataset['test'][0:10]['AngryTyped']

original_model_angrytypeds = []
instruct_model_angrytypeds = []
peft_model_angrytypeds = []

for idx, correct in enumerate(corrects):
    human_baseline_text_output = baseline[idx]
    prompt = f"Instruct: Revise the sentences to make them sound like they were written by an angry person.\n{correct}\nOutput:\n"

    original_model_res = gen(original_model,prompt,13,)
    original_model_text_output = original_model_res[0].split('Output:\n')[1]

    peft_model_res = gen(ft_model,prompt,13,)
    peft_model_output = peft_model_res[0].split('Output:\n')[1]
    print(peft_model_output)
    peft_model_text_output, success, result = peft_model_output.partition('###')

    original_model_angrytypeds.append(original_model_text_output)
    peft_model_angrytypeds.append(peft_model_text_output)

zipped_angrytypeds = list(zip(baseline, original_model_angrytypeds, peft_model_angrytypeds))

df = pd.DataFrame(zipped_angrytypeds, columns = ['baseline', 'original_model_angrytypeds', 'peft_model_angrytypeds'])
df

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I've told you this FIVE freaking times already! Why don


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


How the hell did I forget my freaking keys AGAIN?!



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I've told you this FIVE freaking times already! Why don


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Stop leaving your filthy dishes in the sink! Clean up after yourself


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


I expected way more detail in this presentation! This is pathetic!


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Clean up your freaking workspace before you leave, it's a pig


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


MOVE your freaking car! It's blocking the driveway!



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Turn down that blaring noise! It's way too loud!


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Finish the damn project by Friday, or we're all screwed!


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Where the hell is the report you swore you'd send yesterday?


Unnamed: 0,baseline,original_model_angrytypeds,peft_model_angrytypeds
0,I've told you this FIVE freaking times already...,"I've told you this five times already, and you...",I've told you this FIVE freaking times already...
1,How the hell did I forget my freaking keys AGA...,I can't believe I forgot my keys again!\n,How the hell did I forget my freaking keys AGA...
2,I've told you this FIVE freaking times already...,"I've told you this five times already, and you...",I've told you this FIVE freaking times already...
3,Stop leaving your filthy dishes in the sink! C...,Can you please stop leaving your dishes in the...,Stop leaving your filthy dishes in the sink! C...
4,I expected way more detail in this presentatio...,I expected the presentation to be more detaile...,I expected way more detail in this presentatio...
5,Clean up your freaking workspace before you le...,"Clean up your workspace before you leave, you ...",Clean up your freaking workspace before you le...
6,MOVE your freaking car! It's blocking the driv...,Move your car! It's blocking the driveway!\n,MOVE your freaking car! It's blocking the driv...
7,Turn down that blaring noise! It's way too loud!,Could you lower the volume? It's a bit too loud!,Turn down that blaring noise! It's way too loud!
8,"Finish the damn project by Friday, or we're al...","Make sure you finish the project by Friday, no...","Finish the damn project by Friday, or we're al..."
9,Where the hell is the report you swore you'd s...,I haven't received the report you promised yes...,Where the hell is the report you swore you'd s...


In [34]:
import evaluate

rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_angrytypeds,
    references=baseline[0:len(original_model_angrytypeds)],
    use_aggregator=True,
    use_stemmer=True,
)

peft_model_results = rouge.compute(
    predictions=peft_model_angrytypeds,
    references=baseline[0:len(peft_model_angrytypeds)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)

print("Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL")

improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

ORIGINAL MODEL:
{'rouge1': 0.5757371547652879, 'rouge2': 0.3734247651213596, 'rougeL': 0.5487082985888343, 'rougeLsum': 0.5480179028132992}
PEFT MODEL:
{'rouge1': 0.9608974358974358, 'rouge2': 0.9575757575757574, 'rougeL': 0.9608974358974358, 'rougeLsum': 0.9608974358974359}
Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL
rouge1: 38.52%
rouge2: 58.42%
rougeL: 41.22%
rougeLsum: 41.29%
