# Introduction to LoRA and Prompt Tuning using PEFT

In this lab, you will explore two efficient fine-tuning techniques, LoRA (Low-Rank Adaptation) and Prompt Tuning, using the [PEFT (Parameter-Efficient Fine-Tuning) framework](https://huggingface.co/docs/peft/index). These techniques are gaining popularity for their ability to adapt pre-trained language models like FLAN-T5 to specific tasks, while only modifying a small percentage of model parameters. This approach reduces the computational resources needed, making it more feasible to fine-tune large models on tasks like text summarization or translation. By the end of this lab, you will have a practical understanding of full fine-tuning, LoRA, and prompt tuning, comparing their performance in both qualitative and quantitative terms. You'll be using the DialogSum dataset to fine-tune FLAN-T5 models, analyzing their results with the ROUGE metric, and reflecting on the efficiency of each method.

In [4]:
!pip install evaluate

Collecting evaluate
  Downloading evaluate-0.4.6-py3-none-any.whl.metadata (9.5 kB)
Downloading evaluate-0.4.6-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: evaluate
Successfully installed evaluate-0.4.6


In [5]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

You are going to experiment with the [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum) Hugging Face dataset. It contains 10,000+ dialogues with the corresponding manually labeled summaries and topics.

In [6]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

train.csv:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

validation.csv: 0.00B [00:00, ?B/s]

test.csv: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/12460 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/500 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1500 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

Load the pre-trained [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5) and its tokenizer directly from HuggingFace. Notice that you will be using the [small version of FLAN-T5](https://huggingface.co/google/flan-t5-small). Setting torch_dtype=torch.bfloat16 specifies the memory type to be used by this model.

In [7]:
model_name='google/flan-t5-small'
original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

It is possible to pull out the number of model parameters and find out how many of them are trainable.

In [9]:
def print_number_of_trainable_model_parameters(model):
    """
    Prints the number of trainable and total model parameters.

    This function iterates through the parameters of a given model and calculates:
    1. The total number of model parameters.
    2. The number of trainable parameters (those with `requires_grad=True`).

    It then returns a formatted string with the number of trainable parameters, total parameters,
    and the percentage of parameters that are trainable.

    Args:
        model (torch.nn.Module): The neural network model from which parameters are being counted.

    Returns:
        str: A string displaying the total number of parameters, trainable parameters, and
        the percentage of trainable parameters.

    Example:
        >>> model = YourModel()
        >>> print(print_number_of_trainable_model_parameters(model))
        trainable model parameters: 123456
        all model parameters: 234567
        percentage of trainable model parameters: 52.63%
    """
    # TODO: Implement the function
    trainable_model_params = 0
    all_model_params = 0

    # TODO: Iterate through the parameters of the model and count the number of trainable and total parameters
    for _, param in model.named_parameters():
        if param.requires_grad == True:
          trainable_model_params += param.numel()
        all_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(original_model))

trainable model parameters: 76961152
all model parameters: 76961152
percentage of trainable model parameters: 100.00%


Test the model with the zero shot inferencing. You can see that the model struggles to summarize the dialogue compared to the baseline summary, but it does pull out some important information from the text which indicates the model can be fine-tuned to the task at hand.

In [10]:
index = 200

dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

inputs = tokenizer(prompt, return_tensors='pt')
output = tokenizer.decode(
    original_model.generate(
        inputs["input_ids"],
        max_new_tokens=200,
    )[0],
    skip_special_tokens=True
)

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation.

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also need a more powerful hard disc, more memory and a faster modem. Do you have a CD-ROM drive?
#Person2#: No.
#Person1#: Then you might want to add a CD-ROM drive too, because most new software programs are coming out on Cds.
#Person2#: That sounds great. Thanks.

Summary:

-------------------------------------------------------------------

## Perform Full Fine-Tuning

### Preprocess the Dialog-Summary Dataset

You need to convert the dialog-summary (prompt-response) pairs into explicit instructions for the LLM. Prepend an instruction to the start of the dialog with `Summarize the following conversation` and to the start of the summary with Summary as follows:

Training prompt (dialogue):
```
Summarize the following conversation.

    Chris: This is his part of the conversation.
    Antje: This is her part of the conversation.

Summary:
```
Training response (summary):

`Both Chris and Antje participated in the conversation.`

Then preprocess the prompt-response dataset into tokens and pull out their input_ids (1 per token).

In [11]:
def tokenize_function(example):
    """
    Tokenizes a given dialogue-summary example for model input.

    This function preprocesses an example from the dataset by constructing a prompt
    for summarization. It adds an instruction prompt before the dialogue and a "Summary"
    tag before the summary. Then, the input dialogue and the summary are tokenized,
    with padding and truncation applied to ensure the tokenized sequences fit the model's
    input size requirements.

    Args:
        example (dict): A dictionary containing two keys:
            - "dialogue" (list of str): List of dialogue strings to be summarized.
            - "summary" (list of str): Corresponding summaries for the dialogues.

    Returns:
        dict: A dictionary with the following updated keys:
            - "input_ids" (torch.Tensor): The tokenized input prompts.
            - "labels" (torch.Tensor): The tokenized summaries.
    """
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    example['labels'] = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids

    return example

# The dataset actually contains 3 diff splits: train, validation, test.
# The tokenize_function code is handling all data across all splits in batches.
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'topic', 'dialogue', 'summary',])

Map:   0%|          | 0/12460 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/1500 [00:00<?, ? examples/s]

To save some time in the lab, you will subsample the dataset:

In [12]:
tokenized_datasets = tokenized_datasets.filter(lambda example, index: index % 1000 == 0, with_indices=True)

Filter:   0%|          | 0/12460 [00:00<?, ? examples/s]

Filter:   0%|          | 0/500 [00:00<?, ? examples/s]

Filter:   0%|          | 0/1500 [00:00<?, ? examples/s]

Check the shapes of all three parts of the dataset:

In [13]:
print(f"Shapes of the datasets:")
print(f"Training: {tokenized_datasets['train'].shape}")
print(f"Validation: {tokenized_datasets['validation'].shape}")
print(f"Test: {tokenized_datasets['test'].shape}")

print(tokenized_datasets)

Shapes of the datasets:
Training: (13, 2)
Validation: (1, 2)
Test: (2, 2)
DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 13
    })
    validation: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 1
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 2
    })
})


### Fine-Tune the Model
Now utilize the built-in Hugging Face `Trainer` class (see the documentation [here](https://huggingface.co/docs/transformers/main_classes/trainer)). Pass the preprocessed dataset with reference to the original model. Other training parameters are found experimentally and there is no need to go into details about those at the moment. This fully fine-tuned model will also be referred to as the instruct model in this lab.

In [99]:
from copy import deepcopy

instruct_model = deepcopy(original_model)

In [100]:
output_dir = f'./dialogue-summary-training-{str(int(time.time()))}'

# TODO: Play with different hyperparameters and training configurations, be careful with the training time
training_args = TrainingArguments(
    output_dir=output_dir,
    learning_rate=1e-3,
    num_train_epochs=40,
    weight_decay=0.01,
    logging_steps=2,
    save_strategy="no",
    report_to="none",
)

trainer = Trainer(
    model=instruct_model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation']
)

In [101]:
trainer.train()

Step,Training Loss
2,44.6875
4,21.0
6,10.4688
8,6.7656
10,5.6406
12,5.125
14,4.8125
16,4.6562
18,4.5156
20,4.375


TrainOutput(global_step=80, training_loss=5.0134765625, metrics={'train_runtime': 60.8318, 'train_samples_per_second': 8.548, 'train_steps_per_second': 1.315, 'total_flos': 96663062446080.0, 'train_loss': 5.0134765625, 'epoch': 40.0})

### Evaluate the model qualitatively

As with many GenAI applications, a qualitative approach where you ask yourself the question "Is my model behaving the way it is supposed to?" is usually a good starting point. In the example below (the same one we started this notebook with), you can see how the fine-tuned model is able to create a reasonable summary of the dialogue compared to the original inability to understand what is being asked of the model

In [113]:
index = 200
dialogue = dataset['test'][index]['dialogue']
human_baseline_summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{human_baseline_summary}')
print(dash_line)
print(f'ORIGINAL MODEL:\n{original_model_text_output}')
print(dash_line)
print(f'INSTRUCT MODEL:\n{instruct_model_text_output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
ORIGINAL MODEL:
How would you like to upgrade your computer?
---------------------------------------------------------------------------------------------------
INSTRUCT MODEL:
#Person1# #Person1# wants to upgrade your system to make up your own flyers and banners for advertising. #Person1# the CD-ROM drive would likely allow you to make up your own flyer and banners for advertising. #Person1# your own printer, CD-ROM drives.


### Evaluate model quantitatively (with ROUGE metric)

The [ROUGE metric](https://en.wikipedia.org/wiki/ROUGE_(metric) ) helps quantify the validity of summarizations produced by models. It compares summarizations to a "baseline" summary which is usually created by a human. While not perfect, it does indicate the overall increase in summarization effectiveness that we have accomplished by fine-tuning.

In [114]:
! pip install rouge_score



In [115]:
rouge = evaluate.load('rouge')

Generate the outputs for the sample of the test dataset (only 10 dialogues and summaries to save time), and save the results.

In [116]:
dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []

for _, dialogue in enumerate(dialogues):
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

    input_ids = tokenizer(prompt, return_tensors="pt").input_ids

    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)
    original_model_summaries.append(original_model_text_output)

    instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)
    instruct_model_summaries.append(instruct_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, instruct_model_summaries))

df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'instruct_model_summaries'])
df

Unnamed: 0,human_baseline_summaries,original_model_summaries,instruct_model_summaries
0,Ms. Dawson helps #Person1# to write a memo to ...,Is this the first time you have a dictation fo...,"employee #Person1# reads ""Person1# the memo is..."
1,In order to prevent employees from wasting tim...,Is this the first time you have a dictation fo...,#Person1# wants to remove external communicati...
2,Ms. Dawson takes a dictation for #Person1# abo...,Is this the first time you have a dictation fo...,#Person1# yours. #Person1# nomenization for he...
3,#Person2# arrives late because of traffic jam....,Talk to your boss.,#Person1# #Person1# thinks it's better for the...
4,#Person2# decides to follow #Person1#'s sugges...,Talk to your boss.,#Person1# #Person1# doesn't think it's a lot o...
5,#Person2# complains to #Person1# about the tra...,Talk to your boss.,#Person1# wants to quit driving to work. #Pers...
6,#Person1# tells Kate that Masha and Hero get d...,"Kate, you know, I'm not sure.",#Person1# Person1# replace #Person1# custody. ...
7,#Person1# tells Kate that Masha and Hero are g...,"Kate, you know, I'm not sure.",#Person1# the kids get custody of their kids g...
8,#Person1# and Kate talk about the divorce betw...,"Kate, you know, I'm not sure.",#Person1# #Person1# splits
9,#Person1# and Brian are at the birthday party ...,"Brian, how are you?",#Person1# is happy birthday with Brian. #Perso...


Evaluate the models computing ROUGE metrics. Notice the improvement in the results!

In [117]:
original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

instruct_model_results = rouge.compute(
    predictions=instruct_model_summaries,
    references=human_baseline_summaries[0:len(instruct_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('INSTRUCT MODEL:')
print(instruct_model_results)

ORIGINAL MODEL:
{'rouge1': np.float64(0.11575268414695844), 'rouge2': np.float64(0.01111111111111111), 'rougeL': np.float64(0.09514258109004686), 'rougeLsum': np.float64(0.09559844081242395)}
INSTRUCT MODEL:
{'rouge1': np.float64(0.1852510274530866), 'rouge2': np.float64(0.03472490707996117), 'rougeL': np.float64(0.15324847934099545), 'rougeLsum': np.float64(0.15280317779821845)}


The results show substantial improvement in all ROUGE metrics:

In [118]:
print("Absolute percentage improvement of INSTRUCT MODEL over HUMAN BASELINE")

improvement = (np.array(list(instruct_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(instruct_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of INSTRUCT MODEL over HUMAN BASELINE
rouge1: 6.95%
rouge2: 2.36%
rougeL: 5.81%
rougeLsum: 5.72%


## Perform Parameter Efficient Fine-Tuning (PEFT)

Now, let's perform Parameter Efficient Fine-Tuning (PEFT) fine-tuning as opposed to "full fine-tuning" as you did above. PEFT is a form of instruction fine-tuning that is much more efficient than full fine-tuning - with comparable evaluation results as you will see soon.

PEFT is a generic term that includes Low-Rank Adaptation (LoRA) and prompt tuning (which is NOT THE SAME as prompt engineering!). In most cases, when someone says PEFT, they typically mean LoRA. LoRA, at a very high level, allows the user to fine-tune their model using fewer compute resources (in some cases, a single GPU). After fine-tuning for a specific task, use case, or tenant with LoRA, the result is that the original LLM remains unchanged and a newly-trained “LoRA adapter” emerges. This LoRA adapter is much, much smaller than the original LLM - on the order of a single-digit % of the original LLM size (MBs vs GBs).

That said, at inference time, the LoRA adapter needs to be reunited and combined with its original LLM to serve the inference request. The benefit, however, is that many LoRA adapters can re-use the original LLM which reduces overall memory requirements when serving multiple tasks and use cases.

### Brief introduction to LoRA Tuning
LoRA is a re-parameterization technique. Its operation is simple, complex, and brilliant at the same time. It involves reducing the size of the matrices to be trained by dividing them in such a way that when multiplied, they yield the original matrix.

The weights that are modified are those of the reduced matrices, not the original matrix. It's better visualized in an image.

![](resources/lora_matrix_multiplication.webp)

We have an original matrix of 50x50, which means we would have to modify about 2500 parameters. However, as we know, if we multiply two matrices of (2x50) and (50x2), we obtain a 50x50 matrix. Yet, these two matrices are formed by only 100 parameters each. In other words, for the reduced matrices, we need to modify a total of 200 parameters compared to the 2500 of the original matrix. This represents a 92% reduction, and the larger the original matrix, the greater the percentage of savings.

In Language Models like GPT-3 or any of the current ones with LoRA, it's possible that we only need to train about 0.02% of the original parameters. This varies for each model. The best part is that the obtained result is very similar to that of full fine-tuning, in some cases, it can even be better.

#### Setup the LoRA model for Fine-Tuning

You need to set up the LoRA model for fine-tuning with a new layer/parameter adapter. Using LoRA, you are freezing the underlying LLM and only training the adapter. Have a look at the LoRA configuration below. Note the rank (`r`) hyper-parameter, which defines the rank/dimension of the adapter to be trained.

In [173]:
from peft import LoraConfig, get_peft_model, TaskType

# TODO: Play with different hyperparameters and training configurations, be careful with the training time
lora_config = LoraConfig(
    r=16, # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="lora_only",  # this specifies if the bias parameter should be trained.
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)

Add LoRA adapter layers/parameters to the original LLM to be trained.

In [174]:
lora_model = get_peft_model(deepcopy(original_model), lora_config)
print(print_number_of_trainable_model_parameters(lora_model))

trainable model parameters: 688128
all model parameters: 77649280
percentage of trainable model parameters: 0.89%


#### Train LoRA Adapter

In [175]:
output_dir = f'./lora-dialogue-summary-training-{str(int(time.time()))}'

# TODO: Play with different hyperparameters and training configurations, be careful with the training time
lora_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3, # Higher learning rate than full fine-tuning.
    num_train_epochs=150,
    weight_decay=0.01,
    logging_steps=10,
    save_strategy="no",
    report_to="none",
)

lora_trainer = Trainer(
    model=lora_model,
    args=lora_training_args,
    train_dataset=tokenized_datasets["train"],
)

In [176]:
lora_trainer.train()

Step,Training Loss
10,41.225
20,20.0938
30,7.6875
40,5.5
50,4.7656
60,4.4531
70,4.15
80,3.8328
90,3.5234
100,3.2641


TrainOutput(global_step=300, training_loss=4.931666666666667, metrics={'train_runtime': 176.395, 'train_samples_per_second': 11.055, 'train_steps_per_second': 1.701, 'total_flos': 366608646144000.0, 'train_loss': 4.931666666666667, 'epoch': 150.0})

#### Evaluate the model qualitatively

In [178]:
index = 200
dialogue = dataset['test'][index]['dialogue']
baseline_human_summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

lora_model_outputs = lora_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
lora_model_text_output = tokenizer.decode(lora_model_outputs[0], skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{human_baseline_summary}')
print(dash_line)
print(f'ORIGINAL MODEL:\n{original_model_text_output}')
print(dash_line)
print(f'INSTRUCT MODEL:\n{instruct_model_text_output}')
print(dash_line)
print(f'LoRA MODEL: {lora_model_text_output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
ORIGINAL MODEL:
How would you like to upgrade your computer?
---------------------------------------------------------------------------------------------------
INSTRUCT MODEL:
#Person1#Person2# Person1# wants to upgrade your system to make up your own flyers and a hard disc.
---------------------------------------------------------------------------------------------------
LoRA MODEL: #Person1# Would consider adding a CD-ROM drive to your software.


#### Evaluate the model quantitatively (with ROUGE metric)

In [179]:
dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
lora_model_summaries = []

for idx, dialogue in enumerate(dialogues):
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

    input_ids = tokenizer(prompt, return_tensors="pt").input_ids

    human_baseline_text_output = human_baseline_summaries[idx]

    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

    instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

    lora_model_outputs = lora_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    lora_model_text_output = tokenizer.decode(lora_model_outputs[0], skip_special_tokens=True)

    original_model_summaries.append(original_model_text_output)
    instruct_model_summaries.append(instruct_model_text_output)
    lora_model_summaries.append(lora_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, instruct_model_summaries, lora_model_summaries))

df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'instruct_model_summaries', 'lora_model_summaries'])
df

Unnamed: 0,human_baseline_summaries,original_model_summaries,instruct_model_summaries,lora_model_summaries
0,Ms. Dawson helps #Person1# to write a memo to ...,Is this the first time you have a dictation fo...,#Person1# is #Person1# “Person1# yours. #Perso...,#Person1#: Suggesting #Person1# #Person1#: #Pe...
1,In order to prevent employees from wasting tim...,Is this the first time you have a dictation fo...,#Person1#. #Person1# #Person1# #Person1# #Pers...,#Person1# and #Person1#: #Person1#: #Person1#:...
2,Ms. Dawson takes a dictation for #Person1# abo...,Is this the first time you have a dictation fo...,#Person1# replaces any outside communication w...,#Person1# will go out as an intra-office memor...
3,#Person2# arrives late because of traffic jam....,Talk to your boss.,#Person1# #Person1# thinks it's better for the...,#Person1# is going to quit driving to work on ...
4,#Person2# decides to follow #Person1#'s sugges...,Talk to your boss.,#Person1# #Person1# weighs a different route t...,#Person1# is finally here.
5,#Person2# complains to #Person1# about the tra...,Talk to your boss.,#Person1#:#Person1#. #Person1#. #Person1#. #Pe...,#Person1# says #Person1# #Person1# doesn't thi...
6,#Person1# tells Kate that Masha and Hero get d...,"Kate, you know, I'm not sure.",sad about who get custody of her. #Person1#Per...,#Person1# and Masha and Hero are getting divor...
7,#Person1# tells Kate that Masha and Hero are g...,"Kate, you know, I'm not sure.","#Person1# tells Kate, Masha and Hero get divor...",#Person1#: #Person1#: #Person1#: #Person1#: #P...
8,#Person1# and Kate talk about the divorce betw...,"Kate, you know, I'm not sure.",#Person1# fight #Person1#path about who get th...,"Masha and Hero, the perfect couple are getting..."
9,#Person1# and Brian are at the birthday party ...,"Brian, how are you?",tweets from @Person1#space #Person1#: #Person1#.,"#Person1# is happy birthday, Brian #Person1# a..."


In [182]:
rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

instruct_model_results = rouge.compute(
    predictions=instruct_model_summaries,
    references=human_baseline_summaries[0:len(instruct_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

lora_model_results = rouge.compute(
    predictions=lora_model_summaries,
    references=human_baseline_summaries[0:len(lora_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('INSTRUCT MODEL:')
print(instruct_model_results)
print('LoRA MODEL:')
print(lora_model_results)

ORIGINAL MODEL:
{'rouge1': np.float64(0.11575268414695844), 'rouge2': np.float64(0.01111111111111111), 'rougeL': np.float64(0.09514258109004686), 'rougeLsum': np.float64(0.09559844081242395)}
INSTRUCT MODEL:
{'rouge1': np.float64(0.16970300576249103), 'rouge2': np.float64(0.02609376943166273), 'rougeL': np.float64(0.14865633675703868), 'rougeLsum': np.float64(0.1491569644403289)}
LoRA MODEL:
{'rouge1': np.float64(0.24831823832768496), 'rouge2': np.float64(0.04940322580645161), 'rougeL': np.float64(0.20849313714472512), 'rougeLsum': np.float64(0.2116521937635539)}


Calculate the improvement of LoRA over the original model:

In [183]:
print("Absolute percentage improvement of LoRA MODEL over HUMAN BASELINE")

improvement = (np.array(list(lora_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(lora_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of LoRA MODEL over HUMAN BASELINE
rouge1: 13.26%
rouge2: 3.83%
rougeL: 11.34%
rougeLsum: 11.61%


Now calculate the improvement of LoRA over a full fine-tuned model:

In [184]:
print("Absolute percentage improvement of LoRA MODEL over INSTRUCT MODEL")

improvement = (np.array(list(lora_model_results.values())) - np.array(list(instruct_model_results.values())))
for key, value in zip(lora_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of LoRA MODEL over INSTRUCT MODEL
rouge1: 7.86%
rouge2: 2.33%
rougeL: 5.98%
rougeLsum: 6.25%


### Brief introduction to Prompt Tuning

It’s an Additive Fine-Tuning technique for models. This means that we WILL NOT MODIFY ANY WEIGHTS OF THE ORIGINAL MODEL. You might be wondering, how are we going to perform fine-tuning then? Well, we will train additional layers that are added to the model. That’s why it’s called an Additive technique.

Considering it’s an Additive technique and its name is Prompt-Tuning, it seems clear that the layers we’re going to add and train are related to the prompt.

![](resources/prompt_tuning.jpg)

We are creating a type of superprompt by enabling a model to enhance a portion of the prompt with its acquired knowledge. However, that particular section of the prompt cannot be translated into natural language. It's as if we've mastered expressing ourselves in embeddings and generating highly effective prompts.

In each training cycle, the only weights that can be modified to minimize the loss function are those integrated into the prompt.

The primary consequence of this technique is that the number of parameters to train is genuinely small. However, we encounter a second, perhaps more significant consequence, namely that, since we do not modify the weights of the pretrained model, it does not alter its behavior or forget any information it has previously learned.

The training is faster and more cost-effective. Moreover, we can train various models, and during inference time, we only need to load one foundational model along with the new smaller trained models because the weights of the original model have not been altered

#### Setup the Prompt tuning model for Fine-Tuning

You need to set up the Prompt tuning model for fine-tuning with a new layer/parameter adapter.

In [199]:
from peft import get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit

NUM_VIRTUAL_TOKENS = 20 #Number of virtual tokens to be added and trained.

# TODO: Play with different hyperparameters and training configurations, be careful with the training time
prompt_config = PromptTuningConfig(
    task_type=TaskType.SEQ_2_SEQ_LM, #This type indicates the model will generate text.
    prompt_tuning_init=PromptTuningInit.RANDOM,  #The added virtual tokens are initializad with random numbers
    num_virtual_tokens=NUM_VIRTUAL_TOKENS, #Number of virtual tokens to be added and trained.
    tokenizer_name_or_path=model_name #The pre-trained model.
)

Add Prompt tuning adapter layers/parameters to the original LLM to be trained.

In [200]:
prompt_model = get_peft_model(deepcopy(original_model),
                            lora_config)
print(print_number_of_trainable_model_parameters(prompt_model))

trainable model parameters: 688128
all model parameters: 77649280
percentage of trainable model parameters: 0.89%


#### Train Prompt tuning Adapter

In [202]:
output_dir = f'./prompt-tuning-dialogue-summary-training-{str(int(time.time()))}'

# TODO: Play with different hyperparameters and training configurations, be careful with the training time
prompt_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3, # Higher learning rate than full fine-tuning.
    num_train_epochs=70,
    logging_steps=10,
    save_strategy="no",
    report_to="none",
)

prompt_trainer = Trainer(
    model=prompt_model,
    args=prompt_training_args,
    train_dataset=tokenized_datasets["train"],
)

In [203]:
prompt_trainer.train()

Step,Training Loss
10,41.375
20,21.025
30,8.025
40,5.6406
50,4.825
60,4.5156
70,4.325
80,4.125
90,3.9422
100,3.7469


TrainOutput(global_step=140, training_loss=8.251450892857143, metrics={'train_runtime': 82.1787, 'train_samples_per_second': 11.073, 'train_steps_per_second': 1.704, 'total_flos': 171084034867200.0, 'train_loss': 8.251450892857143, 'epoch': 70.0})

#### Evaluate the model qualitatively

In [206]:
index = 200
dialogue = dataset['test'][index]['dialogue']
baseline_human_summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

lora_model_outputs = lora_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
lora_model_text_output = tokenizer.decode(lora_model_outputs[0], skip_special_tokens=True)

prompt_model_outputs = prompt_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
prompt_model_text_output = tokenizer.decode(prompt_model_outputs[0], skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{human_baseline_summary}')
print(dash_line)
print(f'ORIGINAL MODEL:\n{original_model_text_output}')
print(dash_line)
print(f'INSTRUCT MODEL:\n{instruct_model_text_output}')
print(dash_line)
print(f'LoRA MODEL: {lora_model_text_output}')
print(dash_line)
print(f'PROMPT-TUNING MODEL: {prompt_model_text_output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
ORIGINAL MODEL:
How would you like to upgrade your computer?
---------------------------------------------------------------------------------------------------
INSTRUCT MODEL:
#Person1# is a better person1# #Person1# tell1# #Person1# says #Person1# says, #Person1# says #Person1# says #Person1# says #Person1# and #Person1# thinks #Person1# needs. #Person1# is a better choice.
---------------------------------------------------------------------------------------------------
LoRA MODEL: #Person1#: I'm not sure what exactly you'd need.
---------------------------------------------------------------------------------------------------
PROMPT-TUNING MODEL: #Person1#: #Person1##: You 

#### Evaluate the model quantitatively (with ROUGE metric)

In [207]:
dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
lora_model_summaries = []
prompt_model_summaries = []

for idx, dialogue in enumerate(dialogues):
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

    input_ids = tokenizer(prompt, return_tensors="pt").input_ids

    human_baseline_text_output = human_baseline_summaries[idx]

    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

    instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

    lora_model_outputs = lora_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    lora_model_text_output = tokenizer.decode(lora_model_outputs[0], skip_special_tokens=True)

    prompt_model_outputs = lora_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    prompt_model_text_output = tokenizer.decode(prompt_model_outputs[0], skip_special_tokens=True)

    original_model_summaries.append(original_model_text_output)
    instruct_model_summaries.append(instruct_model_text_output)
    lora_model_summaries.append(lora_model_text_output)
    prompt_model_summaries.append(prompt_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, instruct_model_summaries, lora_model_summaries))

df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'instruct_model_summaries', 'lora_model_summaries'])
df

Unnamed: 0,human_baseline_summaries,original_model_summaries,instruct_model_summaries,lora_model_summaries
0,Ms. Dawson helps #Person1# to write a memo to ...,Is this the first time you have a dictation fo...,Removal of Instant Messaging from your email a...,#Person1#: This memo should go out as an intra...
1,In order to prevent employees from wasting tim...,Is this the first time you have a dictation fo...,#Person1# #Person1# #Person1# #Person1# is str...,#Person1# and #Person1#: This should go out as...
2,Ms. Dawson takes a dictation for #Person1# abo...,Is this the first time you have a dictation fo...,#Person1#: #Person1# is always on the automati...,"#Person1#: #Person1#: No, sir. #Person1#: #Per..."
3,#Person2# arrives late because of traffic jam....,Talk to your boss.,#Person1# #Person1# #Person1# isn't good for t...,#Person1# is the best place to start when he's...
4,#Person2# decides to follow #Person1#'s sugges...,Talk to your boss.,#Person1# puts the subway on the subway. #Pers...,#Person1#: Taking the subway to work is a good...
5,#Person2# complains to #Person1# about the tra...,Talk to your boss.,#Person1 #Person1# #Person1# is the freedom th...,#Person1# is a lot of freedom with a car.
6,#Person1# tells Kate that Masha and Hero get d...,"Kate, you know, I'm not sure.",#Person1# weighs in on # Person1# #Person1# th...,#Person1#: #Person1# and Hero are getting divo...
7,#Person1# tells Kate that Masha and Hero are g...,"Kate, you know, I'm not sure.",#Person1# #Person1# wants to divorce. #Person1...,"Mashaha and Hero, Masha and Hero, the perfect ..."
8,#Person1# and Kate talk about the divorce betw...,"Kate, you know, I'm not sure.",#Person2# tells #Person1#’s the separation of ...,#Person2# #Person2## is having a separation fo...
9,#Person1# and Brian are at the birthday party ...,"Brian, how are you?",#Person1#: #Person1# share your birthday with ...,"#Person1# is happy birthday, #Person1# and Bri..."


In [208]:
rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

instruct_model_results = rouge.compute(
    predictions=instruct_model_summaries,
    references=human_baseline_summaries[0:len(instruct_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

lora_model_results = rouge.compute(
    predictions=lora_model_summaries,
    references=human_baseline_summaries[0:len(lora_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

prompt_model_results = rouge.compute(
    predictions=prompt_model_summaries,
    references=human_baseline_summaries[0:len(prompt_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('INSTRUCT MODEL:')
print(instruct_model_results)
print('LoRA MODEL:')
print(lora_model_results)
print('PROMPT-TUNING MODEL:')
print(prompt_model_results)

ORIGINAL MODEL:
{'rouge1': np.float64(0.11575268414695844), 'rouge2': np.float64(0.01111111111111111), 'rougeL': np.float64(0.09514258109004686), 'rougeLsum': np.float64(0.09559844081242395)}
INSTRUCT MODEL:
{'rouge1': np.float64(0.18175019416944393), 'rouge2': np.float64(0.03328044704118771), 'rougeL': np.float64(0.14208754591310357), 'rougeLsum': np.float64(0.14211328504516635)}
LoRA MODEL:
{'rouge1': np.float64(0.24415563085845587), 'rouge2': np.float64(0.06465986394557822), 'rougeL': np.float64(0.1980433867532662), 'rougeLsum': np.float64(0.20118318618427566)}
PROMPT-TUNING MODEL:
{'rouge1': np.float64(0.2530931317850373), 'rouge2': np.float64(0.07603388248549539), 'rougeL': np.float64(0.19719055048538925), 'rougeLsum': np.float64(0.1985801650215555)}


Calculate the improvement of Prompt-tuning over the original model:

In [209]:
print("Absolute percentage improvement of PROMPT-TUNING MODEL over HUMAN BASELINE")

improvement = (np.array(list(prompt_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(prompt_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of PROMPT-TUNING MODEL over HUMAN BASELINE
rouge1: 13.73%
rouge2: 6.49%
rougeL: 10.20%
rougeLsum: 10.30%


Calculate the improvement of LoRA over a full fine-tuned model:

In [210]:
print("Absolute percentage improvement of PROMPT-TUNING MODEL over INSTRUCT MODEL")

improvement = (np.array(list(prompt_model_results.values())) - np.array(list(instruct_model_results.values())))
for key, value in zip(prompt_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of PROMPT-TUNING MODEL over INSTRUCT MODEL
rouge1: 7.13%
rouge2: 4.28%
rougeL: 5.51%
rougeLsum: 5.65%


Now, calculate the improvement of Prompt-tuning over a LoRA:

In [211]:
print("Absolute percentage improvement of PROMPT-TUNING MODEL over LoRA MODEL")

improvement = (np.array(list(prompt_model_results.values())) - np.array(list(lora_model_results.values())))
for key, value in zip(prompt_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of PROMPT-TUNING MODEL over LoRA MODEL
rouge1: 0.89%
rouge2: 1.14%
rougeL: -0.09%
rougeLsum: -0.26%


# Questions

## Preprocessing and Tokenization:

- Why is it important to prepend instructions like "Summarize the following conversation" when constructing prompts for training a language model?
Including explicit instructions helps the model understand what task it is supposed to perform. Language models are trained to predict the next token given a context, so clearly defining the task guides the model toward producing the desired type of output.

- How does tokenization affect the model’s performance? What challenges might arise from long input sequences in tasks like summarization?
Tokenization breaks text into smaller units, which are the fundamental inputs the model processes. The quality of tokenization affects how efficiently the model represents meaning. Efficient tokenization improves both training speed and understanding of language structure.
## Model Performance and Training:

Why do you think full fine-tuning achieves better results than zero-shot learning but might be less efficient for large-scale applications?
Full fine-tuning performs better because the model is directly trained on task-specific data, allowing it to adapt its weights for optimal performance.However, it’s less efficient for large-scale applications because it requires retraining the entire model, consuming significant time, compute, and storage resources for each new task.

## LoRA Fine-Tuning:

- How does LoRA reduce the number of trainable parameters compared to full fine-tuning, and why might this be beneficial for larger models?
LoRA  reduces trainable parameters by learning small low-rank matrices added to existing weights instead of updating all model parameters. This is beneficial for larger models because it drastically lowers memory and compute requirements, making fine-tuning feasible on limited hardware while still adapting the model effectively to new tasks.

- LoRA modifies certain attention weights in the model. Why do you think only specific parts of the model are updated, and how does this affect its generalization to new tasks?
LoRA updates only key weights because these layers have the most influence on how the model processes information. By keeping most of the original weights frozen, the model retains its general knowledge while efficiently learning task-specific behavior, improving generalization to new tasks without overfitting.
## Prompt Tuning:

- In your own words, explain how prompt tuning differs from both full fine-tuning and LoRA. Why is it referred to as an additive fine-tuning technique?
Prompt tuning doesn’t modify the model’s internal weights. Instead, it prepends learnable tokens (soft prompts) to the input that guide the model’s behavior. It’s called an additive fine-tuning technique because these prompts are added to the input rather than changing the existing model parameters.

- How does prompt tuning impact the number of parameters that are trained? Why is this method more efficient than full fine-tuning?
Prompt tuning trains only the soft prompt tokens, leaving the majority of the model frozen. This drastically reduces the number of parameters that need updating, making it much more efficient than full fine-tuning in terms of memory, compute, and storage.

- How do the results from prompt-tuning compare to LoRA and full fine-tuning? Which technique performed best in terms of ROUGE scores?
Prompt tuning performs slightly worse than LoRA in terms of ROUGE scores. LoRA obtained the best scores, and full fine-tuning obtained the worst scores.
## Efficiency and Trade-offs:

- Given the results of your experiments, which fine-tuning method (LoRA, full fine-tuning, or prompt-tuning) do you think strikes the best balance between computational efficiency and model performance? Why?
Prompt-tuning strikes the best balance. It achieves ROUGE scores comparable to LoRA (even slightly higher in ROUGE-1 and ROUGE-2) while training far fewer parameters, making it more computationally efficient than full fine-tuning and easier to scale.
- If you were to deploy one of these models in a production system with limited computational resources, which approach would you choose and why?
I would choose prompt-tuning for deployment. It requires minimal memory and compute since only the soft prompts are trained, yet it maintains strong performance, making it ideal for production systems with resource constraints.
- How would you extend these methods to other tasks beyond summarization (e.g., machine translation or question-answering)?
These fine-tuning methods can be extended to other tasks like machine translation and question answering. For machine translation, you would prepare a dataset of source and target language sentence pairs. The prompt would instruct the model to translate the source sentence, and the target would be the corresponding translation. For question answering, the dataset would consist of context paragraphs and questions with their corresponding answers. The prompt would ask the model to answer the question based on the provided context, and the target would be the answer. The same PEFT configurations and training procedures used for summarization can be applied, adjusting hyperparameters and potentially the target modules for LoRA based on the specific task and model architecture.