# Introduction to LoRA and Prompt Tuning using PEFT

In this lab, you will explore two efficient fine-tuning techniques, LoRA (Low-Rank Adaptation) and Prompt Tuning, using the [PEFT (Parameter-Efficient Fine-Tuning) framework](https://huggingface.co/docs/peft/index). These techniques are gaining popularity for their ability to adapt pre-trained language models like FLAN-T5 to specific tasks, while only modifying a small percentage of model parameters. This approach reduces the computational resources needed, making it more feasible to fine-tune large models on tasks like text summarization or translation. By the end of this lab, you will have a practical understanding of full fine-tuning, LoRA, and prompt tuning, comparing their performance in both qualitative and quantitative terms. You'll be using the DialogSum dataset to fine-tune FLAN-T5 models, analyzing their results with the ROUGE metric, and reflecting on the efficiency of each method.

In [6]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

You are going to experiment with the [DialogSum](https://huggingface.co/datasets/knkarthick/dialogsum) Hugging Face dataset. It contains 10,000+ dialogues with the corresponding manually labeled summaries and topics.

In [7]:
huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

train.csv:   0%|          | 0.00/11.3M [00:00<?, ?B/s]

validation.csv: 0.00B [00:00, ?B/s]

test.csv: 0.00B [00:00, ?B/s]

Generating train split:   0%|          | 0/12460 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/500 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1500 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 12460
    })
    validation: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 500
    })
    test: Dataset({
        features: ['id', 'dialogue', 'summary', 'topic'],
        num_rows: 1500
    })
})

Load the pre-trained [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5) and its tokenizer directly from HuggingFace. Notice that you will be using the [small version of FLAN-T5](https://huggingface.co/google/flan-t5-small). Setting torch_dtype=torch.bfloat16 specifies the memory type to be used by this model.

In [8]:
model_name='google/flan-t5-small'
original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

It is possible to pull out the number of model parameters and find out how many of them are trainable.

In [9]:
def print_number_of_trainable_model_parameters(model):
    """
    Prints the number of trainable and total model parameters.

    This function iterates through the parameters of a given model and calculates:
    1. The total number of model parameters.
    2. The number of trainable parameters (those with `requires_grad=True`).

    It then returns a formatted string with the number of trainable parameters, total parameters,
    and the percentage of parameters that are trainable.

    Args:
        model (torch.nn.Module): The neural network model from which parameters are being counted.

    Returns:
        str: A string displaying the total number of parameters, trainable parameters, and
        the percentage of trainable parameters.

    Example:
        >>> model = YourModel()
        >>> print(print_number_of_trainable_model_parameters(model))
        trainable model parameters: 123456
        all model parameters: 234567
        percentage of trainable model parameters: 52.63%
    """
    # TODO: Implement the function
    trainable_model_params = 0
    all_model_params = 0

    # TODO: Iterate through the parameters of the model and count the number of trainable and total parameters
    for _, param in model.named_parameters():
        if param.requires_grad == True:
          trainable_model_params += 1
        all_model_params += 1
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

print(print_number_of_trainable_model_parameters(original_model))

trainable model parameters: 190
all model parameters: 190
percentage of trainable model parameters: 100.00%


Test the model with the zero shot inferencing. You can see that the model struggles to summarize the dialogue compared to the baseline summary, but it does pull out some important information from the text which indicates the model can be fine-tuned to the task at hand.

In [10]:
index = 200

dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

inputs = tokenizer(prompt, return_tensors='pt')
output = tokenizer.decode(
    original_model.generate(
        inputs["input_ids"],
        max_new_tokens=200,
    )[0],
    skip_special_tokens=True
)

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation.

#Person1#: Have you considered upgrading your system?
#Person2#: Yes, but I'm not sure what exactly I would need.
#Person1#: You could consider adding a painting program to your software. It would allow you to make up your own flyers and banners for advertising.
#Person2#: That would be a definite bonus.
#Person1#: You might also want to upgrade your hardware because it is pretty outdated now.
#Person2#: How can we do that?
#Person1#: You'd probably need a faster processor, to begin with. And you also need a more powerful hard disc, more memory and a faster modem. Do you have a CD-ROM drive?
#Person2#: No.
#Person1#: Then you might want to add a CD-ROM drive too, because most new software programs are coming out on Cds.
#Person2#: That sounds great. Thanks.

Summary:

-------------------------------------------------------------------

## Perform Full Fine-Tuning

### Preprocess the Dialog-Summary Dataset

You need to convert the dialog-summary (prompt-response) pairs into explicit instructions for the LLM. Prepend an instruction to the start of the dialog with `Summarize the following conversation` and to the start of the summary with Summary as follows:

Training prompt (dialogue):
```
Summarize the following conversation.

    Chris: This is his part of the conversation.
    Antje: This is her part of the conversation.

Summary:
```
Training response (summary):

`Both Chris and Antje participated in the conversation.`

Then preprocess the prompt-response dataset into tokens and pull out their input_ids (1 per token).

In [11]:
def tokenize_function(example):
    """
    Tokenizes a given dialogue-summary example for model input.

    This function preprocesses an example from the dataset by constructing a prompt
    for summarization. It adds an instruction prompt before the dialogue and a "Summary"
    tag before the summary. Then, the input dialogue and the summary are tokenized,
    with padding and truncation applied to ensure the tokenized sequences fit the model's
    input size requirements.

    Args:
        example (dict): A dictionary containing two keys:
            - "dialogue" (list of str): List of dialogue strings to be summarized.
            - "summary" (list of str): Corresponding summaries for the dialogues.

    Returns:
        dict: A dictionary with the following updated keys:
            - "input_ids" (torch.Tensor): The tokenized input prompts.
            - "labels" (torch.Tensor): The tokenized summaries.
    """
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    example['labels'] = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids

    return example

# The dataset actually contains 3 diff splits: train, validation, test.
# The tokenize_function code is handling all data across all splits in batches.
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'topic', 'dialogue', 'summary',])

Map:   0%|          | 0/12460 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

Map:   0%|          | 0/1500 [00:00<?, ? examples/s]

To save some time in the lab, you will subsample the dataset:

In [12]:
tokenized_datasets = tokenized_datasets.filter(lambda example, index: index % 1000 == 0, with_indices=True)

Filter:   0%|          | 0/12460 [00:00<?, ? examples/s]

Filter:   0%|          | 0/500 [00:00<?, ? examples/s]

Filter:   0%|          | 0/1500 [00:00<?, ? examples/s]

Check the shapes of all three parts of the dataset:

In [13]:
print(f"Shapes of the datasets:")
print(f"Training: {tokenized_datasets['train'].shape}")
print(f"Validation: {tokenized_datasets['validation'].shape}")
print(f"Test: {tokenized_datasets['test'].shape}")

print(tokenized_datasets)

Shapes of the datasets:
Training: (13, 2)
Validation: (1, 2)
Test: (2, 2)
DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 13
    })
    validation: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 1
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 2
    })
})


### Fine-Tune the Model
Now utilize the built-in Hugging Face `Trainer` class (see the documentation [here](https://huggingface.co/docs/transformers/main_classes/trainer)). Pass the preprocessed dataset with reference to the original model. Other training parameters are found experimentally and there is no need to go into details about those at the moment. This fully fine-tuned model will also be referred to as the instruct model in this lab.

In [14]:
from copy import deepcopy

instruct_model = deepcopy(original_model)

In [15]:
output_dir = f'./dialogue-summary-training-{str(int(time.time()))}'

# TODO: Play with different hyperparameters and training configurations, be careful with the training time
training_args = TrainingArguments(
    output_dir=output_dir,
    learning_rate=1e-5,
    num_train_epochs=10,
    weight_decay=0.01,
    logging_steps=1,
    save_strategy="no",
    report_to="none",
)

trainer = Trainer(
    model=instruct_model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['validation']
)

In [16]:
trainer.train()

Step,Training Loss
1,58.25
2,56.25
3,57.0
4,58.75
5,57.25
6,57.5
7,56.75
8,58.25
9,58.25
10,55.5


TrainOutput(global_step=20, training_loss=57.3125, metrics={'train_runtime': 14.5739, 'train_samples_per_second': 8.92, 'train_steps_per_second': 1.372, 'total_flos': 24165765611520.0, 'train_loss': 57.3125, 'epoch': 10.0})

### Evaluate the model qualitatively

As with many GenAI applications, a qualitative approach where you ask yourself the question "Is my model behaving the way it is supposed to?" is usually a good starting point. In the example below (the same one we started this notebook with), you can see how the fine-tuned model is able to create a reasonable summary of the dialogue compared to the original inability to understand what is being asked of the model

In [17]:
index = 200
dialogue = dataset['test'][index]['dialogue']
human_baseline_summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{human_baseline_summary}')
print(dash_line)
print(f'ORIGINAL MODEL:\n{original_model_text_output}')
print(dash_line)
print(f'INSTRUCT MODEL:\n{instruct_model_text_output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
ORIGINAL MODEL:
How would you like to upgrade your computer?
---------------------------------------------------------------------------------------------------
INSTRUCT MODEL:
You're going to need a CD-ROM drive.


### Evaluate model quantitatively (with ROUGE metric)

The [ROUGE metric](https://en.wikipedia.org/wiki/ROUGE_(metric) ) helps quantify the validity of summarizations produced by models. It compares summarizations to a "baseline" summary which is usually created by a human. While not perfect, it does indicate the overall increase in summarization effectiveness that we have accomplished by fine-tuning.

In [18]:
! pip install rouge_score

Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25l[?25hdone
  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24934 sha256=fd3b794b0092b9c21397d44085d257cca72d36dfeb75c2aca0474367a3b32c21
  Stored in directory: /root/.cache/pip/wheels/85/9d/af/01feefbe7d55ef5468796f0c68225b6788e85d9d0a281e7a70
Successfully built rouge_score
Installing collected packages: rouge_score
Successfully installed rouge_score-0.1.2


In [19]:
rouge = evaluate.load('rouge')

Downloading builder script: 0.00B [00:00, ?B/s]

Generate the outputs for the sample of the test dataset (only 10 dialogues and summaries to save time), and save the results.

In [20]:
dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []

for _, dialogue in enumerate(dialogues):
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

    input_ids = tokenizer(prompt, return_tensors="pt").input_ids

    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)
    original_model_summaries.append(original_model_text_output)

    instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)
    instruct_model_summaries.append(instruct_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, instruct_model_summaries))

df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'instruct_model_summaries'])
df

Unnamed: 0,human_baseline_summaries,original_model_summaries,instruct_model_summaries
0,Ms. Dawson helps #Person1# to write a memo to ...,Is this the first time you have a dictation fo...,I am going to send you a copy of this memo.
1,In order to prevent employees from wasting tim...,Is this the first time you have a dictation fo...,"Please, please get this memo re-formatted and ..."
2,Ms. Dawson takes a dictation for #Person1# abo...,Is this the first time you have a dictation fo...,#Person1#: This should go out as an intra-offi...
3,#Person2# arrives late because of traffic jam....,Talk to your boss.,You're right!
4,#Person2# decides to follow #Person1#'s sugges...,Talk to your boss.,Don't worry!
5,#Person2# complains to #Person1# about the tra...,Talk to your boss.,#Person1#: I'm going to work.
6,#Person1# tells Kate that Masha and Hero get d...,"Kate, you know, I'm not sure.",#Person1: I'm not sure if I'm a therapist or a...
7,#Person1# tells Kate that Masha and Hero are g...,"Kate, you know, I'm not sure.",Talk to the media.
8,#Person1# and Kate talk about the divorce betw...,"Kate, you know, I'm not sure.",#Person1#: I think they are getting a divorce ...
9,#Person1# and Brian are at the birthday party ...,"Brian, how are you?","Brian, thanks for the party."


Evaluate the models computing ROUGE metrics. Notice the improvement in the results!

In [21]:
original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

instruct_model_results = rouge.compute(
    predictions=instruct_model_summaries,
    references=human_baseline_summaries[0:len(instruct_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('INSTRUCT MODEL:')
print(instruct_model_results)

ORIGINAL MODEL:
{'rouge1': np.float64(0.11575268414695844), 'rouge2': np.float64(0.01111111111111111), 'rougeL': np.float64(0.09514258109004686), 'rougeLsum': np.float64(0.09559844081242395)}
INSTRUCT MODEL:
{'rouge1': np.float64(0.1207622925973114), 'rouge2': np.float64(0.014829931972789114), 'rougeL': np.float64(0.09869851258581236), 'rougeLsum': np.float64(0.10031754745962032)}


The results show substantial improvement in all ROUGE metrics:

In [22]:
print("Absolute percentage improvement of INSTRUCT MODEL over HUMAN BASELINE")

improvement = (np.array(list(instruct_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(instruct_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of INSTRUCT MODEL over HUMAN BASELINE
rouge1: 0.50%
rouge2: 0.37%
rougeL: 0.36%
rougeLsum: 0.47%


## Perform Parameter Efficient Fine-Tuning (PEFT)

Now, let's perform Parameter Efficient Fine-Tuning (PEFT) fine-tuning as opposed to "full fine-tuning" as you did above. PEFT is a form of instruction fine-tuning that is much more efficient than full fine-tuning - with comparable evaluation results as you will see soon.

PEFT is a generic term that includes Low-Rank Adaptation (LoRA) and prompt tuning (which is NOT THE SAME as prompt engineering!). In most cases, when someone says PEFT, they typically mean LoRA. LoRA, at a very high level, allows the user to fine-tune their model using fewer compute resources (in some cases, a single GPU). After fine-tuning for a specific task, use case, or tenant with LoRA, the result is that the original LLM remains unchanged and a newly-trained “LoRA adapter” emerges. This LoRA adapter is much, much smaller than the original LLM - on the order of a single-digit % of the original LLM size (MBs vs GBs).

That said, at inference time, the LoRA adapter needs to be reunited and combined with its original LLM to serve the inference request. The benefit, however, is that many LoRA adapters can re-use the original LLM which reduces overall memory requirements when serving multiple tasks and use cases.

### Brief introduction to LoRA Tuning
LoRA is a re-parameterization technique. Its operation is simple, complex, and brilliant at the same time. It involves reducing the size of the matrices to be trained by dividing them in such a way that when multiplied, they yield the original matrix.

The weights that are modified are those of the reduced matrices, not the original matrix. It's better visualized in an image.

![](resources/lora_matrix_multiplication.webp)

We have an original matrix of 50x50, which means we would have to modify about 2500 parameters. However, as we know, if we multiply two matrices of (2x50) and (50x2), we obtain a 50x50 matrix. Yet, these two matrices are formed by only 100 parameters each. In other words, for the reduced matrices, we need to modify a total of 200 parameters compared to the 2500 of the original matrix. This represents a 92% reduction, and the larger the original matrix, the greater the percentage of savings.

In Language Models like GPT-3 or any of the current ones with LoRA, it's possible that we only need to train about 0.02% of the original parameters. This varies for each model. The best part is that the obtained result is very similar to that of full fine-tuning, in some cases, it can even be better.

#### Setup the LoRA model for Fine-Tuning

You need to set up the LoRA model for fine-tuning with a new layer/parameter adapter. Using LoRA, you are freezing the underlying LLM and only training the adapter. Have a look at the LoRA configuration below. Note the rank (`r`) hyper-parameter, which defines the rank/dimension of the adapter to be trained.

In [23]:
from peft import LoraConfig, get_peft_model, TaskType

# TODO: Play with different hyperparameters and training configurations, be careful with the training time
lora_config = LoraConfig(
    r=32, # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="lora_only",  # this specifies if the bias parameter should be trained.
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)

Add LoRA adapter layers/parameters to the original LLM to be trained.

In [24]:
lora_model = get_peft_model(deepcopy(original_model), lora_config)
print(print_number_of_trainable_model_parameters(lora_model))

trainable model parameters: 96
all model parameters: 286
percentage of trainable model parameters: 33.57%


#### Train LoRA Adapter

In [25]:
output_dir = f'./lora-dialogue-summary-training-{str(int(time.time()))}'

# TODO: Play with different hyperparameters and training configurations, be careful with the training time
lora_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3, # Higher learning rate than full fine-tuning.
    num_train_epochs=100,
    logging_steps=10,
    save_strategy="no",
    report_to="none",
)

lora_trainer = Trainer(
    model=lora_model,
    args=lora_training_args,
    train_dataset=tokenized_datasets["train"],
)

In [26]:
lora_trainer.train()

Step,Training Loss
10,41.3125
20,20.4062
30,7.7719
40,5.5
50,4.8063
60,4.4906
70,4.2625
80,4.0406
90,3.8031
100,3.5547


TrainOutput(global_step=200, training_loss=6.48421875, metrics={'train_runtime': 122.9707, 'train_samples_per_second': 10.572, 'train_steps_per_second': 1.626, 'total_flos': 247153872076800.0, 'train_loss': 6.48421875, 'epoch': 100.0})

#### Evaluate the model qualitatively

In [27]:
index = 200
dialogue = dataset['test'][index]['dialogue']
baseline_human_summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

lora_model_outputs = lora_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
lora_model_text_output = tokenizer.decode(lora_model_outputs[0], skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{human_baseline_summary}')
print(dash_line)
print(f'ORIGINAL MODEL:\n{original_model_text_output}')
print(dash_line)
print(f'INSTRUCT MODEL:\n{instruct_model_text_output}')
print(dash_line)
print(f'LoRA MODEL: {lora_model_text_output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
ORIGINAL MODEL:
How would you like to upgrade your computer?
---------------------------------------------------------------------------------------------------
INSTRUCT MODEL:
How about upgrading your computer?
---------------------------------------------------------------------------------------------------
LoRA MODEL: #Person1#: You can upgrade your system. #Person1#: You could upgrade your hardware.


#### Evaluate the model quantitatively (with ROUGE metric)

In [28]:
dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
lora_model_summaries = []

for idx, dialogue in enumerate(dialogues):
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

    input_ids = tokenizer(prompt, return_tensors="pt").input_ids

    human_baseline_text_output = human_baseline_summaries[idx]

    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

    instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

    lora_model_outputs = lora_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    lora_model_text_output = tokenizer.decode(lora_model_outputs[0], skip_special_tokens=True)

    original_model_summaries.append(original_model_text_output)
    instruct_model_summaries.append(instruct_model_text_output)
    lora_model_summaries.append(lora_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, instruct_model_summaries, lora_model_summaries))

df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'instruct_model_summaries', 'lora_model_summaries'])
df

Unnamed: 0,human_baseline_summaries,original_model_summaries,instruct_model_summaries,lora_model_summaries
0,Ms. Dawson helps #Person1# to write a memo to ...,Is this the first time you have a dictation fo...,#Person2#: This memo is intended to inform emp...,#Person2# #Person1#: Intra-office communicatio...
1,In order to prevent employees from wasting tim...,Is this the first time you have a dictation fo...,Is it OK to take a dictation for me?,#Person1#: Is this going to be a dictation for...
2,Ms. Dawson takes a dictation for #Person1# abo...,Is this the first time you have a dictation fo...,#Person2#: Is you ready to take a dictation fo...,Your resume is ready for a dictation.
3,#Person2# arrives late because of traffic jam....,Talk to your boss.,Taking the subway would be a good thing.,#Person1#'s the best path to work.
4,#Person2# decides to follow #Person1#'s sugges...,Talk to your boss.,Taking the subway would be a lot less stressfu...,#Person1#: I'm not saying that a different rou...
5,#Person2# complains to #Person1# about the tra...,Talk to your boss.,How about taking the subway?,Posterson# and #Person# #Person# and #Person# ...
6,#Person1# tells Kate that Masha and Hero get d...,"Kate, you know, I'm not sure.",Masha and Hero are getting divorced.,Masha and Hero are getting divorced.
7,#Person1# tells Kate that Masha and Hero are g...,"Kate, you know, I'm not sure.",Masha and Hero are getting divorced for the fi...,Masha and Hero's quarrelling about who got the...
8,#Person1# and Kate talk about the divorce betw...,"Kate, you know, I'm not sure.",#Person2#: I don't know. They're having a divo...,Masha and Hero is having a separation of the k...
9,#Person1# and Brian are at the birthday party ...,"Brian, how are you?","#Person1#: Happy Birthday, Brian.","Brian, Brian, and I'm happy to have a good tim..."


In [29]:
rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

instruct_model_results = rouge.compute(
    predictions=instruct_model_summaries,
    references=human_baseline_summaries[0:len(instruct_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

lora_model_results = rouge.compute(
    predictions=lora_model_summaries,
    references=human_baseline_summaries[0:len(lora_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('INSTRUCT MODEL:')
print(instruct_model_results)
print('LoRA MODEL:')
print(lora_model_results)

ORIGINAL MODEL:
{'rouge1': np.float64(0.11575268414695844), 'rouge2': np.float64(0.01111111111111111), 'rougeL': np.float64(0.09514258109004686), 'rougeLsum': np.float64(0.09559844081242395)}
INSTRUCT MODEL:
{'rouge1': np.float64(0.2213246282040654), 'rouge2': np.float64(0.07594422568935313), 'rougeL': np.float64(0.19695837870966298), 'rougeLsum': np.float64(0.19837906053160367)}
LoRA MODEL:
{'rouge1': np.float64(0.22498099443199143), 'rouge2': np.float64(0.06841619525450307), 'rougeL': np.float64(0.17735059257215163), 'rougeLsum': np.float64(0.17879468077472965)}


Calculate the improvement of LoRA over the original model:

In [30]:
print("Absolute percentage improvement of LoRA MODEL over HUMAN BASELINE")

improvement = (np.array(list(lora_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(lora_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of LoRA MODEL over HUMAN BASELINE
rouge1: 10.92%
rouge2: 5.73%
rougeL: 8.22%
rougeLsum: 8.32%


Now calculate the improvement of LoRA over a full fine-tuned model:

In [31]:
print("Absolute percentage improvement of LoRA MODEL over INSTRUCT MODEL")

improvement = (np.array(list(lora_model_results.values())) - np.array(list(instruct_model_results.values())))
for key, value in zip(lora_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of LoRA MODEL over INSTRUCT MODEL
rouge1: 0.37%
rouge2: -0.75%
rougeL: -1.96%
rougeLsum: -1.96%


### Brief introduction to Prompt Tuning

It’s an Additive Fine-Tuning technique for models. This means that we WILL NOT MODIFY ANY WEIGHTS OF THE ORIGINAL MODEL. You might be wondering, how are we going to perform fine-tuning then? Well, we will train additional layers that are added to the model. That’s why it’s called an Additive technique.

Considering it’s an Additive technique and its name is Prompt-Tuning, it seems clear that the layers we’re going to add and train are related to the prompt.

![](resources/prompt_tuning.jpg)

We are creating a type of superprompt by enabling a model to enhance a portion of the prompt with its acquired knowledge. However, that particular section of the prompt cannot be translated into natural language. It's as if we've mastered expressing ourselves in embeddings and generating highly effective prompts.

In each training cycle, the only weights that can be modified to minimize the loss function are those integrated into the prompt.

The primary consequence of this technique is that the number of parameters to train is genuinely small. However, we encounter a second, perhaps more significant consequence, namely that, since we do not modify the weights of the pretrained model, it does not alter its behavior or forget any information it has previously learned.

The training is faster and more cost-effective. Moreover, we can train various models, and during inference time, we only need to load one foundational model along with the new smaller trained models because the weights of the original model have not been altered

#### Setup the Prompt tuning model for Fine-Tuning

You need to set up the Prompt tuning model for fine-tuning with a new layer/parameter adapter.

In [32]:
from peft import get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit

NUM_VIRTUAL_TOKENS = 20 #Number of virtual tokens to be added and trained.

# TODO: Play with different hyperparameters and training configurations, be careful with the training time
prompt_config = PromptTuningConfig(
    task_type=TaskType.SEQ_2_SEQ_LM, #This type indicates the model will generate text.
    prompt_tuning_init=PromptTuningInit.RANDOM,  #The added virtual tokens are initializad with random numbers
    num_virtual_tokens=NUM_VIRTUAL_TOKENS, #Number of virtual tokens to be added and trained.
    tokenizer_name_or_path=model_name #The pre-trained model.
)

Add Prompt tuning adapter layers/parameters to the original LLM to be trained.

In [33]:
prompt_model = get_peft_model(deepcopy(original_model),
                            lora_config)
print(print_number_of_trainable_model_parameters(prompt_model))

trainable model parameters: 96
all model parameters: 286
percentage of trainable model parameters: 33.57%


#### Train Prompt tuning Adapter

In [34]:
output_dir = f'./prompt-tuning-dialogue-summary-training-{str(int(time.time()))}'

# TODO: Play with different hyperparameters and training configurations, be careful with the training time
prompt_training_args = TrainingArguments(
    output_dir=output_dir,
    auto_find_batch_size=True,
    learning_rate=1e-3, # Higher learning rate than full fine-tuning.
    num_train_epochs=100,
    logging_steps=10,
    save_strategy="no",
    report_to="none",
)

prompt_trainer = Trainer(
    model=prompt_model,
    args=prompt_training_args,
    train_dataset=tokenized_datasets["train"],
)

In [35]:
prompt_trainer.train()

Step,Training Loss
10,41.25
20,19.9563
30,7.5781
40,5.5687
50,4.8531
60,4.5531
70,4.3156
80,4.1063
90,3.8906
100,3.6609


TrainOutput(global_step=200, training_loss=6.527265625, metrics={'train_runtime': 119.7291, 'train_samples_per_second': 10.858, 'train_steps_per_second': 1.67, 'total_flos': 247153872076800.0, 'train_loss': 6.527265625, 'epoch': 100.0})

#### Evaluate the model qualitatively

In [36]:
index = 200
dialogue = dataset['test'][index]['dialogue']
baseline_human_summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

input_ids = tokenizer(prompt, return_tensors="pt").input_ids

original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

lora_model_outputs = lora_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
lora_model_text_output = tokenizer.decode(lora_model_outputs[0], skip_special_tokens=True)

prompt_model_outputs = prompt_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
prompt_model_text_output = tokenizer.decode(prompt_model_outputs[0], skip_special_tokens=True)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{human_baseline_summary}')
print(dash_line)
print(f'ORIGINAL MODEL:\n{original_model_text_output}')
print(dash_line)
print(f'INSTRUCT MODEL:\n{instruct_model_text_output}')
print(dash_line)
print(f'LoRA MODEL: {lora_model_text_output}')
print(dash_line)
print(f'PROMPT-TUNING MODEL: {prompt_model_text_output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# teaches #Person2# how to upgrade software and hardware in #Person2#'s system.
---------------------------------------------------------------------------------------------------
ORIGINAL MODEL:
How would you like to upgrade your computer?
---------------------------------------------------------------------------------------------------
INSTRUCT MODEL:
How about upgrading your computer?
---------------------------------------------------------------------------------------------------
LoRA MODEL: #Person1#: You can upgrade your system. #Person1#: You could upgrade your hardware.
---------------------------------------------------------------------------------------------------
PROMPT-TUNING MODEL: #Person1#: I've considered adding a painting program to your software.


#### Evaluate the model quantitatively (with ROUGE metric)

In [37]:
dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
lora_model_summaries = []
prompt_model_summaries = []

for idx, dialogue in enumerate(dialogues):
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """

    input_ids = tokenizer(prompt, return_tensors="pt").input_ids

    human_baseline_text_output = human_baseline_summaries[idx]

    original_model_outputs = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    original_model_text_output = tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)

    instruct_model_outputs = instruct_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    instruct_model_text_output = tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)

    lora_model_outputs = lora_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    lora_model_text_output = tokenizer.decode(lora_model_outputs[0], skip_special_tokens=True)

    prompt_model_outputs = lora_model.generate(input_ids=input_ids.to(device), generation_config=GenerationConfig(max_new_tokens=200))
    prompt_model_text_output = tokenizer.decode(prompt_model_outputs[0], skip_special_tokens=True)

    original_model_summaries.append(original_model_text_output)
    instruct_model_summaries.append(instruct_model_text_output)
    lora_model_summaries.append(lora_model_text_output)
    prompt_model_summaries.append(prompt_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, instruct_model_summaries, lora_model_summaries))

df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'instruct_model_summaries', 'lora_model_summaries'])
df

Unnamed: 0,human_baseline_summaries,original_model_summaries,instruct_model_summaries,lora_model_summaries
0,Ms. Dawson helps #Person1# to write a memo to ...,Is this the first time you have a dictation fo...,#Person: I'd like to take a dictation for you.,#Person1#: Attention all staff.
1,In order to prevent employees from wasting tim...,Is this the first time you have a dictation fo...,Don't forget to take a dictation.,"#Person1#: This morning, sir, I will take a di..."
2,Ms. Dawson takes a dictation for #Person1# abo...,Is this the first time you have a dictation fo...,"I'm sorry, sir. I'm sorry, sir. I'm sorry, sir...",#Person1# and #Person1#: Thank you for taking ...
3,#Person2# arrives late because of traffic jam....,Talk to your boss.,Taking the subway would be a good way to get h...,@Person1#: I'm stuck in traffic again and I'm ...
4,#Person2# decides to follow #Person1#'s sugges...,Talk to your boss.,Don't worry.,Taking a subway to work and a subway would be ...
5,#Person2# complains to #Person1# about the tra...,Talk to your boss.,#Person: I'm not going to drive to work.,#Person1#: What took so long to get home.
6,#Person1# tells Kate that Masha and Hero get d...,"Kate, you know, I'm not sure.",Masha and Hero are getting divorced.,Masha and Hero are getting married.
7,#Person1# tells Kate that Masha and Hero are g...,"Kate, you know, I'm not sure.",#Person1#: I'm not sure. I'm not sure if they'...,"Hero and Masha have a separation, and they hav..."
8,#Person1# and Kate talk about the divorce betw...,"Kate, you know, I'm not sure.",The divorce is over.,Masha and Hero: They're having a separation fo...
9,#Person1# and Brian are at the birthday party ...,"Brian, how are you?",#Person1#: Happy Birthday to you.,"Brian, Brian's birthday party."


In [38]:
rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

instruct_model_results = rouge.compute(
    predictions=instruct_model_summaries,
    references=human_baseline_summaries[0:len(instruct_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

lora_model_results = rouge.compute(
    predictions=lora_model_summaries,
    references=human_baseline_summaries[0:len(lora_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

prompt_model_results = rouge.compute(
    predictions=prompt_model_summaries,
    references=human_baseline_summaries[0:len(prompt_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('INSTRUCT MODEL:')
print(instruct_model_results)
print('LoRA MODEL:')
print(lora_model_results)
print('PROMPT-TUNING MODEL:')
print(prompt_model_results)

ORIGINAL MODEL:
{'rouge1': np.float64(0.11575268414695844), 'rouge2': np.float64(0.01111111111111111), 'rougeL': np.float64(0.09514258109004686), 'rougeLsum': np.float64(0.09559844081242395)}
INSTRUCT MODEL:
{'rouge1': np.float64(0.14382774417331007), 'rouge2': np.float64(0.034782608695652174), 'rougeL': np.float64(0.1291212693207578), 'rougeLsum': np.float64(0.13483323851098789)}
LoRA MODEL:
{'rouge1': np.float64(0.2470477356500325), 'rouge2': np.float64(0.07112362815063833), 'rougeL': np.float64(0.16231068055239856), 'rougeLsum': np.float64(0.16458009844706004)}
PROMPT-TUNING MODEL:
{'rouge1': np.float64(0.2709185118107144), 'rouge2': np.float64(0.06672150856389988), 'rougeL': np.float64(0.2000385115446785), 'rougeLsum': np.float64(0.2030274169221038)}


Calculate the improvement of Prompt-tuning over the original model:

In [39]:
print("Absolute percentage improvement of PROMPT-TUNING MODEL over HUMAN BASELINE")

improvement = (np.array(list(prompt_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(prompt_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of PROMPT-TUNING MODEL over HUMAN BASELINE
rouge1: 15.52%
rouge2: 5.56%
rougeL: 10.49%
rougeLsum: 10.74%


Calculate the improvement of LoRA over a full fine-tuned model:

In [40]:
print("Absolute percentage improvement of PROMPT-TUNING MODEL over INSTRUCT MODEL")

improvement = (np.array(list(prompt_model_results.values())) - np.array(list(instruct_model_results.values())))
for key, value in zip(prompt_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of PROMPT-TUNING MODEL over INSTRUCT MODEL
rouge1: 12.71%
rouge2: 3.19%
rougeL: 7.09%
rougeLsum: 6.82%


Now, calculate the improvement of Prompt-tuning over a LoRA:

In [41]:
print("Absolute percentage improvement of PROMPT-TUNING MODEL over LoRA MODEL")

improvement = (np.array(list(prompt_model_results.values())) - np.array(list(lora_model_results.values())))
for key, value in zip(prompt_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of PROMPT-TUNING MODEL over LoRA MODEL
rouge1: 2.39%
rouge2: -0.44%
rougeL: 3.77%
rougeLsum: 3.84%


# Questions

## Preprocessing and Tokenization:

- Why is it important to prepend instructions like "Summarize the following conversation" when constructing prompts for training a language model?
>>> It is important to prepend instructions when constructing promtps because we need to explain the model the task that it should do, because if not, the model can't know if it has to summarize, translate, .... It is helpful to understand the task that it needs to folow

- How does tokenization affect the model’s performance? What challenges might arise from long input sequences in tasks like summarization?
>>> Tokenization is important for the model performance because it breaks raw text into smaller units, so that the model doesn't see raw text. It is important to use a good tokenization because it affects how the model captures meaning and structure. It also helps to reduce the computational cost, and to understand the relationships between words. However, long input sequences produce more tokens, which might exceed the maximum input length of the model, making it more computationally expensive, and less accurate.

## Model Performance and Training:

Why do you think full fine-tuning achieves better results than zero-shot learning but might be less efficient for large-scale applications?
>>> Full fine-tuning achieves better results than zero-shot learning because the model’s parameters are directly adjusted to the specific task. This allows it to learn task-specific patterns, which improves accuracy. However, it requires more computational resources, which makes it less efficient for larger applications. This makes it harder to maintain, scale, and update compared to zero-shot learning, which use a single general model for multiple tasks.

## LoRA Fine-Tuning:

- How does LoRA reduce the number of trainable parameters compared to full fine-tuning, and why might this be beneficial for larger models?
>>> LoRA reduces the number of trainable parameters by freezing the original model weights and only training small matrices (that have a lower rank) that are added to specific layers. This means that instead of updating all parameters we only learn a small subset. This reduces the memory and the computation costs. This is beneficial for larger models, because it makes the fine-tuning more efficient, it is faster, and consumes less resources. This method is very good because it is efficient, and it also achieves a performance close to fine-tuning.

- LoRA modifies certain attention weights in the model. Why do you think only specific parts of the model are updated, and how does this affect its generalization to new tasks?
>>> LoRA only updates the specific parts of the model that capture the relevant relationships between tokens and that have a strong influence on how the model processes information. By only updating this parts, LoRA can adapt to new tasks without altering its core knowledge. This helps mantain generalization across tasks, while also allowing efficient fine-tuning.

## Prompt Tuning:

- In your own words, explain how prompt tuning differs from both full fine-tuning and LoRA. Why is it referred to as an additive fine-tuning technique?
>>> Prompt tuning differs from fine-tuning and LoRA because instead of updating the model’s weights, it learns a small set of additional “prompt” embeddings. This helps it to be more specific for a certain task. The original model parameters remain frozen, and only these added prompt vectors are trained. It is referred as an additive fine-tuning technique, because it adds new parameters, instead of updating the already existing ones.

- How does prompt tuning impact the number of parameters that are trained? Why is this method more efficient than full fine-tuning?
>>> Prompt tuning reduces the number of parameters that are trained, because instead of updating the model´s weights, it only updates a small set of prompt embeddings. This makes it far more efficient than full fine-tuning, which requires retraining every parameter in the model.

- How do the results from prompt-tuning compare to LoRA and full fine-tuning? Which technique performed best in terms of ROUGE scores?
>>> Based on the previous results, prompt-tuning achieved the highest ROUGE scores overall, having a slightly better performance than LoRA and full fine-tuning. The prompt-tuning model showed improvements of about 2.39% (ROUGE-1) and 3.8% (ROUGE-Lsum) over LoRA. However LoRA performed slightly better on ROUGE-2. Both LoRA and prompt-tuning performed better than full fine-tuning. These results suggest that prompt-tuning provided the best overall summarization performance while remaining efficient compared to traditional full fine-tuning.

## Efficiency and Trade-offs:

- Given the results of your experiments, which fine-tuning method (LoRA, full fine-tuning, or prompt-tuning) do you think strikes the best balance between computational efficiency and model performance? Why?
>>> From my point of view, the method that strikes the best balance between computational efficiency and model performance is LoRA, because it delivers a strong performance, while reducing the number of parameters and making it more efficient.

- If you were to deploy one of these models in a production system with limited computational resources, which approach would you choose and why?
>>> I would choose LoRA for deployment in a production system with limited computational resources because it has a good performance but it is also very efficient. We only need to store and train a small set of parameters, which reduces memory and computation costs, but we still mantain a good performance.

- How would you extend these methods to other tasks beyond summarization (e.g., machine translation or question-answering)?
>>> This methods could be extended to tasks like machine translation, question-answering by fine-tuning the model on data that is specific for the task, while keeping most of the model frozen. For example, LoRA or prompt tuning can adapt the model to machine translation or question-answering with minimal parameter updates, allowing efficient learning without retraining the entire model.
