<a href="https://www.kaggle.com/code/yannicksteph/nlp-llm-fine-tuning-qa-lora-t5?scriptVersionId=159753452" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# | NLP | LLM | Fine-tuning | QA LoRA T5 |

## Natural Language Processing (NLP) and Large Language Models (LLM) with Fine-Tuning LLM and make Question answering (QA) with LoRA and Flan-T5 Large

![Learning](https://t3.ftcdn.net/jpg/06/14/01/52/360_F_614015247_EWZHvC6AAOsaIOepakhyJvMqUu5tpLfY.jpg)


# <b>1 <span style='color:#78D118'>|</span> Overview</b>

In this notebook we're going to Fine-Tuning LLM:

<img src="https://github.com/YanSte/NLP-LLM-Fine-tuning-Trainer/blob/main/img_2.png?raw=true" alt="Learning" width="50%">

Many LLMs are general purpose models trained on a broad range of data and use cases. This enables them to perform well in a variety of applications, as shown in previous modules. It is not uncommon though to find situations where applying a general purpose model performs unacceptably for specific dataset or use case. This often does not mean that the general purpose model is unusable. Perhaps, with some new data and additional training the model could be improved, or fine-tuned, such that it produces acceptable results for the specific use case.

<img src="https://github.com/YanSte/NLP-LLM-Fine-tuning-Trainer/blob/main/img_1.png?raw=true" alt="Learning" width="50%">

Fine-tuning uses a pre-trained model as a base and continues to train it with a new, task targeted dataset. Conceptually, fine-tuning leverages that which has already been learned by a model and aims to focus its learnings further for a specific task.

It is important to recognize that fine-tuning is model training. The training process remains a resource intensive, and time consuming effort. Albeit fine-tuning training time is greatly shortened as a result of having started from a pre-trained model.

<img src="https://github.com/YanSte/NLP-LLM-Fine-tuning-Trainer/blob/main/img_3.png?raw=true" alt="Learning" width="50%">

[Hugging Face Model](https://huggingface.co/YanSte/t5_large_fine_tuning_question_answering_hc3_chatgpt_prompts)

### Overview definitions

<br/>
<details>
  <summary style="list-style: none;"><b>▶️ What is T5 Model?</b></summary>
  <br/>
  Multiple formats of FLAN-T5 models are available on Hugging Face, from small to extra-large models, and the bigger the model, the more parameters it has.

  Below are the different model sizes available from the Hugging Face model card:
  <br/>
  <img src="https://images.datacamp.com/image/upload/v1699032555/image8_241fd08d9c.png" alt="Learning" width="50%">

  FLAN-T5 variants with their parameters and memory usage

  Choosing the right model size
  The choice of the right model size among the variants of FLAN-T5 highly depends on the following criteria:

  - The specific requirements of the project
  - The available computational resources
  - The level of performance expected

</details>

<br/>

<details>
  <summary style="list-style: none;"><b>▶️ Fine-Tuning with LoRA?</b></summary>
  <br/>
  Fine-tuning, a crucial aspect of adapting pre-trained models to specific tasks, has witnessed a revolutionary approach known as Low Rank Adaptation (LoRA). Unlike conventional fine-tuning methods, LoRA strategically freezes pre-trained model weights and introduces trainable rank decomposition matrices into the Transformer architecture's layers. This innovative technique significantly reduces the number of trainable parameters, leading to expedited fine-tuning processes and mitigated overfitting.

</details>

<br/>

<details>
  <summary style="list-style: none;"><b>▶️ Text Generation vs Text2Text Generation?</b></summary>
  <br/>

  <b>Text Generation:</b>

  Text Generation, also known as Causal Language Modeling, is the process of generating text that closely resembles human writing.

  <img src="https://miro.medium.com/v2/resize:fit:1400/0*XDtcpv-m0SJRGSGB.png" alt="Learning" width="40%">

  It utilizes a Decoder-only architecture and operates in a left-to-right context. Text Generation is often employed for tasks such as sentence completion and generating the next lines of poetry when given a few lines as input. Examples of Text Generation models include the GPT family, BLOOM, and PaLM, which find applications in Chatbots, Text Completion, and content generation.

  ```
  from transformers import pipeline

  task = "text-generation"
  model_name = "gpt2"
  max_output_length = 30
  num_of_return_sequences = 2
  input_text = "Hello, "

  text_generator = pipeline(task, model=model_name)

  text_generator(input_text, max_length=max_output_length, num_return_sequences=num_of_return_sequences)
  ```
    
  <br/>

  <b>Text2Text Generation:</b>

  Text-to-Text Generation, also known as Sequence-to-Sequence Modeling, is the process of converting one piece of text into another.

  <img src="https://miro.medium.com/v2/resize:fit:1400/0*7_yKVuJmhFxUAGPQ.png" alt="Learning" width="40%">

  Text-to-Text Generation involves transforming input text into a desired target text, making it a versatile approach. It is commonly used in tasks such as language translation, summarization, and question-answering.

  Examples of Text-to-Text Generation models include Transformer-based architectures like T5 (Text-to-Text Transfer Transformer) and BART (Bart is not just another Reformatter).

  ```
  from transformers import pipeline

  task = "text2text-generation"
  model_name = "t5-small"
  max_output_length = 50
  num_of_return_sequences = 2
  input_text = "Translate the following English text to French: 'Hello, how are you?'"

  text_generator = pipeline(task, model=model_name)

  text_generator(input_text, max_length=max_output_length, num_return_sequences=num_of_return_sequences)
  ```
    
  <br/>
    
  In this example, we use the T5 model from Hugging Face to perform text-to-text generation. The input text is an English sentence that we want to translate into French. The model is capable of generating multiple possible translations.

</details>

<br/>

<details>
  <summary style="list-style: none;"><b>▶️ What is LoRA?</b></summary>

  <img src="https://miro.medium.com/v2/resize:fit:1400/format:webp/0*kzZ2_LZqBO9_hTi3.png" alt="Learning" width="30%">

  LoRA represents a paradigm shift in fine-tuning strategies, offering efficiency and effectiveness. By reducing the number of trainable parameters and GPU memory requirements, LoRA proves to be a powerful tool for tailoring pre-trained large models to specific tasks. This article explores how LoRA can be employed to create a personalized chatbot.

  <img src="https://miro.medium.com/v2/resize:fit:1400/format:webp/1*SJtZupeQVgp3s5HOBymcQw.png" alt="Learning" width="40%">
  <img src="https://github.com/YanSte/NLP-LLM-Fine-tuning-T5-Small-Reviews/blob/main/img_1.png?raw=true" alt="Learning" width="50%">

</details>

<br/>

<details>
  <summary style="list-style: none;"><b>▶️ PeftModel vs get_peft_model?</b></summary>
    
  <br/>
    
  Note:
  1. <b>`PeftModel.from_pretrained`:</b>
     - By default, the adapter of the PEFT model is frozen (non-trainable).
     - You can change this by adjusting the `is_trainable` configuration.

  2. <b>`get_peft_model` function:</b>
     - Parameters are not frozen by default.
     - Result: you obtain a trainable PEFT model for the SFT task.

  3. <b>Fine-tuning an already fine-tuned PEFT model:</b>
     - Utilize `from_pretrained`.
     - Set `is_trainable = True` to enable training of the previously fine-tuned model.
    
</details>

<br/>

<details>
  <summary style="list-style: none;"><b>▶️ What is ROUGE score?</b></summary>
  <br/>
  ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. Some key components of ROUGE for question-answering include:
  - ROUGE-L: Measures the longest common subsequence between the candidate and reference answers. This focuses on recall of the full text.
  - ROUGE-1, ROUGE-2, ROUGE-SU4: Compare unigram, bigram, 4-gram overlaps between candidate and reference. Focus on recall of key parts/chunks

  Higher ROUGE scores generally indicate better performance for question answering. Scores close to or above 0.70+ are considered strong.
  When using this metric, processing like stemming, and removing stopwords can help improve the overall performance
</details>

<br/>


### Prompt Datasets

The utilization of chat prompts during the fine-tuning of a T5 model holds crucial significance due to several inherent advantages associated with the conversational nature of such data. Here is a more detailed explanation of using chat prompts in this context:

1. **Simulation of Human Interaction:** Chat prompts enable the simulation of human interactions, mirroring the dynamics of a real conversation. This approach facilitates the model's learning to generate responses that reflect the fluidity and coherence inherent in human exchanges.

2. **Contextual Awareness:** Chat prompts are essential for capturing contextual nuances in conversations. Each preceding turn of speech influences the understanding and generation of responses. The use of these prompts allows the model to grasp contextual subtleties and adjust its responses accordingly.

3. **Adaptation to Specific Language:** By incorporating chat prompts during fine-tuning, the model can adapt to specific languages, unique conversational styles, and even idiosyncratic expressions. This enhances the model's proficiency in generating responses that align with the particular expectations of end-users.

4. **Diversity in Examples:** Conversations inherently exhibit diversity, characterized by a variety of expressions, tones, and linguistic structures. Chat prompts inject this diversity into the training process, endowing the model with the ability to handle real-world scenarios and adapt to the richness of human interactions.

Using Chat prompts during the fine-tuning of a T5 model represents a potent strategy to enhance its capability in understanding and generating conversational texts. These prompts act as a bridge between training data and real-life situations, thereby strengthening the model's performance in applications such as chatbot response generation, virtual assistant systems, and other natural language processing tasks.

### Model Details

T5 is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a **text-to-text** format.

### Training procedure

Since, T5 is a text-to-text model, the labels of the dataset are converted as follows: For each example, a sentence as been formed as "Question sentence: " + Answer sentence.

## Learning Objectives

By the end of this notebook, you will gain expertise in the following areas:

1. Learn how to effectively prepare datasets for training.
2. Understand the process of fine-tuning the T5 model manually, without relying on the Trainer module.
3. Explore the usage of accelerators to optimize model training and inference.
4. Evaluate the performance of your model using metrics such as Rouge scores.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Setup

In [None]:
%%capture
!pip install -U torch torchvision evaluate transformers datasets accelerate peft deepspeed rouge-score

In [None]:
from transformers import T5ForConditionalGeneration

# Replace 'your_hub_repo_name' with the name of your repository
hub_repo_name = "your_hub_repo_name"

# Load the fine-tuned T5 model
finetuned_model = T5ForConditionalGeneration.from_pretrained(hub_repo_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


OSError: your_hub_repo_name is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

In [None]:
import torch
assert torch.cuda.is_available()

AssertionError: 

In [None]:
###### General ######
import os
import gc
import platform
import pandas as pd
import numpy as np
from tqdm import tqdm
import matplotlib.pyplot as plt

###### Torch ######

import torch
from torch.utils.data import DataLoader

###### Hugging face ######

# Hub
# --
from huggingface_hub import login

# Dataset
# --
from datasets import Dataset, DatasetDict, load_dataset, load_metric
import datasets

# Transformers
# --
from transformers import pipeline, T5ForConditionalGeneration, AutoTokenizer, AutoConfig, default_data_collator, get_linear_schedule_with_warmup, DataCollatorForSeq2Seq, set_seed
import transformers

# Perf
# --
from peft import LoraConfig, TaskType, get_peft_model, get_peft_model_state_dict, PeftModel
from peft.utils.other import fsdp_auto_wrap_policy
from peft import PeftConfig

# Accelerator
# --
from accelerate import Accelerator

# Rouge score
# --
import evaluate
import nltk

###### Kaggle ######

# Kaggle
# --
from kaggle_secrets import UserSecretsClient

ModuleNotFoundError: No module named 'kaggle_secrets'

#### Config

In [None]:
# Hugging face
# ---
hub_repo_name="YanSte/t5_large_fine_tuning_lora_question_answering_hc3_and_chatgpt_prompts"

# General
# ---
cache_dir = "./cache"
seed = 42

# Model
# ---
model_name_or_path = "google/flan-t5-large"#"google/flan-t5-xxl"

# Data formatting
# ---
prefix = "Answer this question: "# We prefix our task

# Data formatting and tokenization
# ---
tokenizer_input_max_tokens = 256
tokenizer_output_max_tokens = 256
tokenizer_num_proc=16

# DataLoader
# ---
data_loader_batch_size = 4

# Lora
# ---
lora_task_type=TaskType.SEQ_2_SEQ_LM
lora_inference_mode=False
lora_r=8
lora_alpha=32
lora_dropout=0.1

# Model
# ---
lr = 1e-4
num_epochs = 3

def get_peft_model_name_or_path(model_name_or_path, peft_config):
    return f"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_")

def get_optimizer(model, lr):
    return torch.optim.AdamW(model.parameters(), lr=lr)


# Pipeline
# ---
pipeline_task = "text2text-generation"
pipeline_min_length=20
pipeline_temperature=0.3
pipeline_max_length=256

In [None]:
login(token=UserSecretsClient().get_secret("HUGGINGFACEHUB_API_TOKEN"))

NameError: name 'UserSecretsClient' is not defined

In [None]:
#os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:1024"

#### Setup

In [None]:
pd.set_option('display.max_column', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_seq_items', None)
pd.set_option('display.max_colwidth', 500)
pd.set_option('expand_frame_repr', True)

In [None]:
set_seed(seed)

Accelerator

In [None]:
accelerator = Accelerator()

# Or
# Initialize accelerator with config from configs/accelerate_ds_z3.yaml
# accelerator = (
#     Accelerator(log_with=args.report_to, logging_dir=args.output_dir) if args.with_tracking else Accelerator()
# )

Log

In [None]:
datasets.utils.logging.set_verbosity_error()
transformers.utils.logging.set_verbosity_error()

Rouge metric

In [None]:
# Download NLTK punkt tokenizer
nltk.download("punkt", quiet=True)

# Load ROUGE metric
rouge_metric = evaluate.load("rouge")

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

#### Methods

In [None]:
def accelerator_print_sep(accelerator=accelerator):
    sep = "#" * 12
    accelerator.print(sep)

def clear_gpu_memory():
    """Clear GPU memory by emptying the cache and collecting garbage."""
    torch.cuda.empty_cache()
    gc.collect()

def print_system_specs():
    # Check if CUDA is available
    is_cuda_available = torch.cuda.is_available()
    print("CUDA Available:", is_cuda_available)
# Get the number of available CUDA devices
    num_cuda_devices = torch.cuda.device_count()
    print("Number of CUDA devices:", num_cuda_devices)
    if is_cuda_available:
        for i in range(num_cuda_devices):
            # Get CUDA device properties
            device = torch.device('cuda', i)
            print(f"--- CUDA Device {i} ---")
            print("Name:", torch.cuda.get_device_name(i))
            print("Compute Capability:", torch.cuda.get_device_capability(i))
            print("Total Memory:", torch.cuda.get_device_properties(i).total_memory, "bytes")
    # Get CPU information
    print("--- CPU Information ---")
    print("Processor:", platform.processor())
    print("System:", platform.system(), platform.release())
    print("Python Version:", platform.python_version())

#### Specs

In [None]:
print_system_specs()

CUDA Available: False
Number of CUDA devices: 0
--- CPU Information ---
Processor: x86_64
System: Linux 6.1.58+
Python Version: 3.10.12


# <b>2 <span style='color:#78D118'>|</span> Fine-Tuning</b>

### Step 1 - Data Preparation

The first step of the fine-tuning process is to identify a specific task and supporting dataset.

We will use two datasets:

[Hello-SimpleAI/HC3](https://huggingface.co/datasets/Hello-SimpleAI/HC3?source=post_page-----d7817b77fac0--------------------------------)

and

[MohamedRashad/ChatGPT-prompts](https://huggingface.co/datasets/MohamedRashad/ChatGPT-prompts?source=post_page-----d7817b77fac0--------------------------------)

In [None]:
hello_dataset = load_dataset("Hello-SimpleAI/HC3", name="all")
hc3_dataset = load_dataset("MohamedRashad/ChatGPT-prompts")

Downloading data:   0%|          | 0.00/39.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/24322 [00:00<?, ? examples/s]

Downloading readme:   0%|          | 0.00/404 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/422k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/360 [00:00<?, ? examples/s]

In [None]:
hc3_dataset

In [None]:
hello_dataset

In [None]:
hello_df, hc3_df = pd.DataFrame(hello_dataset['train']), pd.DataFrame(hc3_dataset['train'])

# Test
# ---
#hello_df, hc3_df = hello_df.iloc[:10], hc3_df.iloc[:10]

In [None]:
questions, reference_answers = [], []

# Process Hello DataFrame
for _, row in hello_df.iterrows():

    for reference in row["human_answers"]:
        questions.append(row["question"])
        reference_answers.append(reference)

    for reference in row["chatgpt_answers"]:
        questions.append(row["question"])
        reference_answers.append(reference)

# Process Hc3 DataFrame
for _, row in hc3_df.iterrows():
    human_prompt = row["human_prompt"]
    chatgpt_response = row["chatgpt_response"]
    questions.append(human_prompt)
    reference_answers.append(chatgpt_response)

# Create a new DataFrame
df = pd.DataFrame()
df["question"] = questions
df["answer"] = reference_answers

# Save to CSV file
df.to_csv("./train.csv", index=False)

In [None]:
df.head()

In [None]:
dataset = Dataset.from_pandas(df, split='train')

Once the data is acquired, it is split into training and testing datasets, respectively, at the proportion of 70% and 30%, and this is achieved using the train_test_split function.

In [None]:
dataset = dataset.train_test_split(shuffle=True, test_size=0.3, seed=seed)
test_ds = dataset.pop("test")
dataset["validation"] = test_ds

In [None]:
dataset

In [None]:
pd.DataFrame(dataset[:15]).head(15)

### Step 2 - Model and Tokenizer initialization


In [None]:
model = T5ForConditionalGeneration.from_pretrained(model_name_or_path, cache_dir=cache_dir)

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, cache_dir=cache_dir)

data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

### Step 3 - Data formatting and tokenization



We have a significant amount of data in both training and testing datasets for the fine-tuning process.

But, before that we need to process the data to fit the fine-tuning format.

During the inference mode, the process of calling the model will be in this format:
```
“Answer this question: <USER_QUESTION>”
```
Where the ```<USER_QUESTION>``` is the question the user would like the answer about. To achieve that functionality, we need to format the training data by prefixing the task with the string ```“Answer this question”``` and this is done with the preprocess_function function below.

In addition to the formatting, the function also applies the tokenization of the inputs and outputs using the tokenizer function.


In [None]:
def preprocess_tokenize(examples):
    inputs = [prefix + doc for doc in examples["question"]]

    model_inputs = tokenizer(
        inputs,
        max_length=tokenizer_input_max_tokens,
        padding=True,
        truncation=True
    )

    # Setup the tokenizer for targets
    labels = tokenizer(
        examples["answer"],
        max_length=tokenizer_output_max_tokens,
        padding=True,
        truncation=True
    )

    model_inputs["labels"] = labels["input_ids"]

    return model_inputs

In [None]:
pip install --upgrade datasets

In [None]:
processed_datasets = dataset.map(
    preprocess_tokenize,
    batched=True,
    num_proc=1,  # Change this line to disable multiprocessing
    # or use num_proc=None
    remove_columns=dataset["train"].column_names,
    load_from_cache_file=False,
    desc="Running tokenizer on dataset",
)

del dataset

train_dataset = processed_datasets["train"]
eval_dataset = processed_datasets["validation"]

model, optimizer = accelerator.prepare(your_model, your_optimizer)

In [None]:
with accelerator.main_process_first():
    processed_datasets = dataset.map(
        preprocess_tokenize,
        batched=True,
        num_proc=tokenizer_num_proc,
        remove_columns=dataset["train"].column_names,
        load_from_cache_file=False,
        desc="Running tokenizer on dataset",
    )


del dataset

train_dataset = processed_datasets["train"]
eval_dataset = processed_datasets["validation"]

In [None]:
train_dataset

In [None]:
eval_dataset

### Step 4 - DataLoader

In [None]:
train_dataloader = DataLoader(
    train_dataset,
    shuffle=True,
    collate_fn=data_collator,
    batch_size=data_loader_batch_size,
    pin_memory=True
)

eval_dataloader = DataLoader(
    eval_dataset,
    collate_fn=data_collator,
    batch_size=data_loader_batch_size,
    pin_memory=True
)

#### Cleaning Memory

In [None]:
clear_gpu_memory()

### Step 5 - Lora


Description of each argument for the `LoraConfig` class in the `peft` module:

1. **`r` (int, default=8):**  
   Dimension/Rank of the LoRA decomposition. For each layer to be trained, the weight update matrix \( \Delta W \) of dimension \( d \times k \) is represented by a low-rank decomposition \( BA \), where \( B \) is a \( d \times r \) matrix and \( A \) is an \( r \times k \) matrix. The rank of the decomposition \( r \) is typically much smaller than the minimum between \( d \) and \( k \). The default value for \( r \) is 8.
   
<img src="https://miro.medium.com/v2/resize:fit:1400/format:webp/1*EnUd1eXLvXCxRZj9NW2BeA.png" alt="Learning" width="50%">

2. **`lora_alpha` (float, default=8):**  
   Alpha parameter for LoRA scaling. According to the LoRA paper, \( \Delta W \) is scaled by \( \alpha / r \), where \( \alpha \) is a constant. When optimizing with Adam, setting \( \alpha \) is roughly the same as setting the learning rate if initialization has been appropriately scaled. The default value for \( \alpha \) is 8.

3. **`target_modules` (list of str, default=None):**  
   Modules to apply LoRA on. You can select specific modules to fine-tune. This is a list of module names such as "q" (query module) and "v" (value module). The default is `None`, which means LoRA will be applied to all layers of the model.

4. **`lora_dropout` (float, default=0.01):**  
   Dropout rate for LoRA weights. This parameter controls the probability of zeroing out elements in the \( B \) matrix during training, helping to regularize the model. The default value is 0.01.

5. **`bias` (str, default="none"):**  
   Bias can be ‘none’, ‘all’ or ‘lora_only’. If ‘all’ or ‘lora_only’, the corresponding biases will be updated during training. Even when disabling the adapters, the model will not produce the same output as the base model would have without adaptation. The default is None.

6. **`task_type` (str, default="SEQ_2_SEQ_LM"):**  
   Task type. This is the type of task for which the model is fine-tuned. Possible options include "SEQ_2_SEQ_LM" (Sequence-to-Sequence Language Model) and other specific task types. The default is "SEQ_2_SEQ_LM".

These parameters allow customization of the LoRA fine-tuning behavior to fit specific application needs.

In [None]:
peft_config = LoraConfig(
    r=lora_r,
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    task_type=lora_task_type,
    inference_mode=lora_inference_mode
)

In [None]:
model = get_peft_model(model, peft_config)
accelerator_print_sep()
accelerator.print(model.print_trainable_parameters())
accelerator_print_sep()

### Step 6 -  Learning Ratescheduler

In [None]:
# Setup optimizer
optimizer = get_optimizer(model, lr)

In [None]:
lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=(len(train_dataloader) * num_epochs),
)

In [None]:
if getattr(accelerator.state, "fsdp_plugin", None) is not None:
    accelerator.state.fsdp_plugin.auto_wrap_policy = fsdp_auto_wrap_policy(model)

### Step 7 -  Accelerator setup

In [None]:
model, train_dataloader, eval_dataloader, optimizer, lr_scheduler = accelerator.prepare(
    model,
    train_dataloader,
    eval_dataloader,
    optimizer,
    lr_scheduler
)

### DeepSpeed

This code is designed to determine whether the model is currently using the third stage of the DeepSpeed decentralization process (zero_stage). DeepSpeed is a library that optimizes training models on distributed architectures, particularly on GPUs.

Detailed explanations:
- `accelerator`: It seems to refer to an object that manages hardware acceleration, possibly provided by the Hugging Face Accelerated Inference API.
- `accelerator.state`: Accesses the internal state of the accelerator object.
- `accelerator.state.deepspeed_plugin`: If DeepSpeed is in use, this accesses the object representing the DeepSpeed plugin within the accelerator state.
- `accelerator.state.deepspeed_plugin.zero_stage`: Accesses the current stage of the DeepSpeed decentralization process.

In [None]:
is_ds_zero_3 = False
if getattr(accelerator.state, "deepspeed_plugin", None):
    is_ds_zero_3 = accelerator.state.deepspeed_plugin.zero_stage == 3

### Step 7 -  Train

In [None]:
def train_epoch(model, train_dataloader, optimizer, lr_scheduler, accelerator):
    """
    Train the model for one epoch.
    """
    model.train()
    total_loss = 0

    for step, batch in enumerate(tqdm(train_dataloader, disable=not accelerator.is_main_process)):
        outputs = model(**batch)
        loss = outputs.loss
        total_loss += loss.detach().float()
        accelerator.backward(loss)
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()

    return total_loss / len(train_dataloader)

def evaluate_epoch(model, eval_dataloader, tokenizer, accelerator):
    """
    Evaluate the model on the validation dataset for one epoch.
    """
    model.eval()
    eval_loss = 0
    eval_preds = []

    for step, batch in enumerate(tqdm(eval_dataloader, disable=not accelerator.is_main_process)):
        with torch.no_grad():
            outputs = model(**batch)

        loss = outputs.loss
        eval_loss += loss.detach().float()
        preds = accelerator.gather_for_metrics(torch.argmax(outputs.logits, -1)).detach().cpu().numpy()
        eval_preds.extend(tokenizer.batch_decode(preds, skip_special_tokens=True))

    return eval_loss / len(eval_dataloader), eval_preds

def save_model(model, peft_model_name_or_path, accelerator, hub_repo_name):
    """
    Save the model to the specified path.
    """
    accelerator.wait_for_everyone()
    model.save_pretrained(peft_model_name_or_path)
    tokenizer.save_pretrained(peft_model_name_or_path)
    model.push_to_hub(hub_repo_name)
    tokenizer.push_to_hub(hub_repo_name)
    accelerator.wait_for_everyone()

def plot_loss(train_losses, eval_losses):
    """
    Plot training and validation losses.
    """
    plt.figure(figsize=(10, 6))

    # Move tensors to CPU
    train_losses_cpu = [loss.cpu().item() for loss in train_losses]
    eval_losses_cpu = [loss.cpu().item() for loss in eval_losses]

    epochs = range(1, len(train_losses_cpu) + 1)

    plt.plot(epochs, train_losses_cpu, label='Training Loss')
    plt.plot(epochs, eval_losses_cpu, label='Validation Loss')

    plt.title('Training and Validation Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.show()

In [None]:
peft_model_name_or_path = get_peft_model_name_or_path(model_name_or_path, peft_config)

train_losses = []
eval_losses = []
rouge_scores = []

# Main training loop
for epoch in range(num_epochs):
    # Train
    # ---
    accelerator_print_sep()
    accelerator.print(f"Train epoch: {epoch=}")

    train_loss = train_epoch(model, train_dataloader, optimizer, lr_scheduler, accelerator)
    train_ppl = torch.exp(train_loss)

    accelerator_print_sep()
    accelerator.print(f"{epoch=}: {train_ppl=} {train_loss=}")

    train_losses.append(train_loss)

    # Eval
    # ---
    accelerator_print_sep()
    accelerator.print(f"Eval epoch: {epoch=}")

    eval_loss, eval_preds = evaluate_epoch(model, eval_dataloader, tokenizer, accelerator)
    eval_ppl = torch.exp(eval_loss)

    accelerator_print_sep()
    accelerator.print(f"{epoch=}: {eval_ppl=} {eval_loss=}")

    eval_losses.append(eval_loss)

    # Save
    # ---
    accelerator_print_sep()
    accelerator.print(f"Save epoch: {epoch=}")
    save_model(model, peft_model_name_or_path, accelerator, hub_repo_name)

#### Plot

In [None]:
plot_loss(train_losses, eval_losses)

#### Cleaning Memory

In [None]:
del model
del train_losses
del eval_losses
del rouge_scores
del lr_scheduler
del optimizer
del peft_config
del train_dataloader
del eval_dataloader
del tokenizer
del data_collator

clear_gpu_memory()

# <b>3 <span style='color:#78D118'>|</span> Performance Evaluation</b>

### Step 1 - Load model and apply Perf

In [None]:
# Local Load
# ----
# Load Model
#tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
#model = T5ForConditionalGeneration.from_pretrained(model_name_or_path)

# Apply Perf
#finetuned_model = PeftModel.from_pretrained(model, peft_model_name_or_path, device_map={"":0})

# Hub Load
# ----
tokenizer = AutoTokenizer.from_pretrained(hub_repo_name)
finetuned_model = T5ForConditionalGeneration.from_pretrained(hub_repo_name)

In [None]:
finetuned_model.eval()

### Step 2 - Pipeline

In [None]:
text_generation_pipeline = pipeline(
    task=pipeline_task,
    model=finetuned_model,
    tokenizer=tokenizer,
    truncation=True,
    max_length=pipeline_max_length,
    min_length=pipeline_min_length,
    temperature=pipeline_temperature,
    device=0 # Set device to 0 for GPU, -1 for CPU
)

### Step 3 - Evaluation

#### Shot Evaluation

In [None]:
questions = [

    ## Linked questions
    "How do companies profit from war?",
    "Why do U.S. companies rebuild towns after wars?",
    "Is there a World War Hulk movie in production?",
    "What is the concept of paying off the principal of a home vs. investing in a mutual fund?",
    "I'm making real money for the first time, what should I do with it?",
    "How are the first days of each season chosen?",
    "Why are laws requiring identification for voting scrutinized by the media?",
    "Why aren't there many new operating systems being created?",
    "How can schools keep students after the normal school day?",
    "What are the duties of military personnel stationed in peaceful countries?",
    "Why did the world's magnetic field reverse 780,000 years ago?",
    "What is Sherlock Holmes' job?",
    "Explain the fifthworldproblems subreddit like I'm five.",
    "Why were Shaq and Kobe seen as rivals despite winning three finals in a row?",
    "If a filter-feeding whale swallows a turtle, can it digest it normally?",
    "Why does the pubic region have darker skin than the rest of the body?",
    "How does the Military-Industrial Complex work?",
    "Why is there no World War Hulk movie in production?",
    "What factors should be considered when deciding to pay off a home or invest in a mutual fund?",
    "What is a Roth IRA, and why is it a good idea to open one?",

    ## Generic questions
    "How do ants decide where to build their colonies?",
    "What would happen if all the bees disappeared from the Earth?",
    "Can robots ever have feelings like humans?",
    "If animals could talk, which species do you think would be the most chatty?",
    "What would a world without gravity be like?",
    "If you could time travel, would you go to the past or the future?",
    "How do plants know when it's time to bloom?",
    "If you could have any superpower, what would it be and why?",
    "What if our dreams were actually glimpses of alternate realities?",
    "How do birds know where to migrate each year?",
    "If you could create a new color, what would you name it?",
    "What if everyone on Earth spoke the same language?",
    "How would life be different if humans had tails?",
    "If you could be any fictional character for a day, who would you choose?",
    "What if we discovered a parallel universe right next to ours?",
    "How do animals in the wild know which plants are safe to eat?",
    "If you could design a new planet, what features would it have?",
    "What if we could communicate with dolphins?",
    "How do clouds decide when to rain?",
    "If you could swap lives with any person for a week, who would it be and why?"
]

In [None]:
transformed_questions = [prefix + question for question in questions]

generated_texts = text_generation_pipeline(transformed_questions, do_sample=True)

In [None]:
predictions = [output_text["generated_text"] for output_text in generated_texts]
questions = [question.split(":")[-1].strip() for question in questions]
data = []

for question, prediction in zip(questions, predictions):
    data.append({
        "Input": question,
        "Output": prediction
    })

df = pd.DataFrame(data)

display(df)

#### Rouge Evaluation

In [None]:
questions = df.sample(n=50, random_state=seed)

In [None]:
questions.head()

In [None]:
reference_answers = questions['answer']
questions = [prefix + question for question in questions['question']]

In [None]:
questions[:1]

In [None]:
reference_answers[:1]

In [None]:
generated_texts = text_generation_pipeline(questions, do_sample=True)

Note: The minimum token is set to 20 in the pipeline. The score of Rouge maybe lower for references with little answer.

In [None]:
predictions = [output_text["generated_text"] for output_text in generated_texts]
data = []

for question, reference, prediction in zip(questions, reference_answers, predictions):

    rouge_result = rouge_metric.compute(
        predictions=[prediction],
        references=[reference]
    )

    data.append({
        "Input": question.split(":")[-1].strip(),
        "Reference": reference,
        "Output": prediction,
        "Rouge-1 Score": rouge_result['rouge1'],
        "Rouge-2 Score": rouge_result['rouge2'],
        "Rouge-L Score": rouge_result['rougeL'],
        "Rouge-Lsum Score": rouge_result['rougeLsum'],
    })

df_result = pd.DataFrame(data)

display(df_result)