# **Prompt Tuning for Sequence Classificaiton**

In this example we will be fine-tuning the model *RoBERTa Large* to classify a sequence of tokens. For this purpose, we will use a PEFT method called **Prompt Tuning**, which prepends a trainable embedding matrix to the input embeddings. We will use **transformers** to download models and training, **datasets** for data downdload **peft** for Prompt Tuning model initialization, **evaluate** for loading evaluation metrics and **wandb** (Weights & Biases) to log the results.

You can also open this example in google colab:

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Wicwik/peft_tutorial/blob/main/examples/pt_classification.ipynb)

### **0. Install and import required modules**

In [None]:
# 4.36.0 for compatibility with adapters
%pip install -q --user transformers==4.36.0
%pip install -q --user datasets
%pip install -q --user peft
%pip install -q --user evaluate
# %pip install -q --user wandb

In [None]:
import torch
# import wandb
import evaluate

from peft import (
    get_peft_model,
    PromptTuningConfig,
    TaskType,
    PromptTuningInit,
)
from transformers import ( 
    AutoModelForSequenceClassification, 
    AutoTokenizer, 
    TrainingArguments,
    Trainer,
    default_data_collator
)

from datasets import load_dataset

### **1. Set variables**

We will be fine-tuning the pre-trained version of model [roberta-large](https://huggingface.co/FacebookAI/roberta-large) which has **355M** parameters. We will set the max **input length to 128** tokens and train for **3 epochs** with **batch size of 32**. 

For Prompt Tuning we will also set the *num_virtual_token* variable, which represents the lenght of the soft prompt.

In [None]:
device = "cuda"
model_name_or_path = "roberta-large"
tokenizer_name_or_path = "roberta-large"

max_length = 128
lr = 1e-3
num_epochs = 3
batch_size = 32 # in case of "unable to allocate" errors, decrease batch size to some lower number (e.g. 8,16)
num_virtual_tokens = 10

### **2. Create PEFT model**

Next we will create the PEFT model. The Hugging Face PEFT module will freeze the weights and add prompt encoder automatically. 

We are also passing the *prompt_tuning_init_text* parameter to initialize the soft prompt with a text, which also requires the models tokenizer to be speiciefied. In this case the init text is transformed to tokens and those tokens are than used to get embeddings from model's vocabulary. These embeddings are than used to initialize the prompt encoder weights. Since we will be working with paraphrases, we would like the model to determine, if the sentence is paraphrase.

Compare the model architecutres with and without the added prompt encoder weights.

In [None]:
peft_config = PromptTuningConfig(
    task_type=TaskType.SEQ_CLS,
    num_virtual_tokens=num_virtual_tokens,
    prompt_tuning_init=PromptTuningInit.TEXT,
    tokenizer_name_or_path=tokenizer_name_or_path,
    prompt_tuning_init_text="Is the meaning of these sentences equivalent:",
)


model = AutoModelForSequenceClassification.from_pretrained(model_name_or_path, return_dict=True)

# comment next 2 lines if you want to do FFT
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
model

We can also see that we have been able to reduce the number of trainable parameters to mere **0.3% of original model parameters**.

### **3. Dataset and preprocessing**

The dataset that we will be using is called [Microsoft Research Paraphrase Corpus (MRPC)](https://huggingface.co/datasets/financial_phrasebank). It is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent. The dataset contains 5.8k samples from which are **3.67k samples used for training**. We don't need to do any splits, since the dataset is fully annotated and contains train, valid and test sets.

In [None]:
# we have also a usable test split already, so we don't need to make it
dataset = load_dataset("glue", "mrpc")
dataset["train"][0]

Now we will tokenize the dataset. We only don't need to tokenize the labels because we will train model to return real numbers.

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

def preprocess_function(examples):
    model_inputs = tokenizer(examples["sentence1"], examples["sentence2"], max_length=max_length, padding="max_length", truncation=True, return_tensors="pt")
    return model_inputs

processed_datasets = dataset.map(
    preprocess_function,
    batched=True,
    num_proc=1,
    remove_columns=["sentence1", "sentence2", "idx"],
    load_from_cache_file=False,
    desc="Running tokenizer on dataset",
)

processed_datasets = processed_datasets.rename_column("label", "labels")

train_dataset = processed_datasets["train"].shuffle()
eval_dataset = processed_datasets["validation"]
test_dataset = processed_datasets["test"]

### **4. Training and evaluation**

For training we are using the Hugging Face [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) and provide it with [TrainingArguments](https://huggingface.co/docs/transformers/v4.38.2/en/main_classes/trainer#transformers.TrainingArguments). The trainer will take a *compute_metrics* method that will be used to compute metrics during the evaluation. 

For GLUE MRPC dataset *evaluate* computes F1 and accuracy.

In [None]:
metric = evaluate.load("glue", "mrpc")

def compute_metrics(eval_preds):
    preds, labels = eval_preds
    preds = preds.argmax(axis=1)

    return metric.compute(predictions=preds, references=labels)

training_args = TrainingArguments(
    "out",
    per_device_train_batch_size=batch_size,
    learning_rate=lr,
    num_train_epochs=num_epochs,
    evaluation_strategy="epoch",
    logging_strategy="epoch",
    save_strategy="no",
)

Now we will do the traning and evaluation. Give a quick look on GPU memory usage, how much are we using? How would the memory usage change if we would do FFT?

In [None]:

trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    data_collator=default_data_collator,
    compute_metrics=compute_metrics,
)
trainer.train()

trainer.evaluate(eval_dataset=test_dataset, metric_key_prefix="test")

if wandb.run is not None:
    wandb.finish()

### **5. Save and load**

Now we can save the model just with *save_pretrained* method (like we would for other Hugging Face transformers models).

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

peft_model_id = f"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}"
model.save_pretrained(peft_model_id)

ckpt = f"{peft_model_id}/adapter_model.safetensors"
!du -h $ckpt

We can now load the pre-trained model and give it a custom example.

Notice that we have saved the last version of the model. In more real scenario, we would like to save the model with the best validation score and load it at the end of the training. We can do this with training args.

In [None]:
from peft import PeftModel, PeftConfig

peft_model_id = f"{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}"

config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForSequenceClassification.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(model, peft_model_id)

In [None]:
inputs = tokenizer("this is an apple", "this is a fruit", return_tensors="pt")
print(inputs)
with torch.no_grad():
    outputs = model(**inputs)
    print(outputs.logits.argmax(axis=1))