# PEFT - Prompt Tuning (SST2)

This notebook explore prompt tuning techniques OPT 1.3b on SST-2 dataset.

The Stanford Sentiment Treebank (SST) is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.

Binary classification experiments on full sentences (negative or somewhat negative vs somewhat positive or positive with neutral sentences discarded) refer to the dataset as SST-2 or SST binary.

Adapted from Hugging face tutorial:
https://huggingface.co/docs/peft/main/en/task_guides/ptuning-seq-classification
And Databricks course:
https://github.com/databricks-academy/llm-foundation-models/tree/published/LLM%2002%20-%20PEFT

> Add blockquote



In [1]:
%pip install peft==0.4.0 torch datasets

Collecting peft==0.4.0
  Downloading peft-0.4.0-py3-none-any.whl.metadata (21 kB)
Collecting datasets
  Downloading datasets-2.20.0-py3-none-any.whl.metadata (19 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Using cached nvidia_cufft_cu

In [2]:
!pip install -U bitsandbytes

Collecting bitsandbytes
  Downloading bitsandbytes-0.43.2-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Downloading bitsandbytes-0.43.2-py3-none-manylinux_2_24_x86_64.whl (137.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.5/137.5 MB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: bitsandbytes
Successfully installed bitsandbytes-0.43.2


In [3]:
!pip install --upgrade peft

Collecting peft
  Downloading peft-0.12.0-py3-none-any.whl.metadata (13 kB)
Downloading peft-0.12.0-py3-none-any.whl (296 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.4/296.4 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: peft
  Attempting uninstall: peft
    Found existing installation: peft 0.4.0
    Uninstalling peft-0.4.0:
      Successfully uninstalled peft-0.4.0
Successfully installed peft-0.12.0


In [4]:
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    AutoModelForSequenceClassification,
    Trainer,
    DataCollatorForLanguageModeling,
    DataCollatorWithPadding,
    BitsAndBytesConfig,
)
from datasets import load_dataset

In [5]:
model_name = "facebook/opt-1.3b"

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
if getattr(tokenizer, "pad_token_id") is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id

# Create a BitsAndBytesConfig object with the desired quantization settings
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,  # or load_in_8bit=True, as needed
    load_in_8bit=False  # if using 4-bit, set this to False
)

model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    device_map="auto",
    quantization_config=quantization_config,
    num_labels=2,
    id2label={"0": "Positive", "1": "Negative"},
    label2id={"Positive": "0", "Negative": "1"},
)

Before doing any fine-tuning, we will ask the model to generate a new phrase to the following input sentence.

In [7]:
# input1 = tokenizer("Two things are infinite: ", return_tensors="pt")

In [8]:
# foundation_outputs = foundation_model.generate(
#     input_ids=input1["input_ids"],
#     attention_mask=input1["attention_mask"],
#     max_new_tokens=7,
#     eos_token_id=tokenizer.eos_token_id
#     )
# print(tokenizer.batch_decode(foundation_outputs, skip_special_tokens=True))

In [9]:
# Function to perform inference
def run_inference(text, tokenizer, model, device):
    # Tokenize the input text
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
    inputs = {key: value.to(device) for key, value in inputs.items()}

    # Run the input through the model
    with torch.no_grad():
        outputs = model(**inputs)

    # Get the logits and apply softmax to get probabilities
    logits = outputs.logits
    probabilities = torch.softmax(logits, dim=-1)

    return probabilities

In [15]:
# Ensure the models are on the correct device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# model.to(device)

In [9]:
input_sentence = "It was a bad movie."

examples = [
        "Example: This is the worst movie I have ever seen. I would never recommend it to anyone.\nClassification: Negative",
        "Example: I hate this product. It broke after one use. Classification: Negative",
        "Example: I love this product! It works exactly as advertised. Classification: Positive",
        "Example: This is the best movie I have ever seen. I will watch it again. Classification: Positive"
    ]

prompt = "\n".join(examples) + f"\nInput sentence: {input_sentence}\nClassification:"

# Classify the sentiment of the following sentence as positive or negative.
# Input sentence: holden caulfield did it better .
# """

# Run inference using the teacher model
model_prob = run_inference(input_sentence, tokenizer, model, device)
print("Model Probabilities (on input sentence):", model_prob)
model_prob = run_inference(prompt, tokenizer, model, device)
print("Model Probabilities (on prompt):", model_prob)

Model Probabilities (on input sentence): tensor([[0.6234, 0.3766]], device='cuda:0')
Model Probabilities (on prompt): tensor([[0.9276, 0.0724]], device='cuda:0')


The output is not too bad. However, the dataset OPT is pre-trained is not for a classification task. Therefore, we are going to fine-tune `opt1.3b` on [a dataset called `SST2` containing pairs of sentences and their labels either being Positive or Negative.

In [10]:
dataset_id="glue"
dataset_config="sst2"

In [11]:
dataset = load_dataset(dataset_id, dataset_config)
# dataset = load_dataset('stanfordnlp/sst2')
dataset

Downloading readme:   0%|          | 0.00/35.3k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/3.11M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/72.8k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/148k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/67349 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/872 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1821 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 67349
    })
    validation: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 872
    })
    test: Dataset({
        features: ['sentence', 'label', 'idx'],
        num_rows: 1821
    })
})

In [12]:
def tokenize_function(examples):
    # max_length=None => use the model max length (it's actually the default)
    outputs = tokenizer(examples["sentence"], truncation=True, max_length=512)
    return outputs

In [13]:
tokenized_datasets = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=["idx"],
)

tokenized_datasets = tokenized_datasets.rename_column("label", "labels")

Map:   0%|          | 0/67349 [00:00<?, ? examples/s]

Map:   0%|          | 0/872 [00:00<?, ? examples/s]

Map:   0%|          | 0/1821 [00:00<?, ? examples/s]

In [14]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, padding="longest")

In [15]:
# data = dataset.map(lambda samples: tokenizer(samples["sentence"], padding=True, truncation=True, max_length=512), batched=True)
# data.set_format('torch')
# data["train"] = data["train"].rename_column("label", "labels")
# train_sample = data["train"].select(range(500))
# display(train_sample)

In [None]:
# data["validation"].set_format(type='torch', columns=['sentence', 'label', 'idx', 'input_ids', 'attention_mask'])

# dataloader = torch.utils.data.DataLoader(data["validation"], batch_size=32)
# next(iter(dataloader))

In [None]:
# valid_sample = data["validation"].select(range(100))
# valid_sample = valid_sample.rename_column("label", "labels")
# display(valid_sample)

Onto fine-tuning: define PEFT configurations for random initialization

Recall that prompt tuning allows both random and initialization of soft prompts or also known as virtual tokens. We will compare the model outputs from both initialization methods later. For now, we will start with random initialization, where all we provide is the length of the virtual prompt.
API docs:
* [PromptTuningConfig](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig)
* [PEFT model](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig)

In [16]:
from peft import  get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit, PromptEncoderConfig

In [17]:
# peft_config = PromptTuningConfig(
#     task_type=TaskType.CAUSAL_LM|SEQ_CLS,
#     prompt_tuning_init=PromptTuningInit.RANDOM,
#     num_virtual_tokens=4,
#     tokenizer_name_or_path=model_name
# )
# peft_model = get_peft_model(foundation_model, peft_config)
# print(peft_model.print_trainable_parameters())

In [18]:
peft_config = PromptEncoderConfig(task_type="SEQ_CLS", num_virtual_tokens=4, encoder_hidden_size=128)
peft_model = get_peft_model(model, peft_config)
peft_model.print_trainable_parameters()

trainable params: 1,093,888 || all params: 6,659,576,064 || trainable%: 0.0164


That's the beauty of PEFT! It allows us to drastically reduce the number of trainable parameters. Now, we can proceed with using [HuggingFace's `Trainer` class](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#trainer) and its [`TrainingArguments` to define our fine-tuning configurations](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments).

The `Trainer` class provides user-friendly abstraction to leverage PyTorch under the hood to conduct training.

In [8]:
import numpy as np
from datasets import load_metric

# define metrics and metrics function
accuracy_metric = load_metric( "accuracy")

def compute_metrics(eval_pred):

    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    acc = accuracy_metric.compute(predictions=predictions, references=labels)
    return {
        "accuracy": acc["accuracy"] if "accuracy" in acc else 0,
    }

  accuracy_metric = load_metric( "accuracy")


Downloading builder script:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

The repository for accuracy contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/accuracy.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


COMMAND ----------

We will also use `Data Collator` to help us form batches of inputs to pass in to the model for training. Go [here](https://huggingface.co/docs/transformers/main/en/main_classes/data_collator#data-collator) for documentation.

Specifically, we will be using `DataCollatorforLanguageModeling` which will additionally pad the inputs to the maximum length of a batch since the inputs can have variable lengths. Refer to [API docs here](https://huggingface.co/docs/transformers/main/en/main_classes/data_collator#transformers.DataCollatorForLanguageModeling).

Note: This cell might take ~10 mins to train. **Decrease `num_train_epochs` above to speed up the training process.** On another hand, you might notice that this cells triggers a whole new MLflow run. [MLflow](https://mlflow.org/docs/latest/index.html) is an open source tool that helps to manage end-to-end machine learning lifecycle, including experiment tracking, ML code packaging, and model deployment. You can read more about [LLM tracking here](https://mlflow.org/docs/latest/llm-tracking.html).

In [20]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="jmokdad3/opt-large-peft-p-tuning",
    learning_rate=1e-3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=10,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)



In [21]:
train_dataset = tokenized_datasets["train"].select(range(500))
eval_dataset = tokenized_datasets["validation"].select(range(100))

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

In [22]:
# import os
# os.environ["CUDA_LAUNCH_BLOCKING"]="1"

# Train the model
trainer.train()

OPTForSequenceClassification will not detect padding tokens in `inputs_embeds`. Results may be unexpected if using padding tokens in conjunction with `inputs_embeds.`


Epoch,Training Loss,Validation Loss,Accuracy
1,No log,,0.48
2,No log,,0.48
3,No log,,0.48
4,No log,,0.48
5,No log,,0.48
6,No log,,0.48
7,No log,,0.48
8,0.350800,,0.48
9,0.350800,,0.48
10,0.350800,,0.48


TypeError: 'method' object is not subscriptable

In [19]:
from torch.utils.data import DataLoader
import torch
import numpy as np

# Ensure the model is in evaluation mode
peft_model.eval()

data_collator = DataCollatorWithPadding(tokenizer)

# Prepare DataLoader for the validation set
valid_dataloader = DataLoader(valid_sample, batch_size=16, collate_fn=data_collator)

# Initialize variables to track accuracy
correct_predictions = 0
total_predictions = 0

# Loop over the validation data
for batch in valid_dataloader:
    inputs = {k: v.to(device) for k, v in batch.items()}
    labels = inputs.pop("labels")
    labels = inputs.pop("idx")

    with torch.no_grad():
        outputs = peft_model(**inputs)
        predictions = torch.argmax(outputs.logits, dim=-1)

    # Update the number of correct predictions
    correct_predictions += (predictions == labels).sum().item()
    total_predictions += labels.size(0)

# Calculate accuracy
accuracy = correct_predictions / total_predictions
print(f"Validation Accuracy: {accuracy:.4f}")

Validation Accuracy: 0.0000


In [None]:
input_sentence = "It was a bad movie."

examples = [
        "Example: This is the worst movie I have ever seen. I would never recommend it to anyone.\nClassification: Negative",
        "Example: I hate this product. It broke after one use. Classification: Negative",
        "Example: I love this product! It works exactly as advertised. Classification: Positive",
        "Example: This is the best movie I have ever seen. I will watch it again. Classification: Positive"
    ]

prompt = "\n".join(examples) + f"\nInput sentence: {input_sentence}\nClassification:"

# Classify the sentiment of the following sentence as positive or negative.
# Input sentence: holden caulfield did it better .
# """

# Run inference using the model
model_prob = run_inference(input_sentence, tokenizer, model, device)
print("Model Probabilities (sentence):", model_prob)
model_prob = run_inference(prompt, tokenizer, model, device)
print("Model Probabilities (prompt):", model_prob)

## Save model

In [29]:
import time
import os

In [31]:
time_now = time.time()
peft_model_path = os.path.join("jmokdad3/opt-large-peft-p-tuning", f"peft_model_{time_now}")
trainer.model.save_pretrained(peft_model_path)

## Inference

You can load the model from the path that you have saved to before, and ask the model to generate text based on our input before!

In [32]:
from peft import PeftModel

In [34]:
loaded_model = PeftModel.from_pretrained(model,
                                         peft_model_path,
                                         is_trainable=False)

In [None]:
loaded_model_outputs = loaded_model.generate(
    input_ids=input1["input_ids"],
    attention_mask=input1["attention_mask"],
    max_new_tokens=7,
    eos_token_id=tokenizer.eos_token_id
    )
print(tokenizer.batch_decode(loaded_model_outputs, skip_special_tokens=True))

## Text initialization

Our fine-tuned, randomly initialized model did pretty well on the classification. Let's now compare it with the text initialization method.

Notice that all we are changing is the `prompt_tuning_init` setting and we are also providing a concise text prompt.

API docs
* [prompt_tuning_init_text](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig.prompt_tuning_init_text)

In [None]:
text_peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.TEXT,
    prompt_tuning_init_text="Generate inspirational quotes", # this provides a starter for the model to start searching for the best embeddings
    num_virtual_tokens=3, # this doesn't have to match the length of the text above
    tokenizer_name_or_path=model_name
)
text_peft_model = get_peft_model(foundation_model, text_peft_config)
print(text_peft_model.print_trainable_parameters())

In [None]:
text_trainer = Trainer(
    model=text_peft_model,
    args=training_args,
    train_dataset=train_sample,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

In [None]:
text_trainer.train()

In [None]:
# Save the model
time_now = time.time()
text_peft_model_path = os.path.join(output_directory, f"text_peft_model_{time_now}")
text_trainer.model.save_pretrained(text_peft_model_path)

In [None]:
# Load model
loaded_text_model = PeftModel.from_pretrained(
    foundation_model.to("cpu"),
    text_peft_model_path,
    is_trainable=False
)

In [None]:
# Generate output
text_outputs = loaded_text_model.generate(
    input_ids=input1["input_ids"],
    attention_mask=input1["attention_mask"],
    max_new_tokens=7,
    eos_token_id=tokenizer.eos_token_id
)

In [None]:
print(tokenizer.batch_decode(text_outputs, skip_special_tokens=True))

You can see that text initialization doesn't necessarily perform better than random initialization.

In [42]:
from huggingface_hub import notebook_login

In [43]:
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [44]:
# TODO
hf_username = "JadMokdad"
peft_model_id = f"{hf_username}/bloom_prompt_tuning_{time_now}"
trainer.model.push_to_hub(peft_model_id, use_auth_token=True)



adapter_model.safetensors:   0%|          | 0.00/82.1k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/JadMokdad/bloom_prompt_tuning_1722245590.441907/commit/f05318417f97808826225108881eb80e4f308f56', commit_message='Upload model', commit_description='', oid='f05318417f97808826225108881eb80e4f308f56', pr_url=None, pr_revision=None, pr_num=None)

In [45]:
from peft import PeftModel, PeftConfig

In [46]:
config = PeftConfig.from_pretrained(peft_model_id)
foundation_model = AutoModelForCausalLM.from_pretrained(peft_config.base_model_name_or_path)
peft_random_model = PeftModel.from_pretrained(foundation_model, peft_model_id)

adapter_config.json:   0%|          | 0.00/434 [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/82.1k [00:00<?, ?B/s]

COMMAND ----------

In [None]:
online_model_outputs = peft_random_model.generate(
    input_ids=input1["input_ids"],
    attention_mask=input1["attention_mask"],
    max_new_tokens=7,
    eos_token_id=tokenizer.eos_token_id
    )

In [None]:
print(tokenizer.batch_decode(online_model_outputs, skip_special_tokens=True))