
# Prompt Tuning
This lesson introduces how to apply prompt tuning to your model of choice using [Parameter-Efficient Fine-Tuning (PEFT) library developed by HuggingFace](https://huggingface.co/docs/peft/index). This PEFT library supports multiple methods to reduce the number of parameters for fine-tuning, including prompt tuning and LoRA. For a complete list of methods, refer to their [documentation](https://huggingface.co/docs/peft/main/en/index#supported-methods). Only a subset of models and tasks are supported by this PEFT library for the time being, including GPT-2, LLaMA; for pairs of models and tasks supported, refer to this [page](https://huggingface.co/docs/peft/main/en/index#supported-models).


### ![Dolly](https://files.training.databricks.com/images/llm/dolly_small.png) Learning Objectives
1. Apply prompt tuning to your model of choice
1. Fine-tune on your provided dataset
1. Save and share your model to HuggingFace hub
1. Conduct inference using the fine-tuned model
1. Compare outputs from randomly- and text-initialized fine-tuned model vs. foundation model

In [None]:
%pip install peft==0.4.0

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m


In [None]:
%run ../Includes/Classroom-Setup

Resetting the learning environment:
| enumerating serving endpoints...found 0...(0 seconds)
| No action taken

Skipping install of existing datasets to "dbfs:/mnt/dbacademy-datasets/llm-foundation-models/v01-raw"

Validating the locally installed datasets:
| listing local files...(3 seconds)
| validation completed...(3 seconds total)


Importing lab testing framework.



Using the "default" schema.

Predefined paths variables:
| DA.paths.working_dir: /dbfs/mnt/dbacademy-users/labuser4687840@vocareum.com/llm-foundation-models
| DA.paths.user_db:     dbfs:/mnt/dbacademy-users/labuser4687840@vocareum.com/llm-foundation-models/database.db
| DA.paths.datasets:    /dbfs/mnt/dbacademy-datasets/llm-foundation-models/v01-raw

Setup completed (6 seconds)

The models developed or used in this course are for demonstration and learning purposes only.
Models may occasionally output offensive, inaccurate, biased information, or harmful instructions.



[Auto Classes](https://huggingface.co/docs/transformers/main/en/model_doc/auto#auto-classes) helps you automatically retrieve the relevant model and tokenizers, given the pre-trained models you are interested in using. 

Causal language modeling refers to the decoding process, where the model predicts the next token based on only the tokens on the left. The model cannot see the future tokens, unlike masked language models that have full access to tokens bidirectionally. A canonical example of a causal language model is GPT-2. You also hear causal language models being described as autoregresssive as well. 

API docs:
* [AutoTokenizer](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoTokenizer)
* [AutoModelForCausalLM](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoModelForCausalLM)

In this demo, we will be using `bigscience/bloomz-560m` as our **foundation** causal LM to generate text. You can read more about [`bloomz` model here](https://huggingface.co/bigscience/bloomz). It was trained on [multi-lingual dataset](https://huggingface.co/datasets/bigscience/xP3), spanning 46 languages and 13 programming langauges. The dataset covers a wide range of NLP tasks, including Q/A, title generation, text classification.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "bigscience/bloomz-560m"

tokenizer = AutoTokenizer.from_pretrained(model_name)
foundation_model = AutoModelForCausalLM.from_pretrained(model_name)

Before doing any fine-tuning, we will ask the model to generate a new phrase to the following input sentence. 

In [None]:
input1 = tokenizer("Two things are infinite: ", return_tensors="pt")

foundation_outputs = foundation_model.generate(
    input_ids=input1["input_ids"], 
    attention_mask=input1["attention_mask"], 
    max_new_tokens=7, 
    eos_token_id=tokenizer.eos_token_id
    )
print(tokenizer.batch_decode(foundation_outputs, skip_special_tokens=True))

['Two things are infinite:  the number of people and the number']


The output is not too bad. However, the dataset BLOOMZ is pre-trained on doesn't cover anything about inspirational English quotes. Therefore, we are going to fine-tune `bloomz-560m` on [a dataset called `Abirate/english_quotes`](https://huggingface.co/datasets/Abirate/english_quotes)  containing exclusively inspirational English quotes, with the hopes of using the fine-tuned version to generate more quotes later! 

In [None]:
from datasets import load_dataset

data = load_dataset("Abirate/english_quotes")

data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
train_sample = data["train"].select(range(50))
display(train_sample) 



  0%|          | 0/1 [00:00<?, ?it/s]



Dataset({
    features: ['quote', 'author', 'tags', 'input_ids', 'attention_mask'],
    num_rows: 50
})

## Onto fine-tuning: define PEFT configurations for random initialization

Recall that prompt tuning allows both random and initialization of soft prompts or also known as virtual tokens. We will compare the model outputs from both initialization methods later. For now, we will start with random initialization, where all we provide is the length of the virtual prompt. 

API docs:
* [PromptTuningConfig](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig)
* [PEFT model](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig)

In [None]:
from peft import  get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit

peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.RANDOM,
    num_virtual_tokens=4,
    tokenizer_name_or_path=model_name
)
peft_model = get_peft_model(foundation_model, peft_config)
print(peft_model.print_trainable_parameters())

trainable params: 4,096 || all params: 559,218,688 || trainable%: 0.0007324504863471229
None


That's the beauty of PEFT! It allows us to drastically reduce the number of trainable parameters. Now, we can proceed with using [HuggingFace's `Trainer` class](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#trainer) and its [`TrainingArugments` to define our fine-tuning configurations](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments). 

The `Trainer` class provides user-friendly abstraction to leverage PyTorch under the hood to conduct training. 

In [None]:
from transformers import TrainingArguments
import os

output_directory = os.path.join(DA.paths.working_dir, "peft_outputs") # can give some other path here

if not os.path.exists(DA.paths.working_dir):
    os.mkdir(DA.paths.working_dir)
if not os.path.exists(output_directory):
    os.mkdir(output_directory)

training_args = TrainingArguments(
    output_dir=output_directory, # Where the model predictions and checkpoints will be written
    no_cuda=True, # This is necessary for CPU clusters. 
    auto_find_batch_size=True, # Find a suitable batch size that will fit into memory automatically 
    learning_rate= 3e-2, # Higher learning rate than full fine-tuning
    num_train_epochs=5 # Number of passes to go through the entire fine-tuning dataset 
)

## Train

We will also use `Data Collator` to help us form batches of inputs to pass in to the model for training. Go [here](https://huggingface.co/docs/transformers/main/en/main_classes/data_collator#data-collator) for documentation.

Specifically, we will be using `DataCollatorforLanguageModeling` which will additionally pad the inputs to the maximum length of a batch since the inputs can have variable lengths. Refer to [API docs here](https://huggingface.co/docs/transformers/main/en/main_classes/data_collator#transformers.DataCollatorForLanguageModeling).

Note: This cell might take ~10 mins to train. **Decrease `num_train_epochs` above to speed up the training process.** On another hand, you might notice that this cells triggers a whole new MLflow run. [MLflow](https://mlflow.org/docs/latest/index.html) is an open source tool that helps to manage end-to-end machine learning lifecycle, including experiment tracking, ML code packaging, and model deployment. You can read more about [LLM tracking here](https://mlflow.org/docs/latest/llm-tracking.html).

In [None]:
from transformers import Trainer, DataCollatorForLanguageModeling

trainer = Trainer(
    model=peft_model, # We pass in the PEFT version of the foundation model, bloomz-560M
    args=training_args,
    train_dataset=train_sample,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False) # mlm=False indicates not to use masked language modeling
)

trainer.train()

You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss


TrainOutput(global_step=35, training_loss=3.268965148925781, metrics={'train_runtime': 663.9015, 'train_samples_per_second': 0.377, 'train_steps_per_second': 0.053, 'total_flos': 58327152033792.0, 'train_loss': 3.268965148925781, 'epoch': 5.0})

## Save model

In [None]:
import time

time_now = time.time()
peft_model_path = os.path.join(output_directory, f"peft_model_{time_now}")
trainer.model.save_pretrained(peft_model_path)

## Inference

You can load the model from the path that you have saved to before, and ask the model to generate text based on our input before! 

In [None]:
from peft import PeftModel

loaded_model = PeftModel.from_pretrained(foundation_model, 
                                         peft_model_path, 
                                         is_trainable=False)

In [None]:
loaded_model_outputs = loaded_model.generate(
    input_ids=input1["input_ids"], 
    attention_mask=input1["attention_mask"], 
    max_new_tokens=7, 
    eos_token_id=tokenizer.eos_token_id
    )
print(tokenizer.batch_decode(loaded_model_outputs, skip_special_tokens=True))

['Two things are infinite:  time and space. Time is the']


Well, it seems like our fine-tuned model is indeed getting closer to generating inspirational quotes. 


In fact, the input above is taken from the training dataset. 
<br>
<br>

<img src="https://files.training.databricks.com/images/llm/english_quote_example.png" width=500>

## Text initialization

Our fine-tuned, randomly initialized model did pretty well on the quote above. Let's now compare it with the text initialization method. 

Notice that all we are changing is the `prompt_tuning_init` setting and we are also providing a concise text prompt. 

API docs
* [prompt_tuning_init_text](https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig.prompt_tuning_init_text)

In [None]:
text_peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.TEXT,
    prompt_tuning_init_text="Generate inspirational quotes", # this provides a starter for the model to start searching for the best embeddings
    num_virtual_tokens=3, # this doesn't have to match the length of the text above
    tokenizer_name_or_path=model_name
)
text_peft_model = get_peft_model(foundation_model, text_peft_config)
print(text_peft_model.print_trainable_parameters())

trainable params: 3,072 || all params: 559,217,664 || trainable%: 0.0005493388706691496
None


In [None]:
text_trainer = Trainer(
    model=text_peft_model,
    args=training_args,
    train_dataset=train_sample,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

text_trainer.train()



Step,Training Loss


TrainOutput(global_step=35, training_loss=2.946445792061942, metrics={'train_runtime': 623.6887, 'train_samples_per_second': 0.401, 'train_steps_per_second': 0.056, 'total_flos': 58327152033792.0, 'train_loss': 2.946445792061942, 'epoch': 5.0})

In [None]:
# Save the model
time_now = time.time()
text_peft_model_path = os.path.join(output_directory, f"text_peft_model_{time_now}")
text_trainer.model.save_pretrained(text_peft_model_path)

# Load model 
loaded_text_model = PeftModel.from_pretrained(
    foundation_model, 
    text_peft_model_path, 
    is_trainable=False
)

# Generate output
text_outputs = text_peft_model.generate(
    input_ids=input1["input_ids"], 
    attention_mask=input1["attention_mask"], 
    max_new_tokens=7, 
    eos_token_id=tokenizer.eos_token_id
)
    
print(tokenizer.batch_decode(text_outputs, skip_special_tokens=True))

['Two things are infinite:  the number of people you can count']


You can see that text initialization doesn't necessarily perform better than random initialization. 

## Share model to HuggingFace hub (optional)

If you have a model that you would like to share with the rest of the HuggingFace community, you can choose to push your model to the HuggingFace hub! 

1. You need to first create a free HuggingFace account! The signup process is simple. Go to the [home page](https://huggingface.co/) and click "Sign Up" on the top right corner.

<img src="https://files.training.databricks.com/images/llm/hf_homepage_signup.png" width=700>

2. Once you have signed up and confirmed your email address, click on your user icon on the top right and click the `Settings` button. 

3. Navigate to the `Access Token` tab and copy your token. 

<img src="https://files.training.databricks.com/images/llm/hf_token_page.png" width=500>



API docs:
* [push_to_hub](https://huggingface.co/docs/transformers/main/en/model_sharing#share-a-model)

Alternatively, you can use HuggingFace's helper login method. This login cell below will prompt you to enter your token

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# TODO
hf_username = "Ayush-1722"
peft_model_id = f"{hf_username}/bloom_prompt_tuning_{time_now}"
trainer.model.push_to_hub(peft_model_id, use_auth_token=True)

adapter_model.bin:   0%|          | 0.00/17.1k [00:00<?, ?B/s]

Upload 1 LFS files:   0%|          | 0/1 [00:00<?, ?it/s]

CommitInfo(commit_url='https://huggingface.co/Ayush-1722/bloom_prompt_tuning_1701169281.6695077/commit/b369cec40a999e3eeb1e49dc15b52b8cea5c8a64', commit_message='Upload model', commit_description='', oid='b369cec40a999e3eeb1e49dc15b52b8cea5c8a64', pr_url=None, pr_revision=None, pr_num=None)

### Inference from model in HuggingFace hub

In [None]:
from peft import PeftModel, PeftConfig

config = PeftConfig.from_pretrained(peft_model_id)
foundation_model = AutoModelForCausalLM.from_pretrained(peft_config.base_model_name_or_path)
peft_random_model = PeftModel.from_pretrained(foundation_model, peft_model_id)

Downloading adapter_config.json:   0%|          | 0.00/442 [00:00<?, ?B/s]

Downloading adapter_model.bin:   0%|          | 0.00/17.1k [00:00<?, ?B/s]

In [None]:
online_model_outputs = peft_random_model.generate(
    input_ids=input1["input_ids"], 
    attention_mask=input1["attention_mask"], 
    max_new_tokens=7, 
    eos_token_id=tokenizer.eos_token_id
    )
    
print(tokenizer.batch_decode(online_model_outputs, skip_special_tokens=True))

['Two things are infinite:  time and space. Time is the']
