# Lab | Introduction to Prompt Tuning using PEFT from Hugging Face

<!-- ### Fine-tune a Foundational Model effortless -->

**Note:** This is more or less the same notebook you saw in the previous lesson, but that is ok. This is an LLM fine-tuning lab. In class we used a set of datasets and models, and in the labs you are required to change the LLMs models and the datasets including the pre-processing pipelines.

# Prompt Tuning

## Brief introduction to Prompt Tuning.
It’s an Additive Fine-Tuning technique for models. This means that we WILL NOT MODIFY ANY WEIGHTS OF THE ORIGINAL MODEL. You might be wondering, how are we going to perform fine-tuning then? Well, we will train additional layers that are added to the model. That’s why it’s called an Additive technique.

Considering it’s an Additive technique and its name is Prompt-Tuning, it seems clear that the layers we’re going to add and train are related to the prompt.

![My Image](https://github.com/peremartra/Large-Language-Model-Notebooks-Course/blob/main/img/Martra_Figure_5_Prompt_Tuning.jpg?raw=true)

We are creating a type of superprompt by enabling a model to enhance a portion of the prompt with its acquired knowledge. However, that particular section of the prompt cannot be translated into natural language. **It's as if we've mastered expressing ourselves in embeddings and generating highly effective prompts.**

In each training cycle, the only weights that can be modified to minimize the loss function are those integrated into the prompt.

The primary consequence of this technique is that the number of parameters to train is genuinely small. However, we encounter a second, perhaps more significant consequence, namely that, **since we do not modify the weights of the pretrained model, it does not alter its behavior or forget any information it has previously learned.**

The training is faster and more cost-effective. Moreover, we can train various models, and during inference time, we only need to load one foundational model along with the new smaller trained models because the weights of the original model have not been altered

## What are we going to do in the notebook?
We are going to train two different models using two datasets, each with just one pre-trained model from the Bloom family. One will be trained to generate prompts and the other to detect hate in sentences.

Additionally, we'll explore how to load both models with only one copy of the foundational model in memory.


## Loading the Peft Library
This library contains the Hugging Face implementation of various fine-tuning techniques, including Prompt Tuning

In [1]:
!pip install -q transformers==4.41.2
!pip install -q peft==0.10.0
!pip install -q datasets==2.20.0
!pip install -q accelerate==0.30.1

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.8/43.8 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.1/9.1 MB[0m [31m125.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m110.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.1/199.1 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.8/547.8 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m316.1/316.1 kB[0m [31m17.3 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
gcsfs 2025.3.0 requires fsspec==2025.3.0, but you have fsspec 2024.5.0 which is incompatible.[0

From the transformers library, we import the necessary classes to instantiate the model and the tokenizer.

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoModelForSeq2SeqLM

## Loading the model and the tokenizers.

Bloom is one of the smallest and smartest models available for training with the PEFT Library using Prompt Tuning.

I'm opting for the smallest one to minimize training time and avoid memory issues in Colab. Feel Free to try with a bigger one if you have acces to a good GPU.

In [3]:
model_name = "bigscience/bloomz-560m"
NUM_VIRTUAL_TOKENS = 20
#If you just want to test the solution, you can reduce the EPOCHs.
NUM_EPOCHS_PROMPT = 5
NUM_EPOCHS_CLASSIFIER = 5
device = "cuda" #Replace with "mps" for Silicon chips.

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_name)
foundational_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    device_map = device
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/715 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

## Inference with the pre trained bloom model



In [5]:
#this function returns the outputs from the model received, and inputs.
def get_outputs(model, inputs, max_new_tokens=100): #PLAY WITH THIS FUNCTION AS YOU SEE FIT
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=max_new_tokens,
        #temperature=0.2,
        #top_p=0.95,
        #do_sample=True,
        repetition_penalty=1.5, #Avoid repetition.
        early_stopping=True, #The model can stop before reach the max_length
        eos_token_id=tokenizer.eos_token_id
    )
    return outputs

To compare the pre-trained model with the same model after the prompt-tuning process, I will run the same sentence on both models.

Since I'm creating a model that can generate prompts, I'll instruct it to provide a prompt that makes it act like a fitness trainer.

In [6]:
input_prompt = tokenizer("Act as a fitness Trainer. Prompt:", return_tensors="pt")
foundational_outputs_prompt = get_outputs(foundational_model,
                                          input_prompt.to(device),
                                          max_new_tokens=50)

print(tokenizer.batch_decode(foundational_outputs_prompt, skip_special_tokens=True))



['Act as a fitness Trainer. Prompt:']


The model doesn't know what its mission is and answers as best as it can. It's not a bad response, but it's not what we're looking for.

# Prompt Creator
## Preparing Datasets
The Dataset used, for this first example, is:
* https://huggingface.co/datasets/fka/awesome-chatgpt-prompts



In [7]:
import os
from datasets import load_dataset

In [8]:
dataset_prompt = "fka/awesome-chatgpt-prompts"

In [9]:
def concatenate_columns_prompt(dataset):
    def concatenate(example):
        example['prompt'] = "Act as a {}. Prompt: {}".format(example['act'], example['prompt'])
        return example

    dataset = dataset.map(concatenate)
    return dataset

In [10]:
#Create the Dataset to create prompts.
data_prompt = load_dataset(dataset_prompt)
data_prompt['train'] = concatenate_columns_prompt(data_prompt['train'])

data_prompt = data_prompt.map(lambda samples: tokenizer(samples["prompt"]), batched=True)
train_sample_prompt = data_prompt["train"].remove_columns('act')

Downloading readme:   0%|          | 0.00/339 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/104k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/203 [00:00<?, ? examples/s]

Map:   0%|          | 0/203 [00:00<?, ? examples/s]

Map:   0%|          | 0/203 [00:00<?, ? examples/s]

In [11]:
print(train_sample_prompt)

Dataset({
    features: ['prompt', 'input_ids', 'attention_mask'],
    num_rows: 203
})


In [12]:
print(train_sample_prompt[:2])

{'prompt': ['Act as a An Ethereum Developer. Prompt: Imagine you are an experienced Ethereum developer tasked with creating a smart contract for a blockchain messenger. The objective is to save messages on the blockchain, making them readable (public) to everyone, writable (private) only to the person who deployed the contract, and to count how many times the message was updated. Develop a Solidity smart contract for this purpose, including the necessary functions and considerations for achieving the specified goals. Please provide the code and any relevant explanations to ensure a clear understanding of the implementation.', "Act as a SEO Prompt. Prompt: Using WebPilot, create an outline for an article that will be 2,000 words on the keyword 'Best SEO prompts' based on the top 10 results from Google. Include every relevant heading possible. Keep the keyword density of the headings high. For each section of the outline, include the word count. Include FAQs section in the outline too, b

## prompt-tuning configuration.  

API docs:
https://huggingface.co/docs/peft/main/en/package_reference/tuners#peft.PromptTuningConfig


In [13]:
from peft import  get_peft_model, PromptTuningConfig, TaskType, PromptTuningInit

generation_config_prompt = PromptTuningConfig( #PLAY WITH THIS CONFIG IF YOU LIKE
    task_type=TaskType.CAUSAL_LM, #This type indicates the model will generate text.
    prompt_tuning_init=PromptTuningInit.RANDOM,  #The added virtual tokens are initializad with random numbers
    num_virtual_tokens=NUM_VIRTUAL_TOKENS, #Number of virtual tokens to be added and trained.
    tokenizer_name_or_path=model_name #The pre-trained model.
)


We will create two  prompt tuning models using the same pre-trained model and the same config, but with a different Dataset.

In [14]:
peft_model_prompt = get_peft_model(foundational_model, generation_config_prompt)
print(peft_model_prompt.print_trainable_parameters())

trainable params: 20,480 || all params: 559,235,072 || trainable%: 0.0036621451381361144
None


**That's amazing: did you see the reduction in trainable parameters? We are going to train a 0.001% of the paramaters available.**

Now we are going to create the training arguments, and we will use the same configuration in both trainings.

In [15]:
from transformers import TrainingArguments
def create_training_arguments(path, learning_rate=0.0035, epochs=6, autobatch=True):
    training_args = TrainingArguments(
        output_dir=path, # Where the model predictions and checkpoints will be written
        #use_cpu=True, # This is necessary for CPU clusters.
        auto_find_batch_size=autobatch, # Find a suitable batch size that will fit into memory automatically
        learning_rate= learning_rate, # Higher learning rate than full fine-tuning
        #per_device_train_batch_size=4,
        num_train_epochs=epochs,
        report_to="none"  # Disable wandb and other logging integrations
    )
    return training_args

In [16]:

import os

working_dir = "./"

#Is best to store the models in separate folders.
#Create the name of the directories where to store the models.
output_directory_prompt =  os.path.join(working_dir, "peft_outputs_prompt")
output_directory_classifier =  os.path.join(working_dir, "peft_outputs_classifier")

#Just creating the directoris if not exist.
if not os.path.exists(working_dir):
    os.mkdir(working_dir)
if not os.path.exists(output_directory_prompt):
    os.mkdir(output_directory_prompt)


We need to indicate the directory containing the model when creating the TrainingArguments.

## Training first model

We will create the trainer Object, one for each model to train.  

In [17]:
training_args_prompt = create_training_arguments(output_directory_prompt,
                                                 3e-2,
                                                 NUM_EPOCHS_PROMPT)

In [18]:
from transformers import Trainer, DataCollatorForLanguageModeling
def create_trainer(model, training_args, train_dataset):
    trainer = Trainer(
        model=model, # We pass in the PEFT version of the foundation model, bloomz-560M
        args=training_args, #The args for the training.
        train_dataset=train_dataset, #The dataset used to train the model.
        data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False) # mlm=False indicates not to use masked language modeling
    )
    return trainer


In [19]:
#Training first model.
trainer_prompt = create_trainer(peft_model_prompt,
                                training_args_prompt,
                                train_sample_prompt)
trainer_prompt.train()

Step,Training Loss


Step,Training Loss


TrainOutput(global_step=255, training_loss=2.8081964231004903, metrics={'train_runtime': 142.83, 'train_samples_per_second': 7.106, 'train_steps_per_second': 1.785, 'total_flos': 324586719363072.0, 'train_loss': 2.8081964231004903, 'epoch': 5.0})

Release GPU memory.

In [20]:
import torch
import gc
torch.cuda.empty_cache()
gc.collect()

0

## Save model
We are going to save the model. These models are ready to be used, as long as we have the pre-trained model from which they were created in memory.

In [21]:
trainer_prompt.model.save_pretrained(output_directory_prompt)



## Inference first tuned model

You can load the model from the path that you have saved to before, and ask the model to generate text based on our input before!

In [22]:
from peft import PeftModel

loaded_model_peft = PeftModel.from_pretrained(foundational_model,
                                         output_directory_prompt,
                                         #device_map=device,
                                         is_trainable=False)

In [23]:
loaded_model_prompt_outputs = get_outputs(loaded_model_peft,
                                          input_prompt,
                                          max_new_tokens=50)
print(tokenizer.batch_decode(loaded_model_prompt_outputs, skip_special_tokens=True))



['Act as a fitness Trainer. Prompt: I want you to act like an expert in your field of expertise and provide me with the best advice for my clients on how they can improve their performance, increase confidence or even lose weight.  My first request is "I need help improving personal health" ']


Let's compare the result of the model before and after being fine-tuned with prompt-tuning.

**Input for the model**
```
Act as a fitness Trainer. Prompt:
```

**Original model**
```
Act as a fitness Trainer. Prompt:  Follow up with your trainer
```
**Trained for classification with Prompt-tuning** 50 Epochs:
```
Act as a fitness Trainer. Prompt: ＋ Acts like an expert in the field of sports and health, but does not provide detailed information about his work or products to help you understand them better.  + I want my first client referred me through this website for their gym membership program which is based on physical activity training exercises that are easy enough (eight minutes) per week with no need any special equipment required.   - First Question : What would be your role?
```

It's very clear that the result is quite different, it's not exactly what we're looking for but it's much closer.

It's possible that we're at the limit of what Bloom's smallest model can offer. Try with any other model, surely with the one with 1B parameters the result will be better.

# Hate Classifier
##Loading the Dataset

* https://huggingface.co/datasets/SetFit/ethos_binary

In [24]:
input_classifier = tokenizer("Sentence : I don't like short people, no idea why they exist. Label :", return_tensors="pt")
foundational_outputs_prompt = get_outputs(foundational_model,
                                          input_classifier.to(device),
                                          max_new_tokens=50)

print(tokenizer.batch_decode(foundational_outputs_prompt, skip_special_tokens=True))

["Sentence : I don't like short people, no idea why they exist. Label : No"]


The model has no idea what its purpose is, so it completes the sentence as best as it can.

In [25]:
dataset_classifier = "SetFit/ethos_binary"

def concatenate_columns_classifier(dataset):
    def concatenate(example):
        example['text'] = "Sentence : {} Label : {}".format(example['text'], example['label_text'])
        return example

    dataset = dataset.map(concatenate)
    return dataset

In [26]:
data_classifier = load_dataset(dataset_classifier)
data_classifier['train'] = concatenate_columns_classifier(data_classifier['train'])

data_classifier = data_classifier.map(lambda samples: tokenizer(samples["text"]), batched=True)
train_sample_classifier = data_classifier["train"].remove_columns(['label', 'label_text', 'text'])

Downloading readme:   0%|          | 0.00/162 [00:00<?, ?B/s]

Repo card metadata block was not found. Setting CardData to empty.


Downloading data:   0%|          | 0.00/107k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/64.3k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/598 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/400 [00:00<?, ? examples/s]

Map:   0%|          | 0/598 [00:00<?, ? examples/s]

Map:   0%|          | 0/598 [00:00<?, ? examples/s]

Map:   0%|          | 0/400 [00:00<?, ? examples/s]

In [27]:
data_classifier

DatasetDict({
    train: Dataset({
        features: ['text', 'label', 'label_text', 'input_ids', 'attention_mask'],
        num_rows: 598
    })
    test: Dataset({
        features: ['text', 'label', 'label_text', 'input_ids', 'attention_mask'],
        num_rows: 400
    })
})

In [28]:
train_sample_classifier

Dataset({
    features: ['input_ids', 'attention_mask'],
    num_rows: 598
})

I have deleted all the columns from the dataset that are not strictly necessary for training, that is to say, I have removed all columns that are not essential for the model's learning process.

In [29]:
print(train_sample_classifier[1:2])

{'input_ids': [[62121, 1671, 915, 473, 760, 10190, 513, 16154, 60, 19821, 138929, 20812, 426, 18833, 18816, 75536, 45617, 39469, 19368, 17956, 57274, 3758, 18065, 38, 44140, 17956, 72870, 8309, 9492, 15, 614, 156801, 85061, 48283, 44419, 426, 16472, 96789, 602, 45227, 43111, 181485, 435, 19821, 60, 48283, 44419, 426, 16472, 96789, 614, 156801, 77658, 915, 74549, 40423]], 'attention_mask': [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]}


## prompt-tuning configuration

In [30]:
generation_config_classifier = PromptTuningConfig( #PLAY WITH THIS AS YOU SEE FIT
    task_type=TaskType.CAUSAL_LM, #This type indicates the model will generate text.
    prompt_tuning_init=PromptTuningInit.TEXT,  #
    prompt_tuning_init_text="Indicates whether the sentence contains hate speech or not",
    num_virtual_tokens=NUM_VIRTUAL_TOKENS, #Number of virtual tokens to be added and trained.
    tokenizer_name_or_path=model_name #The pre-trained model.
)

In [31]:
peft_model_classifier = get_peft_model(foundational_model, generation_config_classifier)
print(peft_model_classifier.print_trainable_parameters())

trainable params: 20,480 || all params: 559,235,072 || trainable%: 0.0036621451381361144
None


In [32]:
if not os.path.exists(output_directory_classifier):
    os.mkdir(output_directory_classifier)

In [33]:
training_args_classifier = create_training_arguments(output_directory_classifier,
                                                    3e-2,
                                                    NUM_EPOCHS_CLASSIFIER)

## Training Second Model

In [34]:
trainer_classifier = create_trainer(peft_model_classifier,
                                   training_args_classifier,
                                   train_sample_classifier)
trainer_classifier.train()

Step,Training Loss


Step,Training Loss


Step,Training Loss


Step,Training Loss
500,3.2673
1000,3.1937
1500,3.1803
2000,3.1593
2500,3.1264




TrainOutput(global_step=2990, training_loss=3.1733291472878347, metrics={'train_runtime': 240.5633, 'train_samples_per_second': 12.429, 'train_steps_per_second': 12.429, 'total_flos': 360078925602816.0, 'train_loss': 3.1733291472878347, 'epoch': 5.0})

In [35]:
trainer_classifier.model.save_pretrained(output_directory_classifier)



## Inference second Model

In [36]:
loaded_model_peft.load_adapter(output_directory_classifier, adapter_name="classifier")
loaded_model_peft.set_adapter("classifier")

In [37]:
loaded_model_sentences_outputs = get_outputs(loaded_model_peft,
                                             input_classifier, max_new_tokens=3)
print(tokenizer.batch_decode(loaded_model_sentences_outputs, skip_special_tokens=True))



["Sentence : I don't like short people, no idea why they exist. Label : hate speech"]


Let's check how the model's response has changed with training:

**Input for the model**
```
Sentence : Head is the shape of a light bulb. Label :
Sentence : I don't liky short people, no idea why they exist. Label :
```

**Original model**
```
Sentence : Head is the shape of a light bulb. Label :  head
Sentence : I don't liky short people, no idea why they exist. Label :  No
```
**Trained for classification with Prompt-tuning**
```
Sentence : Head is the shape of a light bulb. Label :  no hate speech
Sentence : I don't liky short people, no idea why they exist. Label :  hate speech
```

It's clear that the training has fulfilled its purpose. The original model doesn't know what its mission is and tries to complete the sentences as best as it can. On the other hand, the updated model with prompt-tuning does know what its mission is and is able to classify the sentences correctly and in the indicated format.


# Exercise
- Complete the prompts similar to what we did in class.
     - Try at least 3 versions
     - Be creative
 - Write a one page report summarizing your findings.
     - Were there variations that didn't work well? i.e., where GPT either hallucinated or wrong
 - What did you learn?

## Testing the Prompt Generator Model
Let's test the trained prompt generator with different role variations

In [38]:
# Load the prompt generator adapter
loaded_model_peft.set_adapter("default")  # Switch back to prompt generator

### Test 1: Nutrition Expert

In [39]:
test_input_1 = tokenizer("Act as a nutrition expert. Prompt:", return_tensors="pt")
output_1 = get_outputs(loaded_model_peft, test_input_1.to(device), max_new_tokens=50)
print("Test 1 - Nutrition Expert:")
print(tokenizer.batch_decode(output_1, skip_special_tokens=True))

Test 1 - Nutrition Expert:
['Act as a nutrition expert. Prompt: I want you to act like an experienced dietitian and explain how your body functions in order for me, who is not very knowledgeable about food or exercise but would be interested if someone could help with my research on the topic of weight loss. ']


### Test 2: Motivational Speaker

In [40]:
test_input_2 = tokenizer("Act as a motivational speaker. Prompt:", return_tensors="pt")
output_2 = get_outputs(loaded_model_peft, test_input_2.to(device), max_new_tokens=50)
print("Test 2 - Motivational Speaker:")
print(tokenizer.batch_decode(output_2, skip_special_tokens=True))

Test 2 - Motivational Speaker:
['Act as a motivational speaker. Prompt: I want you to act like an expert in your field and explain the basics of that topic, then give examples from real life situations or anecdotes about it so people can learn more than just what they know already! My first request is "I need some']


### Test 3: Career Counselor

In [41]:
test_input_3 = tokenizer("Act as a career counselor. Prompt:", return_tensors="pt")
output_3 = get_outputs(loaded_model_peft, test_input_3.to(device), max_new_tokens=50)
print("Test 3 - Career Counselor:")
print(tokenizer.batch_decode(output_3, skip_special_tokens=True))

Test 3 - Career Counselor:
['Act as a career counselor. Prompt: I want you to act like an expert in your field and provide advice on how best approach the challenges of life, including financial decisions for yourself or others.  My first request is "I need help with my finances"  The second one is: "Need some']


## Testing the Hate Speech Classifier
Let's test the hate speech classifier with different sentences

In [42]:
# Switch to classifier adapter
loaded_model_peft.set_adapter("classifier")

### Test 4: Positive Statement (No Hate)

In [43]:
test_classifier_1 = tokenizer("Sentence : Everyone deserves respect and kindness. Label :", return_tensors="pt")
output_c1 = get_outputs(loaded_model_peft, test_classifier_1.to(device), max_new_tokens=5)
print("Test 4 - Positive Statement:")
print(tokenizer.batch_decode(output_c1, skip_special_tokens=True))

Test 4 - Positive Statement:
['Sentence : Everyone deserves respect and kindness. Label : no hate speech']


### Test 5: Borderline Negative Statement

In [44]:
test_classifier_2 = tokenizer("Sentence : That movie was terrible and a waste of time. Label :", return_tensors="pt")
output_c2 = get_outputs(loaded_model_peft, test_classifier_2.to(device), max_new_tokens=5)
print("Test 5 - Borderline Negative:")
print(tokenizer.batch_decode(output_c2, skip_special_tokens=True))

Test 5 - Borderline Negative:
['Sentence : That movie was terrible and a waste of time. Label : hate speech no']


### Test 6: Clear Hate Speech

In [45]:
test_classifier_3 = tokenizer("Sentence : People from that country are all criminals and should be banned. Label :", return_tensors="pt")
output_c3 = get_outputs(loaded_model_peft, test_classifier_3.to(device), max_new_tokens=5)
print("Test 6 - Clear Hate Speech:")
print(tokenizer.batch_decode(output_c3, skip_special_tokens=True))

Test 6 - Clear Hate Speech:
['Sentence : People from that country are all criminals and should be banned. Label : hate speech no']


# Lab Report

## Test Results Summary

I tested the models with 6 different variations:

**Prompt Generator (3 tests):**
- Nutrition Expert: Generated relevant prompts about diet and weight loss
- Motivational Speaker: Created prompts about expertise and real-life examples
- Career Counselor: Produced prompts focused on career and financial advice

**Hate Speech Classifier (3 tests):**
- Positive statement → Correctly identified as "no hate speech" ✅
- Movie criticism → Incorrectly flagged, output was "hate speech no" ❌
- Actual hate speech → Misclassified, output was "hate speech no" ❌

## Variations That Didn't Work Well

The hate speech classifier struggled with Tests 5 and 6, producing confusing outputs like "hate speech no" instead of clear labels. The model couldn't distinguish between negative sentiment and actual hate speech. Only 5 training epochs wasn't enough for this complex task.

## What I Learned

PEFT is highly efficient, training only 0.001% of parameters. Prompt generation worked well but hate speech classification struggled with the small model and limited training. Complex tasks need more epochs and larger models. Multiple adapters can load on one base model. PEFT is great for simple tasks but has limitations with nuanced classification.