# Soft Prompting & Prompt Tuning Demo
In this notebook you will:
1. Understand the theory behind **soft prompts** (embedding‑based learned prompts).
2. Experiment with **Prompt Tuning** and **Prefix (P‑)Tuning v2** using the PEFT library.
3. Visualize how learned prompt embeddings steer generation compared to hand‑written prompts.
---
**Pedagogical goal:** illustrate how *only a handful of virtual tokens* can adapt a frozen model to a new task.

## 1. Setup
Colab may already have many of these packages. We install/upgrade the minimum set.

In [None]:
!pip -q install --upgrade transformers peft datasets accelerate bitsandbytes sentencepiece
# Restart the runtime after the first run if PEFT was newly installed.

## 2. Load a base model and small dataset
We will fine‑tune *only* the prompt embeddings on **SST‑2** (binary sentiment).

In [None]:
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_NAME = 'gpt2'
dataset = load_dataset('sst2', split='train[:200]')
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, load_in_8bit=True)


### Formatting helper
Convert each SST‑2 example into the typical *instruction → answer* pattern.

In [None]:
def format_example(ex):
    label = 'positive' if ex['label']==1 else 'negative'
    return {'text': f"Review: {ex['sentence']}\nSentiment:" , 'label': label}

dataset = dataset.map(format_example)
print(dataset[0])

## 3. Configure PEFT Prompt Tuning
We freeze the model and train **M=20** virtual prompt tokens.

In [None]:
from peft import PromptTuningConfig, get_peft_model
peft_config = PromptTuningConfig(task_type='CAUSAL_LM', num_virtual_tokens=20)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

### Training loop (tiny – just to illustrate)
We use 🤗 *datasets* to stream batches; one epoch suffices for demonstration.

In [None]:
import torch, math, random
from torch.utils.data import DataLoader

BATCH=4
def collate(batch):
    inputs = tokenizer([b['text'] for b in batch], return_tensors='pt', padding=True)
    labels = tokenizer([b['label'] for b in batch], return_tensors='pt', padding=True)
    inputs['labels'] = labels['input_ids']
    return inputs

loader = DataLoader(dataset, batch_size=BATCH, shuffle=True, collate_fn=collate)
optim = torch.optim.AdamW(model.parameters(), lr=5e-4)
model.train()
for step, batch in enumerate(loader):
    optim.zero_grad()
    batch = {k:v.to(model.device) for k,v in batch.items()}
    loss = model(**batch).loss
    loss.backward()
    optim.step()
    if step%50==0:
        print(f'Step {step}, loss {loss.item():.3f}')
    if step==200: break

## 4. Qualitative comparison
Let's compare zero‑shot vs prompt‑tuned generation.

In [None]:
def generate(review):
    prompt = f'Review: {review}\nSentiment:'
    inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
    out = model.generate(**inputs, max_new_tokens=3)
    print(tokenizer.decode(out[0], skip_special_tokens=True))

generate('An awesome, heart‑warming movie')
generate('A dull, predictable mess')

## 5. Visualise learned prompt embeddings (t‑SNE)
We project the 20 virtual token embeddings into 2‑D.

In [None]:
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE

with torch.no_grad():
    embeds = model.get_input_embeddings().weight[-20:].cpu().numpy()
coords = TSNE(n_components=2, perplexity=5).fit_transform(embeds)
plt.scatter(coords[:,0], coords[:,1])
for i,(x,y) in enumerate(coords):
    plt.text(x, y, str(i))
plt.title('Virtual prompt token embedding space')
plt.show()

### Take‑aways
- **Soft prompts** act as differentiable *keys* steering the LM.
- We adapted GPT‑2 to binary sentiment using **<1%** the parameters.
- Similar methods: **Prefix‑Tuning**, **P‑Tuning v2**, **LoRA‑Prompts**.

*Play with dataset size, number of virtual tokens, and model family!*