In this notebook, we'll explore few-shot learning with [Flan T5-Small](https://huggingface.co/docs/transformers/model_doc/flan-t5), a [T5 language model](https://jmlr.org/papers/volume21/20-074/20-074.pdf) that has been instruction-finetuned for a variety of a tasks; this model can fit within the memory and processing constraints of laptops while also being openly available.  Can you create a new classification task and design prompts to differentiate between the classes within it?  You can execute this notebook on your laptop or on Colab.

In [None]:
!pip install transformers

In [None]:
import torch
from torch.nn import functional as F

In [None]:
from transformers import pipeline

In [None]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

## Flan-T5


In [None]:
model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-small")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-small")

In [None]:
def classify_with_prompt_t5(user_prompt, labels):
    prompt=f"""You're a helpful assistant for text classification. You will pick one of the labels:
{labels}
Output only the label; nothing more. Here are some examples:
{user_prompt}
"""
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    decoder_input_ids = torch.full(
        size=(input_ids.size(0), 1),
        fill_value=tokenizer.pad_token_id,
        dtype=torch.long,
    )
    with torch.no_grad():
        outputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)
        completion_layer = outputs.logits[0, -1, :]

    probabilities = F.softmax(completion_layer, dim=-1)
    pred_idx=torch.argmax(probabilities).item()
    pred_token=tokenizer.decode(pred_idx)

    label_ids=[]
    for label in labels:
        token_ids = tokenizer(label).input_ids
        assert len(token_ids) == 2 # there's an <eos> token appended at the end
        label_ids.append(token_ids[0])

    sorted_args=list(torch.argsort(probabilities[label_ids], descending=True))
    for arg in sorted_args:
            print("%.6f\t%s" % (probabilities[label_ids[arg]], labels[arg]))

    print("\nCompletion with highest probability:\n")
    print("Prompt: \n###\n%s\n###\n" % user_prompt)
    print("Output: \n###\n%s\n###" % pred_token)
    #print(user_prompt + ' ' + pred_token)

In [None]:
prompt = """X: I love this movie
Y: positive

X: I hate the movie
Y: negative

X: I kind of like the movie
Y: positive

X: This is one of the best movies I've ever seen
Y:"""

classify_with_prompt_t5(prompt, ["positive", "negative"])

In [None]:
prompt = """X: Vampires take over the planet during an eclipse
Y: Horror

X: Two friends switch bodies and live each other's lives
Y: Comedy

X: John turns into a werewolf during a full moon
Y: Horror

X: John is a werewolf who plays basketball
Y: Comedy

X: A court sentences George to be Jerry's butler
Y: Comedy

X: A virus outbreak turns everyone into zombies
Y:"""

classify_with_prompt_t5(prompt, ["Horror", "Comedy"])

In [None]:
prompt = """Q: This is a text
A: English

Q: Nel mezzo del cammin' di nostra vita
A: Italian

Q: Je ne sais pas
A:"""

classify_with_prompt_t5(prompt, ["English", "Italian", "French", "Spanish", "Japanese"])

**Q1**.  Your job is to create a new classification task using prompt design (as in the examples above).  You are free to consider binary classification or multiclass classifaction; keep in mind that you have ~1000 tokens to use as a prompt for Flan-T5, so be sure to provide enough answered prompts for each class.  (Note it is not a requirement that your model performs *well* (we want to assess what is -- and isn't -- learnable) but give it every opportunity to do so.  Create 5 test examples to assess whether Flan-T5 is able to recognize the class given your fixed prompt.  To take the language ID task above, one test example corresponds to one prediction you make for the same set of answered prompts; the following constitutes two test examples for that task:

1.)

```
prompt = """Q: This is a text
A: English

Q: Nel mezzo del cammin' di nostra vita
A: Italian

Q: Je ne sais pas
A:"""
```

2.)

```
prompt = """Q: This is a text
A: English

Q: Nel mezzo del cammin' di nostra vita
A: Italian

Q: Non lo so
A:"""
```