Welcome to day 5 of the *Cognitive Science Summer School 2022*! 


We had a full week! From masked language modeling to left-to-right models, we've covered the state-of-the-art models in NLP, right now! Today, we want to show you a new way to interact with these large language models and have them do tasks they were not even trained for initially! This is a paradigm known as:

> **Zero-shot learning**: if you do not feed examples (to the model) of how to perform the task.

and 

> **Few-shot learning**: if you feed to the model some examples of how to perform the task in the prompt.

## Agenda


1. GPT-3
2. T0
3. [Optional] Tk-Instruct

# [GPT3] Zero- and few-shot learning

## Setup

In [None]:
%%time
%%capture
!pip install openai
!pip install rich

import openai
openai.api_key = ... # TODO: Set OPEN AI's API key with read access

# Visuals
from rich.console import Console
from rich.text import Text

# Console for printing with nice colors :)
console = Console(width=80)

CPU times: user 178 ms, sys: 41.5 ms, total: 220 ms
Wall time: 28.4 s


In [None]:
def _gpt3_request(prompt:str, model_name: str, max_tokens: int=100, temperature: float=0.7, **kwargs) -> str:
  # ---------------------------------------------
  # Arguments Validation
  # ---------------------------------------------
  err_msg = "The number of tokens is above 1000. Try shortening the input or reducing the 'max_tokens'."
  assert len(prompt.split()) + max_tokens < 1000, err_msg
  assert 0 <= temperature <= 1, "ValueError: Try setting temperature in [0, 1]." 

  # ---------------------------------------------
  # Send request to OpenAI
  # ---------------------------------------------
  response = openai.Completion.create(
    model=model_name,
    prompt=prompt,
    temperature=temperature,
    max_tokens=max_tokens,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    **kwargs
  )
  response = response["choices"][0].text
  return response


def gpt3_generate(prompt: str, max_tokens: int=100, model_name: str="babbage", **kwargs):
  """Gets a GPT-3 generation for the prompt you provided."""
  if model_name in ("ada", "babbage", "curie"):
    fullname = f"text-{model_name}-001"
  elif model_name == "davinci":
    fullname = f"text-{model_name}-002"
  else:
    raise ValueError("Invalid model name for GPT-3")

  text = _gpt3_request(prompt, fullname, max_tokens=max_tokens, **kwargs)
  console.print(f"[magenta]({model_name}) [bright_black]{prompt}[cyan]{text}")

## Numerical Reasoning

Alright! An interesting set of tasks to start with can be numerical reasoning. This include both addition, multiplication, and unit conversion tasks. Examples of these tasks are presented in the table below: 

| Task   | Prompt Template |
| ------ | --------------- |
| Addition | `Q: What is x1 plus x2? A: y` | 
| Multiplication | `Q: What is x1 times x2? A: y` |
| Min -> Sec | `Q: What is x1 minutes in seconds? A: y` |
| Hour -> Min | `Q: What is x1 hours in minutes? A: y` |
| Month -> Weeks | `Q: What is x1 months in weeks? A: y` |

where `x1` and `x2` are integers numbers you specify and `y` is either the output or the expected output of the model, depending on whether you're doing zero-shot or few-shot learning, respectively.


In [None]:
prompt = """Q: What is 5 plus 5? A:"""
gpt3_generate(prompt, stop=".")
# Note: We add the `stop` argument because we're only interested in the output
# up until the very first period. No need to generate beyond that as we would
# be wasting resources!

In [None]:
prompt = """Q: What is 5 times 5? A: """
gpt3_generate(prompt, stop=".")
# Note: We add the `stop` argument because we're only interested in the output
# up until the very first period. No need to generate beyond that as we would
# be wasting resources!

The model seems to fare pretty badly. Let us try a slightly larger model called `curie`.

In [None]:
prompt = """Q: What is 5 plus 5? A:"""
gpt3_generate(prompt, model_name="curie", stop=".")

prompt = """Q: What is 5 times 5? A:"""
gpt3_generate(prompt, model_name="curie", stop=".")

Alright, it seems `curie` is better at Math than `babbage` (the default GPT-3 model variant we were using).

In [None]:
prompt = """Q: What is 13 plus 18? A:"""
gpt3_generate(prompt, model_name="curie", stop=".")

Well, if we complicate things a bit, `curie` starts... FAILING... 😕 We have (at least) three alternatives:
1. Use a larger model (in this case, we could use `davinci`);
2. Use **in-context learning** (also known as few-shot learning); 
3. Improve the prompt (also called **prompt fine-tuning**).


Since the first is just about setting `model_name="davinci"`, we will instead focus on the latter two hypotheses. Let us start with the 4-shot learning! We will add 4 examples of numerical reasoning:

In [None]:
prompt = """Q: What is 24 plus 10? A: 34.
Q: What is 2 plus 18? A: 20.
Q: What is 3 plus 5? A: 8.
Q: What is 10 plus 22? A: 32.
Q: What is 13 plus 18? A:"""
gpt3_generate(prompt, model_name="curie", stop=".")

That's a bit better! At least the output is consistent 😅! Before giving up on `curie` we will try some *positive reinforcement*. Let us **tweak the prompt**! We will add the following:

> I'm great at math and arithmetic. The following addition operations are easy.

Feel free to try your own prompt ideas! *Getting the right prompt is half of the success, did you know that?* 

**Note**: Even if the result is not correct, we invite you to play with different prompts and see how that affects the "*confidence*" of the model by running it multiple times and getting a sense of its consistency.


In [None]:
prompt = """I'm great at math and addition. The following addition operations are easy.
Q: What is 24 plus 10? A: 34.
Q: What is 3 plus 18? A: 20.
Q: What is 3 plus 5? A: 8.
Q: What is 10 plus 22? A: 32.
Q: What is 13 plus 18? A:"""
gpt3_generate(prompt, model_name="curie")

*Hum...* This isn't looking good! 😕 

Alright, time to add more examples! Let us try **8-shot learning** and we will try the `babbage` model again! Maybe we can get it to work :) 

In [None]:
# Now leveraging 8-shots, i.e., 8 examples in context
prompt = """Q: What is 24 plus 10? A: 34.
Q: What is 3 plus 18? A: 20.
Q: What is 4 plus 9? A: 13.
Q: What is 10 plus 22? A: 32.
Q: What is 3 plus 3? A: 6.
Q: What is 6 plus 5? A: 11.
Q: What is 12 plus 16? A: 28.
Q: What is 13 plus 22? A: 35.
Q: What is 13 plus 18? A:"""
gpt3_generate(prompt, stop=".")

### Exercise

Let us try something different, now! Can you think of any prompt? Try first defining it as a 0-shot task and see how the model performs. Then, define a few examples and create a prompt including those examples. Does it improve?

(**Suggestion**: On the same line of thought, we can experiment with the pattern `Q: What is x minutes in seconds? A:` or `Q: What is x weeks in days? A:`)

In [None]:
# prompt = ... # TODO: Define 0-shot prompt
# gpt3_generate(prompt)

In [None]:
# prompt = ... # TODO: Define few-shot prompt
# gpt3_generate(prompt)

### Extra

If you're interested in knowing more about why large language models seem so good in some temporal and numerical reasoning examples and so bad on others, take a look at [this paper](https://arxiv.org/pdf/2202.07206.pdf), which studies the impact of pretraining term frequencies on few-shot reasoning for the GPT-based languaged models.

## Coreference resolution

Coreference (or co-reference) resolution is that task of identifying which person or thing different mentions refer to. For example, in 

> Sameer told Catarina that he would bike to work. 



In [None]:
def get_coref_prompt(text: str) -> str:
  import re
  text = re.sub(r"\s+", " ", text)
  return f"""{text}\nIn the previous sentence, decide who 'his' is referring to."""

In [None]:
prompt = """The lunch my boyfriend and I had with our lawyer got cancelled since his meeting went on too long."""
prompt = get_coref_prompt(prompt)
gpt3_generate(prompt, model_name='davinci')

### Exercise

In [None]:
prompt = ... # TODO: Try out your own sentences
prompt = get_coref_prompt(prompt)
gpt3_generate(prompt)

## Linguistic Acceptability 

The task of determining the linguistic acceptability (or grammaticality) of a sentence aims to test the linguistic competence of the models. A well known corpus for this task is *Corpus of Linguistic Acceptability* (CoLA) and comprises 10.6k sentences. In the [original paper](https://arxiv.org/pdf/1805.12471.pdf), the authors state that all models they tested (on 2018) perform far below human level in several grammatical constructions! 

Let us try GPT-3!!

In [None]:
def get_cola_prompt(text: str, template: int=1) -> str:
  templates = {
      1: lambda s: f"""Is this correct according to English grammar?\n\n{s}""",
      2: lambda s: f"""Is this sentence grammatically correct or not? {s}"""
  }
  import re
  text = re.sub(r"\s+", " ", text)
  return templates[template](text)

In [None]:
sentence = "The book devoured the pencil." 
prompt = get_cola_prompt(sentence)
gpt3_generate(prompt)

In [None]:
sentence = "They can sing."
prompt = get_cola_prompt(sentence)
gpt3_generate(prompt)

In [None]:
sentence = "The dog barked its way out of the room."
prompt = get_cola_prompt(sentence)
gpt3_generate(prompt, temperature=1)
print("\n\n")
gpt3_generate(prompt, model_name="curie")






In [None]:
sentence = "They caused him to become angry by making him."
prompt = get_cola_prompt(sentence)
gpt3_generate(prompt)
print("\n\n")
gpt3_generate(prompt, model_name="curie")






### Exercise

Try it for yourself 😁
Feel free to add more prompts to the method `get_cola_prompt` and try how they impact the model's performance!

In [None]:
sentence = ... ## TODO: type down a grammatically correct/incorrect sentence
prompt = get_cola_prompt(sentence, template=1)
gpt3_generate(prompt)

### [Extra] Syntax trees

In this small section, we'd like to point out a syntax tree example, one of the students shared with us yesterday :) 

Note that the model was not trained to perform this kind of task and it is amazing its capabilities to infer the task based on a natural language description!

In [None]:
prompt = """draw a syntactic tree for the following sentence: \n\nI like food."""
gpt3_generate(prompt)

Try to add a few examples in context to try to formalize the answer's format :) or try to specify it in the prompt, for example, by adding something like: 


In [None]:
sentence = "I like food"
prompt = f"""A two-column spreadsheet of a syntactic tree for each word in this sentence: \n\n{sentence}."""
gpt3_generate(prompt)

## [Generative] Creating analogies

In this part, we will use GPT-3's generation capabilities to generate analogies for any sentence we specify. Try running the same prompt multiple times. Due to the randomness of the generation it might happen so that the model does not generate interesting results at first, but ends up with good generations after a few times :) 


If this does not work, you can always provide a few examples (use **few-shot learning**).

In [None]:
metaphor = "Questions are arrows"
prompt = f"""Create an analogy for this phrase:\n\n{metaphor} in that:"""
gpt3_generate(prompt, model_name="curie")

In [None]:
metaphor = "You are an early bird"
prompt = f"""Create an analogy for this phrase:\n\n{metaphor} in that:"""
gpt3_generate(prompt, model_name="davinci")

### Exercise

Try it yourself :) 

In [None]:
metaphor = ... # TODO: type down the metaphor
prompt = f"""Create an analogy for this phrase:\n\n{metaphor} in that:"""

gpt3_generate(prompt, model_name="curie")

## [Generative] Text2Question

Let us use GPT3 to generate questions automatically for different prompts. Here are a few examples 😜

In [None]:
sentence = "I hide beneath the bed when I think of E.T."
prompt = f"""Write a question for this sentence:\n\n{sentence}"""
gpt3_generate(prompt, stop="?")
print("\n\n")
sentence = "When I am angry with the world, I hit people with a teddy bear."
prompt = f"""Write a question for this sentence:\n\n{sentence}"""
gpt3_generate(prompt, stop="?")






### Exercise 


In [None]:
prompt = """Create a list of 6 questions for my interview with a toaster:\n\n"""
gpt3_generate(prompt)

## [Generative] Other tasks

In this section, we leave examples of prompts that you can play with. These were definitely never ever used to train these models. It's amazing how these models are able to produce such results.

### Movie2Emoji

In [None]:
prompt = """Convert movie titles into emoji.

Back to the Future: 👨👴🚗🕒 
Batman: 🤵🦇 
Transformers: 🚗🤖 
Jurassic Park:
"""
gpt3_generate(prompt, model_name="davinci")

### Content creator

Create lists, spreadsheets in the format you desire.

In [None]:
prompt = """A two-column spreadsheet of top science fiction movies and the year of release:

Title |  Year of release"""
gpt3_generate(prompt)

In [None]:
prompt = """Make a list of the most important philosophers in history.

1."""
gpt3_generate(prompt, model_name="curie")

In [None]:
prompt = """Make a list of the most important philosophers in history.
Early Greek philosophers (Name, estimated time):\n"""
gpt3_generate(prompt, model_name="curie")

### Study guide 

You'll never be alone with GPT-3. He will always have something to help you with during your studies *ehehe*

In [None]:
prompt = """What are 5 key points I should know when studying contemporary art?"""
gpt3_generate(prompt)

### Content Rating

In [None]:
prompt = """Provide an ESRB rating for the following text:

"i'm going to blow your brains out with my ray gun then stomp on your guts."

ESRB rating:"""
gpt3_generate(prompt)

# [T0] Zero-shot and few-shot learning

We had a lot of fun with GPT-3, but let us explore the capabilities of other models that came out after GPT-3. In this section, we will explore **T0** -- a model proposed in March 2022 --, which not only 16x smaller than GPT-3 but also outperforms it in many tasks. 

It is known for its impressive **zero-shot task generalization** of English natural language prompts! 

<img src="https://raw.githubusercontent.com/yongzx/bigscience-workshop.github.io/gh-pages/en/pages/uploads/images/Octopus.png" alt="High level view scheme of T0 training scheme" style="height: 100px;"/>



## Setup

In the [original paper](https://arxiv.org/pdf/2110.08207.pdf), the authors analyse versions of T0 and determined that the variant **T0++** yields better results on average. We will use HuggingFace and load the corresponding model, which is named `t0pp`! 

The model contains 11B parameters overall, which is pretty big and won't load in the non-premium notebook. Hence, we set up a API for you to access these models!

In [None]:
%%time
%%capture
!pip install rich

# Visuals
from rich.console import Console
from rich.text import Text

# Console for printing with nice colors :)
console = Console(width=80)

CPU times: user 88.9 ms, sys: 17.5 ms, total: 106 ms
Wall time: 6.7 s


In [None]:
import json
import requests

HUGGING_FACE_API_KEY = ... # TODO: Set OPEN AI's API key with read access
HEADERS = {"Authorization": f"Bearer {HUGGING_FACE_API_KEY}"}

MODEL2URL = {
    "t0": "https://api-inference.huggingface.co/models/bigscience/T0",
    "t0-3b": "https://api-inference.huggingface.co/models/bigscience/T0_3B",
    "t0p": "https://api-inference.huggingface.co/models/bigscience/T0p",
    "t0pp": "https://api-inference.huggingface.co/models/bigscience/T0pp",
    "tk": "https://api-inference.huggingface.co/models/allenai/tk-instruct-11b-def",
    "tk-3b": "https://api-inference.huggingface.co/models/allenai/tk-instruct-3b-def",
}

def t_generate(prompt, model_name: str="t0pp", debug: bool=False, **kwargs):
  api_url = MODEL2URL[model_name]
  assert isinstance(model_name, str) and model_name in (MODEL2URL.keys())

  payload = {"inputs": prompt}
  payload.update(kwargs)
  if debug:
    print(payload)
  data = json.dumps(payload)
  response = requests.request("POST", api_url, headers=HEADERS, data=data)
  response = json.loads(response.content.decode("utf-8"))
  
  if isinstance(response, list):
    response = response[0]["generated_text"]
    console.print(f"[green]({model_name}) [bright_black]{prompt}[cyan] {response}")
  else:
    console.print(response)

## Numerical Reasoning


We will refrain from copying the task introduction. Please head to GPT-3 Numerical reasoning section for an introduction to the task. In this case, we will assess T0's availability to perform this task.

In [None]:
# Note: We add the `stop` argument because we're only interested in the output
# up until the very first period. No need to generate beyond that as we would
# be wasting resources!
prompt = """Q: What is 5 plus 5? A:"""
t_generate(prompt, stop=".")
prompt = """Q: What is 5 plus 4? A:"""
t_generate(prompt, stop=".")

In [None]:
prompt = """Q: What is 13 times 2? A:"""
t_generate(prompt, stop=".")
prompt = """Q: What is 2 times 5? A:"""
t_generate(prompt, stop=".")

T0 seems to get some of the math right, some of it wrong. Let's try **few-shot learning**! 

In [None]:
prompt = """Question: What is 24 plus 10? 34
Question: What is 2 plus 18? 20
Question: What is 3 plus 5? 8
Question: What is 10 plus 22? 32
Question: What is 13 plus 18?"""
t_generate(prompt, model_name="t0")

Let us try to boost the model's performance by adding a bit of **positive encouragement** in the prompt. (Feel free to tweak the prompt!)

In [None]:
prompt = """I'm great at math and addition. The following addition operations are easy.
Q: What is 24 plus 10? A: 34.
Q: What is 3 plus 18? A: 20.
Q: What is 3 plus 5? A: 8.
Q: What is 10 plus 22? A: 32.
Q: What is 13 plus 18? A:"""
t_generate(prompt)

In [None]:
# Now leveraging 8-shots, i.e., 8 examples in context
prompt = """Q: What is 24 plus 10? A: 34.
Q: What is 3 plus 18? A: 20.
Q: What is 4 plus 9? A: 13.
Q: What is 10 plus 22? A: 32.
Q: What is 3 plus 3? A: 6.
Q: What is 6 plus 5? A: 11.
Q: What is 12 plus 16? A: 28.
Q: What is 13 plus 22? A: 35.
Q: What is 13 plus 18? A:"""
t_generate(prompt)

### Exercise

Let us try something different, now! Can you think of any prompt? Try first defining it as a 0-shot task and see how the model performs. Then, define a few examples and create a prompt including those examples. Does it improve?

(**Suggestion**: On the same line of thought, we can experiment with the pattern `Q: What is x minutes in seconds? A:` or `Q: What is x weeks in days? A:`)

In [None]:
prompt = ... # TODO: Define 0-shot prompt
t_generate(prompt)

In [None]:
prompt = ... # TODO: Define few-shot prompt
t_generate(prompt)

## Coreference resolution

Coreference (or co-reference) resolution is that task of identifying which person or thing different mentions refer to. For example, in 

> Sameer told Catarina that he would bike to work. 

what entity does *he* refer to? We expect a good system to be able to identify *Sameer* as the correct entity but do current state-of-the-art NLP models do it?



In [None]:
def get_coref_prompt(text: str) -> str:
  import re
  text = re.sub(r"\s+", " ", text)
  return f"""{text}\nIn the previous sentence, decide who 'her' is referring to."""

In [None]:
prompt = """Barack Obama nominated Hilary Clinton as his secretary of state on Monday. 
He chose her because she had foreign affairs experience as a former First Lady."""
prompt = get_coref_prompt(prompt)
t_generate(prompt)

### Exercise

In [None]:
prompt = ... # TODO: Try out your own sentences
prompt = get_coref_prompt(prompt)
t_generate(prompt)

## Linguistic Acceptability 

As before, let us try to use T0 for determining whether the sentences are grammatically correct.

In [None]:
def get_cola_prompt(text: str, template: int=1) -> str:
  templates = {
      1: lambda s: f"Is this corret according to English Grammar?\n\n\"{s}\" Yes or no?",
      2: lambda s: f"Is this sentence grammatically correct or not? {s}",
      3: lambda s: f"Decide whether this sentence is grammatically correct:\n\n{s}",
      4: lambda s: f"""I want to know whether the following sentence is correct.\n{s}\nIs it?"""
  }



  import re
  text = re.sub(r"\s+", " ", text)
  return templates[template](text)

In [None]:
sentence = "The more books I ask to whom he will give, the more he reads." 
prompt = get_cola_prompt(sentence, template=1)
t_generate(prompt)

In [None]:
sentence = "They can sing."
prompt = get_cola_prompt(sentence, template=1)
t_generate(prompt)

In [None]:
prompt = "They can sing. Correct the previous sentence."
t_generate(prompt)

In [None]:
sentence = "They can play the oboe."
prompt = get_cola_prompt(sentence)
t_generate(prompt)

It seems that if we start the sentence *____ can sing.* with a pronoun or an object that T0 is not able to correctly determine the correctness of the sentence. 

In [None]:
sentence = "The dog barked its way out of the room."
prompt = get_cola_prompt(sentence)
t_generate(prompt)
print()
prompt = get_cola_prompt(sentence, template=2)
t_generate(prompt)
print()
prompt = get_cola_prompt(sentence, template=3)
t_generate(prompt)
print()
prompt = get_cola_prompt(sentence, template=4)
t_generate(prompt)







In [None]:
# Let us see if we can get an idea of why the model thinks that the sentence is correct
sentence = "The dog barked its way out of the room."
prompt = f"{sentence} Correct the previous sentence."

t_generate(prompt)

prompt = f"Correct this: {sentence}"
t_generate(prompt)

Weird... 🤔 The model seems to replicate the exact same sentence and, apparently, changing the prompt is not causing it to change its prediction. Let us try **few-shot learning**.

In [None]:
prompt = """Is the grammar on the following sentences correct?
They caused him to become angry by making him. no
You can sing. yes
singh and name sameer years is my am I 27 old. no
he told me no. yes
They can sing.
"""

t_generate(prompt)

In [None]:
prompt = "why is the sentence \"They can sing.\" incorrect?"
t_generate(prompt)

prompt = "why is the sentence \"They can sing.\" unacceptable?"
t_generate(prompt)

Okay, T0... You know it better.  We also tried tweaking the pronouns in the sentence! We find something interesting...  It seems like neither **you** nor **they** know how to sing according to T0 😅.

In [None]:
prompt = "Is the sentence \"I can sing.\" correct?"
t_generate(prompt)
prompt = "Is the sentence \"You can sing.\" correct?"
t_generate(prompt)
prompt = "Is the sentence \"She can sing.\" correct?"
t_generate(prompt)
prompt = "Is the sentence \"He can sing.\" correct?"
t_generate(prompt)
prompt = "Is the sentence \"We can sing.\" correct?"
t_generate(prompt)

In [None]:
prompt = """Is the grammar on the following sentences correct?
They caused him to become angry by making him. no
I can sing. yes
He can sing. yes
She no can sing. yes
sing You cannot. no
She they sing. no
singh and name sameer years is my am I 27 old. no
he told me no. yes
They can sing.
"""

t_generate(prompt)

Ok... This is my last try... I've looked into the `example 14` at the [model's Web Hosted Inference API](https://huggingface.co/bigscience/T0pp?text=The+word+%27binne%27+means+any+animal+that+is+furry+and+has+four+legs%2C+and+the+word+%27bam%27+means+a+simple+sort+of+dwelling.%0A%0A+Which+of+the+following+best+characterizes+binne+bams%3F%0A+-+Sentence+1%3A+Binne+bams+are+for+pets.%0A+-+Sentence+2%3A+Binne+bams+are+typically+furnished+with+sofas+and+televisions.%0A+-+Sentence+3%3A+Binne+bams+are+luxurious+apartments.%0A+-+Sentence+4%3A+Binne+bams+are+places+where+people+live.)  and appleid the same styling to our example! 🤯 🥳

In [None]:
prompt = """A correct SV sentence has one subject and one verb.

 Which of the following best characterizes a correct SV sentence?
 - Sentence 1: They sing to the birds.
 - Sentence 2: They sing.
 - Sentence 3: I drive my car every morning.
 - Sentence 4: Wellness centers are places where people live."""
t_generate(prompt)

### Exercise 

We challenge you to try your own sentences and prompts! Can you find a way that the model correctly classifies `They can sing.` as a correct sentence? 

In [None]:
sentence = ... # Try it for yourself
# Try few-shot, or different prompts :)
t_generate(prompt)

### [Extra] Syntax trees

In this small section, we'd like to point out a syntax tree example, one of the students shared with us yesterday :) 

Note that the model was not trained to perform this kind of task and it is amazing its capabilities to infer the task based on a natural language description!

In [None]:
prompt = """Create a syntactic tree for this sentence: \n\nI like food."""
t_generate(prompt)

In [None]:
sentence = "I like food"
prompt = f"""A two-column spreadsheet of a syntactic tree for each word in this sentence: \n\n{sentence}."""
t_generate(prompt)

In [None]:
prompt = f"""Input: I like food.
Syntax: I (subject) like (verb) food(object).
Input: Food was eaten by me.
Syntax: me (subject) eaten (verb) Food(object).
Input: She really hates cats.
Output:"""
t_generate(prompt)

## [Generative] Creating analogies

In this part, we will use T0 generation capabilities to generate analogies for any sentence we specify. Try running the same prompt multiple times. Due to the randomness of the generation it might happen so that the model does not generate interesting results at first, but ends up with good generations after a few times :) 


If this does not work, you can always provide a few examples (use **few-shot learning**).

In [None]:
metaphor = "Questions are arrows"
prompt = f"""Create an analogy for this phrase:\n\n{metaphor}, this means that"""
t_generate(prompt)

In [None]:
metaphor = "You are an early bird"
prompt = f"""Create an analogy for this phrase:\n\n{metaphor}, this means that"""
t_generate(prompt)

### Exercise

Try it yourself :) 

## [Generative] Text2Question

Let us use T0 to generate questions automatically for different prompts. Here are a few examples 😜

In [None]:
sentence = "I hide beneath the bed when I think of E.T."
prompt = f"""Write a question for this sentence:\n\n{sentence}"""
t_generate(prompt)

In [None]:
sentence = "When I am angry with the world, I hit people with a teddy bear."
prompt = f"""Write a question for this sentence:\n\n{sentence}"""
t_generate(prompt)

Ok! This looks like T0 has a better time creating questions than numerical reasoning, discerning grammatically correct sentences, and syntax trees.

### Exercise

Try your own examples!

In [None]:
prompt = """Create a list of 6 questions for my interview with NASA:\n1."""
t_generate(prompt)

In [None]:
prompt = ... #TODO
t_generate(prompt)

## Other generative (and discriminative) tasks 

So far, it seems like T0 is not as good as strong as GPT-based models in generating creative continuations of our inputs. You'll probably noticed that its outputs are less imaginative and also less random. 

In part, this can be explained by the pretraining strategy. T0 was trained on a variety of NLP tasks. These are typically well structured and the authors converted them into natural language formats using several templates that you can check on their paper. In some sense, this may explain its lower abilities to generalize.


For this reason, we will focus this next section in other tasks that these models are able to do, somewhat successfully. However, feel free to try out the sames tasks as we've seen in the GPT-3 section.

### Multiple choice answer

In [None]:
prompt = """The word 'binne' means any animal that is furry and has four legs, and the word 'bam' means a simple sort of dwelling.

 Which of the following best characterizes binne bams?
 - Sentence 1: Binne bams are for pets.
 - Sentence 2: Binne bams are typically furnished with sofas and televisions.
 - Sentence 3: Binne bams are luxurious apartments.
 - Sentence 4: Binne bams are places where people live."""
t_generate(prompt)

### Re-ordering words

In [None]:
prompt = """Reorder the words in this sentence: justin and name bieber years is my am I 27 old."""
t_generate(prompt)

### Copying sentences with different intents

In [None]:
prompt = """Task: copy but say the opposite.
 PSG won its match against Barca."""
t_generate(prompt)

### Concept2Question and Concept2Text

In [None]:
prompt = """Convert the concepts to a question: knife, hungry"""
t_generate(prompt) 

In [None]:
prompt = """Humans can easily string together abstract concepts to form a coherent
sentence. An example of a sentence with the concepts "food", "avocado" can be"""
t_generate(prompt, "t0")

In [None]:
prompt = """Humans can easily string together abstract concepts to form a coherent
sentence. An example of a sentence with the concepts "bones", "avocado", "puppy" can be"""
t_generate(prompt, "t0")

### Generating *stories*

In [None]:
prompt = """Generate a story about a one-eyed dog called Rufus."""
t_generate(prompt, "t0")

### [Extra] Testing T0 with GPT-3 generative tasks

We've tried the Title to emoji task but T0 does not support emojis. Instead, we've defined a similar task where we create a summary of movies based on the title. 

Another interesting idea to try out could be give it a list of concepts and ask it to generate or determine the movie title.

In [None]:
prompt = """Task: Write a 5 paragraph sumary on the novel "Of Rice and Lies":"""

gpt3_generate(prompt, model_name="davinci", max_tokens=900)

In [None]:
prompt = """A two-column spreadsheet of top science fiction movies and the year of release:
Title |  Year of release"""
t_generate(prompt)

In [None]:
prompt = """Make a list of the most important philosophers in history.\n1."""
t_generate(prompt)

In [None]:
prompt = """Make a list of the most important philosophers in history.
Early Greek philosophers (Name, estimated time):\n"""
t_generate(prompt)

# [Tk-Instruct] Zero- and few-shot learning

The other model we will be looking at, also published in early 2022, is called [Tk-Instruct](https://arxiv.org/abs/2204.07705). It's architecture resembles T5 but the models were trained on 70+ distinct language tasks with expert written instructions.

![Tk-instruct was trained on the Natural Instructions benchmark 2](https://raw.githubusercontent.com/PastelBelem8/langsci-summer-school2022/main/imgs/training_datasets.png)

Let us see how it fares!


**NOTE**: Make sure you run the setup for T0.


## Defining tasks with Tk-instruct 


Each task constitutes:
- **definition**: the natural language definition of the task that you'd like your model to perform. This involves a "complete definition of how an input text is expected to be mapped to an output text".
- **positive examples**: are samples of inputs and correct outputs to them, along with a short explanation for each.
- **negative examples**: are sample inputs with incorrect/invalid outputs to them, along with a short explanation for each.
- **instances**: large collection of input-output pairs for each task. Each instance consists of a textual input and a list of acceptable textual outputs.

Consider the following schematics: 

![Schema of an input example for Tk-instruct](https://raw.githubusercontent.com/PastelBelem8/langsci-summer-school2022/main/imgs/Screenshot%20from%202022-08-05%2013-30-07.png)

Consider the following sentiment classification examples. Possible prompts for `tk-instruct` could be: 

In [None]:
prompt = """Given a review, classify it into one of 4 categories: Positive, Negative, Neutral, or Mixed. 
The pumpkin was one of the worst that I've had in my life."""

t_generate(prompt, "tk-3b")

In [None]:
prompt = """"Definition: return the currency of the given country. Now complete the following example - Input: India. Output:"""
t_generate(prompt, "tk-3b")

In [None]:
prompt = """Definition: negate the following sentence. Input: John went to school. Output:"""
t_generate(prompt, "tk-3b")

## Numerical reasoning

In [None]:
prompt = """Definition: add the following numbers. Input: 13 plus 18. Output:"""
t_generate(prompt, "tk-3b")

In [None]:
prompt = """Definition: Given a textual description of an addition operation, expressed as "X plus Y" determine the result of adding those numbers.
Positive Example 1-
  input: 3 plus 8
  output: 11
  explanation: 3+8=11
Positive Example 2-
  input: 15 plus 7
  output: 22
  explanation: 15+7=22
Positive Example 3-
  input: 13 plus 2
  output: 15
  explanation: 13+2=15
Positive Example 4-
  input: 15 plus 18
  output: 33
  explanation: 15+18=33
Now complete the following example-
  input: 13 plus 18
  output:"""

t_generate(prompt, "tk-3b")

## Coreference Resolution

In [None]:
prompt = """Given two sentences decide who the pronoun 'her' refers to. 
Barack Obama nominated Hilary Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady."""

t_generate(prompt, "tk-3b")

## Linguistic Acceptability

In [None]:
prompt = """Given an English sentence, classify it according to its grammar into one of 2 categories: correct, or incorrect.
They can sing."""

t_generate(prompt, "tk-3b")

In [None]:
prompt = """Given an English sentence, classify it according to its grammar into one of 2 categories: incorrect or correct.
I love you."""

t_generate(prompt, "tk-3b")

### Exercise

Try it yourself. 

In [None]:
prompt = ... # TODO:
t_generate(prompt, "tk-3b")

## Miscellaneous

In [None]:
prompt = """Generate a story about a one-eyed dog called Rufus."""
t_generate(prompt, "tk-3b")