<a href="https://colab.research.google.com/github/TurkuNLP/intro-to-nlp/blob/master/intro_2023_exercise_13_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Example solution to exercise task 13

This notebook contains example solutions to exercise task 13, demonstrating zero-shot, one-shot, and two-shot prediction for a number of tasks.

---

## Setup

In [1]:
!pip install --quiet transformers

Import the `AutoTokenizer`, `AutoModelForCausalLM`, and `pipeline` classes. The first two support loading tokenizers and generative models from the [Hugging Face repository](https://huggingface.co/models), and the last wraps a tokenizer and a model for convenience.

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

---

## English solution

We'll provide an example solution for English. A Finnish solution would proceed similarly using a Finnish model, setting e.g.

```MODEL_NAME = 'TurkuNLP/gpt3-finnish-large'```

and using Finnish templates and examples.

Load a generative model and its tokenizer.

In [3]:
MODEL_NAME = 'gpt2-large'

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

Instantiate a text generation pipeline using the tokenizer and model.

In [4]:
pipe = pipeline(
    'text-generation',
    model=model,
    tokenizer=tokenizer,
    device=model.device
)

Simple convenience function calling the pipeline with a prompt and returning one generated output:

In [5]:
def generate(prompt, max_new_tokens=5):
    output = pipe(
        prompt,
        max_new_tokens=max_new_tokens,
        do_sample=False,    # make repeatable
        pad_token_id=tokenizer.eos_token_id    # suppresses an unnecessary warning
    )
    return output[0]['generated_text']

Test generation

In [6]:
print(generate("Hi, how's it going?", max_new_tokens=10))



Hi, how's it going?

I'm doing great.

I


### Binary sentiment classification

We'll use the following test cases:

**Zero-shot**:

We'll use the following template with a natural language instruction for our inputs.

In [7]:
sentences = [
    "This movie is absolutely wonderful, I love it!",
    "That is certainly the worst laptop computer ever made.",
    "I'm not sure what to think about it, but I guess on balance it's OK",
    "This is certainly not particularly good.",
    "I'm feeling quite upbeat today!"
]

In [8]:
template = """Do the following texts express a positive or negative sentiment?

Text: {}
Answer:"""

We can fill the template like so:

In [9]:
print(template.format(sentences[0]))

Do the following texts express a positive or negative sentiment?

Text: This movie is absolutely wonderful, I love it!
Answer:


We can now generate for all of the sentences:

In [10]:
for s in sentences:
    print(generate(template.format(s)))
    print('---')

Do the following texts express a positive or negative sentiment?

Text: This movie is absolutely wonderful, I love it!
Answer: I love it!

---
Do the following texts express a positive or negative sentiment?

Text: That is certainly the worst laptop computer ever made.
Answer: I am not sure.
---
Do the following texts express a positive or negative sentiment?

Text: I'm not sure what to think about it, but I guess on balance it's OK
Answer: I'm not sure what
---
Do the following texts express a positive or negative sentiment?

Text: This is certainly not particularly good.
Answer: This is not particularly good
---
Do the following texts express a positive or negative sentiment?

Text: I'm feeling quite upbeat today!
Answer: I'm feeling quite upbeat
---


None of these show the desired output ("positive" or "negative"). Results like this are quite common for zero-shot settings with "pure" language models. While this could be improved on by working on the prompt, let's move on to few-shot settings.

**One-shot**

Let's add one example into our template:

In [11]:
template = """Do the following texts express a positive or negative sentiment?

Text: This is great!
Answer: positive

Text: {}
Answer:"""

And now just run the same generation:

In [12]:
for s in sentences:
    print(generate(template.format(s)))
    print('---')

Do the following texts express a positive or negative sentiment?

Text: This is great!
Answer: positive

Text: This movie is absolutely wonderful, I love it!
Answer: negative

Text:
---
Do the following texts express a positive or negative sentiment?

Text: This is great!
Answer: positive

Text: That is certainly the worst laptop computer ever made.
Answer: negative

Text:
---
Do the following texts express a positive or negative sentiment?

Text: This is great!
Answer: positive

Text: I'm not sure what to think about it, but I guess on balance it's OK
Answer: negative

Text:
---
Do the following texts express a positive or negative sentiment?

Text: This is great!
Answer: positive

Text: This is certainly not particularly good.
Answer: negative

Text:
---
Do the following texts express a positive or negative sentiment?

Text: This is great!
Answer: positive

Text: I'm feeling quite upbeat today!
Answer: positive

Text:
---


Some of these are now correct. Note that the model generates extra text after the answer, predicting that the pattern of "Text:" and "Answer:" will continue: there is nothing telling the model it should stop. We could get rid of this by reducing the value of `max_new_tokens` when calling the pipeline. 

**Two-shot**

Again the only thing we change is add another example to our template:

In [13]:
template = """Do the following texts express a positive or negative sentiment?

Text: This is great!
Answer: positive

Text: This is terrible!
Answer: negative

Text: {}
Answer:"""

Rerun:

In [14]:
for s in sentences:
    print(generate(template.format(s)))
    print('---')

Do the following texts express a positive or negative sentiment?

Text: This is great!
Answer: positive

Text: This is terrible!
Answer: negative

Text: This movie is absolutely wonderful, I love it!
Answer: negative

Text:
---
Do the following texts express a positive or negative sentiment?

Text: This is great!
Answer: positive

Text: This is terrible!
Answer: negative

Text: That is certainly the worst laptop computer ever made.
Answer: positive

Text:
---
Do the following texts express a positive or negative sentiment?

Text: This is great!
Answer: positive

Text: This is terrible!
Answer: negative

Text: I'm not sure what to think about it, but I guess on balance it's OK
Answer: negative

Text:
---
Do the following texts express a positive or negative sentiment?

Text: This is great!
Answer: positive

Text: This is terrible!
Answer: negative

Text: This is certainly not particularly good.
Answer: negative

Text:
---
Do the following texts express a positive or negative sentiment?


Curiously, that appears to have gotten worse, although the model does maintain the correct pattern of responses. We could continue going with 3-shot, 4-shot etc. and would expect some improvement, but let's move on to the next task.

---

## Person name recognition

We'll follow the same pattern as above, providing a number of example sentences and separate templates for zero-, one-, and two-shot prediction.

In [15]:
sentences = [
    "John went to the store.",
    "Her name is Jane.",
    "The president is Joe Biden.",
    "Is she called Mary?",
    "John and Jane went home.",
]

**Zero-shot**

In [16]:
template = """List the person names occurring in the following texts.

Text: {}
Names:"""

for s in sentences:
    print(generate(template.format(s)))
    print('---')

List the person names occurring in the following texts.

Text: John went to the store.
Names: John, John, John
---
List the person names occurring in the following texts.

Text: Her name is Jane.
Names: Jane, Jane, Jane
---
List the person names occurring in the following texts.

Text: The president is Joe Biden.
Names: Joe Biden, Joe Biden
---
List the person names occurring in the following texts.

Text: Is she called Mary?
Names: Mary, Mary, Mary
---
List the person names occurring in the following texts.

Text: John and Jane went home.
Names: John and Jane went home
---


This has some indications of the right answers, but none are fully correct.

**One-shot**

In [17]:
template = """List the person names occurring in the following texts.

Text: Bob is buying groceries.
Names: Bob

Text: {}
Names:"""

for s in sentences:
    print(generate(template.format(s)))
    print('---')

List the person names occurring in the following texts.

Text: Bob is buying groceries.
Names: Bob

Text: John went to the store.
Names: John

Text:
---
List the person names occurring in the following texts.

Text: Bob is buying groceries.
Names: Bob

Text: Her name is Jane.
Names: Jane

Text:
---
List the person names occurring in the following texts.

Text: Bob is buying groceries.
Names: Bob

Text: The president is Joe Biden.
Names: Joe

Text:
---
List the person names occurring in the following texts.

Text: Bob is buying groceries.
Names: Bob

Text: Is she called Mary?
Names: Mary

Text:
---
List the person names occurring in the following texts.

Text: Bob is buying groceries.
Names: Bob

Text: John and Jane went home.
Names: Jane and John


---


This has some correct answers, but it fails when there is more than one proper noun. Let's give an example of that.

**Two-shot**

In [18]:
template = """List the person names occurring in the following texts.

Text: Bob is buying groceries.
Names: Bob

Text: Alice is a fan of the vice president, Kamala Harris.
Names: Alice, Kamala Harris

Text: {}
Names:"""

for s in sentences:
    print(generate(template.format(s)))
    print('---')

List the person names occurring in the following texts.

Text: Bob is buying groceries.
Names: Bob

Text: Alice is a fan of the vice president, Kamala Harris.
Names: Alice, Kamala Harris

Text: John went to the store.
Names: John

Text:
---
List the person names occurring in the following texts.

Text: Bob is buying groceries.
Names: Bob

Text: Alice is a fan of the vice president, Kamala Harris.
Names: Alice, Kamala Harris

Text: Her name is Jane.
Names: Jane

Text:
---
List the person names occurring in the following texts.

Text: Bob is buying groceries.
Names: Bob

Text: Alice is a fan of the vice president, Kamala Harris.
Names: Alice, Kamala Harris

Text: The president is Joe Biden.
Names: Joe Biden

Text
---
List the person names occurring in the following texts.

Text: Bob is buying groceries.
Names: Bob

Text: Alice is a fan of the vice president, Kamala Harris.
Names: Alice, Kamala Harris

Text: Is she called Mary?
Names: Mary

Text:
---
List the person names occurring in the

All but one correct. Again we could continue providing more examples or refine the prompt to try to improve on this.

## Two-digit addition

We'll follow the same pattern as above with a simple patter `X + Y = ?`

In [19]:
examples = [
    "10 + 10",
    "15 + 25",
    "31 + 42",
    "78 + 47",
    "99 + 98",
]

**One-shot**

In [20]:
template = """This is a first grade math exam.

{} ="""

for e in examples:
    print(generate(template.format(e)))
    print('---')

This is a first grade math exam.

10 + 10 = 20

20 +
---
This is a first grade math exam.

15 + 25 = 50

15 +
---
This is a first grade math exam.

31 + 42 = 63

The answer
---
This is a first grade math exam.

78 + 47 = 100

The answer
---
This is a first grade math exam.

99 + 98 = 100

The answer
---


It manages 10 + 10, but not the others. Let's see if an example helps.

**One-shot**

In [21]:
template = """This is a first grade math exam.

11 + 12 = 23

{} ="""

for e in examples:
    print(generate(template.format(e)))
    print('---')

This is a first grade math exam.

11 + 12 = 23

10 + 10 = 23

9 +
---
This is a first grade math exam.

11 + 12 = 23

15 + 25 = 33

15 +
---
This is a first grade math exam.

11 + 12 = 23

31 + 42 = 63

31 +
---
This is a first grade math exam.

11 + 12 = 23

78 + 47 = 97

The answer
---
This is a first grade math exam.

11 + 12 = 23

99 + 98 = 99

99 +
---


Nope, definitely not; now it even got 10 + 10 wrong, copying the first answer instead. How about two shots?

**Two-shot**

In [22]:
template = """This is a first grade math exam.

11 + 12 = 23

42 + 42 = 84

{} ="""

for e in examples:
    print(generate(template.format(e)))
    print('---')

This is a first grade math exam.

11 + 12 = 23

42 + 42 = 84

10 + 10 = 15

10 +
---
This is a first grade math exam.

11 + 12 = 23

42 + 42 = 84

15 + 25 = 33

15 +
---
This is a first grade math exam.

11 + 12 = 23

42 + 42 = 84

31 + 42 = 63

31 +
---
This is a first grade math exam.

11 + 12 = 23

42 + 42 = 84

78 + 47 = 100

The answer
---
This is a first grade math exam.

11 + 12 = 23

42 + 42 = 84

99 + 98 = 144

So,
---


This one seems quite hopeless. As surprising as it is that a model that shows some emergent ability to do natural language tasks such as sentiment classification and name recognition can't do a basic computer task like two-digit addition, this is actually a common result: even very large language models make arithmetic mistakes, although not in cases this simple. 