# Introduction to Data Science 2025

# Week 4

In this week's exercise, we look at prompting and zero- and few-shot task settings. Below is a text generation example from https://github.com/TurkuNLP/intro-to-nlp/blob/master/text_generation_pipeline_example.ipynb demonstrating how to load a text generation pipeline with a pre-trained model and generate text with a given prompt. Your task is to load a similar pre-trained generative model and assess whether the model succeeds at a set of tasks in zero-shot, one-shot, and two-shot settings.

**Note: Downloading and running the pre-trained model locally may take some time. Alternatively, you can open and run this notebook on [Google Colab](https://colab.research.google.com/), as assumed in the following example.**

## Text generation example

This is a brief example of how to run text generation with a causal language model and `pipeline`.

Install [transformers](https://huggingface.co/docs/transformers/index) python package. This will be used to load the model and tokenizer and to run generation.

In [1]:
!pip install --quiet transformers
!pip install torch torchvision torchaudio


^C


Collecting torchvision
  Downloading torchvision-0.23.0-cp310-cp310-win_amd64.whl.metadata (6.1 kB)
Collecting torchaudio
  Downloading torchaudio-2.8.0-cp310-cp310-win_amd64.whl.metadata (7.2 kB)
Collecting pillow!=8.3.*,>=5.3.0 (from torchvision)
  Downloading pillow-11.3.0-cp310-cp310-win_amd64.whl.metadata (9.2 kB)
Downloading torchvision-0.23.0-cp310-cp310-win_amd64.whl (1.6 MB)
   ---------------------------------------- 0.0/1.6 MB ? eta -:--:--
   ---------------------------------------- 0.0/1.6 MB ? eta -:--:--
   ------ --------------------------------- 0.3/1.6 MB ? eta -:--:--
   ------------- -------------------------- 0.5/1.6 MB 1.3 MB/s eta 0:00:01
   -------------------------- ------------- 1.0/1.6 MB 1.3 MB/s eta 0:00:01
   -------------------------- ------------- 1.0/1.6 MB 1.3 MB/s eta 0:00:01
   -------------------------------- ------- 1.3/1.6 MB 1.3 MB/s eta 0:00:01
   ---------------------------------------- 1.6/1.6 MB 1.2 MB/s  0:00:01
Downloading torchaudio-2.8.0-

Import the `AutoTokenizer`, `AutoModelForCausalLM`, and `pipeline` classes. The first two support loading tokenizers and generative models from the [Hugging Face repository](https://huggingface.co/models), and the last wraps a tokenizer and a model for convenience.

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

  from .autonotebook import tqdm as notebook_tqdm


Load a generative model and its tokenizer. You can substitute any other generative model name here (e.g. [other TurkuNLP GPT-3 models](https://huggingface.co/models?sort=downloads&search=turkunlp%2Fgpt3)), but note that Colab may have issues running larger models. 

In [4]:
MODEL_NAME = 'TurkuNLP/gpt3-finnish-large'

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


KeyboardInterrupt: 

Instantiate a text generation pipeline using the tokenizer and model.

In [22]:
pipe = pipeline(
    'text-generation',
    model=model,
    tokenizer=tokenizer,
    device=model.device
)

We can now call the pipeline with a text prompt; it will take care of tokenizing, encoding, generation, and decoding:

In [23]:
output = pipe('Terve, miten menee?', max_new_tokens=25)

print(output)

[{'generated_text': 'Terve, miten menee?”\n”Hyvin, kiitos.”\n”Kiva kuulla.”\n”Kuule, minulla on sinulle asiaa.”\n'}]


Just print the text

In [24]:
print(output[0]['generated_text'])

Terve, miten menee?”
”Hyvin, kiitos.”
”Kiva kuulla.”
”Kuule, minulla on sinulle asiaa.”



We can also call the pipeline with any arguments that the model `generate` function supports. For details on text generation using `transformers`, see e.g. [this tutorial](https://huggingface.co/blog/how-to-generate).

Example with sampling and a high `temperature` parameter to generate more chaotic output:

In [26]:
output = pipe(
    'Terve, miten menee?',
    do_sample=True,
    temperature=10.0,
    max_new_tokens=25
)

print(output[0]['generated_text'])

Terve, miten menee? kysyi Heikki yhtäkkiä astuessaan taloon sisään kantaen kahvipöytä-tarvikkeita käsissään sisään tultuaan.
(Ryökäle luuli varmasti hänen tulevan meille istumaan)




## Exercise 1

Your task is to assess whether a generative model succeeds in the following tasks in zero-shot, one-shot, and two-shot settings:

- binary sentiment classification (positive / negative)

- person name recognition

- two-digit addition (e.g. 11 + 22 = 33)

For example, for assessing whether a generative model can name capital cities, we could use the following prompts:

- zero-shot:
	>"""\
	>Identify the capital cities of countries.
	>
	>Question: What is the capital of Finland?\
	>Answer:\
	>"""
- one-shot:
	>"""\
	>Identify the capital cities of countries.
	>
	>Question: What is the capital of Sweden?\
	>Answer: Stockholm
	>
	>Question: What is the capital of Finland?\
	>Answer:\
	>"""
- two-shot:
	>"""\
	>Identify the capital cities of countries.
	>
	>Question: What is the capital of Sweden?\
	>Answer: Stockholm
	>
	>Question: What is the capital of Denmark?\
	>Answer: Copenhagen
	>
	>Question: What is the capital of Finland?\
	>Answer:\
	>"""

You can do the tasks either in English or Finnish and use a generative model of your choice from the Hugging Face models repository, for example the following models:

- English: `gpt2-large`
- Finnish: `TurkuNLP/gpt3-finnish-large`

You can either come up with your own instructions for the tasks or use the following:

- English:
	- binary sentiment classification: "Do the following texts express a positive or negative sentiment?"
	- person name recognition: "List the person names occurring in the following texts."
	- two-digit addition: "This is a first grade math exam."
- Finnish:
	- binary sentiment classification: "Ilmaisevatko seuraavat tekstit positiivista vai negatiivista tunnetta?"
	- person name recognition: "Listaa seuraavissa teksteissä mainitut henkilönnimet."
	- two-digit addition: "Tämä on ensimmäisen luokan matematiikan koe."

Come up with at least two test cases for each of the three tasks, and come up with your own one- and two-shot examples.

In [None]:
prompts = {
    "sentiment": {
        "zero": [
            'Do the following texts express a positive or negative sentiment?\n\nText: "I love this movie!"\nAnswer:',
            'Do the following texts express a positive or negative sentiment?\n\nText: "He makes me mad."\nAnswer:'
        ],
        "one": [
            'Do the following texts express a positive or negative sentiment?\n\nText: "I hate rock music."\nAnswer: Negative\n\nText: "I love this movie!"\nAnswer:',
            'Do the following texts express a positive or negative sentiment?\n\nText: "The food was terrible."\nAnswer: Negative\n\nText: "I enjoyed my stay!"\nAnswer:'
        ],
        "two": [
            'Do the following texts express a positive or negative sentiment?\n\nText: "I hate traffic jams."\nAnswer: Negative\n\nText: "The food was amazing."\nAnswer: Positive\n\nText: "I love this movie!"\nAnswer:',
            'Do the following texts express a positive or negative sentiment?\n\nText: "The food was bland."\nAnswer: Negative\n\nText: "The staff were very friendly."\nAnswer: Positive\n\nText: "I enjoyed the concert a lot!"\nAnswer:'
        ]
    },
    "names": {
        "zero": [
            'List the person names occurring in the following text:\n\nText: "Bob and Patrick are best friends."\nAnswer:',
            'List the person names occurring in the following text:\n\nText: "Emmy and Sarah were here."\nAnswer:'
        ],
        "one": [
            'List the person names occurring in the following text:\n\nText: "John and Mary are classmates."\nAnswer: John, Mary\n\nText: "Alice and Bob went to the park."\nAnswer:',
            'List the person names occurring in the following text:\n\nText: "Emma, Olivia, and Liam are siblings."\nAnswer: Emma, Olivia, Liam\n\nText: "Michael and Sarah attended the meeting."\nAnswer:'
        ],
        "two": [
            'List the person names occurring in the following text:\n\nText: "Bob and Patrick are best friends."\nAnswer: Bob, Patrick\n\nText: "Tom and Jerry are friends."\nAnswer: Tom, Jerry\n\nText: "Alice and Bob went to the park."\nAnswer:',
            'List the person names occurring in the following text:\n\nText: "Eetos and Uljas are siblings."\nAnswer: Eetos, Uljas\n\nText: "Sara and Sera went to school."\nAnswer: Sara, Seraw\n\nText: "Michael and Sarah attended the meeting."\nAnswer:'
        ]
    },
    "addition": {
        "zero": [
            'This is a first grade math exam.\n\nWhat is 9 + 10?\nAnswer:',
            'This is a first grade math exam.\n\nWhat is 420 + 69?\nAnswer:'
        ],
        "one": [
            'This is a first grade math exam.\n\nWhat is 666 + 69?\nAnswer: 735\n\nWhat is 11 + 22?\nAnswer:',
            'This is a first grade math exam.\n\nWhat is 11 + 22?\nAnswer: 33\n\nWhat is 17 + 28?\nAnswer:'
        ],
        "two": [
            'This is a first grade math exam.\n\nWhat is 41 + 22?\nAnswer: 63\n\nWhat is 23 + 45?\nAnswer: 68\n\nWhat is 11 + 22?\nAnswer:',
            'This is a first grade math exam.\n\nWhat is 14 + 14?\nAnswer: 28\n\nWhat is 21 + 33?\nAnswer: 54\n\nWhat is 17 + 28?\nAnswer:'
        ]
    }
}

for task, settings in prompts.items():
    print(task.upper())
    for shot, examples in settings.items():
        print(shot)
        for i, prompt in enumerate(examples, 1):
            output = pipe(prompt, max_new_tokens=20)
            print(f"Test{i} output: {output[0]['generated_text']}")

"""

SENTIMENT
zero



Test1 output: Do the following texts express a positive or negative sentiment?

Text: "I love this movie!"
Answer: "Yes! I love this movie!"

Text: "I was shocked by this movie!"



Test2 output: Do the following texts express a positive or negative sentiment?

Text: "He makes me mad."
Answer: "He makes me mad."

Text: "He makes me mad."

Answer:
one


Test1 output: Do the following texts express a positive or negative sentiment?

Text: "I hate rock music."
Answer: Negative

Text: "I love this movie!"
Answer: Positive

Text: "I love this book!"

Answer: Positive

Text:

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test2 output: Do the following texts express a positive or negative sentiment?

Text: "The food was terrible."
Answer: Negative

Text: "I enjoyed my stay!"
Answer: Positive

Text: "I will miss this place. Thank you for all the fun we had
two

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test1 output: Do the following texts express a positive or negative sentiment?

Text: "I hate traffic jams."
Answer: Negative

Text: "The food was amazing."
Answer: Positive

Text: "I love this movie!"
Answer: Positive

Text: "I hate traffic jams."

Answer: Negative

Text:

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test2 output: Do the following texts express a positive or negative sentiment?

Text: "The food was bland."
Answer: Negative

Text: "The staff were very friendly."
Answer: Positive

Text: "I enjoyed the concert a lot!"
Answer: Negative

Text: "I wish I had the opportunity to go again!"

Answer:
NAMES
zero

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test1 output: List the person names occurring in the following text:

Text: "Bob and Patrick are best friends."
Answer: "Bob and Patrick are best friends."

Text: "Bob and Patty are best friends."

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test2 output: List the person names occurring in the following text:

Text: "Emmy and Sarah were here."
Answer: "Emmy and Sarah were here."

The text above is the same as this:

one

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test1 output: List the person names occurring in the following text:

Text: "John and Mary are classmates."
Answer: John, Mary

Text: "Alice and Bob went to the park."
Answer: Alice and Bob

Text: "They went to the park to have a picnic." Answer:

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test2 output: List the person names occurring in the following text:

Text: "Emma, Olivia, and Liam are siblings."
Answer: Emma, Olivia, Liam

Text: "Michael and Sarah attended the meeting."
Answer: Michael and Sarah

Text: "Emma and Liam attended the meeting."

Answer:
two

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test1 output: List the person names occurring in the following text:

Text: "Bob and Patrick are best friends."
Answer: Bob, Patrick

Text: "Tom and Jerry are friends."
Answer: Tom, Jerry

Text: "Alice and Bob went to the park."
Answer: Alice

Text: "Alice and Bob went to the park."

Answer: Alice

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test2 output: List the person names occurring in the following text:

Text: "Eetos and Uljas are siblings."
Answer: Eetos, Uljas

Text: "Sara and Sera went to school."
Answer: Sara, Seraw

Text: "Michael and Sarah attended the meeting."
Answer: Michael, Sarah,

ADDITION
zero

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test1 output: This is a first grade math exam.

What is 9 + 10?
Answer:

9

10

10

10

10

10


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test2 output: This is a first grade math exam.

What is 420 + 69?
Answer: 420 + 69 = 420.

What is the difference between 420 and 69?

Answer
one

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test1 output: This is a first grade math exam.

What is 666 + 69?
Answer: 735

What is 11 + 22?
Answer: 2

What is 666 + 7?

Answer: 7

What is 2 +

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Test2 output: This is a first grade math exam.

What is 11 + 22?
Answer: 33

What is 17 + 28?
Answer: 40

What is 12 + 14?

Answer: 17

What is 6 +
two


Test1 output: This is a first grade math exam.

What is 41 + 22?
Answer: 63

What is 23 + 45?
Answer: 68

What is 11 + 22?
Answer: 33

What is 19 + 5?

Answer: 19

What is 4 +
Test2 output: This is a first grade math exam.

What is 14 + 14?
Answer: 28

What is 21 + 33?
Answer: 54

What is 17 + 28?
Answer: 27

What is 8 + 5?

Answer: 19

What is 10 +
"""

SENTIMENT
zero


NameError: name 'pipe' is not defined

**Submit this exercise by submitting your code and your answers to the above questions as comments on the MOOC platform. You can return this Jupyter notebook (.ipynb) or .py, .R, etc depending on your programming preferences.**