# Introduction to Data Science 2025

# Week 4

In this week's exercise, we look at prompting and zero- and few-shot task settings. Below is a text generation example from https://github.com/TurkuNLP/intro-to-nlp/blob/master/text_generation_pipeline_example.ipynb demonstrating how to load a text generation pipeline with a pre-trained model and generate text with a given prompt. Your task is to load a similar pre-trained generative model and assess whether the model succeeds at a set of tasks in zero-shot, one-shot, and two-shot settings.

**Note: Downloading and running the pre-trained model locally may take some time. Alternatively, you can open and run this notebook on [Google Colab](https://colab.research.google.com/), as assumed in the following example.**

## Text generation example

This is a brief example of how to run text generation with a causal language model and `pipeline`.

Install [transformers](https://huggingface.co/docs/transformers/index) python package. This will be used to load the model and tokenizer and to run generation.

In [1]:
!pip install --quiet transformers

Import the `AutoTokenizer`, `AutoModelForCausalLM`, and `pipeline` classes. The first two support loading tokenizers and generative models from the [Hugging Face repository](https://huggingface.co/models), and the last wraps a tokenizer and a model for convenience.

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

  from .autonotebook import tqdm as notebook_tqdm


Load a generative model and its tokenizer. You can substitute any other generative model name here (e.g. [other TurkuNLP GPT-3 models](https://huggingface.co/models?sort=downloads&search=turkunlp%2Fgpt3)), but note that Colab may have issues running larger models. 

In [3]:
MODEL_NAME = 'TurkuNLP/gpt3-finnish-large'

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


Instantiate a text generation pipeline using the tokenizer and model.

In [4]:
pipe = pipeline(
    'text-generation',
    model=model,
    tokenizer=tokenizer,
    device=model.device
)

Device set to use cpu


We can now call the pipeline with a text prompt; it will take care of tokenizing, encoding, generation, and decoding:

In [7]:
output = pipe('Hey,how are you?', max_new_tokens=25)

print(output)

[{'generated_text': 'Hey,how are you? I´m so happy for you,you had so much help in my life.I love you.Greetings from'}]


Just print the text

In [8]:
print(output[0]['generated_text'])

Hey,how are you? I´m so happy for you,you had so much help in my life.I love you.Greetings from


We can also call the pipeline with any arguments that the model `generate` function supports. For details on text generation using `transformers`, see e.g. [this tutorial](https://huggingface.co/blog/how-to-generate).

Example with sampling and a high `temperature` parameter to generate more chaotic output:

In [9]:
output = pipe(
    'Terve, miten menee?',
    do_sample=True,
    temperature=10.0,
    max_new_tokens=25
)

print(output[0]['generated_text'])

Terve, miten menee?" En voinut vastustaa tilannetta - sitä hetkeä vain, mitä tahansa tapahtumaa (johon myös minä liittymätön sivustaseissyt ulkopuolinen satuin astumaan


## Exercise 1

Your task is to assess whether a generative model succeeds in the following tasks in zero-shot, one-shot, and two-shot settings:

- binary sentiment classification (positive / negative)

- person name recognition

- two-digit addition (e.g. 11 + 22 = 33)

For example, for assessing whether a generative model can name capital cities, we could use the following prompts:

- zero-shot:
	>"""\
	>Identify the capital cities of countries.
	>
	>Question: What is the capital of Finland?\
	>Answer:\
	>"""
- one-shot:
	>"""\
	>Identify the capital cities of countries.
	>
	>Question: What is the capital of Sweden?\
	>Answer: Stockholm
	>
	>Question: What is the capital of Finland?\
	>Answer:\
	>"""
- two-shot:
	>"""\
	>Identify the capital cities of countries.
	>
	>Question: What is the capital of Sweden?\
	>Answer: Stockholm
	>
	>Question: What is the capital of Denmark?\
	>Answer: Copenhagen
	>
	>Question: What is the capital of Finland?\
	>Answer:\
	>"""

You can do the tasks either in English or Finnish and use a generative model of your choice from the Hugging Face models repository, for example the following models:

- English: `gpt2-large`
- Finnish: `TurkuNLP/gpt3-finnish-large`

You can either come up with your own instructions for the tasks or use the following:

- English:
	- binary sentiment classification: "Do the following texts express a positive or negative sentiment?"
	- person name recognition: "List the person names occurring in the following texts."
	- two-digit addition: "This is a first grade math exam."
- Finnish:
	- binary sentiment classification: "Ilmaisevatko seuraavat tekstit positiivista vai negatiivista tunnetta?"
	- person name recognition: "Listaa seuraavissa teksteissä mainitut henkilönnimet."
	- two-digit addition: "Tämä on ensimmäisen luokan matematiikan koe."

Come up with at least two test cases for each of the three tasks, and come up with your own one- and two-shot examples.

In [1]:
# Use this cell for your code
from transformers import pipeline 
generator = pipeline("text-generation", model = "gpt2-large")


  from .autonotebook import tqdm as notebook_tqdm
Device set to use cpu


In [5]:
promptposornegzero = """
Do the following texts express a positive or negative sentiment?

Text: "I love this movie!"
Answer:
"""
promptposornegone = """
Do the following texts express a positive or negative sentiment?

Text: "This food is terrible."
Answer: negative

Text: "The service was slow and disappointing."
Answer:

"""
promptposornegtwo = """
Do the following texts express a positive or negative sentiment?

Text: "This food is terrible."
Answer: negative

Text: "What a wonderful day!"
Answer: positive

Text: "The service was slow and disappointing."
Answer:
"""
promptnamezero = """
List the person names occurring in the following texts.

Text: "David called Emma yesterday."
Answer:
"""
promptnameone = """
List the person names occurring in the following texts.

Text: "John met Mary at the station."
Answer: John, Mary

Text: "David called Emma yesterday."
Answer:

"""
promptnametwo = """
List the person names occurring in the following texts.

Text: "John met Mary at the station."
Answer: John, Mary

Text: "Peter and Susan are siblings."
Answer: Peter, Susan

Text: "David called Emma yesterday."
Answer:

"""
promptaddzero = """
This is a first grade math exam.

Question: What is 11 + 22?
Answer:

"""
promptaddone = """
This is a first grade math exam.

Question: What is 14 + 13?
Answer: 27

Question: What is 11 + 22?
Answer:

"""
promptaddtwo = """
This is a first grade math exam.

Question: What is 14 + 13?
Answer: 27

Question: What is 20 + 15?
Answer: 35

Question: What is 11 + 22?
Answer:

"""

output1 = generator(promptposornegzero, max_new_tokens=30)
print(output1[0]["generated_text"])
output2 = generator(promptposornegone, max_new_tokens=30)
print(output2[0]["generated_text"])
output3 = generator(promptposornegtwo, max_new_tokens=30)
print(output3[0]["generated_text"])
output4 = generator(promptnamezero, max_new_tokens=30)
print(output4[0]["generated_text"])
output5 = generator(promptnameone, max_new_tokens=30)
print(output5[0]["generated_text"])
output6 = generator(promptnametwo, max_new_tokens=30)
print(output6[0]["generated_text"])
output7 = generator(promptaddzero, max_new_tokens=30)
print(output7[0]["generated_text"])
output8 = generator(promptaddone, max_new_tokens=30)
print(output8[0]["generated_text"])
output9 = generator(promptaddtwo, max_new_tokens=30)
print(output9[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Do the following texts express a positive or negative sentiment?

Text: "I love this movie!"
Answer:

Text: "I hate this movie!"

Answer:

Text: "I hate this movie!"

Answer:

Text

Do the following texts express a positive or negative sentiment?

Text: "This food is terrible."
Answer: negative

Text: "The service was slow and disappointing."
Answer:


Note: The words "very good" and "good" are not included in the responses. This is because the text "This food is terrible


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Do the following texts express a positive or negative sentiment?

Text: "This food is terrible."
Answer: negative

Text: "What a wonderful day!"
Answer: positive

Text: "The service was slow and disappointing."
Answer:

Text: "Very nice place."

Answer:

Text: "Very nice place."

Answer:

Text: "


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



List the person names occurring in the following texts.

Text: "David called Emma yesterday."
Answer:

"David called Emma yesterday."

Text: "David spoke to Emma about his plans."

Answer:

"David spoke to


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



List the person names occurring in the following texts.

Text: "John met Mary at the station."
Answer: John, Mary

Text: "David called Emma yesterday."
Answer:


Text: "John went to the bank."

Answer:


Text: "John went to the market this morning."

Answer


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



List the person names occurring in the following texts.

Text: "John met Mary at the station."
Answer: John, Mary

Text: "Peter and Susan are siblings."
Answer: Peter, Susan

Text: "David called Emma yesterday."
Answer:


Text: "David came to see me yesterday."

Answer: David

Text: "Mary is married."

Answer: Mary


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



This is a first grade math exam.

Question: What is 11 + 22?
Answer:


This is a first grade math exam.

Question: What is a square root of 4?

Answer:


This is a


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



This is a first grade math exam.

Question: What is 14 + 13?
Answer: 27

Question: What is 11 + 22?
Answer:


I think I can't go any higher.

Answer:


I can't believe you just did that, because you're a mor

This is a first grade math exam.

Question: What is 14 + 13?
Answer: 27

Question: What is 20 + 15?
Answer: 35

Question: What is 11 + 22?
Answer:


Question: What is 12 + 13?

Answer: 28

Question: What is 17 + 12?

Answer: 30



**Submit this exercise by submitting your code and your answers to the above questions as comments on the MOOC platform. You can return this Jupyter notebook (.ipynb) or .py, .R, etc depending on your programming preferences.**