<a href="https://colab.research.google.com/github/TurkuNLP/intro-to-nlp/blob/master/intro_2023_exercise_14_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Example solution to exercise task 14

In this exercise, we'll experiment with a few pipelines from the Hugging Face repository. To keep things simple, we'll here just use the default models for each of the pipelines.

---

## Setup

In [39]:
!pip install --quiet transformers

In [40]:
from transformers import pipeline

---

## Text classification

Following the [example](https://huggingface.co/docs/transformers/task_summary#text-classification) pointed to in the exercise, we'll instantiate a sentiment analysis pipeline.

In [41]:
pipe = pipeline(task='sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Some simple test sentences:

In [42]:
sentences = [
    "This movie is absolutely wonderful, I love it!",
    "That is certainly the worst laptop computer ever made.",
    "I'm not sure what to think about it, but I guess on balance it's OK",
    "This is certainly not particularly good.",
    "I'm feeling quite upbeat today!"
]

We can call the pipeline like so:

In [43]:
pipe(sentences[0])

[{'label': 'POSITIVE', 'score': 0.9998733997344971}]

Let's run that for all of the sentences:

In [44]:
for s in sentences:
    print(s, pipe(s))

This movie is absolutely wonderful, I love it! [{'label': 'POSITIVE', 'score': 0.9998733997344971}]
That is certainly the worst laptop computer ever made. [{'label': 'NEGATIVE', 'score': 0.9997958540916443}]
I'm not sure what to think about it, but I guess on balance it's OK [{'label': 'POSITIVE', 'score': 0.9997145533561707}]
This is certainly not particularly good. [{'label': 'NEGATIVE', 'score': 0.9997963309288025}]
I'm feeling quite upbeat today! [{'label': 'POSITIVE', 'score': 0.9998337030410767}]


These are all correct.

---

## Sequence labeling (NER)

We'll follow the same process as above.

In [45]:
pipe = pipeline(task='ner')

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [46]:
sentences = [
    "John went to the store.",
    "Her name is Jane.",
    "The president is Joe Biden.",
    "Is she called Mary?",
    "John and Jane went home.",
]

for s in sentences:
    print(s)
    for e in pipe(s):
      print('   ', e)

John went to the store.
    {'entity': 'I-PER', 'score': 0.9865002, 'index': 1, 'word': 'John', 'start': 0, 'end': 4}
Her name is Jane.
    {'entity': 'I-PER', 'score': 0.9979619, 'index': 4, 'word': 'Jane', 'start': 12, 'end': 16}
The president is Joe Biden.
    {'entity': 'I-PER', 'score': 0.9988011, 'index': 4, 'word': 'Joe', 'start': 17, 'end': 20}
    {'entity': 'I-PER', 'score': 0.9982734, 'index': 5, 'word': 'B', 'start': 21, 'end': 22}
    {'entity': 'I-PER', 'score': 0.99794954, 'index': 6, 'word': '##iden', 'start': 22, 'end': 26}
Is she called Mary?
    {'entity': 'I-PER', 'score': 0.99653107, 'index': 4, 'word': 'Mary', 'start': 14, 'end': 18}
John and Jane went home.
    {'entity': 'I-PER', 'score': 0.9944279, 'index': 1, 'word': 'John', 'start': 0, 'end': 4}
    {'entity': 'I-PER', 'score': 0.98774844, 'index': 3, 'word': 'Jane', 'start': 9, 'end': 13}


These are again all correct, as one would expect from such simple examples. (Note that `Biden` is split into two tokens, `B` and `##iden`, where `##` denotes continuation. This is a tokenization detail; the prediction is correct.)

---

## Question answering

We'll proceed similarly as above, asking questions of a fixed document, namely the first paragraph of the [Wikipedia page for Turku](https://en.wikipedia.org/wiki/Turku).

In [47]:
pipe = pipeline(task="question-answering")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [48]:
document = """Turku is a city and former capital on the southwest coast of Finland at the mouth of the Aura River, in the region of Finland Proper (Varsinais-Suomi) and the former Turku and Pori Province (Turun ja Porin lääni; 1634–1997).
The region was originally called Suomi (Finland), which later became the name for the whole country.
As of 31 March 2021, the population of Turku was 194,244 making it the sixth largest city in Finland after Helsinki, Espoo, Tampere, Vantaa and Oulu.
There were 281,108 inhabitants living in the Turku Central Locality, ranking it as the third largest urban area in Finland after the Capital Region area and Tampere Central Locality.
The city is officially bilingual as 5.2 percent of its population identify Swedish as a mother-tongue."""

questions = [
    "What region of Finland is Turku located in?",
    "What province is Turku located in?",
    "What is the population of Turku?",
    "What was the original name of the Finland Proper region?",
    "What cities in Finland are larger than Turku?",
]

for q in questions:
    print(q, pipe(question=q, context=document))

What region of Finland is Turku located in? {'score': 0.9126155376434326, 'start': 118, 'end': 132, 'answer': 'Finland Proper'}
What province is Turku located in? {'score': 0.6044908165931702, 'start': 176, 'end': 189, 'answer': 'Pori Province'}
What is the population of Turku? {'score': 0.9827180504798889, 'start': 375, 'end': 382, 'answer': '194,244'}
What was the original name of the Finland Proper region? {'score': 0.7893495559692383, 'start': 258, 'end': 263, 'answer': 'Suomi'}
What cities in Finland are larger than Turku? {'score': 0.0007836163858883083, 'start': 433, 'end': 441, 'answer': 'Helsinki'}


3/5 correct; not a fantastic showing for such comparatively simple questions.

---

## Summarization

We'll follow the pattern above, again with some paragraphs from wikipedia pages as inputs. 

In [49]:
pipe = pipeline(task="summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [50]:
documents = [
    "Turku is a city and former capital on the southwest coast of Finland at the mouth of the Aura River, in the region of Finland Proper (Varsinais-Suomi) and the former Turku and Pori Province (Turun ja Porin lääni; 1634–1997). The region was originally called Suomi (Finland), which later became the name for the whole country. As of 31 March 2021, the population of Turku was 194,244 making it the sixth largest city in Finland after Helsinki, Espoo, Tampere, Vantaa and Oulu. There were 281,108 inhabitants living in the Turku Central Locality, ranking it as the third largest urban area in Finland after the Capital Region area and Tampere Central Locality. The city is officially bilingual as 5.2 percent of its population identify Swedish as a mother-tongue.",
    "Finland was first inhabited around 9000 BC after the Last Glacial Period. The Stone Age introduced several different ceramic styles and cultures. The Bronze Age and Iron Age were characterized by contacts with other cultures in Fennoscandia and the Baltic region. From the late 13th century, Finland became a part of Sweden as a consequence of the Northern Crusades. In 1809, as a result of the Finnish War, Finland became part of the Russian Empire as the autonomous Grand Duchy of Finland, during which Finnish art flourished and the idea of independence began to take hold. In 1906, Finland became the first European state to grant universal suffrage, and the first in the world to give all adult citizens the right to run for public office. After the 1917 Russian Revolution, Finland declared independence from Russia. In 1918, the fledgling state was divided by the Finnish Civil War. During World War II, Finland fought the Soviet Union in the Winter War and the Continuation War, and Nazi Germany in the Lapland War. It subsequently lost parts of its territory, but maintained its independence.",
    "The Age of Enlightenment, the French Revolution and the Napoleonic Wars shaped the continent culturally, politically and economically from the end of the 17th century until the first half of the 19th century. The Industrial Revolution, which began in Great Britain at the end of the 18th century, gave rise to radical economic, cultural and social change in Western Europe and eventually the wider world. Both world wars began and were fought to a great extent in Europe, contributing to a decline in Western European dominance in world affairs by the mid-20th century as the Soviet Union and the United States took prominence. During the Cold War, Europe was divided along the Iron Curtain between NATO in the West and the Warsaw Pact in the East, until the Revolutions of 1989, Fall of the Berlin Wall and the Dissolution of the Soviet Union. The European Union (EU) and the Council of Europe are two important international organizations aiming to represent the European continent on a political level. The Council of Europe was founded in 1948 with the idea of unifying Europe to achieve common goals and prevent future wars. Further European integration by some states led to the formation of the European Union, a separate political entity that lies between a confederation and a federation. The EU originated in Western Europe but has been expanding eastward since the fall of the Soviet Union in 1991. A majority of its members have adopted a common currency, the euro, and a large bloc of countries, the Schengen Area, have abolished internal border and immigration controls.",
]

for d in documents:
    s = pipe(d)[0]['summary_text']
    print(f'Document ({len(d)} chars): {d}')
    print(f'Summary  ({len(s)} chars): {s}')
    print('---')

Document (761 chars): Turku is a city and former capital on the southwest coast of Finland at the mouth of the Aura River, in the region of Finland Proper (Varsinais-Suomi) and the former Turku and Pori Province (Turun ja Porin lääni; 1634–1997). The region was originally called Suomi (Finland), which later became the name for the whole country. As of 31 March 2021, the population of Turku was 194,244 making it the sixth largest city in Finland after Helsinki, Espoo, Tampere, Vantaa and Oulu. There were 281,108 inhabitants living in the Turku Central Locality, ranking it as the third largest urban area in Finland after the Capital Region area and Tampere Central Locality. The city is officially bilingual as 5.2 percent of its population identify Swedish as a mother-tongue.
Summary  (309 chars):  As of 31 March 2021, the population of Turku was 194,244 making it the sixth largest city in Finland . The city is officially bilingual as 5.2 percent of its population identify Swedish as a mo

---

## Translation

We'll follow the [example](https://huggingface.co/docs/transformers/task_summary#translation) in attempting translation using a multilingual model.

In [51]:
pipe = pipeline(task="translation", model="t5-small")



In [52]:
sentences = [
    "Turku is a city and former capital on the southwest coast of Finland at the mouth of the Aura River.",
    "The city is officially bilingual as 5.2 percent of its population identify Swedish as a mother-tongue.",
    "Finland was first inhabited around 9000 BC after the Last Glacial Period.",
]

for s in sentences:
    t = pipe(prefix + s)[0]['translation_text']
    print(s)
    print(t)
    print('---')

Turku is a city and former capital on the southwest coast of Finland at the mouth of the Aura River.
Turku ist eine Stadt und ehemalige Hauptstadt an der Südwestküste Finnlands an der Mündung des Flusses Aura.
---
The city is officially bilingual as 5.2 percent of its population identify Swedish as a mother-tongue.
Die Stadt ist offiziell bilingue, da 5,2 % ihrer Bevölkerung Schwedisch als Muttersprache identifizieren.
---
Finland was first inhabited around 9000 BC after the Last Glacial Period.
Finnland wurde zuerst um 9000 v. Chr. nach der letzten Glacial Periode bewohnt.
---
