<a href="https://colab.research.google.com/github/TurkuNLP/Deep_Learning_in_LangTech_course/blob/master/exercise_cross_lingual_transfer_qa.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Question Answering with zero-shot cross-lingual transfer

In [None]:
!pip install -q transformers sentencepiece datasets evaluate

Without sentencepiece installed, the error message is:


```
ValueError: Couldn't instantiate the backend tokenizer from one of:
  (1) a `tokenizers` library serialization file,
  (2) a slow tokenizer instance to convert or
  (3) an equivalent slow tokenizer class to instantiate and convert.

You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

```
[SentencePiece](https://github.com/google/sentencepiece) is an another library for subword tokenization, ie. for splitting words into subwords.


## Create a pipeline for QA

In [None]:
from transformers import pipeline


# a) Get predictions
pipe = pipeline('question-answering', model="deepset/xlm-roberta-large-squad2")
example_input = {
    'question': 'Why is model conversion important?',
    'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'
}
res = pipe(example_input, handle_impossible_answer=True)
print(res)

Some weights of the model checkpoint at deepset/xlm-roberta-large-squad2 were not used when initializing XLMRobertaForQuestionAnswering: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


{'score': 0.3094019293785095, 'start': 58, 'end': 133, 'answer': ' gives freedom to the user and let people easily switch between frameworks.'}


## Evaluate on one Finnish example

(English translation also provided)

In [None]:
finnish_input = {
    "question": "Minkä takia linnut pysyvät ilmassa?",
    "context": """Linnut (Aves) ovat tasalämpöisiä, munivia ja höyhenpeitteisiä selkärankaisia,
    joiden siivet antavat useimmille niistä lentokyvyn. Lintuja tunnetaan yli 10 900 lajia.
    Linnut kehittyivät jurakaudella dinosauruksista, ja ne luokitellaankin teropodien alaryhmäksi.
    Lintulajit muistuttavat paljon toisiaan, ja kaikkien lajien perusrakenne on sama. Linnut ovatkin
    muista eläinryhmistä suhteellisen selvästi erottuva ryhmä. Linnuilla on nokka, mutta ei hampaita.
    Eturaajat ovat kehittyneet siiviksi, ja luut ovat onttoja ja kevyitä."""
}
english_input = {
    "question": "Why do birds stay up in the air?",
    "context": """Birds (Aves) are even-tempered, egg-laying and feathered vertebrates,
    whose wings give most of them the ability to fly. There are more than 10 900 known species of birds.
    Birds evolved from dinosaurs in the Jurassic period and are classified as a sub-group of teropods.
    Bird species are very similar to each other, and the basic structure of all species is the same.
    Birds are therefore a relatively distinct group from other animal groups. Birds have beaks but no teeth.
    The forelimbs have evolved into wings and the bones are hollow and light."""
}
print(pipe(finnish_input, handle_impossible_answer=True))
print(pipe(english_input, handle_impossible_answer=True))
print(pipe(english_input))



{'score': 0.5494571328163147, 'start': 88, 'end': 95, 'answer': ' siivet'}
{'score': 0.3622419536113739, 'start': 0, 'end': 0, 'answer': ''}
{'score': 0.07016430050134659, 'start': 79, 'end': 85, 'answer': ' wings'}


## Evaluate using Finnish SQuAD

* DeepL -based machine translation of the English SQuAD v2.0

In [None]:
import datasets
squad_fi = datasets.load_dataset("TurkuNLP/squad_v2_fi", split="validation")

In [None]:
print(squad_fi)
sample = [{"question": e["question"], "context": e["context"]} for e in squad_fi][:1000]
print(len(sample))
predictions = pipe(sample, handle_impossible_answer=True)

Dataset({
    features: ['id', 'title', 'context', 'question', 'answers'],
    num_rows: 11873
})
1000


In [None]:
print(predictions[0])
for i in range(10):
  print("Q:", sample[i]["question"])
  print("Pred A:", predictions[i]["answer"])
  print("Corr A:", squad_fi[i]["answers"]["text"])
  print()

{'score': 0.5167831182479858, 'start': 149, 'end': 160, 'answer': ' Ranskassa.'}
Q: Missä maassa Normandia sijaitsee?
Pred A:  Ranskassa.
Corr A: ['Ranskassa', 'Ranskassa', 'Ranskassa', 'Ranskassa']

Q: Milloin normannit olivat Normandiassa?
Pred A:  10. ja 11. vuosisadalla
Corr A: ['10. ja 11. vuosisadalla', '10. ja 11. vuosisadalla', '10. ja 11. vuosisadalla', '10. ja 11. vuosisadalla']

Q: Mistä maista norjalaiset olivat peräisin?
Pred A:  Tanskasta, Islannista ja Norjasta
Corr A: ['Tanskasta, Islannista ja Norjasta', 'Tanskasta, Islannista ja Norjasta', 'Tanskasta, Islannista ja Norjasta', 'Tanskasta, Islannista ja Norjasta']

Q: Kuka oli norjalainen johtaja?
Pred A:  Rollon
Corr A: ['Rollon', 'Rollon', 'Rollon', 'Rollon']

Q: Millä vuosisadalla normannit saivat ensimmäisen kerran oman identiteettinsä?
Pred A:  10.
Corr A: ['10. vuosisadan', '10. vuosisadan alkupuoliskolla', '10', '10']

Q: Kuka antoi nimensä Normandialle 1000- ja 1100-luvuilla?
Pred A: 
Corr A: []

Q: Mikä on Rans

In [None]:
# Use the official SQuAD metric https://huggingface.co/spaces/evaluate-metric/squad_v2
from evaluate import load

squad_metric = load("squad_v2")

gold = []
preds = []

# Reformat the data for evaluation metric
for i in range(len(predictions)):
  g = {'answers': {'answer_start': squad_fi[i]["answers"]["answer_start"], 'text': squad_fi[i]["answers"]['text']}, 'id':  squad_fi[i]["id"]}
  p = {'prediction_text': predictions[i]["answer"], 'id': squad_fi[i]["id"], 'no_answer_probability': 0.0}
  gold.append(g)
  preds.append(p)

results = squad_metric.compute(predictions=preds, references=gold)
results

{'exact': 65.8,
 'f1': 71.62373064843649,
 'total': 1000,
 'HasAns_exact': 53.52697095435685,
 'HasAns_f1': 65.60939968555299,
 'HasAns_total': 482,
 'NoAns_exact': 77.22007722007721,
 'NoAns_f1': 77.22007722007721,
 'NoAns_total': 518,
 'best_exact': 65.9,
 'best_exact_thresh': 0.0,
 'best_f1': 71.72373064843634,
 'best_f1_thresh': 0.0}