# Contextual question answering

The aim of this exercise is building a neural model able to answer contextual questions in the legal domain.

## Tasks

## Objectives (8 points)

1. Get acquainted with the [Simple legal questions dataset](https://github.com/apohllo/simple-legal-questions-pl) (you need to send your github login to gain access to the dataset).
2. **Bonus +5 points** Select one of the open issues in the dataset, provide the answers for the questions in the
   package and open a pull request with the answers.
3. The legal questions dataset is your **test dataset**.
4. [PoQuAD](https://huggingface.co/datasets/clarin-pl/poquad) is your **train and validation dataset** (use the splits from the repo).
5. **Warning** PoQuAD has a python API compatible with the `datasets` library, but it only provides the **extractive answers**, even
   though the abstractive answers are available in the JSON files. So you have to read the JSON files directly.
6. **Bonus +5 points** If you write a pull request with the changes to the API of the dataset that will expose the abstractive answers
   and the impossible questions, and the PR will be accepted, you will gain additionl 5 points.
7. Train a neural model able to answer the legal questions. Make sure you are using a machine
   with a GPU, since training the model on CPU will be very long. 
   The training should include at least 3 epochs (depending on the size of the training set you are using). 
   As the pre-trained models you can use (or any other model that is able to perform abstractive Question Answering):
   * [plT5-base](https://huggingface.co/allegro/plt5-base)
   * [plT5-large](https://huggingface.co/allegro/plt5-large)
8. If you have problems training the model, you can use [apohllo/plt5-base-poquad](https://huggingface.co/apohllo/plt5-base-poquad) which was trained on PoQuAD. **This will result in  subtraction of 2 points**. 

In [1]:
!pip install transformers datasets pandas jsonlines evaluate rouge_score

Collecting jsonlines
  Downloading jsonlines-4.0.0-py3-none-any.whl.metadata (1.6 kB)
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25ldone
Downloading jsonlines-4.0.0-py3-none-any.whl (8.7 kB)
Downloading evaluate-0.4.3-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.0/84.0 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: rouge_score
  Building wheel for rouge_score (setup.py) ... [?25ldone
[?25h  Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24934 sha256=941cba9cd59a5371c4f05b3529e6a746f26f994cab0f2d83499404e4a7c97254
  Stored in directory: /root/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4
Successfully built rouge_score
Installing collected packages: jsonlines, rouge_score, evaluate
Succ

In [2]:
import json
import jsonlines
import os
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer, DataCollatorForSeq2Seq
from datasets import Dataset
import numpy as np
import evaluate
import torch
import random
import uuid

In [3]:
data_path = "/kaggle/input/nlp-pouqad"

file_list = ["answers.jl", "questions.jl", "passages.jl", "relevant.jl", "poquad-train.json", "poquad-dev.json"]

for file_name in file_list:
    file_path = os.path.join(data_path, file_name)
    print(f"\n=== Analiza pliku: {file_name} ===")

    if file_name.endswith(".jl"):
        try:
            with jsonlines.open(file_path, mode='r') as reader:
                data = [obj for obj in reader]
                print(f"Liczba rekordów: {len(data)}")
                print("Przykładowy rekord:", data[0])
        except Exception as e:
            print(f"Błąd przy odczycie pliku {file_name}: {e}")

    elif file_name.endswith(".json"):
        try:
            with open(file_path, "r") as f:
                data = json.load(f)
                print(f"Typ danych: {type(data)}")
                if isinstance(data, dict):
                    print("Klucze:", list(data.keys()))
                    if "data" in data:
                        print(f"Liczba rekordów w kluczu 'data': {len(data['data'])}")
                        print("Przykładowy rekord:", data['data'][0])
        except Exception as e:
            print(f"Błąd przy odczycie pliku {file_name}: {e}")


=== Analiza pliku: answers.jl ===
Liczba rekordów: 680
Przykładowy rekord: {'score': '1', 'question-id': '1', 'answer': 'Tak, podlega karze aresztu wojskowego albo pozbawienia wolności do lat 3.'}

=== Analiza pliku: questions.jl ===
Liczba rekordów: 1436
Przykładowy rekord: {'_id': '1', 'text': 'Czy żołnierz, który dopuszcza się czynnej napaści na przełożonego podlega karze pozbawienia wolności?'}

=== Analiza pliku: passages.jl ===
Liczba rekordów: 26287
Przykładowy rekord: {'_id': '2004_2387_1', 'title': 'Ustawa z dnia 27 sierpnia 2004 r. o ratyfikacji Konwencji o pozbawianiu uprawnień do kierowania pojazdami, sporządzonej w Luksemburgu dnia 17 czerwca 1998 r.', 'text': 'Art. 1. Wyraża się zgodę na dokonanie przez Prezydenta Rzeczypospolitej Polskiej ratyfikacji Konwencji o pozbawianiu uprawnień do kierowania pojazdami, sporządzonej w Luksemburgu dnia 17 czerwca 1998 r.'}

=== Analiza pliku: relevant.jl ===
Liczba rekordów: 1436
Przykładowy rekord: {'question-id': '1', 'passage-id'

In [4]:
def convert_paragraphs_to_qa(data):
    formatted_data = []

    for record in data["data"]:
        for paragraph in record.get("paragraphs", []):
            context = paragraph["context"]
            for qa in paragraph["qas"]:
                question = qa["question"]
                is_impossible = qa.get("is_impossible", False)

                if not is_impossible:
                    answers = qa["answers"]
                    if answers:
                        answer = answers[0]["text"]
                    else:
                        answer = "N/A"
                else:
                    answer = "N/A"

                formatted_data.append({
                    "context": context,
                    "question": question,
                    "answer": answer
                })

    return formatted_data


with open("/kaggle/input/nlp-pouqad/poquad-dev.json", "r") as f:
    poquad_dev = json.load(f)

poquad_dev_formatted = convert_paragraphs_to_qa(poquad_dev)

dev_output_path = "poquad-dev-formatted.jsonl"
with open(dev_output_path, "w") as f:
    for item in poquad_dev_formatted:
        f.write(json.dumps(item) + "\n")
print(f"Przetworzone dane zapisane do {dev_output_path}")


with open("/kaggle/input/nlp-pouqad/poquad-train.json", "r") as f:
    poquad_train = json.load(f)

poquad_train_formatted = convert_paragraphs_to_qa(poquad_train)

train_output_path = "poquad-train-formatted.jsonl"
with open(train_output_path, "w") as f:
    for item in poquad_train_formatted:
        f.write(json.dumps(item) + "\n")
print(f"Przetworzone dane zapisane do {train_output_path}")

Przetworzone dane zapisane do poquad-dev-formatted.jsonl
Przetworzone dane zapisane do poquad-train-formatted.jsonl


In [5]:
os.environ["WANDB_DISABLED"] = "true"

model_name = "allegro/plt5-base"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

def print_examples(trainer, dataset, num_examples=3):
    print("\n=== Przykładowe odpowiedzi modelu ===")
    device = trainer.model.device
    for i in range(num_examples):
        inputs = tokenizer(f"Odpowiedz na pytanie. Pytanie: {dataset[i]['question']} Kontekst: {dataset[i]['context']}", 
                         return_tensors="pt", truncation=True)
        inputs = {k: v.to(device) for k, v in inputs.items()}
        outputs = trainer.model.generate(
            **inputs,
            max_new_tokens=128,
            num_beams=4,
            length_penalty=1.0,
            early_stopping=True,
            no_repeat_ngram_size=2,
        )
        predicted = tokenizer.decode(outputs[0], skip_special_tokens=True)
        print(f"\nPrzykład {i+1}:")
        print(f"Pytanie: {dataset[i]['question']}")
        print(f"Prawidłowa odpowiedź: {dataset[i]['answer']}")
        print(f"Przewidziana odpowiedź: {predicted}")
        print("-" * 80)

def load_qa_dataset(file_path, sample_ratio=0.1):
    data = []
    with open(file_path, 'r') as f:
        for line in f:
            item = json.loads(line)
            example_id = str(uuid.uuid4())
            data.append({
                'id': example_id,
                'context': item['context'],
                'question': item['question'],
                'answer': item['answer'],
                'answers': {
                    'text': [item['answer']],
                    'answer_start': [0]
                }
            })
    
    sample_size = int(len(data) * sample_ratio)
    sampled_data = random.sample(data, sample_size)
    print(f"Original dataset size: {len(data)}")
    print(f"Sampled dataset size: {len(sampled_data)}")
    
    return Dataset.from_list(sampled_data)

random.seed(42)

train_data = load_qa_dataset('poquad-train-formatted.jsonl', sample_ratio=0.15)
val_data = load_qa_dataset('poquad-dev-formatted.jsonl', sample_ratio=0.15)

def preprocess_function(examples):
    inputs = [f"Odpowiedz krótko na pytanie na podstawie podanego kontekstu. Kontekst: {c} Pytanie: {q} Odpowiedź:" 
             for q, c in zip(examples['question'], examples['context'])]
    targets = examples['answer']
    
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding='max_length')
    labels = tokenizer(targets, max_length=128, truncation=True, padding='max_length')
    
    model_inputs['labels'] = labels['input_ids']
    return model_inputs

train_dataset = train_data.map(preprocess_function, batched=True)
val_dataset = val_data.map(preprocess_function, batched=True)

rouge_metric = evaluate.load("rouge")
squad_metric = evaluate.load("squad")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    
    decoded_preds = [pred.strip().replace('規', '').strip() for pred in decoded_preds]
    
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)
    decoded_labels = [label.strip() for label in decoded_labels]
    
    print("\n=== Przykłady podczas ewaluacji ===")
    for i in range(min(3, len(decoded_preds))):
        print(f"\nPredykcja {i+1}: {decoded_preds[i]}")
        print(f"Oczekiwana: {decoded_labels[i]}")
        print("-" * 50)
    
    rouge_result = rouge_metric.compute(predictions=decoded_preds, references=decoded_labels)
    
    formatted_predictions = [{"id": str(i), "prediction_text": pred} for i, pred in enumerate(decoded_preds)]
    formatted_references = [{"id": str(i), "answers": {"text": [ref], "answer_start": [0]}} 
                          for i, ref in enumerate(decoded_labels)]
    
    squad_result = squad_metric.compute(predictions=formatted_predictions, references=formatted_references)
    
    return {
        'rouge1': rouge_result['rouge1'],
        'rouge2': rouge_result['rouge2'],
        'rougeL': rouge_result['rougeL'],
        'exact_match': squad_result['exact_match'],
        'f1': squad_result['f1']
    }

data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

tokenizer_config.json:   0%|          | 0.00/141 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/658 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/1.12M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


pytorch_model.bin:   0%|          | 0.00/1.10G [00:00<?, ?B/s]

Original dataset size: 56618
Sampled dataset size: 8492
Original dataset size: 7060
Sampled dataset size: 1059


Map:   0%|          | 0/8492 [00:00<?, ? examples/s]

Map:   0%|          | 0/1059 [00:00<?, ? examples/s]

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/4.53k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.32k [00:00<?, ?B/s]

In [6]:
training_args = Seq2SeqTrainingArguments(
    output_dir="./plt5-qa-model",
    evaluation_strategy="steps",
    eval_steps=100,
    logging_steps=50,
    learning_rate=3e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    save_total_limit=3,
    predict_with_generate=True,
    report_to=["none"],
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",  
    greater_is_better=False  
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

print("\nPrzykładowe odpowiedzi przed treningiem:")
print_examples(trainer, val_dataset, num_examples=3)

trainer.train()

trainer.save_model("./plt5-qa-model-best")

print("\nPrzykładowe odpowiedzi po treningu:")
print_examples(trainer, val_dataset, num_examples=3)

final_metrics = trainer.evaluate()
print("\nFinal Evaluation Metrics:")
print(f"Exact Match: {final_metrics['eval_exact_match']:.2f}")
print(f"F1 Score: {final_metrics['eval_f1']:.2f}")
print(f"ROUGE-1: {final_metrics['eval_rouge1']:.2f}")
print(f"ROUGE-2: {final_metrics['eval_rouge2']:.2f}")
print(f"ROUGE-L: {final_metrics['eval_rougeL']:.2f}")

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.



Przykładowe odpowiedzi przed treningiem:

=== Przykładowe odpowiedzi modelu ===

Przykład 1:
Pytanie: Czy Romie udało się zwyciężyć mecz rozegrany w ramach pierwszych derb Rzymu z Lazio?
Prawidłowa odpowiedź: Roma wygrała ten mecz 1:0
Przewidziana odpowiedź: 規. Kontekst: Stadion Testaccio. Odpowiedz na pytanie.
--------------------------------------------------------------------------------

Przykład 2:
Pytanie: Z jakich osób składa się parlament tego kraju?
Prawidłowa odpowiedź: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
Przewidziana odpowiedź: 規. Kontekst: Głową państwa jest prezydent wybrany przez parlament.
--------------------------------------------------------------------------------

Przykład 3:
Pytanie: Czy każdy system klasyfikacji gatunków rozróżnia podrodziny?
Prawidłowa odpowiedź: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
Przewidzia

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Step,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Exact Match,F1
100,94.1095,45.625626,0.06077,0.017837,0.058493,0.0,5.364217
200,11.1668,6.087194,0.059701,0.01321,0.057535,0.188857,5.167633
300,4.4752,2.037203,0.061542,0.01272,0.06064,0.377715,5.335829
400,2.1655,1.165577,0.056813,0.00885,0.055337,0.283286,4.637736
500,1.3912,0.892836,0.047865,0.009362,0.047314,0.188857,3.817652
600,0.966,0.756987,0.05632,0.011884,0.055032,0.283286,4.944643
700,0.8807,0.665434,0.121967,0.0,0.122542,0.0,0.0
800,0.8301,0.60676,0.121328,0.0,0.121935,0.0,0.0
900,1.2613,0.342446,0.121546,0.0,0.122133,0.0,0.0
1000,0.4581,0.139543,0.181398,0.1609,0.181556,15.29745,17.522873


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: w tym miejscu. Pytanie: Na czym polegał?
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: w Erytrei. Pytanie: Ile osób
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: . Odpowiedź: Nie. Odpowiedź: Nie. Odpowiedź:
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: w tym, że Rom
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: w Erytrei.
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: wyjątek.hil
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: w tym,
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: w Erytrei.
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: wyszczerzoną w
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: , że w泼 na poz
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: wybierz się na,
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: wyjątek w,,
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: w tym,,
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: wybraną,
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: wyrostków i,
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: , że w
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: wybraną,
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: wyizolowanych,
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A Cremonese
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/idgwayi
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A rigwayi
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------


There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight'].



Przykładowe odpowiedzi po treningu:

=== Przykładowe odpowiedzi modelu ===

Przykład 1:
Pytanie: Czy Romie udało się zwyciężyć mecz rozegrany w ramach pierwszych derb Rzymu z Lazio?
Prawidłowa odpowiedź: Roma wygrała ten mecz 1:0
Przewidziana odpowiedź: N/A
--------------------------------------------------------------------------------

Przykład 2:
Pytanie: Z jakich osób składa się parlament tego kraju?
Prawidłowa odpowiedź: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
Przewidziana odpowiedź: N/A
--------------------------------------------------------------------------------

Przykład 3:
Pytanie: Czy każdy system klasyfikacji gatunków rozróżnia podrodziny?
Prawidłowa odpowiedź: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
Przewidziana odpowiedź: N
--------------------------------------------------------------------------------


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Roma wygrała ten mecz 1:0
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: 75 członków pochodzi z Centralnej Rady Ludowego Frontu na rzecz Demokracji i Sprawiedliwości, pozostali są deputowanymi wybrani przez komitety regionalne Ludowego Frontu
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Nie każdy jednak system klasyfikacji wyróżnia podrodziny
--------------------------------------------------

Final Evaluation Metrics:
Exact Match: 18.22
F1 Score: 19.70
ROUGE-1: 0.20
ROUGE-2: 0.18
ROUGE-L: 0.20


The model was trained on 15% of the PoQuAD dataset due to hardware and computational limitations. During training, the model demonstrated a steady reduction in training and validation loss, indicating effective learning from the available data. Early stages of training produced incoherent and irrelevant outputs, but as the training progressed, the model started generating partially relevant answers, as reflected in the gradual improvement of metrics like F1 and Rouge scores.

Attempts to train the model on a larger portion of the dataset did not yield better results. Despite the additional data, the model’s performance plateaued, with evaluation metrics remaining nearly identical even as training continued. This suggests that there may be underlying issues in the training configuration or the dataset's compatibility with the model, limiting its ability to improve further.

On the test dataset, the model struggled to generalize, often defaulting to "N/A" for its predictions. Final evaluation metrics (Exact Match: 18.22, F1 Score: 19.70) indicate that while the model captured some patterns during training, it failed to produce accurate and detailed responses in unseen scenarios. 

9. Report the obtained performance of the model (in the form of a table). The report should include *exact match* and *F1 score* 
   for the tokens appearing both in the reference and the predicted answer.
10. Report the best results obtained on the validation dataset and the corresponding results on your test dataset. The results on the 
   test set have to be obtained for the model that yield the best result on the validation dataset.
11. Generate, report and analyze the answers for at least 10 questions provided by the best model on you test dataset.

In [10]:
data_path = "/kaggle/input/nlp-pouqad"

questions_file = os.path.join(data_path, "questions.jl")
answers_file = os.path.join(data_path, "answers.jl")
passages_file = os.path.join(data_path, "passages.jl")
relevant_file = os.path.join(data_path, "relevant.jl")

def load_test_data(questions_file, answers_file, passages_file, relevant_file):
    questions = {}
    answers = {}
    passages = {}
    relevant_pairs = {}
    
    with jsonlines.open(questions_file) as reader:
        for obj in reader:
            questions[obj['_id']] = obj['text']
            
    with jsonlines.open(answers_file) as reader:
        for obj in reader:
            answers[obj['question-id']] = obj['answer']
            
    with jsonlines.open(passages_file) as reader:
        for obj in reader:
            passages[obj['_id']] = obj['text']
            
    with jsonlines.open(relevant_file) as reader:
        for obj in reader:
            relevant_pairs[obj['question-id']] = obj['passage-id']
    
    test_examples = []
    for q_id in questions:
        if q_id in relevant_pairs and q_id in answers:
            test_examples.append({
                'context': passages[relevant_pairs[q_id]],
                'question': questions[q_id],
                'answer': answers[q_id]
            })
    
    return Dataset.from_list(test_examples)

test_dataset = load_test_data(questions_file, answers_file, passages_file, relevant_file)

test_dataset = test_dataset.map(preprocess_function, batched=True)

test_metrics = trainer.evaluate(eval_dataset=test_dataset)

Map:   0%|          | 0/680 [00:00<?, ? examples/s]

Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Tr


=== Przykłady podczas ewaluacji ===

Predykcja 1: N/A
Oczekiwana: Tak, podlega karze aresztu wojskowego albo pozbawienia wolności do lat 3.
--------------------------------------------------

Predykcja 2: N/A
Oczekiwana: Komisja przetargowa składa się z co najmniej trzech osób.
--------------------------------------------------

Predykcja 3: N/A
Oczekiwana: Komandytariusz odpowiada za zobowiązania spółki wobec jej wierzycieli tylko do wysokości sumy komandytowej.
--------------------------------------------------


In [8]:
def print_results_table(validation_metrics, test_metrics):
    print("\n=== Wyniki modelu ===")
    print("| Metryka | Walidacja | Test |")
    print("|---------|------------|------|")
    print(f"| Exact Match | {validation_metrics['eval_exact_match']:.2f} | {test_metrics['eval_exact_match']:.2f} |")
    print(f"| F1 Score | {validation_metrics['eval_f1']:.2f} | {test_metrics['eval_f1']:.2f} |")
    print(f"| ROUGE-1 | {validation_metrics['eval_rouge1']:.2f} | {test_metrics['eval_rouge1']:.2f} |")
    print(f"| ROUGE-2 | {validation_metrics['eval_rouge2']:.2f} | {test_metrics['eval_rouge2']:.2f} |")
    print(f"| ROUGE-L | {validation_metrics['eval_rougeL']:.2f} | {test_metrics['eval_rougeL']:.2f} |")

print_results_table(final_metrics, test_metrics)


=== Wyniki modelu ===
| Metryka | Walidacja | Test |
|---------|------------|------|
| Exact Match | 18.22 | 0.00 |
| F1 Score | 19.70 | 1.33 |
| ROUGE-1 | 0.20 | 0.01 |
| ROUGE-2 | 0.18 | 0.00 |
| ROUGE-L | 0.20 | 0.01 |


The evaluation results indicate that the model performs moderately on the validation dataset but fails to generalize to the test dataset. On the validation set, the Exact Match score is 18.22, and the F1 score is 19.70, suggesting that the model can partially generate relevant and accurate answers for questions it has seen during training. The Rouge metrics (ROUGE-1: 0.20, ROUGE-2: 0.18, ROUGE-L: 0.20) further support this, reflecting some alignment between the generated and expected answers.

However, the performance on the test set, which uses Simple Legal Questions data, is significantly worse. The Exact Match score is 0.00, and the F1 score is only 1.33, with Rouge metrics near zero. The model frequently outputs "N/A" for predictions, indicating that it struggles to provide meaningful responses. This disparity suggests a lack of generalization to the out-of-domain test data, possibly due to differences between the PoQuAD and Simple Legal Questions datasets or insufficient fine-tuning on the test dataset's domain.

In [9]:
print("\n=== Analiza przykładowych odpowiedzi ze zbioru testowego ===")
print_examples(trainer, test_dataset, num_examples=10)


=== Analiza przykładowych odpowiedzi ze zbioru testowego ===

=== Przykładowe odpowiedzi modelu ===

Przykład 1:
Pytanie: Czy żołnierz, który dopuszcza się czynnej napaści na przełożonego podlega karze pozbawienia wolności?
Prawidłowa odpowiedź: Tak, podlega karze aresztu wojskowego albo pozbawienia wolności do lat 3.
Przewidziana odpowiedź: N/A
--------------------------------------------------------------------------------

Przykład 2:
Pytanie: Z ilu osób składa się komisja przetargowa?
Prawidłowa odpowiedź: Komisja przetargowa składa się z co najmniej trzech osób.
Przewidziana odpowiedź: N/A
--------------------------------------------------------------------------------

Przykład 3:
Pytanie: Do jakiej wysokości za zobowiązania spółki odpowiada komandytariusz?
Prawidłowa odpowiedź: Komandytariusz odpowiada za zobowiązania spółki wobec jej wierzycieli tylko do wysokości sumy komandytowej.
Przewidziana odpowiedź: N/A
-------------------------------------------------------------------

The analysis of the model's predictions on the test dataset highlights significant limitations in its ability to generate accurate and meaningful responses. In most cases, the predicted answers were "N/A," indicating the model failed to provide any response, despite the availability of well-defined and contextually appropriate answers in the dataset.

Out of the 10 examples, only one prediction (Example 10) partially resembled the expected answer, though it was incomplete and lacked key details. This suggests the model struggles to generalize its learning from the training data (PoQuAD) to the test data (Simple Legal Questions), likely due to differences in question structure, context phrasing, or domain-specific vocabulary.

Furthermore, in some cases, such as Example 6 and Example 8, the predictions included fragments of unrelated or verbose text, indicating that the model might have difficulty aligning the context with the question to extract relevant information. This inconsistency could result from insufficient training data, limited generalization capabilities, or suboptimal configuration during training.

## Questions (2 points)

**1. Does the performance on the validation dataset reflects the performance on your test set?**

No, the performance on the validation dataset does not reflect the performance on the test set. While the model achieved moderate scores on the validation set (Exact Match: 18.22, F1: 19.70), it performed poorly on the test set (Exact Match: 0.00, F1: 1.33). This suggests that the model struggles to generalize to the test set, likely due to differences between the PoQuAD training data and the Simple Legal Questions test data.

**2. What are the outcomes of the model on your test questions? Are they satisfying? If not, what might be the reason
   for that?**

The model's outcomes on the test questions are unsatisfactory. Most predictions are "N/A," indicating that the model could not generate relevant answers. The reasons for this include:
- Differences in structure, phrasing, and domain-specific vocabulary between the training and test datasets.
- Limited training data (15% of PoQuAD) reducing the model’s ability to generalize.
- Lack of fine-tuning on the test dataset's domain.
  
**3. Why extractive question answering is not well suited for inflectional languages?**

Extractive question answering struggles with inflectional languages because words can change form based on grammar, like case, gender, or number. This makes it harder to match the exact text span in the context to the expected answer, as the model may not recognize different forms of the same word as equivalent.

## Hints
1. Contextual question answering can be resolved by at lest two approaches:
   * extractive QA (EQA) - the model has to select a consecutive sequence of tokens from the context which form the question.
   * abstractive QA (AQA) - the model has to generate a sequence of tokens, based on the question and the provided context.
2. Decoder-only models, like BERT, are not able to answer questions in the AQA paradigm, however they are very well suited for EQA.
3. To resolve AQA you need a generative model, such as (m)T%, BART or GPT. These model (generally) are called sequence-to-sequence
   or text-to-text models, since they take text as the input and produce text as the output.
4. Text-to-text model generate the text autoregresively, i.e. they produce one token at a given step and then feed the generated token 
   (and all tokens generated so far) as the input to the model when generating the next token.
   As a result the generation process is pretty slow.
6. Many NLP tasks base on the neural networks can be solved with
   [ready-made scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch) available in the Transformers library.
8. A model able to answer questions in the AQA paradigm may be trained with the
   [run_seq2seq_qa.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering)
   script available in Transfomers.
   If using such a script make sure you are acquianted with the available training options - some of the are defined in the
   [script itself](https://github.com/huggingface/transformers/blob/main/examples/pytorch/question-answering/run_seq2seq_qa.py#L56), 
   but most of them are inherited from the general [trainer](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments)
   or [seq2seq trainer](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Seq2SeqTrainingArguments).