# Homework 10: Cross-Lingual Transfer & Transfer Learning
#### Introduction to Natural Language Processing

* Hyerin, Seo. (hyseo@students.uni-mainz.de)
* Yeonwoo, Nam. (yeonam@students.uni-mainz.de)
* Yevin, Kim. (kyevin@students.uni-mainz.de)

You can reach 20 points on this homework.

In this homework, we will try to improve our model by using transfer learning and cross-lingual transfer!

If you have questions, you can reach out via mail: minhducbui@uni-mainz.de

# Evaluation

*Task 1:* Explain what cross-lingual transfer is (e.g. how it works, why it works). Explain how you would apply crosslingual transfer for our task (given the above datasets)? XX/3

*Task 2: Train on the english dataset!* -> Evaluation: XX/2

*Task 3: Explain your results! (2P)* -> Evaluation: XX/2

*Task 4:* Explain what transfer learning and multi-task learning is (e.g. how it works, why it works). -> Evaluation: XX/3

*Task 5:* Propose one task. -> Evaluation: XX/2

*Task 6:* Explain whether this task could be beneficial for the main task or not! XX/2

*Task 7:* Create the dataset for the above task! -> Evaluation: XX/3

*TASK 8: Now train the model! (3P)* -> Evaluation: XX/3


**Total: XX/20**

# Prerequisites

We are going to use the inflection data again:

In [1]:
import os

data_dir = "morphological"

# Define the file paths
train_file = os.path.join(data_dir, "english-train-medium.txt")
dev_file = os.path.join(data_dir, "english-dev.txt")
test_file = os.path.join(data_dir, "english-uncovered-test.txt")

def read_conll_file(file_path):
    data = []
    with open(file_path, 'r', encoding='utf-8') as file:
        current_sentence = []
        for line in file:
            line = line.strip()
            if not line:  # Empty line indicates the end of a sentence
                if current_sentence:
                    data.append(current_sentence)
                    current_sentence = []
            else:
                columns = line.split('\t')
                current_sentence.append(columns)
        data += current_sentence
    return data

# Read data
train_data_raw = read_conll_file(train_file)
dev_data = read_conll_file(dev_file)
# We are going to reduce the amount of dev data
dev_data = dev_data[:50]
test_data = read_conll_file(test_file)

Read the following two code cells carefully and understand what they are doing!

You are given the following datasets:

In [2]:
# Small training dataset amount for your main task
train_data = train_data_raw[:100]

# Transfer Learning dataset
transferlearning_train_data = train_data_raw[100:]

# Cross-Lingual Dataset
train_file = os.path.join(data_dir, "german-train-medium.txt")
crosslingual_train_data = read_conll_file(train_file)

Load the following scripts (from Homework 09):

Training Script on the inflection task and calculating the test accuracy.

In [23]:
from transformers import AutoTokenizer, T5ForConditionalGeneration
import torch
from torch.utils.data import Dataset, DataLoader
from datasets import load_metric
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer, DataCollatorForSeq2Seq

def compute_metrics(preds):
    output, labels = preds
    logits = output[0]
    logits = torch.tensor(logits)
    labels = torch.tensor(labels)
    predictions = logits.argmax(-1)

    # Replace all occurrences of -100 with 0 in labels
    predictions = torch.where(labels != -100, predictions, torch.tensor(0))
    labels = torch.where(labels != -100, labels, torch.tensor(0))
    
    correct_sequences = torch.all(predictions == labels, dim=1)

    accuracy = torch.mean(correct_sequences.float())

    return {"accuracy": accuracy.item()}

    
# Custom PyTorch dataset with pre-tokenized data
class CustomDataset(Dataset):
    def __init__(self, tokenized_data):
        self.tokenized_data = tokenized_data

    def __len__(self):
        return len(self.tokenized_data["input_ids"])

    def __getitem__(self, idx):
        return {
            "input_ids": self.tokenized_data["input_ids"][idx],
            "labels": self.tokenized_data["label"][idx]
        }

def transform_to_token_ids(data, tokenizer):
    # Initialize the PyTorch dictionary
    tokenized = {
        "input_ids": [],
        "label": []
    }
    # Loop through raw data to tokenize and create training data
    for example in data:
        # Concatenate the first and third elements in the example list with a space
        input_text = example[0] + " " + example[2]
        
        # Tokenize the input text
        tokens = tokenizer(input_text, return_tensors='pt')
        
        input_ids = tokens['input_ids'].squeeze()#.tolist()
        
        # Get the label (using example[1])
        label = tokenizer(example[1], return_tensors='pt')['input_ids'].squeeze()#.tolist()
        
        # Append to the PyTorch dictionary
        tokenized['input_ids'].append(input_ids)
        tokenized['label'].append(label)
    return tokenized

def training_inflection(model, train_data, dev_data=dev_data, learning_rate=5e-3):
    # Create the PyTorch dataset
    train_tokenized = transform_to_token_ids(train_data, tokenizer)
    train_dataset = CustomDataset(train_tokenized)
    
    dev_tokenized = transform_to_token_ids(dev_data, tokenizer)
    dev_dataset = CustomDataset(dev_tokenized)

    data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

    training_args = Seq2SeqTrainingArguments(
        output_dir="./output",
        per_device_train_batch_size=16,
        per_device_eval_batch_size=1,
        num_train_epochs=3,
        logging_steps=60,
        evaluation_strategy="steps",
        eval_steps=60,  # Number of steps before evaluation
        learning_rate=learning_rate, 
    )
    
    
    trainer = Seq2SeqTrainer(
        model=model,
        #data_collator=data_collator,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=dev_dataset,
        compute_metrics=compute_metrics,
        data_collator=data_collator,
    
    )
    
    trainer.train()

def test_accuracy(model, test_data):
    model.eval()
    total_correct = 0
    total_samples = 0
    batch_size = 1
    log_interval = 10
    test_tokenized = transform_to_token_ids(test_data, tokenizer=tokenizer)
    test_dataset = CustomDataset(test_tokenized)
    test_dataloader = DataLoader(test_dataset, batch_size=1, shuffle=False)
    with torch.no_grad():
        for idx, batch in enumerate(test_dataloader):
            # Tokenize and prepare inputs
            inputs = batch["input_ids"]
            labels = batch["labels"]
            
            # Forward pass
            outputs = model.generate(inputs)
            labels = torch.cat((torch.tensor([[0]]), labels), dim=1)
            # Calculate accuracy
            if outputs.shape == labels.shape:
                correct_predictions = torch.equal(outputs, labels)
            else:            
                correct_predictions = False
            total_correct += correct_predictions
            total_samples += batch_size
            # Log progress every log_interval steps
            if (idx + 1) % log_interval == 0 or (idx + 1) == len(test_dataloader):
                print(f"Processed {idx+1}/{len(test_dataloader)} examples - Correct: {total_correct}")
    
    print("Final Test Accuracy: {}%".format(round(total_correct/total_samples * 100), 2))

Your base model trained on the small training set of 100 english samples, reaches approximately **22% accuracy** (see below). Important: You do not need to execute the code again - This could run for a while.

Let's see if we can improve on the task with cross-lingual transfer and transfer learning!

In [4]:
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

# Train on our main task and test the model.
training_inflection(model, train_data)
test_accuracy(model, test_data)

You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss




Processed 10/1000 examples - Correct: 3
Processed 20/1000 examples - Correct: 7
Processed 30/1000 examples - Correct: 11
Processed 40/1000 examples - Correct: 12
Processed 50/1000 examples - Correct: 16
Processed 60/1000 examples - Correct: 18
Processed 70/1000 examples - Correct: 24
Processed 80/1000 examples - Correct: 30
Processed 90/1000 examples - Correct: 35
Processed 100/1000 examples - Correct: 38
Processed 110/1000 examples - Correct: 43
Processed 120/1000 examples - Correct: 47
Processed 130/1000 examples - Correct: 49
Processed 140/1000 examples - Correct: 51
Processed 150/1000 examples - Correct: 54
Processed 160/1000 examples - Correct: 57
Processed 170/1000 examples - Correct: 60
Processed 180/1000 examples - Correct: 65
Processed 190/1000 examples - Correct: 69
Processed 200/1000 examples - Correct: 73
Processed 210/1000 examples - Correct: 78
Processed 220/1000 examples - Correct: 81
Processed 230/1000 examples - Correct: 85
Processed 240/1000 examples - Correct: 90
Pro

# Cross-Lingual Transfer

**Task 1:** Explain what cross-lingual transfer is (e.g. how it works, why it works). Explain how you would apply crosslingual transfer for our task (given the above datasets)? In which sitation do you think cross-lingual transfer will not work? (3P)

: Cross-lingual transfer involves using knowledge acquired from a source language to upgrade the performance of a model on a different target language. The process entails pre-training the model on a sizable dataset in the source language to grasp general linguistic patterns. Then the pre-trained model undergoes fine-tuning on a smaller dataset in the target language for a specific task, adapting its parameters to the language's characteristics.


train_file = os.path.join(data_dir, "german-train-medium.txt")
crosslingual_train_data = read_conll_file(train_file)
The code introduces German data from "german-train-medium.txt" into the main task's dataset (`crosslingual_train_data`). The pre-training model acquires general language characteristics from English data and adjusts to the German language during the fine-tuning process.The model is then trained on this combined dataset, constituting transfer learning with German language incorporation.


Source-target language disparities, especially in structures, word orders, or grammatical rules, may impede effective pre-trained knowledge transfer. If the source and target languages are from different domains and the pre-training data doesn't adequately cover the target domain, effective transfer may be hindered; for instance, transferring knowledge from legal to medical text across vastly different domains may not be beneficial. Insufficient data quality or quantity in the target language, such as a small or noisy dataset, may hinder effective fine-tuning, impacting the cross-lingual transfer's effectiveness in capturing language-specific nuances.If the tasks between the source and target languages differ fundamentally, the success of cross-lingual transfer may be compromised, as it is most effective when tasks exhibit similrities.


**Task 2:** Cross-Lingual Transfer: First train on the german dataset and then on the english dataset. Test the resulting model on the english test set. (2P)

In [5]:
from transformers import AutoTokenizer, T5ForConditionalGeneration
import torch

tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

"""
Your Code Here.

Apply Cross-Lingual transfer with the given german dataset: 
Train the model on german dataset and then on the english one.

Hint: Look at training_inflection() and test_accuracy()


"""
# Load the German dataset
german_train_file = os.path.join(data_dir, "german-train-medium.txt")
german_train_data = read_conll_file(german_train_file)

# Train the model on the German dataset
training_inflection(model, german_train_data)

# Fine-tune on the English dataset
training_inflection(model, train_data)

# Test the resulting model on the English test set
test_accuracy(model, test_data)


You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss,Accuracy
60,1.5639,2.185899,0.04
120,0.8254,2.205527,0.04
180,0.4188,2.272441,0.1


Step,Training Loss,Validation Loss


Processed 10/1000 examples - Correct: 4
Processed 20/1000 examples - Correct: 10
Processed 30/1000 examples - Correct: 15
Processed 40/1000 examples - Correct: 16
Processed 50/1000 examples - Correct: 22
Processed 60/1000 examples - Correct: 25
Processed 70/1000 examples - Correct: 31
Processed 80/1000 examples - Correct: 37
Processed 90/1000 examples - Correct: 44
Processed 100/1000 examples - Correct: 49
Processed 110/1000 examples - Correct: 52
Processed 120/1000 examples - Correct: 58
Processed 130/1000 examples - Correct: 61
Processed 140/1000 examples - Correct: 65
Processed 150/1000 examples - Correct: 70
Processed 160/1000 examples - Correct: 73
Processed 170/1000 examples - Correct: 76
Processed 180/1000 examples - Correct: 80
Processed 190/1000 examples - Correct: 85
Processed 200/1000 examples - Correct: 91
Processed 210/1000 examples - Correct: 95
Processed 220/1000 examples - Correct: 98
Processed 230/1000 examples - Correct: 103
Processed 240/1000 examples - Correct: 109


**Task 3**: Explain your results! Argue why the performance \[increased/decreased\]. (2P)

: The performance increased, and this can be attributed to the variance in the initial training data and fine-tuning data of the model.In the first model, it was trained directly on English training data and tested on English tasks. However, the second model was initially trained on German data. Subsequently, fine-tuning was performed using English data, illustrating an example of cross-lingual transfer.

The effective application of cross-lingual transfer is evident in the increased accuracy. While the first model relied solely on English training, the second model leveraged the knowledge gained from German data during its initial training, demonstrating the ability to transfer learning across language



# Transfer Learning

In this section, we will try out Transfer Learning!

**Task 4:** Explain what Transfer Learning and multi-task learning is (e.g. how it works, why it works). What is the difference? (3P)

: Tansfer learning entails improving a model's performance on a related task by utilizing knowledge gained from training on a different task. Specifically in natural language processing, this involves pre-training on a language modeling task, followed by fine-tuning for a specific downstream task, leveraging general linguistic features and proving advantageous when labeled data for the target task is scarce.Multi-task learning concurrently trains a model on various tasks, aiming to leverage shared knowledge for improved overall performance. In NLP, this can encompass training on diverse language-related tasks, such as part-of-speech tagging and sentiment analysis. Transfer learning improves a model's performance on a related task by leveraging knowledge from training on a different task, while multi-task learning simultaneously trains a model on various tasks to enhance overall performance, particularly in NLP with diverse language-related tasks.

Transfer Learning: First, we train on a "utility" task (source task) and then fine-tune on our main task (target task), which is the inflection task.

Now, imagine I give you a dataset with only the base word (lemma) and its changed form (inflected form), but without the morphological features, e.g. "Reflektion" and "Reflektionen".

**Task 5**: What kind of source task could you create to help with our target task (excluding my proposed task underneath)? Explain your decision! (2P)

: Source task thatinvolves training the model in "Syntactic Dependency Parsing" or "Semantic Role Labeling." could help with our target task. These tasks focus on understanding grammatical relationships and semantic roles within sentences, providing valuable linguistic insights. Syntactic Dependency Parsing teaches the model about sentence structures, while Semantic Role Labeling enhances its understanding of the semantic roles each word plays. Incorporating these tasks can deepen the model's comprehension of language features not explicitly covered in the existing code, ultimately improving its performance in the target task of inflection generation.


Let's make it interesting by mimicking T5's pre-training (so you also learn how T5 pre-trains).

We'll randomly mask some characters where the input will be the base word and its inflected form:, for example, turning _"Reflektion Reflektionen" -> "Refle\<masked1\>tion Refletion\<masked2\>"_. The task is to predict the missing characters!

I'll show you how T5 expects its input format. You can also check it out here: https://huggingface.co/docs/transformers/model_doc/t5#training

In [6]:
# From Huggingface: 
# In this setup, spans of the input sequence are masked by so-called sentinel tokens (a.k.a unique mask tokens) 
# and the output sequence is formed as a concatenation of the same sentinel tokens and the real masked tokens. 
# Each sentinel token represents a unique mask token for this sentence and should start with 
# <extra_id_0>, <extra_id_1>, … up to <extra_id_99>.

input_ids = tokenizer("Refl<extra_id_0>tio<extra_id_1>", return_tensors="pt").input_ids
print(input_ids)

label_ids = tokenizer("<extra_id_0>ek<extra_id_1>n<extra_id_2>", return_tensors="pt").input_ids
print(label_ids)

tensor([[  419,    89,    40, 32099,     3,    17,    23,    32, 32098,     1]])
tensor([[32099,     3,    15,   157, 32098,     3,    29, 32097,     1]])


T5 uses up to 100 "sentinal tokens", i.e. masking tokens.

In [7]:
num_extra_tokens = 100
special_tokens = [f"<extra_id_{i}>" for i in range(num_extra_tokens)]

print(special_tokens[:5])


['<extra_id_0>', '<extra_id_1>', '<extra_id_2>', '<extra_id_3>', '<extra_id_4>']


**Task 6:** Explain whether this task could be beneficial for the main task or not! (2P)

_Feel free to present arguments against this task. I was exploring tasks that might be relevant and found this one interesting :)_

I think this task might be less beneficial than a cross-lingual transfer. The utility task and the inflection task are significantly different. If the source task doesn't provide relevant information or features for the target task, transfer learning may not be beneficial. The original approach involves fine-tuning the model on German data and then further fine-tuning it on English data. However, in this case, we randomly mask some characters in tokenized input. This random masking can make it difficult for the model to predict specific positions accurately, especially when there is a lack of context. Therefore, it can be said that this approach is not beneficial, particularly due to the challenges posed by insufficient context for accurate predictions.

**Task 7:** Create the dataset for the above task! (3P)

In [18]:
import random

# Set a specific seed value
seed_value = 42
random.seed(seed_value)

def dropout_and_replace(word, dropout_prob=0.3, sentinel_tokens=special_tokens):
    """
    Dropout characters in a word based on the given probability and replace them with sentinel tokens.

    Parameters:
    - word (str): The input word.
    - dropout_prob (float): The probability of dropping out each character.
    - sentinel_tokens (list): List of sentinel tokens to replace dropped characters.

    Returns:
    - str: The modified word with dropout and replacement.
    """
    masked_word = ""
    label = ""

    """
    Your Code Here.
    
    """
    for char in word:
        # Randomly decide whether to drop out the character
        if random.uniform(0, 1) < dropout_prob:
            # Drop out the character and replace with a sentinel token
            masked_word += random.choice(sentinel_tokens)
            label += char
        else:
            masked_word += char
            label += f"<extra_id_{random.randint(0, num_extra_tokens - 1)}>"

    return masked_word, label

def transferlearning_token_ids(data, tokenizer, dropout_prob=0.3, special_tokens=special_tokens):
    # Use this dictionary to collect your data
    tokenized = {
        "input_ids": [],
        "label": []
    }

    # Loop through raw data to tokenize and create training data
    for example in data:
        """
        Your Code Here

        Remember that the input should be lemma + " " + inflected_form.
        """
        # Concatenate the lemma and inflected_form with a space
        input_text = example[0] + " " + example[1]

        # Apply dropout_and_replace to create the masked input and labels
        masked_input, label = dropout_and_replace(input_text, dropout_prob, special_tokens)

        # Tokenize the masked input and labels
        tokens = tokenizer(masked_input, return_tensors='pt')
        label_tokens = tokenizer(label, return_tensors='pt')

        input_ids = tokens['input_ids'].squeeze()
        label_ids = label_tokens['input_ids'].squeeze()

        # Append to the PyTorch dictionary
        tokenized['input_ids'].append(input_ids)
        tokenized['label'].append(label_ids)

    # This code shuffles the data!
    combined_lists = list(zip(tokenized['input_ids'], tokenized['label']))
    # Shuffle the combined pairs
    random.shuffle(combined_lists)
    tokenized['input_ids'], tokenized['label'] = zip(*combined_lists)

    return tokenized


In [17]:
# This assert will only work if you initialize dropout_and_replace() without calling it first!
# Depending on your function implementation this might not work anyways
# But this should give you a good hint of the dropout_and_replace() function

# Testing with one word
# Example usage:
word = "Reflektion"
dropout_prob = 0.3
input_word, label_word = dropout_and_replace(word, dropout_prob, special_tokens)

assert input_word == 'R<extra_id_0>ekt<extra_id_1>o<extra_id_2>'
assert label_word == '<extra_id_0>efl<extra_id_1>i<extra_id_2>n<extra_id_3>'

AssertionError: 

In [19]:

class TransferLearningDataset(Dataset):
    def __init__(self, tokenized_data):
        self.tokenized_data = tokenized_data

    def __len__(self):
        return len(self.tokenized_data["input_ids"])

    def __getitem__(self, idx):
        return {
            "input_ids": self.tokenized_data["input_ids"][idx],
            "labels": self.tokenized_data["label"][idx]
        }

# Create the PyTorch dataset
train_tokenized = transferlearning_token_ids(transferlearning_train_data, tokenizer)
train_dataset = TransferLearningDataset(train_tokenized)

dev_tokenized = transferlearning_token_ids(dev_data, tokenizer)
dev_dataset = TransferLearningDataset(dev_tokenized)

In [20]:
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

**Task 8:** Now train the model on our source task and then on our target task. Test the resulting model! Did the performance increase? Why or why not? (3P)

_Hint: Use Seq2SeqTrainingArguments, Seq2SeqTrainer with a learning rate of 5e-4_

In [21]:
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer, AutoTokenizer, T5ForConditionalGeneration
from torch.utils.data import Dataset, DataLoader
import torch
from transformers import DataCollatorForSeq2Seq

# Settings
learning_rate, per_device_train_batch_size, num_train_epochs = 5e-4, 16, 5

# Function to compute metrics
def compute_metrics(preds):
    output, labels = preds
    predictions = torch.argmax(torch.tensor(output[0]), dim=-1)
    predictions = torch.where(labels != -100, predictions, torch.tensor(0))
    labels = torch.where(labels != -100, labels, torch.tensor(0))
    accuracy = torch.mean(torch.all(predictions == labels, dim=1).float())
    return {"accuracy": accuracy.item()}

# Custom PyTorch dataset
class CustomDataset(Dataset):
    def __init__(self, tokenized_data):
        self.tokenized_data = tokenized_data
    def __len__(self):
        return len(self.tokenized_data["input_ids"])
    def __getitem__(self, idx):
        return {"input_ids": self.tokenized_data["input_ids"][idx], "labels": self.tokenized_data["label"][idx]}

# Tokenizer and Model Initialization
tokenizer = AutoTokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

# Function to transform data to token IDs
def transform_to_token_ids(data):
    return {"input_ids": [tokenizer(x[0] + " " + x[2], return_tensors='pt')['input_ids'].squeeze() for x in data],
            "label": [tokenizer(x[1], return_tensors='pt')['input_ids'].squeeze() for x in data]}

# Training function
def training_inflection(model, train_data, dev_data=None, learning_rate=5e-4):
    train_dataset = CustomDataset(transform_to_token_ids(train_data))
    dev_dataset = CustomDataset(transform_to_token_ids(dev_data)) if dev_data else None
    data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

    training_args = Seq2SeqTrainingArguments(
        output_dir="./output",
        per_device_train_batch_size=16,
        per_device_eval_batch_size=1,
        num_train_epochs=3,
        logging_steps=60,
        evaluation_strategy="steps",
        eval_steps=60,
        learning_rate=learning_rate,
    )
    
    trainer = Seq2SeqTrainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=dev_dataset,
        compute_metrics=compute_metrics,
        data_collator=data_collator,
    )
    
    trainer.train()

# Training on the source task
training_inflection(model, train_data, dev_data=dev_data, learning_rate=5e-4)


You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss,Validation Loss


In [24]:
# Then train on our main task and test the model.

training_inflection(model, train_data)
test_accuracy(model, test_data)

Step,Training Loss,Validation Loss


Processed 10/1000 examples - Correct: 3
Processed 20/1000 examples - Correct: 7
Processed 30/1000 examples - Correct: 11
Processed 40/1000 examples - Correct: 12
Processed 50/1000 examples - Correct: 17
Processed 60/1000 examples - Correct: 19
Processed 70/1000 examples - Correct: 25
Processed 80/1000 examples - Correct: 29
Processed 90/1000 examples - Correct: 32
Processed 100/1000 examples - Correct: 35
Processed 110/1000 examples - Correct: 36
Processed 120/1000 examples - Correct: 40
Processed 130/1000 examples - Correct: 42
Processed 140/1000 examples - Correct: 45
Processed 150/1000 examples - Correct: 48
Processed 160/1000 examples - Correct: 50
Processed 170/1000 examples - Correct: 53
Processed 180/1000 examples - Correct: 55
Processed 190/1000 examples - Correct: 58
Processed 200/1000 examples - Correct: 62
Processed 210/1000 examples - Correct: 67
Processed 220/1000 examples - Correct: 69
Processed 230/1000 examples - Correct: 73
Processed 240/1000 examples - Correct: 77
Pro

: As discussed in task 6, this approach did not seem to work well for the target inflection task. It was less accurate than cross-lingual transfer.