# Assignment 2

**Credits**: Federico Ruggeri, Eleonora Mancini, Paolo Torroni

**Keywords**: Human Value Detection, Multi-label classification, Transformers, BERT


# Contact

For any doubt, question, issue or help, you can always contact us at the following email addresses:

Teaching Assistants:

* Federico Ruggeri -> federico.ruggeri6@unibo.it
* Eleonora Mancini -> e.mancini@unibo.it

Professor:

* Paolo Torroni -> p.torroni@unibo.it

# Introduction

You are tasked to address the [Human Value Detection challenge](https://aclanthology.org/2022.acl-long.306/).

## Problem definition

Arguments are paired with their conveyed human values.

Arguments are in the form of **premise** $\rightarrow$ **conclusion**.

### Example:

**Premise**: *``fast food should be banned because it is really bad for your health and is costly''*

**Conclusion**: *``We should ban fast food''*

**Stance**: *in favour of*

<center>
    <img src="images/human_values.png" alt="human values" />
</center>

# [Task 1 - 0.5 points] Corpus

Check the official page of the challenge [here](https://touche.webis.de/semeval23/touche23-web/).

The challenge offers several corpora for evaluation and testing.

You are going to work with the standard training, validation, and test splits.

#### Arguments
* arguments-training.tsv
* arguments-validation.tsv
* arguments-test.tsv

#### Human values
* labels-training.tsv
* labels-validation.tsv
* labels-test.tsv

### Example

#### arguments-*.tsv
```

Argument ID    A01005

Conclusion     We should ban fast food

Stance         in favor of

Premise        fast food should be banned because it is really bad for your health and is costly.
```

#### labels-*.tsv

```
Argument ID                A01005

Self-direction: thought    0
Self-direction: action     0
...
Universalism: objectivity: 0
```

### Splits

The standard splits contain

   * **Train**: 5393 arguments
   * **Validation**: 1896 arguments
   * **Test**: 1576 arguments

### Annotations

In this assignment, you are tasked to address a multi-label classification problem.

You are going to consider **level 3** categories:

* Openness to change
* Self-enhancement
* Conversation
* Self-transcendence

**How to do that?**

You have to merge (**logical OR**) annotations of level 2 categories belonging to the same level 3 category.

**Pay attention to shared level 2 categories** (e.g., Hedonism). $\rightarrow$ [see Table 1 in the original paper.](https://aclanthology.org/2022.acl-long.306/)

#### Example

```
Self-direction: thought:    0
Self-direction: action:     1
Stimulation:                0
Hedonism:                   1

Openess to change           1
```

### Instructions

* **Download** the specificed training, validation, and test files.
* **Encode** split files into a pandas.DataFrame object.
* For each split, **merge** the arguments and labels dataframes into a single dataframe.
* **Merge** level 2 annotations to level 3 categories.

In [2]:
import numpy as np
import pandas as pd
import torch 
import torch.nn as nn
from torch.utils.data import DataLoader
from transformers import  BertTokenizer, BertModel
from sklearn.metrics import accuracy_score
from random import randint
from typing import Tuple
from sys import stdout

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
f = open('data/arguments-training.tsv')
argument_training = f.read()
f.close()
f = open('data/arguments-validation.tsv')
argument_validation = f.read()
f.close()
f = open('data/arguments-test.tsv')
argument_test = f.read()
f.close()
f = open('data/labels-training.tsv')
label_training = f.read()
f.close()
f = open('data/labels-validation.tsv')
label_validation = f.read()
f.close()
f = open('data/labels-test.tsv')
label_test = f.read()
f.close()

In [4]:
def parse_label(label_elements):
    label = {'Openness to change': 0, 'Self-enhancement': 0, 'Conservation': 0, 'Self-transcendence': 0}
    for i in range(len(label_elements)):
        label_elements[i] = int(label_elements[i])
    label['Openness to change'] = min(1,sum(label_elements[:4]))
    label['Self-enhancement'] = min(1,sum(label_elements[3:8]))
    label['Conservation'] = min(1,sum(label_elements[7:14]))
    label['Self-transcendence'] = min(1,sum(label_elements[13:]))
    return label


def parse_set(set_training_str, set_labels_str):
    set_data = {}
    for line in set_training_str.split('\n')[1:]:
        if line != '':
            split_line = line.split('\t')
            set_data[split_line[0]] = {'Conclusion': split_line[1], 'Stance': split_line[2], 'Premise': split_line[3]}

    for line in set_labels_str.split('\n')[1:]:
        if line != '':
            split_line = line.split('\t') 
            elem = set_data[split_line[0]]
            elem.update(parse_label(split_line[1:]))


    data_list = []

    for key in set_data.keys():
        elem = set_data[key].copy()
        elem['Id'] = key
        data_list.append(elem)

    return pd.DataFrame(data_list)

In [5]:
training_dataframe = parse_set(argument_training, label_training)
validation_dataframe = parse_set(argument_validation, label_validation)
test_dataframe = parse_set(argument_test, label_test)

In [6]:
training_dataframe.head()

Unnamed: 0,Conclusion,Stance,Premise,Openness to change,Self-enhancement,Conservation,Self-transcendence,Id
0,We should ban human cloning,in favor of,we should ban human cloning as it will only ca...,0,0,1,0,A01002
1,We should ban fast food,in favor of,fast food should be banned because it is reall...,0,0,1,0,A01005
2,We should end the use of economic sanctions,against,sometimes economic sanctions are the only thin...,0,1,1,0,A01006
3,We should abolish capital punishment,against,capital punishment is sometimes the only optio...,0,0,1,1,A01007
4,We should ban factory farming,against,factory farming allows for the production of c...,0,0,1,1,A01008


In [7]:
class Dataset(torch.utils.data.Dataset):
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __len__(self):
        return len(self.x)

    def __getitem__(self, idx):
        return self.x[idx], self.y[idx]

In [8]:
train_y = torch.Tensor([
    [
    training_dataframe.iloc[i]['Openness to change'], 
    training_dataframe.iloc[i]['Self-enhancement'], 
    training_dataframe.iloc[i]['Conservation'], 
    training_dataframe.iloc[i]['Self-transcendence']
    ] for i in range(len(training_dataframe))])

validation_y = torch.Tensor([
    [
    training_dataframe.iloc[i]['Openness to change'], 
    training_dataframe.iloc[i]['Self-enhancement'], 
    training_dataframe.iloc[i]['Conservation'], 
    training_dataframe.iloc[i]['Self-transcendence']
    ] for i in range(len(validation_dataframe))])
train_y

tensor([[0., 0., 1., 0.],
        [0., 0., 1., 0.],
        [0., 1., 1., 0.],
        ...,
        [0., 0., 0., 1.],
        [0., 0., 1., 1.],
        [1., 1., 1., 1.]])

# [Task 2 - 2.0 points] Model definition

You are tasked to define several neural models for multi-label classification.

<center>
    <img src="images/model_schema.png" alt="model_schema" />
</center>

### Instructions

* **Baseline**: implement a random uniform classifier (an individual classifier per category).
* **Baseline**: implement a majority classifier (an individual classifier per category).

<br/>

* **BERT w/ C**: define a BERT-based classifier that receives an argument **conclusion** as input.
* **BERT w/ CP**: add argument **premise** as an additional input.
* **BERT w/ CPS**: add argument premise-to-conclusion **stance** as an additional input.

## Random classifier

In [9]:
def get_random_classifier(_input_shape:Tuple, _output_shape:Tuple):
    input_len = len(_input_shape)
    def random_classifier(input:'np.array|torch.Tensor'):
        batch = 1
        if len(input.shape) == input_len + 1:
            batch = input.shape[input_len] 
        output_shape = _output_shape + (batch, )
        print(output_shape)
        output = np.zeros(shape=output_shape)
        for i in range(batch):
            idxs = [randint(0, max_idx -1) for max_idx in _output_shape] + [i]
            output[tuple(idxs)] = 1
        return output
    return random_classifier


In [10]:
classifier = get_random_classifier((3,5), (4,))
input = np.ones((3,5,7))
print(classifier(input))

(4, 7)
[[0. 0. 0. 0. 1. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 0. 0. 1.]]


# Majority classifier

In [11]:
def get_majority_classifier(_input_shape:Tuple, _output_shape:Tuple, majority:Tuple):
    input_len = len(_input_shape)
    def majority_classifier(input:'np.array|torch.Tensor'):
        batch = 1
        if len(input.shape) == input_len + 1:
            batch = input.shape[input_len] 
        output_shape = _output_shape + (batch, )
        print(output_shape)
        output = np.zeros(shape=output_shape)
        for i in range(batch):
            idxs = majority + (i, )
            output[tuple(idxs)] = 1
        return output
    return majority_classifier

In [12]:
classifier = get_majority_classifier((3,5), (4,), (2, ))
input = np.ones((3,5,7))
print(classifier(input))

(4, 7)
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [1. 1. 1. 1. 1. 1. 1.]
 [0. 0. 0. 0. 0. 0. 0.]]


# BERT models

In [13]:
class NeuralNetwork(nn.Module):
  """
  This class implements a simple interface to get a working neural network using pytorch.
  """
  def __init__(self) -> None:
     super().__init__()
  def train_network(self,
            train_loader:'torch.utils.data.DataLoader',
            validation_loader:'torch.utils.data.DataLoader',
            optimizer = torch.optim.Adam, 
            loss_function = nn.CrossEntropyLoss(),
            learning_rate:'float'=.1,
            epochs:'int'=10,
            device:'str'='cpu',
            metrics:'dict[str,callable]' = {}) -> Tuple[dict[str,list[float]],dict[str,list[float]]]:
    """
      A simple training loop for the neural network. It returns the epochs loss and accuracy history both on the training and the validation set. The tuple will be formatted as:
      train loss, train accuracy, val loss, val accuracy
      Parameters
      ----------
      train_loader: torch.utils.data.DataLoader
        A dataloader containing the dataset that will be used for training the network
      validation_loader: torch.utils.data.DataLoader
        A dataloader containing the dataset that will be used for validate the network at the end of each epoch
      optimizer:
        The optimizer to use while training, default to Adam.
      loss_function:
        The loss function to use while training, default to crossentropy
      learning_rate: float
        The learning rate that will be used in the optimizer to train the network. Default to .1
      epochs:
        The number of training epochs, default to 10.
      device:
        The device to use for the computation
      metrics:
        The metrics to use to evaluate the network
    """
    net = self.to(device)
    optimizer = optimizer(net.parameters(), learning_rate)

    train_loss_history = []
    val_loss_history = []

    total_batch = int(len(train_loader.dataset) / train_loader.batch_size)
    train_metrics_scores = {}
    val_metrics_scores = {}
    for key in metrics:
        train_metrics_scores[key] = []
        val_metrics_scores[key] = []

    for epoch in range(epochs):
        net.train()
        for batch_idx, data in enumerate(train_loader):
            labels = data[1].to(device)
            if isinstance(data[0], dict):
              inputs = {}
              for key in data[0].keys():
                 inputs[key] = data[0][key].to(device)
            else:
              inputs = data[0].to(device)
            outputs = net(inputs)
            n_batch, n_classes = outputs.shape
            loss_outputs = torch.reshape(outputs, (n_batch, n_classes))
            loss_labels = torch.reshape(labels, (n_batch, n_classes))
            loss = loss_function(loss_outputs, loss_labels)
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            predicted_classes = torch.round(outputs).detach().cpu()
            normal_labels = torch.round(labels).cpu()

            stdout.write(f"\rbatch {batch_idx + 1}/{total_batch} ----- loss: {loss.cpu()} ----- {'-----'.join([f'{key}: {metrics[key](predicted_classes, normal_labels)}' for key in metrics.keys()])}")
            stdout.flush()

        val_metrics, val_loss = self.__validate(validation_loader, metrics, loss_function, device)
        for key in metrics:
          val_metrics_scores[key].append(val_metrics[key])
        val_loss_history.append(val_loss)
        train_metrics, train_loss = self.__validate(train_loader, metrics, loss_function, device)
        for key in metrics:
          train_metrics_scores[key].append(train_metrics[key])
        train_loss_history.append(train_loss)
        out_str = "======================================================================================================================================\n" + \
        f"EPOCH {epoch + 1} training loss: {train_loss_history[-1]} - validation loss: {val_loss_history[-1]}\n" + \
        '\n'.join([f"EPOCH {epoch + 1} training {metric}: {train_metrics_scores[metric][-1]} - validation {metric}: {val_metrics_scores[metric][-1]}" for metric in metrics.keys()]) + \
        """
======================================================================================================================================
        """
        stdout.write("\r" + " " * len(out_str) + "\r")
        stdout.flush()
        stdout.write(out_str)
        stdout.flush()
        print()
    train_metrics_scores['loss'] = train_loss_history
    val_metrics_scores['loss'] = val_loss_history
    return train_metrics_scores, val_metrics_scores

  def __validate(self, loader, metrics, loss_function, device):
    losses = []
    metrics_scores = {}
    for key in metrics.keys():
      metrics_scores[key] = []
    net = self.to(device)
    net.eval()
    with torch.no_grad():
        for batch_idx, data in enumerate(loader):
            labels = data[1].to(device)
            if isinstance(data[0], dict):
              inputs = {}
              for key in data[0].keys():
                 inputs[key] = data[0][key].to(device)
            else:
              inputs = data[0].to(device)

            outputs = net(inputs)
            n_batch, n_classes = outputs.shape
            loss_outputs = torch.reshape(outputs, (n_batch, n_classes))
            loss_labels = torch.reshape(labels, (n_batch, n_classes))
            loss = loss_function(loss_outputs, loss_labels)
            losses.append(loss)
            predicted_classes = torch.round(outputs)
            for key in metrics.keys():
              metrics_scores[key].append(metrics[key](predicted_classes.cpu(), torch.round(labels).cpu()))

    average_loss = sum(losses)/(batch_idx+1)
    mean_metrics_scores = {}
    for key in metrics.keys():
      mean_metrics_scores[key] = sum(metrics_scores[key])/len(loader)
    return mean_metrics_scores, average_loss

In [45]:
class Bert_c(NeuralNetwork):
    def __init__(self, base_model_name:str, num_classes:int) -> None:
        super().__init__()
        self.base_model = BertModel.from_pretrained(base_model_name)
        self.lin_layer = nn.Linear(self.base_model.config.hidden_size, num_classes)
        self.sigmoid = nn.Sigmoid()

    def forward(self, inputs):
        features = self.base_model(**inputs).pooler_output
        outs = self.lin_layer(features)
        return self.sigmoid(outs) 
    
    def freeze_bert(self):
        for param in self.base_model.parameters():
                param.requires_grad = False
    def unfreeze_bert(self):
        for param in self.base_model.parameters():
                param.requires_grad = True
    
class Bert_pc(NeuralNetwork):
    def __init__(self, base_model_name:str, num_classes:int) -> None:
        super().__init__()
        self.base_model = BertModel.from_pretrained(base_model_name)
        self.lin_layer = torch.nn.Linear(self.base_model.config.hidden_size, num_classes)
        self.sigmoid = nn.Sigmoid()

    def forward(self, inputs):
        premises, conclusions = inputs
        encoded_premises = self.base_model(**premises).pooler_output
        encoded_conclusions = self.base_model(**conclusions).pooler_output
        features = torch.cat((encoded_premises, encoded_conclusions),1)
        outs = self.lin_layer(features)
        return self.sigmoid(outs) 

    
    def freeze_bert(self):
        for param in self.base_model.parameters():
                param.requires_grad = False
    def unfreeze_bert(self):
        for param in self.base_model.parameters():
                param.requires_grad = True

class Bert_pcs(NeuralNetwork):
    def __init__(self, base_model_name:str, num_classes:int) -> None:
        super().__init__()
        self.base_model = BertModel.from_pretrained(base_model_name)
        self.lin_layer = torch.nn.Linear(self.base_model.config.hidden_size, num_classes)
        self.sigmoid = nn.Sigmoid()

    def forward(self, inputs):
        premises, conclusions, stance = inputs
        encoded_premises = self.base_model(**premises).pooler_output
        encoded_conclusions = self.base_model(**conclusions).pooler_output
        encoded_stance = self.base_model(**stance).pooler_output
        features = torch.cat((encoded_premises, encoded_conclusions, encoded_stance),1)
        outs = self.lin_layer(features)
        return self.sigmoid(outs) 

    
    def freeze_bert(self):
        for param in self.base_model.parameters():
                param.requires_grad = False
    def unfreeze_bert(self):
        for param in self.base_model.parameters():
                param.requires_grad = True
    
class Bert_pcs_numerical(NeuralNetwork):
    def __init__(self, base_model_name:str, num_classes:int) -> None:
        super().__init__()
        self.base_model = BertModel.from_pretrained(base_model_name)
        self.lin_layer = torch.nn.Linear(self.base_model.config.hidden_size, num_classes)
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, inputs):
        premises, conclusions, stance = inputs
        encoded_premises = self.base_model(**premises).pooler_output
        encoded_conclusions = self.base_model(**conclusions).pooler_output
        features = torch.cat((encoded_premises, encoded_conclusions, stance),1)
        outs = self.lin_layer(features)
        return self.sigmoid(outs) 

    
    def freeze_bert(self):
        for param in self.base_model.parameters():
                param.requires_grad = False
    def unfreeze_bert(self):
        for param in self.base_model.parameters():
                param.requires_grad = True

In [15]:
def tokenize_dataset(dataset:pd.DataFrame, tokenizer:BertTokenizer, columns:list[str]) -> pd.DataFrame:
    return {column:
        tokenizer(dataset[column].tolist(), padding=True, truncation=True, return_tensors='pt')
        for column in columns}

def dict_lists_to_list_of_dicts(input_dict):
    # Get keys and the length of lists in the dictionary
    keys = input_dict.keys()
    list_lengths = [len(input_dict[key]) for key in keys]

    # Ensure that all lists have the same length
    if len(set(list_lengths)) > 1:
        raise ValueError("All lists in the input dictionary must have the same length.")

    # Create a list of dictionaries
    list_of_dicts = [{key: input_dict[key][i] for key in keys} for i in range(list_lengths[0])]

    return list_of_dicts

In [47]:
bert_c_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
train_conclusion = dict_lists_to_list_of_dicts(tokenize_dataset(training_dataframe, bert_c_tokenizer, ['Conclusion'])['Conclusion'])
validation_conclusion = dict_lists_to_list_of_dicts(tokenize_dataset(validation_dataframe, bert_c_tokenizer, ['Conclusion'])['Conclusion'])
bert_c = Bert_c('bert-base-uncased', 4)

In [40]:
bert_pc_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_pc = Bert_pc('bert-base-uncased', 4)

In [41]:
bert_pcs_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_pcs = Bert_pcs('bert-base-uncased', 4)

In [42]:
bert_pcs_numerical_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
bert_pcs_numerical = Bert_pcs_numerical('bert-base-uncased', 4)

In [43]:
train_c_dataset = Dataset(train_conclusion, train_y)
validation_c_dataset = Dataset(validation_conclusion, validation_y)

In [44]:
train_loader  = DataLoader(train_c_dataset, batch_size=50, shuffle=True)
validation_loader = DataLoader(validation_c_dataset, batch_size=50)

In [19]:
def multilabel_accuracy_one_hot(true_labels, predicted_probs):
    """
    Calculate accuracy for a multilabel classification task with one-hot encoded values.

    Parameters:
    - true_labels: List of binary arrays representing the true labels for each instance.
    - predicted_probs: List of binary arrays representing the predicted probabilities for each instance.
    - threshold: Threshold for converting predicted probabilities to binary values.

    Returns:
    - Accuracy score.
    """
    # Apply threshold to convert predicted probabilities to binary values
    predicted_labels = np.array(predicted_probs)
    # Flatten the arrays of labels
    flat_true_labels = np.array(true_labels).flatten()
    flat_predicted_labels = predicted_labels.flatten()

    # Calculate accuracy
    accuracy = accuracy_score(flat_true_labels, flat_predicted_labels)

    return accuracy

In [49]:
bert_c.freeze_bert()
bert_c.train_network(train_loader, 
                    validation_loader,
                    device='cuda', 
                    loss_function=nn.BCELoss(), 
                    metrics={'accuracy':multilabel_accuracy_one_hot},
                    learning_rate=.001,
                    epochs=5)

EPOCH 1 training loss: 0.5807784199714661 - validation loss: 0.6260959506034851
EPOCH 1 training accuracy: 0.7066451335055985 - validation accuracy: 0.659645308924485
        
EPOCH 2 training loss: 0.5829989314079285 - validation loss: 0.6308569312095642
EPOCH 2 training accuracy: 0.7037898363479758 - validation accuracy: 0.6632208237986269
        
EPOCH 3 training loss: 0.5794072151184082 - validation loss: 0.6253248453140259
EPOCH 3 training accuracy: 0.7093916881998281 - validation accuracy: 0.6572997711670481
        
EPOCH 4 training loss: 0.5785292983055115 - validation loss: 0.6218100190162659
EPOCH 4 training accuracy: 0.7124375538329022 - validation accuracy: 0.6618821510297482
        
EPOCH 5 training loss: 0.5812191367149353 - validation loss: 0.6291512846946716
EPOCH 5 training accuracy: 0.7052174849267874 - validation accuracy: 0.6631979405034326
        


({'accuracy': [0.7066451335055985,
   0.7037898363479758,
   0.7093916881998281,
   0.7124375538329022,
   0.7052174849267874],
  'loss': [tensor(0.5808, device='cuda:0'),
   tensor(0.5830, device='cuda:0'),
   tensor(0.5794, device='cuda:0'),
   tensor(0.5785, device='cuda:0'),
   tensor(0.5812, device='cuda:0')]},
 {'accuracy': [0.659645308924485,
   0.6632208237986269,
   0.6572997711670481,
   0.6618821510297482,
   0.6631979405034326],
  'loss': [tensor(0.6261, device='cuda:0'),
   tensor(0.6309, device='cuda:0'),
   tensor(0.6253, device='cuda:0'),
   tensor(0.6218, device='cuda:0'),
   tensor(0.6292, device='cuda:0')]})

### Notes

**Do not mix models**. Each model has its own instructions.

You are **free** to select the BERT-based model card from huggingface.

#### Examples

```
bert-base-uncased
prajjwal1/bert-tiny
distilbert-base-uncased
roberta-base
```

### BERT w/ C

<center>
    <img src="images/bert_c.png" alt="BERT w/ C" />
</center>

### BERT w/ CP

<center>
    <img src="images/bert_cp.png" alt="BERT w/ CP" />
</center>

### BERT w/ CPS

<center>
    <img src="images/bert_cps.png" alt="BERT w/ CPS" />
</center>

### Input concatenation

<center>
    <img src="images/input_merging.png" alt="Input merging" />
</center>

### Notes

The **stance** input has to be encoded into a numerical format.

You **should** use the same model instance to encode **premise** and **conclusion** inputs.

# [Task 3 - 0.5 points] Metrics

Before training the models, you are tasked to define the evaluation metrics for comparison.

### Instructions

* Evaluate your models using per-category binary F1-score.
* Compute the average binary F1-score over all categories (macro F1-score).

### Example

You start with individual predictions ($\rightarrow$ samples).

```
Openess to change:    0 0 1 0 1 1 0 ...
Self-enhancement:     1 0 0 0 1 0 1 ...
Conversation:         0 0 0 1 1 0 1 ...
Self-transcendence:   1 1 0 1 0 1 0 ...
```

You compute per-category binary F1-score.

```
Openess to change F1:    0.35
Self-enhancement F1:     0.55
Conversation F1:         0.80
Self-transcendence F1:   0.21
```

You then average per-category scores.
```
Average F1: ~0.48
```

# [Task 4 - 1.0 points] Training and Evaluation

You are now tasked to train and evaluate **all** defined models.

### Instructions

* Train **all** models on the train set.
* Evaluate **all** models on the validation set.
* Pick **at least** three seeds for robust estimation.
* Compute metrics on the validation set.
* Report **per-category** and **macro** F1-score for comparison.

# [Task 5 - 1.0 points] Error Analysis

You are tasked to discuss your results.

### Instructions

* **Compare** classification performance of BERT-based models with respect to baselines.
* Discuss **difference in prediction** between the best performing BERT-based model and its variants.

### Notes

You can check the [original paper](https://aclanthology.org/2022.acl-long.306/) for suggestions on how to perform comparisons (e.g., plots, tables, etc...).

# [Task 6 - 1.0 points] Report

Wrap up your experiment in a short report (up to 2 pages).

### Instructions

* Use the NLP course report template.
* Summarize each task in the report following the provided template.

### Recommendations

The report is not a copy-paste of graphs, tables, and command outputs.

* Summarize classification performance in Table format.
* **Do not** report command outputs or screenshots.
* Report learning curves in Figure format.
* The error analysis section should summarize your findings.

# Submission

* **Submit** your report in PDF format.
* **Submit** your python notebook.
* Make sure your notebook is **well organized**, with no temporary code, commented sections, tests, etc...
* You can upload **model weights** in a cloud repository and report the link in the report.

# FAQ

Please check this frequently asked questions before contacting us

### Model card

You are **free** to choose the BERT-base model card you like from huggingface.

### Model architecture

You **should not** change the architecture of a model (i.e., its layers).

However, you are **free** to play with their hyper-parameters.

### Model Training

You are **free** to choose training hyper-parameters for BERT-based models (e.g., number of epochs, etc...).

### Neural Libraries

You are **free** to use any library of your choice to address the assignment (e.g., Keras, Tensorflow, PyTorch, JAX, etc...)

### Error Analysis

Some topics for discussion include:
   * Model performance on most/less frequent classes.
   * Precision/Recall curves.
   * Confusion matrices.
   * Specific misclassified samples.

# The End