# Assignment 2

**Credits**: Federico Ruggeri, Eleonora Mancini, Paolo Torroni

**Keywords**: Human Value Detection, Multi-label classification, Transformers, BERT

# Imports and libraries needed

In [4]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
import seaborn as sns
import sklearn
import random
import os
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Task 1: Corpus

Check the official page of the challenge [here](https://touche.webis.de/semeval23/touche23-web/).

The challenge offers several corpora for evaluation and testing.

You are going to work with the standard training, validation, and test splits.

#### Arguments
* arguments-training.tsv
* arguments-validation.tsv
* arguments-test.tsv

#### Human values
* labels-training.tsv
* labels-validation.tsv
* labels-test.tsv

### Instructions

* **Download** the specificed training, validation, and test files.
* **Encode** split files into a pandas.DataFrame object.
* For each split, **merge** the arguments and labels dataframes into a single dataframe.
* **Merge** level 2 annotations to level 3 categories.

### Train set

In [3]:
# Create documents dataframe
dataframes = []

with open('arguments/arguments-training.tsv', 'r') as f:
    lines = f.readlines()
    data = [line.split('\t') for line in lines]
    df = pd.DataFrame(data, columns=['Argument ID', 'Conclusion', 'Stance', 'Premise'])

df_train = pd.DataFrame({'Dataframes': dataframes})
print(df_train['Dataframes'][0][0:5])
print(df_train.shape)

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 5827: character maps to <undefined>

In [None]:
dataframes = []

with open('arguments/labels-training.tsv', 'r') as f:
    lines = f.readlines()
    data = [line.split('\t') for line in lines]
    df = pd.DataFrame(data, columns=['Argument ID', 'Conclusion', 'Stance', 'Premise'])

df_train_label = pd.DataFrame({'Dataframes': dataframes})
print(df_train_label['Dataframes'][0][0:5])
print(df_train_label.shape)

### Validation set

In [None]:
# Create documents dataframe
dataframes = []

with open('arguments/arguments-validation.tsv', 'r') as f:
    lines = f.readlines()
    data = [line.split('\t') for line in lines]
    df = pd.DataFrame(data, columns=['Argument ID', 'Conclusion', 'Stance', 'Premise'])

df_val = pd.DataFrame({'Dataframes': dataframes})
print(df_val['Dataframes'][0][0:5])
print(df_val.shape)

In [None]:
dataframes = []

with open('arguments/labels-validation.tsv', 'r') as f:
    lines = f.readlines()
    data = [line.split('\t') for line in lines]
    df = pd.DataFrame(data, columns=['Argument ID', 'Conclusion', 'Stance', 'Premise'])

df_val_label = pd.DataFrame({'Dataframes': dataframes})
print(df_val_label['Dataframes'][0][0:5])
print(df_val_label.shape)

### Test set

In [None]:
# Create documents dataframe
dataframes = []

with open('arguments/arguments-test.tsv', 'r') as f:
    lines = f.readlines()
    data = [line.split('\t') for line in lines]
    df = pd.DataFrame(data, columns=['Argument ID', 'Conclusion', 'Stance', 'Premise'])

df_test = pd.DataFrame({'Dataframes': dataframes})
print(df_test['Dataframes'][0][0:5])
print(df_test.shape)

In [None]:
dataframes = []

with open('arguments/labels-test.tsv', 'r') as f:
    lines = f.readlines()
    data = [line.split('\t') for line in lines]
    df = pd.DataFrame(data, columns=['Argument ID', 'Conclusion', 'Stance', 'Premise'])

df_test_label = pd.DataFrame({'Dataframes': dataframes})
print(df_test_label['Dataframes'][0][0:5])
print(df_test_label.shape)

# Task 2: Model definition

You are tasked to define several neural models for multi-label classification.

### Instructions

* **Baseline**: implement a random uniform classifier (an individual classifier per category).
* **Baseline**: implement a majority classifier (an individual classifier per category).

<br/>

* **BERT w/ C**: define a BERT-based classifier that receives an argument **conclusion** as input.
* **BERT w/ CP**: add argument **premise** as an additional input.
* **BERT w/ CPS**: add argument premise-to-conclusion **stance** as an additional input.

### Implement a random uniform classifier

In [None]:
# Random uniform classifier using Keras
model = Sequential()
model.add(Dense(1, input_dim=X_train.shape[1], activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

### Implement a majority classifier

In [None]:
# Find the majority class
majority_class = np.argmax(np.bincount(y_train))

# Majority classifier using Keras
model = Sequential()
model.add(Dense(1, input_dim=X_train.shape[1], activation='sigmoid', kernel_initializer='zeros', bias_initializer='zeros'))

# Set the weights to predict the majority class
model.layers[0].set_weights([np.array([[0.0]]), np.array([float(majority_class)])])

# Task 3: Metrics

Before training the models, you are tasked to define the evaluation metrics for comparison.

### Instructions

* Evaluate your models using per-category binary F1-score.
* Compute the average binary F1-score over all categories (macro F1-score).

# Task 4: Training and Evaluation

You are now tasked to train and evaluate **all** defined models.

### Instructions

* Train **all** models on the train set.
* Evaluate **all** models on the validation set.
* Pick **at least** three seeds for robust estimation.
* Compute metrics on the validation set.
* Report **per-category** and **macro** F1-score for comparison.

# Task 5: Error Analysis

You are tasked to discuss your results.

### Instructions

* **Compare** classification performance of BERT-based models with respect to baselines.
* Discuss **difference in prediction** between the best performing BERT-based model and its variants.

# Task 6: Report

Wrap up your experiment in a short report (up to 2 pages).

### Instructions

* Use the NLP course report template.
* Summarize each task in the report following the provided template.