# Assignment 2

**Credits**: Federico Ruggeri, Eleonora Mancini, Paolo Torroni

**Keywords**: Human Value Detection, Multi-label classification, Transformers, BERT

# Imports and libraries needed

In [2]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
import seaborn as sns
import sklearn
import random
import os
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

### Task 1: Corpus:

We address a multi-label classification problem. We consider only level 3 categories which are the following:
- Openness to change
- Self-enhancement
- Conservation
- Self-transcendence

We're gonna merge annotations of level 2 categories belonging to the same level 3 category. For example, we merge the annotations of the level 2 categories "Stimulation" and "Hedonism" into the level 3 category "Openness to change".

Encoding to pandas dataframe:

In [3]:
arg_train = pd.read_csv('arguments/arguments-training.tsv', sep='\t')
print(arg_train.head(1))
print(f"Shape of the training data: {arg_train.shape}")
arg_val = pd.read_csv('arguments/arguments-validation.tsv', sep='\t')
print(f"Shape of the validation data: {arg_val.shape}")
arg_test = pd.read_csv('arguments/arguments-test.tsv', sep='\t')
print(f"Shape of the test data: {arg_test.shape}")

  Argument ID                   Conclusion       Stance  \
0      A01002  We should ban human cloning  in favor of   

                                             Premise  
0  we should ban human cloning as it will only ca...  
Shape of the training data: (5393, 4)
Shape of the validation data: (1896, 4)
Shape of the test data: (1576, 4)


In [4]:
lab_train = pd.read_csv('arguments/labels-training.tsv', sep='\t')
print(lab_train.head(1))
print(f"Shape of the training data: {lab_train.shape}")
lab_val = pd.read_csv('arguments/labels-validation.tsv', sep='\t')
print(f"Shape of the validation data: {lab_val.shape}")
lab_test = pd.read_csv('arguments/labels-test.tsv', sep='\t')
print(f"Shape of the test data: {lab_test.shape}")

  Argument ID  Self-direction: thought  Self-direction: action  Stimulation  \
0      A01002                        0                       0            0   

   Hedonism  Achievement  Power: dominance  Power: resources  Face  \
0         0            0                 0                 0     0   

   Security: personal  ...  Tradition  Conformity: rules  \
0                   0  ...          0                  0   

   Conformity: interpersonal  Humility  Benevolence: caring  \
0                          0         0                    0   

   Benevolence: dependability  Universalism: concern  Universalism: nature  \
0                           0                      0                     0   

   Universalism: tolerance  Universalism: objectivity  
0                        0                          0  

[1 rows x 21 columns]
Shape of the training data: (5393, 21)
Shape of the validation data: (1896, 21)
Shape of the test data: (1576, 21)


Now for each split we merge arguments and labels into a single dataframe.

In [5]:
df_train = arg_train.merge(lab_train, on='Argument ID')
df_val = arg_val.merge(lab_val, on='Argument ID')
df_test = arg_test.merge(lab_test, on='Argument ID')
print(f"Shape of the training data: {df_train.shape}")
print(f"Shape of the validation data: {df_val.shape}")
print(f"Shape of the test data: {df_test.shape}")
print(df_train.head(1))

Shape of the training data: (5393, 24)
Shape of the validation data: (1896, 24)
Shape of the test data: (1576, 24)
  Argument ID                   Conclusion       Stance  \
0      A01002  We should ban human cloning  in favor of   

                                             Premise  Self-direction: thought  \
0  we should ban human cloning as it will only ca...                        0   

   Self-direction: action  Stimulation  Hedonism  Achievement  \
0                       0            0         0            0   

   Power: dominance  ...  Tradition  Conformity: rules  \
0                 0  ...          0                  0   

   Conformity: interpersonal  Humility  Benevolence: caring  \
0                          0         0                    0   

   Benevolence: dependability  Universalism: concern  Universalism: nature  \
0                           0                      0                     0   

   Universalism: tolerance  Universalism: objectivity  
0              

We merge level 2 categories into level 3 categories.

In [6]:
# Merge the level 2 categories to level 3
# They start from column 4 so we can just add 4 to the level 2 category
# Openness to change: 4 columns
# Conservation: columns 4 columns
# Self-enhancement: 6 columns
# Self-transcendence: 6 columns

def merge_categories(df):
    df['Openness to change'] = df[df.columns[4:8]].sum(axis=1)
    df['Conservation'] = df[df.columns[8:12]].sum(axis=1)
    df['Self-enhancement'] = df[df.columns[12:18]].sum(axis=1)
    df['Self-transcendence'] = df[df.columns[18:24]].sum(axis=1)
    df = df.drop(df.columns[4:24], axis=1)
    return df
# get column names
df_train = merge_categories(df_train)
df_val = merge_categories(df_val)
df_test = merge_categories(df_test)
print(df_train.head(1))
print(f"Shape of the training data: {df_train.shape}")
print(f"Shape of the validation data: {df_val.shape}")
print(f"Shape of the test data: {df_test.shape}")

  Argument ID                   Conclusion       Stance  \
0      A01002  We should ban human cloning  in favor of   

                                             Premise  Openness to change  \
0  we should ban human cloning as it will only ca...                   0   

   Conservation  Self-enhancement  Self-transcendence  
0             0                 1                   0  
Shape of the training data: (5393, 8)
Shape of the validation data: (1896, 8)
Shape of the test data: (1576, 8)


# Task 2: Model definition

You are tasked to define several neural models for multi-label classification.

### Instructions

* **Baseline**: implement a random uniform classifier (an individual classifier per category).
* **Baseline**: implement a majority classifier (an individual classifier per category).

<br/>

* **BERT w/ C**: define a BERT-based classifier that receives an argument **conclusion** as input.
* **BERT w/ CP**: add argument **premise** as an additional input.
* **BERT w/ CPS**: add argument premise-to-conclusion **stance** as an additional input.

### Implement a random uniform classifier

In [None]:
# Random uniform classifier using Keras
model = Sequential()
model.add(Dense(1, input_dim=X_train.shape[1], activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

### Implement a majority classifier

In [None]:
# Find the majority class
majority_class = np.argmax(np.bincount(y_train))

# Majority classifier using Keras
model = Sequential()
model.add(Dense(1, input_dim=X_train.shape[1], activation='sigmoid', kernel_initializer='zeros', bias_initializer='zeros'))

# Set the weights to predict the majority class
model.layers[0].set_weights([np.array([[0.0]]), np.array([float(majority_class)])])

# Task 3: Metrics

Before training the models, you are tasked to define the evaluation metrics for comparison.

### Instructions

* Evaluate your models using per-category binary F1-score.
* Compute the average binary F1-score over all categories (macro F1-score).

# Task 4: Training and Evaluation

You are now tasked to train and evaluate **all** defined models.

### Instructions

* Train **all** models on the train set.
* Evaluate **all** models on the validation set.
* Pick **at least** three seeds for robust estimation.
* Compute metrics on the validation set.
* Report **per-category** and **macro** F1-score for comparison.

# Task 5: Error Analysis

You are tasked to discuss your results.

### Instructions

* **Compare** classification performance of BERT-based models with respect to baselines.
* Discuss **difference in prediction** between the best performing BERT-based model and its variants.

# Task 6: Report

Wrap up your experiment in a short report (up to 2 pages).

### Instructions

* Use the NLP course report template.
* Summarize each task in the report following the provided template.