<a href="https://colab.research.google.com/github/AfshinKhd/bt-feedbackBot/blob/master/Copy_of_Assignment2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 2

**Credits**: Federico Ruggeri, Eleonora Mancini, Paolo Torroni

**Keywords**: Human Value Detection, Multi-label classification, Transformers, BERT


# Contact

For any doubt, question, issue or help, you can always contact us at the following email addresses:

Teaching Assistants:

* Federico Ruggeri -> federico.ruggeri6@unibo.it
* Eleonora Mancini -> e.mancini@unibo.it

Professor:

* Paolo Torroni -> p.torroni@unibo.it

# Introduction

You are tasked to address the [Human Value Detection challenge](https://aclanthology.org/2022.acl-long.306/).

## Problem definition

Arguments are paired with their conveyed human values.

Arguments are in the form of **premise** $\rightarrow$ **conclusion**.

### Example:

**Premise**: *``fast food should be banned because it is really bad for your health and is costly''*

**Conclusion**: *``We should ban fast food''*

**Stance**: *in favour of*

<center>
    <img src="https://github.com/AfshinKhd/nlp-multilabeltextclassification/blob/main/images/human_values.png?raw=1" alt="human values" />
</center>

# [Task 1 - 0.5 points] Corpus

Check the official page of the challenge [here](https://touche.webis.de/semeval23/touche23-web/).

The challenge offers several corpora for evaluation and testing.

You are going to work with the standard training, validation, and test splits.

#### Arguments
* arguments-training.tsv
* arguments-validation.tsv
* arguments-test.tsv

#### Human values
* labels-training.tsv
* labels-validation.tsv
* labels-test.tsv

### Example

#### arguments-*.tsv
```

Argument ID    A01005

Conclusion     We should ban fast food

Stance         in favor of

Premise        fast food should be banned because it is really bad for your health and is costly.
```

#### labels-*.tsv

```
Argument ID                A01005

Self-direction: thought    0
Self-direction: action     0
...
Universalism: objectivity: 0
```

### Splits

The standard splits contain

   * **Train**: 5393 arguments
   * **Validation**: 1896 arguments
   * **Test**: 1576 arguments

### Annotations

In this assignment, you are tasked to address a multi-label classification problem.

You are going to consider **level 3** categories:

* Openness to change
* Self-enhancement
* Conversation
* Self-transcendence

**How to do that?**

You have to merge (**logical OR**) annotations of level 2 categories belonging to the same level 3 category.

**Pay attention to shared level 2 categories** (e.g., Hedonism). $\rightarrow$ [see Table 1 in the original paper.](https://aclanthology.org/2022.acl-long.306/)

#### Example

```
Self-direction: thought:    0
Self-direction: action:     1
Stimulation:                0
Hedonism:                   1

Openess to change           1
```

### Instructions

* **Download** the specificed training, validation, and test files.
* **Encode** split files into a pandas.DataFrame object.
* For each split, **merge** the arguments and labels dataframes into a single dataframe.
* **Merge** level 2 annotations to level 3 categories.

In [None]:
! pip install transformers



In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
drive_path = "/content/drive/MyDrive/NLP_2023/Assignment2/"

In [None]:
from sklearn.dummy import DummyClassifier
from sklearn.metrics import f1_score, classification_report
import joblib
import pandas as pd
from pathlib import Path
import warnings
warnings.simplefilter('ignore')
import numpy as np
from tqdm import tqdm
from sklearn import metrics
import transformers
import torch
from torch.utils.data import Dataset, DataLoader, RandomSampler, SequentialSampler
from transformers import DistilBertTokenizer, DistilBertModel
import logging
logging.basicConfig(level=logging.ERROR)

In [None]:
dataset_names = ['arguments-training','arguments-validation','arguments-test','labels-training','labels-validation','labels-test','annotations-level1']
dataframes = {}
for name in dataset_names:
    dataframes[name] = pd.read_table(drive_path+'data/'+name+'.tsv', delimiter='\t')

In [None]:
dataframes['arguments-training'].head(5)

Unnamed: 0,Argument ID,Conclusion,Stance,Premise
0,A01002,We should ban human cloning,in favor of,we should ban human cloning as it will only ca...
1,A01005,We should ban fast food,in favor of,fast food should be banned because it is reall...
2,A01006,We should end the use of economic sanctions,against,sometimes economic sanctions are the only thin...
3,A01007,We should abolish capital punishment,against,capital punishment is sometimes the only optio...
4,A01008,We should ban factory farming,against,factory farming allows for the production of c...


In [None]:
df = dataframes['arguments-training']
df[df['Argument ID'] == 'A01006']

Unnamed: 0,Argument ID,Conclusion,Stance,Premise
2,A01006,We should end the use of economic sanctions,against,sometimes economic sanctions are the only thin...


In [None]:
dataframes['arguments-training'].shape

(5393, 4)

In [None]:
dataframes['labels-training'].columns

Index(['Argument ID', 'Self-direction: thought', 'Self-direction: action',
       'Stimulation', 'Hedonism', 'Achievement', 'Power: dominance',
       'Power: resources', 'Face', 'Security: personal', 'Security: societal',
       'Tradition', 'Conformity: rules', 'Conformity: interpersonal',
       'Humility', 'Benevolence: caring', 'Benevolence: dependability',
       'Universalism: concern', 'Universalism: nature',
       'Universalism: tolerance', 'Universalism: objectivity'],
      dtype='object')

In [None]:
dataframes['labels-training'].shape

(5393, 21)

In [None]:
dataframes['labels-training'].head(5)

Unnamed: 0,Argument ID,Self-direction: thought,Self-direction: action,Stimulation,Hedonism,Achievement,Power: dominance,Power: resources,Face,Security: personal,...,Tradition,Conformity: rules,Conformity: interpersonal,Humility,Benevolence: caring,Benevolence: dependability,Universalism: concern,Universalism: nature,Universalism: tolerance,Universalism: objectivity
0,A01002,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,A01005,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
2,A01006,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,A01007,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,1,0,0,0
4,A01008,0,0,0,0,0,0,0,0,1,...,0,0,0,0,1,0,1,0,0,0


In [None]:
folder = Path.cwd().joinpath("dataframes")
if not folder.exists():
    folder.mkdir(parents=True)

for name in dataset_names:
    df_path = Path.joinpath(folder, name+'.pkl')
    dataframes[name].to_pickle(df_path)

### Merged DataFrame as Argument and Label

In [None]:
df_training = pd.merge(dataframes['arguments-training'], dataframes['labels-training'], on='Argument ID')
df_validation = pd.merge(dataframes['arguments-validation'], dataframes['labels-validation'], on='Argument ID')
df_test = pd.merge(dataframes['arguments-test'], dataframes['labels-test'], on='Argument ID')

### Add Value Categories to DataFrame

In [None]:
value_categories = {
    'self-transcendence':['Humility','Benevolence: caring', 'Benevolence: dependability',  'Universalism: concern', 'Universalism: nature','Universalism: tolerance', 'Universalism: objectivity'],
    'openness_to_change':['Self-direction: thought','Self-direction: action','Stimulation','Hedonism'],
    'self-enhancement':['Hedonism', 'Achievement', 'Power: dominance', 'Power: resources', 'Face'],
    'conservation':['Face','Security: personal', 'Security: societal','Tradition', 'Conformity: rules', 'Conformity: interpersonal', 'Humility']
}

for ins_df in [df_training, df_validation, df_test]:
    for category, values in value_categories.items():
       ins_df[category] = ins_df[values].any(axis=1)


In [None]:
df_training.head(5)

Unnamed: 0,Argument ID,Conclusion,Stance,Premise,Self-direction: thought,Self-direction: action,Stimulation,Hedonism,Achievement,Power: dominance,...,Benevolence: caring,Benevolence: dependability,Universalism: concern,Universalism: nature,Universalism: tolerance,Universalism: objectivity,self-transcendence,openness_to_change,self-enhancement,conservation
0,A01002,We should ban human cloning,in favor of,we should ban human cloning as it will only ca...,0,0,0,0,0,0,...,0,0,0,0,0,0,False,False,False,True
1,A01005,We should ban fast food,in favor of,fast food should be banned because it is reall...,0,0,0,0,0,0,...,0,0,0,0,0,0,False,False,False,True
2,A01006,We should end the use of economic sanctions,against,sometimes economic sanctions are the only thin...,0,0,0,0,0,1,...,0,0,0,0,0,0,False,False,True,True
3,A01007,We should abolish capital punishment,against,capital punishment is sometimes the only optio...,0,0,0,0,0,0,...,0,0,1,0,0,0,True,False,False,True
4,A01008,We should ban factory farming,against,factory farming allows for the production of c...,0,0,0,0,0,0,...,1,0,1,0,0,0,True,False,False,True


# [Task 2 - 2.0 points] Model definition

You are tasked to define several neural models for multi-label classification.

<center>
    <img src="https://github.com/AfshinKhd/nlp-multilabeltextclassification/blob/main/images/model_schema.png?raw=1" alt="model_schema" />
</center>

### Instructions

* **Baseline**: implement a random uniform classifier (an individual classifier per category).
* **Baseline**: implement a majority classifier (an individual classifier per category).

<br/>

* **BERT w/ C**: define a BERT-based classifier that receives an argument **conclusion** as input.
* **BERT w/ CP**: add argument **premise** as an additional input.
* **BERT w/ CPS**: add argument premise-to-conclusion **stance** as an additional input.

### Notes

**Do not mix models**. Each model has its own instructions.

You are **free** to select the BERT-based model card from huggingface.

#### Examples

```
bert-base-uncased
prajjwal1/bert-tiny
distilbert-base-uncased
roberta-base
```

### BERT w/ C

<center>
    <img src="https://github.com/AfshinKhd/nlp-multilabeltextclassification/blob/main/images/bert_c.png?raw=1" alt="BERT w/ C" />
</center>

### BERT w/ CP

<center>
    <img src="https://github.com/AfshinKhd/nlp-multilabeltextclassification/blob/main/images/bert_cp.png?raw=1" alt="BERT w/ CP" />
</center>

### BERT w/ CPS

<center>
    <img src="https://github.com/AfshinKhd/nlp-multilabeltextclassification/blob/main/images/bert_cps.png?raw=1" alt="BERT w/ CPS" />
</center>

### Input concatenation

<center>
    <img src="https://github.com/AfshinKhd/nlp-multilabeltextclassification/blob/main/images/input_merging.png?raw=1" alt="Input merging" />
</center>

### Notes

The **stance** input has to be encoded into a numerical format.

You **should** use the same model instance to encode **premise** and **conclusion** inputs.

### Baseline Models

In [None]:
random_state = 42
baseline_trained_models = {}
baseline_metrics = {}
df_attributes = ['Argument ID', 'Conclusion', 'Stance',
       'Self-direction: thought', 'Self-direction: action', 'Stimulation',
       'Hedonism', 'Achievement', 'Power: dominance', 'Power: resources',
       'Face', 'Security: personal', 'Security: societal', 'Tradition',
       'Conformity: rules', 'Conformity: interpersonal', 'Humility',
       'Benevolence: caring', 'Benevolence: dependability',
       'Universalism: concern', 'Universalism: nature',
       'Universalism: tolerance', 'Universalism: objectivity']

In [None]:


for category in value_categories.keys():
    print(f"\nCategory: {category}")

    X_train = df_training.drop(columns=df_attributes + list(value_categories.keys()))
    y_train = df_training[category]

    X_test = df_test.drop(columns=df_attributes + list(value_categories.keys()))
    y_test = df_test[category]

    # Random Uniform Classifier
    random_uniform_clf = DummyClassifier(strategy='uniform', random_state=random_state)
    random_uniform_clf.fit(X_train, y_train)
    baseline_trained_models[f'random_uniform_{category}_clf'] = random_uniform_clf


    # Related to Task 3
    random_uniform_pred = random_uniform_clf.predict(X_test)
    random_uniform_f1_scores = f1_score(y_test, random_uniform_pred, average=None)
    random_uniform_average_f1_score = f1_score(y_test, random_uniform_pred, average='macro')
    random_uniform_report = classification_report(y_test, random_uniform_pred)
    baseline_metrics[f'random_uniform_{category}_f1_score'] = random_uniform_f1_scores
    baseline_metrics[f'random_uniform_{category}_macro'] = random_uniform_average_f1_score
    baseline_metrics[f'random_uniform_{category}_report'] = random_uniform_report

    # Majority Classifier
    majority_clf = DummyClassifier(strategy='most_frequent')
    majority_clf.fit(X_train, y_train)
    baseline_trained_models[f'majority_{category}_clf'] = majority_clf


    # Related to Task 3
    majority_pred = majority_clf.predict(X_test)
    majority_f1_scores = f1_score(y_test, majority_pred, average=None)
    majority_average_f1_score = f1_score(y_test, majority_pred, average='macro')
    majority_report = classification_report(y_test, majority_pred)
    baseline_metrics[f'majority_{category}_f1_score'] = majority_f1_scores
    baseline_metrics[f'majority_{category}_macro'] = majority_average_f1_score
    baseline_metrics[f'rmajority_{category}_report'] = majority_report



Category: self-transcendence

Category: openness_to_change

Category: self-enhancement

Category: conservation


In [None]:
folder = Path.cwd().joinpath("trained_models/baseline")
if not folder.exists():
    folder.mkdir(parents=True)

for model_name, model in trained_models.items():
    model_filename = f"trained_models/baseline/{model_name}.joblib"
    joblib.dump(model, model_filename)
    print(f"Saved {model_name} to {model_filename}")

Saved random_uniform_self-transcendence_clf to trained_models/baseline/random_uniform_self-transcendence_clf.joblib
Saved majority_self-transcendence_clf to trained_models/baseline/majority_self-transcendence_clf.joblib
Saved random_uniform_openness_to_change_clf to trained_models/baseline/random_uniform_openness_to_change_clf.joblib
Saved majority_openness_to_change_clf to trained_models/baseline/majority_openness_to_change_clf.joblib
Saved random_uniform_self-enhancement_clf to trained_models/baseline/random_uniform_self-enhancement_clf.joblib
Saved majority_self-enhancement_clf to trained_models/baseline/majority_self-enhancement_clf.joblib
Saved random_uniform_conservation_clf to trained_models/baseline/random_uniform_conservation_clf.joblib
Saved majority_conservation_clf to trained_models/baseline/majority_conservation_clf.joblib


### Bert Models

In [None]:
from torch import cuda
device = 'cuda' if cuda.is_available() else 'cpu'
print(f"the device is: {device}")

the device is: cuda


In [None]:
df_training['labels'] =  df_training.iloc[:, -4:].values.tolist()
df_validation['labels'] =  df_validation.iloc[:, -4:].values.tolist()
df_test['labels'] =  df_test.iloc[:, -4:].values.tolist()

In [None]:
MAX_LEN = 512 # ??
TRAIN_BATCH_SIZE = 4
VALID_BATCH_SIZE = 4
EPOCHS = 3
LEARNING_RATE = 1e-05
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased', truncation=True, do_lower_case=True)
pretrained_model = DistilBertModel.from_pretrained('distilbert-base-uncased')

tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

In [None]:
premise_encodings = tokenizer(df_training['Premise'].tolist(), padding=True, truncation=True, return_tensors='pt')
conclusion_encodings = tokenizer(df_training['Conclusion'].tolist(), padding=True, truncation=True, return_tensors='pt')


In [None]:
t = tokenizer.encode_plus(df_training['Premise'][1], pad_to_max_length=True, truncation=True,max_length=512, return_tensors='pt')
t2 = tokenizer.encode_plus(df_training['Conclusion'][1], pad_to_max_length=True, truncation=True, max_length=512, return_tensors='pt')
t3 = tokenizer.encode_plus(text = df_training['Stance'][1], text_pair = None, add_special_tokens=True, truncation=True, max_length=512, pad_to_max_length=True, return_token_type_ids=True)

In [None]:
t['input_ids'].shape

torch.Size([1, 512])

In [None]:
t3

{'input_ids': [101, 1999, 5684, 1997, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

In [None]:
premise_outputs = pretrained_model(**t)
conclusion_output = pretrained_model(**t2)

In [None]:
premise_outputs

BaseModelOutput(last_hidden_state=tensor([[[ 0.0885, -0.0154, -0.2526,  ..., -0.2206,  0.0867,  0.3204],
         [ 0.1098, -0.5050, -0.2092,  ...,  0.0160,  0.5195, -0.2064],
         [ 0.3857, -0.0391, -0.2536,  ..., -0.1056, -0.2069, -0.3072],
         ...,
         [ 0.6670,  0.0197,  0.0391,  ..., -0.2082, -0.1712, -0.0287],
         [ 0.6506,  0.2512, -0.5266,  ..., -0.0068, -0.3886, -0.3310],
         [ 0.4642,  0.2663, -0.0877,  ..., -0.0735, -0.4252, -0.0887]]],
       grad_fn=<NativeLayerNormBackward0>), hidden_states=None, attentions=None)

In [None]:
combined_embeddings = conclusion_output.last_hidden_state + premise_outputs.last_hidden_state

In [None]:
combined_embeddings

tensor([[[ 0.1902, -0.0190, -0.4627,  ..., -0.3381,  0.4250,  0.6001],
         [ 0.5586, -0.6338, -0.6133,  ..., -0.2325,  1.2600, -0.5817],
         [ 0.5724, -0.3763, -0.7557,  ..., -0.3867,  0.0383, -0.4017],
         ...,
         [ 1.1742,  0.0572, -0.1299,  ...,  0.2879, -0.2840, -0.5801],
         [ 1.0857,  0.0216, -0.1040,  ...,  0.2442, -0.2521, -0.4110],
         [ 1.0568, -0.1387, -0.1139,  ...,  0.1906, -0.3092, -0.6281]]],
       grad_fn=<AddBackward0>)

In [None]:
combined_embeddings2 = torch.cat((conclusion_output.last_hidden_state, premise_outputs.last_hidden_state), dim=1)

In [None]:
combined_embeddings2

tensor([[[ 0.1018, -0.0036, -0.2101,  ..., -0.1174,  0.3383,  0.2798],
         [ 0.4488, -0.1288, -0.4041,  ..., -0.2485,  0.7405, -0.3753],
         [ 0.1867, -0.3372, -0.5021,  ..., -0.2810,  0.2452, -0.0945],
         ...,
         [ 0.5442,  0.1008, -0.0650,  ...,  0.1058, -0.2602, -0.2323],
         [ 0.5157,  0.0301, -0.0233,  ...,  0.0961, -0.2903, -0.0812],
         [ 0.4344, -0.0562, -0.0499,  ..., -0.0190, -0.2231, -0.1775]]],
       grad_fn=<CatBackward0>)

In [None]:
class MultiLabelDataset(Dataset):

    def __init__(self, dataframe, tokenizer, max_len, bert_type = "w_c"):
        self.tokenizer = tokenizer
        self.data = dataframe
        self.conclusion = dataframe.Conclusion
        self.premise = dataframe.Premise
        self.stance = dataframe.Stance
        self.targets = self.data.labels
        self.max_len = max_len
        self.bert_type = bert_type

    def __len__(self):
        return len(self.text)

    def __getitem__(self, index):
        conclusion = str(self.conclusion[index])
        conclusion = " ".join(conclusion.split())

        premise = str(self.premise[index])
        premise = " ".join(premise.split())

        if self.bert_type == "w_c":
            print(f"1bert type is: {self.bert_type}")
            inputs = self.tokenizer.encode_plus(
                text = conclusion,
                text_pair = None,
                add_special_tokens=True,
                #max_length=self.max_len,
                pad_to_max_length=True,
                return_token_type_ids=True
            )
        elif self.bert_type == 'w_cp':
            print(f"2bert type is: {self.bert_type}")
            inputs = self.tokenizer.encode_plus(
                text = conclusion,
                text_pair = premise,
                add_special_tokens=True,
                #max_length=self.max_len,
                pad_to_max_length=True,
                return_token_type_ids=True
            )
        elif self.bert_type == 'w_cps':
            print(f"3bert type is: {self.bert_type}")
            premise_encoding  = self.tokenizer.encode_plus(text = premise, text_pair = None, add_special_tokens=True, max_length=self.max_len, pad_to_max_length=True, return_token_type_ids=True)
            conclusion_encodings = self.tokenizer.encode_plus(text = conclusion, text_pair = None, add_special_tokens=True, max_length=self.max_len, pad_to_max_length=True, return_token_type_ids=True)
            stance_encodings = self.tokenizer.encode_plus(text = self.stance, text_pair = None, add_special_tokens=True, max_length=self.max_len, pad_to_max_length=True, return_token_type_ids=True)
            premise_embedding = pretrained_model(**premise_encodings)
            conclusion_embedding = pretrained_model(**conclusion_encodings)

            # add or cancatenate?
            combined_embeddings = premise_embedding.last_hidden_state + conclusion_embedding.last_hidden_state
            #combined_embeddings = torch.cat((premise_embedding.last_hidden_state, conclusion_embedding.last_hidden_state), dim=1)

            inputs = torch.cat((combined_embeddings, stance_encodings['input_ids']), dim=1)

        else:
            print("Bert Type is not recognized!")

        ids = inputs['input_ids']
        mask = inputs['attention_mask']
        token_type_ids = inputs["token_type_ids"]


        return {
            'ids': torch.tensor(ids, dtype=torch.long),
            'mask': torch.tensor(mask, dtype=torch.long),
            'token_type_ids': torch.tensor(token_type_ids, dtype=torch.long),
            'targets': torch.tensor(self.targets[index], dtype=torch.float)
        }

In [None]:
bert_datasets = {}

for bert_type in ['w_c', 'w_cp', 'w_cps']:
    bert_datasets[f'training_set_{bert_type}'] =  MultiLabelDataset(df_training, tokenizer, MAX_LEN, bert_type)
    bert_datasets[f'validation_set_{bert_type}'] =  MultiLabelDataset(df_training, tokenizer, MAX_LEN, bert_type)
    bert_datasets[f'testing_set_{bert_type}'] =  MultiLabelDataset(df_training, tokenizer, MAX_LEN, bert_type)


In [None]:
train_params = {'batch_size': TRAIN_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

test_params = {'batch_size': VALID_BATCH_SIZE,
                'shuffle': True,
                'num_workers': 0
                }

bert_loader = {}

for bert_type in ['w_c', 'w_cp', 'w_cps']:
    bert_loader[f'training_loader_{bert_type}'] = DataLoader(bert_datasets[f'training_set_{bert_type}'], **train_params)
    bert_loader[f'test_loader_{bert_type}'] = DataLoader(bert_datasets[f'testing_set_{bert_type}'], **train_params)

In [None]:
class DistilBERTClass(torch.nn.Module):
    def __init__(self):
        super(DistilBERTClass, self).__init__()
        self.l1 = DistilBertModel.from_pretrained("distilbert-base-uncased")
        self.pre_classifier = torch.nn.Linear(768, 768)
        self.dropout = torch.nn.Dropout(0.1)
        self.classifier = torch.nn.Linear(768, 4)

    def forward(self, input_ids, attention_mask, token_type_ids):
        output_1 = self.l1(input_ids=input_ids, attention_mask=attention_mask)
        hidden_state = output_1[0]
        pooler = hidden_state[:, 0]
        pooler = self.pre_classifier(pooler)
        pooler = torch.nn.Tanh()(pooler)
        pooler = self.dropout(pooler)
        output = self.classifier(pooler)
        return output

model = DistilBERTClass()
model.to(device)

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

DistilBERTClass(
  (l1): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            (lin1): Linear(in

In [None]:
def loss_fn(outputs, targets):
    return torch.nn.BCEWithLogitsLoss()(outputs, targets)

In [None]:
optimizer = torch.optim.Adam(params =  model.parameters(), lr=LEARNING_RATE)

In [None]:
output_model_file = './models/pytorch_distilbert_news.bin'
output_vocab_file = './models/vocab_distilbert_news.bin'

torch.save(model, output_model_file)
tokenizer.save_vocabulary(output_vocab_file)

print('Saved')

RuntimeError: ignored

# [Task 3 - 0.5 points] Metrics

Before training the models, you are tasked to define the evaluation metrics for comparison.

### Instructions

* Evaluate your models using per-category binary F1-score.
* Compute the average binary F1-score over all categories (macro F1-score).

### Example

You start with individual predictions ($\rightarrow$ samples).

```
Openess to change:    0 0 1 0 1 1 0 ...
Self-enhancement:     1 0 0 0 1 0 1 ...
Conversation:         0 0 0 1 1 0 1 ...
Self-transcendence:   1 1 0 1 0 1 0 ...
```

You compute per-category binary F1-score.

```
Openess to change F1:    0.35
Self-enhancement F1:     0.55
Conversation F1:         0.80
Self-transcendence F1:   0.21
```

You then average per-category scores.
```
Average F1: ~0.48
```

# [Task 4 - 1.0 points] Training and Evaluation

You are now tasked to train and evaluate **all** defined models.

### Instructions

* Train **all** models on the train set.
* Evaluate **all** models on the validation set.
* Pick **at least** three seeds for robust estimation.
* Compute metrics on the validation set.
* Report **per-category** and **macro** F1-score for comparison.

### Train Models

In [None]:
bert_trained_models = {}
bert_metrics = {}

In [None]:
def train(model, training_loader, epochs:int):
    model.train()
    total_loss = 0

    for epoch in range(epochs):
        for _,data in tqdm(enumerate(training_loader, 0),desc=f"Epoch {epoch + 1}/{epochs}"):
            ids = data['ids'].to(device, dtype = torch.long)
            mask = data['mask'].to(device, dtype = torch.long)
            token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
            targets = data['targets'].to(device, dtype = torch.float)

            outputs = model(ids, mask, token_type_ids)

            optimizer.zero_grad()
            loss = loss_fn(outputs, targets)
            total_loss += loss.item()

            loss.backward()
            optimizer.step()

        print(f'nubmer of loader: {training_loader}')
        average_loss = total_loss / len(training_loader)
        print(f"Epoch {epoch + 1}/{epochs}, Average Loss: {average_loss}")

In [None]:
def validation(model, testing_loader):
    model.eval()
    fin_targets=[]
    fin_outputs=[]

    with torch.no_grad():
        for _, data in tqdm(enumerate(testing_loader, 0)):
            ids = data['ids'].to(device, dtype = torch.long)
            mask = data['mask'].to(device, dtype = torch.long)
            token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
            targets = data['targets'].to(device, dtype = torch.float)
            outputs = model(ids, mask, token_type_ids)
            fin_targets.extend(targets.cpu().detach().numpy().tolist())
            fin_outputs.extend(torch.sigmoid(outputs).cpu().detach().numpy().tolist())

    return fin_outputs, fin_targets

In [None]:
def fine_tunning(model, training_loader, val_loader, epochs:int, bert_type):
    print(f'Bert {bert_type} ...')

    train(model, training_loader, epochs)

    outputs, targets = validation(model, val_loader)
    # use argmax?
    final_outputs = np.array(outputs) >=0.5

In [None]:
for bert_type in ['w_c', 'w_cp', 'w_cps']:
    fine_tunning(model, bert_loader[f'training_loader_{bert_type}'], bert_loader[f'validation_loader_{bert_type}'], EPOCHS, bert_type)

# [Task 5 - 1.0 points] Error Analysis

You are tasked to discuss your results.

### Instructions

* **Compare** classification performance of BERT-based models with respect to baselines.
* Discuss **difference in prediction** between the best performing BERT-based model and its variants.

### Notes

You can check the [original paper](https://aclanthology.org/2022.acl-long.306/) for suggestions on how to perform comparisons (e.g., plots, tables, etc...).

# [Task 6 - 1.0 points] Report

Wrap up your experiment in a short report (up to 2 pages).

### Instructions

* Use the NLP course report template.
* Summarize each task in the report following the provided template.

### Recommendations

The report is not a copy-paste of graphs, tables, and command outputs.

* Summarize classification performance in Table format.
* **Do not** report command outputs or screenshots.
* Report learning curves in Figure format.
* The error analysis section should summarize your findings.

# Submission

* **Submit** your report in PDF format.
* **Submit** your python notebook.
* Make sure your notebook is **well organized**, with no temporary code, commented sections, tests, etc...
* You can upload **model weights** in a cloud repository and report the link in the report.

# FAQ

Please check this frequently asked questions before contacting us

### Model card

You are **free** to choose the BERT-base model card you like from huggingface.

### Model architecture

You **should not** change the architecture of a model (i.e., its layers).

However, you are **free** to play with their hyper-parameters.

### Model Training

You are **free** to choose training hyper-parameters for BERT-based models (e.g., number of epochs, etc...).

### Neural Libraries

You are **free** to use any library of your choice to address the assignment (e.g., Keras, Tensorflow, PyTorch, JAX, etc...)

### Error Analysis

Some topics for discussion include:
   * Model performance on most/less frequent classes.
   * Precision/Recall curves.
   * Confusion matrices.
   * Specific misclassified samples.

# The End