### Baselines:
- BERT-based classifier trained on the data
- Some form of siamese-nn

### Ideas:
- ...

In [4]:
import json
import pandas as pd
import numpy as np
import sys

pd.set_option('display.max_colwidth', None)
sys.path.append('./src-py')

In [10]:
import sbert_training
from utils import *

In [7]:
from datasets import load_dataset, load_metric, Dataset, Split
from transformers import AutoModelForSeq2SeqLM, DataCollatorForSeq2Seq, Seq2SeqTrainingArguments, Seq2SeqTrainer, AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline, DebertaForSequenceClassification
from transformers import TrainingArguments, Trainer
import wandb
import torch
from tqdm import tqdm

In [8]:
output_path = "../../data-ceph/arguana/argmining22-sharedtask/models/"

In [5]:
taska_training_df = pd.read_csv('../data/TaskA_train.csv')
taska_valid_df = pd.read_csv('../data/TaskA_dev.csv')

In [6]:
taska_training_df = taska_training_df[taska_training_df.Novelty != 0]
taska_valid_df = taska_valid_df[taska_valid_df.Novelty != 0]

In [7]:
taska_training_df['input_txt'] = taska_training_df.apply(lambda x: '<s> {}:{} </s></s> {} </s>'.format(x['topic'], x['Premise'], x['Conclusion']), axis=1)
taska_valid_df['input_txt'] = taska_valid_df.apply(lambda x: '<s> {}:{} </s></s> {} </s>'.format(x['topic'], x['Premise'], x['Conclusion']), axis=1)

In [13]:
val_nov_metric(np.array([0.5, 0.5]), np.array([1,0]), np.array([0,0]), np.array([0,0]))

{'f1_validity': 0.6666666666666666,
 'f1_novelty': 0.0,
 'f1_valid_novel': 0.0,
 'f1_valid_nonnovel': 0.6666666666666666,
 'f1_nonvalid_novel': 0.0,
 'f1_nonvalid_nonnovel': 0.0,
 'f1_macro': 0.16666666666666666}

## Fine-tune the NLI model on the training data:

In [8]:
nli_tokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-base-mnli')
nli_model     = AutoModelForSequenceClassification.from_pretrained('microsoft/deberta-base-mnli')
arg_stance_pipeline = TextClassificationPipeline(model=nli_model, tokenizer=nli_tokenizer, framework='pt', task='validity_classifier', device=0)

Some weights of the model checkpoint at microsoft/deberta-base-mnli were not used when initializing DebertaForSequenceClassification: ['config']
- This IS expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [9]:
train_dataset = Dataset.from_pandas(taska_training_df)
eval_dataset = Dataset.from_pandas(taska_valid_df)

In [10]:
nli_model.config.id2label

{0: 'CONTRADICTION', 1: 'NEUTRAL', 2: 'ENTAILMENT'}

In [11]:
novelty_map = dict([
    (1, 1), # if novel -> neutral label
    (-1, 2) # not novel -> entailment label
])

In [12]:
inverse_novelty_map = dict([
    (2,-1),
    (1,1),
    (0,1) 
])

In [13]:
def preprocess(example):
    inputs = nli_tokenizer(example["input_txt"], add_special_tokens=False, padding=True, max_length=512)
    inputs['labels'] = list(map(novelty_map.get, example['Novelty']))
    return inputs

In [14]:
train_dataset = train_dataset.map(preprocess, batched=True)
eval_dataset = eval_dataset.map(preprocess, batched=True)



  0%|          | 0/1 [00:00<?, ?ba/s]



  0%|          | 0/1 [00:00<?, ?ba/s]

In [15]:
training_args = TrainingArguments(
    output_dir= output_path + "/task-A/novelty/classification/nli-model", 
    report_to="wandb",
    overwrite_output_dir=True,
    metric_for_best_model = 'f1',
    evaluation_strategy = 'steps',          # check evaluation metrics at each epoch
    learning_rate = 5e-5,                   # we can customize learning rate
    max_steps = 200, # five epochs
    logging_steps = 50,                    # we will log every 50 steps which is an epoch given the 700 examples and 16 batch size
    eval_steps = 50,                      # we will perform evaluation every 500 steps
    save_steps = 50,
    load_best_model_at_end = True,
)

In [16]:
trainer = Trainer(
    model=nli_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_nli_metrics
)

max_steps is given, it will override any value given in num_train_epochs


In [17]:
trainer.train()

The following columns in the training set  don't have a corresponding argument in `DebertaForSequenceClassification.forward` and have been ignored: Premise, topic, Validity-Confidence, Novelty-Confidence, input_txt, Novelty, __index_level_0__, Validity, Conclusion. If Premise, topic, Validity-Confidence, Novelty-Confidence, input_txt, Novelty, __index_level_0__, Validity, Conclusion are not expected by `DebertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 718
  Num Epochs = 5
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 200
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
[34m[1mwandb[0m: Currently logged in as: [33mmiladalsh[0m. Use [1m`wandb login --relogin`[0m to force relogin




Step,Training Loss,Validation Loss,Recall,Precision,F1
50,0.5045,0.597112,0.983051,0.651685,0.783784
100,0.3176,0.727261,0.90678,0.681529,0.778182
150,0.1715,1.678869,1.0,0.62766,0.771242
200,0.0727,1.351955,0.957627,0.684848,0.798587


The following columns in the evaluation set  don't have a corresponding argument in `DebertaForSequenceClassification.forward` and have been ignored: Premise, topic, Validity-Confidence, Novelty-Confidence, input_txt, Novelty, __index_level_0__, Validity, Conclusion. If Premise, topic, Validity-Confidence, Novelty-Confidence, input_txt, Novelty, __index_level_0__, Validity, Conclusion are not expected by `DebertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 200
  Batch size = 8
Saving model checkpoint to ../../data-ceph/arguana/argmining22-sharedtask/models//task-A/novelty/classification/nli-model/checkpoint-50
Configuration saved in ../../data-ceph/arguana/argmining22-sharedtask/models//task-A/novelty/classification/nli-model/checkpoint-50/config.json
Model weights saved in ../../data-ceph/arguana/argmining22-sharedtask/models//task-A/novelty/classification/nli-model/checkpoint-50/pytorch_model.bin
The followin

TrainOutput(global_step=200, training_loss=0.26656726002693176, metrics={'train_runtime': 37.5894, 'train_samples_per_second': 85.13, 'train_steps_per_second': 5.321, 'total_flos': 237020932814400.0, 'train_loss': 0.26656726002693176, 'epoch': 4.44})

In [18]:
trainer.evaluate()

The following columns in the evaluation set  don't have a corresponding argument in `DebertaForSequenceClassification.forward` and have been ignored: Premise, topic, Validity-Confidence, Novelty-Confidence, input_txt, Novelty, __index_level_0__, Validity, Conclusion. If Premise, topic, Validity-Confidence, Novelty-Confidence, input_txt, Novelty, __index_level_0__, Validity, Conclusion are not expected by `DebertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 200
  Batch size = 8


{'eval_loss': 1.351954698562622,
 'eval_recall': 0.9576271186440678,
 'eval_precision': 0.6848484848484848,
 'eval_f1': 0.7985865724381626,
 'eval_runtime': 0.5363,
 'eval_samples_per_second': 372.94,
 'eval_steps_per_second': 24.241,
 'epoch': 4.44}

## Fine-tune simple BERT model on the training data:

In [8]:
bert_tokenizer = AutoTokenizer.from_pretrained('roberta-base')

In [9]:
from torch import nn
from transformers import Trainer


class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get("logits")
        # compute custom loss (suppose one has 3 labels with different weights)
        loss_fct = nn.CrossEntropyLoss(weight=torch.tensor([1.0, 1.0]).cuda())
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        return (loss, outputs) if return_outputs else loss

In [10]:
taska_training_df['input_txt'] = taska_training_df.apply(lambda x: '<s> {}:{} </s></s> {} </s>'.format(x['topic'], x['Premise'], x['Conclusion']), axis=1)
taska_valid_df['input_txt'] = taska_valid_df.apply(lambda x: '<s> {}:{} </s></s> {} </s>'.format(x['topic'], x['Premise'], x['Conclusion']), axis=1)

In [11]:
taska_training_df.columns

Index(['topic', 'Premise', 'Conclusion', 'Validity', 'Validity-Confidence',
       'Novelty', 'Novelty-Confidence', 'input_txt'],
      dtype='object')

In [12]:
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=123)

#Balancing the data
taska_training_df, y = ros.fit_resample(taska_training_df, taska_training_df['Novelty'])
taska_training_df['Novelty'] = y

In [13]:
taska_training_df.Novelty.value_counts()

 1    595
-1    595
Name: Novelty, dtype: int64

In [14]:
train_dataset = Dataset.from_pandas(taska_training_df)
eval_dataset = Dataset.from_pandas(taska_valid_df)

In [15]:
novelty_map = dict([ # avoid negative labels
    (1, 1), 
    (-1, 0)
])

In [16]:
def preprocess(example):
    inputs = bert_tokenizer(example["input_txt"], add_special_tokens=False, padding=True, truncation=True, max_length=512)
    inputs['labels'] = list(map(novelty_map.get, example['Novelty']))
    return inputs

In [17]:
train_dataset = train_dataset.map(preprocess, batched=True)
eval_dataset = eval_dataset.map(preprocess, batched=True)



  0%|          | 0/2 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

In [20]:
bert_model     = AutoModelForSequenceClassification.from_pretrained('roberta-base', num_labels=2)

training_args = TrainingArguments(
    output_dir= output_path + "/task-A/novelty/classification/roberta", 
    #report_to="wandb",
    logging_dir='/var/argmining-sharedtask/roberta-baseline-novelty',
    overwrite_output_dir=True,
    metric_for_best_model = 'f1',
    evaluation_strategy = 'steps',          # check evaluation metrics at each epoch
    learning_rate = 5e-6,                   # we can customize learning rate
    max_steps = 600,
    logging_steps = 50,                    # we will log every 50 steps which is an epoch given the 700 examples and 16 batch size
    eval_steps = 50,                      # we will perform evaluation every 500 steps
    save_steps = 50,
    load_best_model_at_end = True,
)

trainer = Trainer(
    model=bert_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=lambda x: compute_metrics(x, average='macro')
)

loading configuration file https://huggingface.co/roberta-base/resolve/main/config.json from cache at /mnt/ceph/storage/data-tmp/current//sile2804/.cache/huggingface/transformers/733bade19e5f0ce98e6531021dd5180994bb2f7b8bd7e80c7968805834ba351e.35205c6cfc956461d8515139f0f8dd5d207a2f336c0c3a83b4bc8dca3518e37b
Model config RobertaConfig {
  "_name_or_path": "roberta-base",
  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "position_embedding_type": "absolute",
  "transformers_version": "4.18.0",
  "type_vocab_size": 1,
  "use_cache": true,
  "vocab_size": 50265
}

loading weights fil

In [21]:
trainer.train()

The following columns in the training set  don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: Premise, Novelty, Conclusion, Validity, input_txt, Validity-Confidence, topic, Novelty-Confidence. If Premise, Novelty, Conclusion, Validity, input_txt, Validity-Confidence, topic, Novelty-Confidence are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 1190
  Num Epochs = 8
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 600
Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


Step,Training Loss,Validation Loss,Recall,Precision,F1
50,0.6993,0.699563,0.5,0.205,0.29078
100,0.6881,0.69316,0.478814,0.289744,0.361022
150,0.6682,0.782242,0.365957,0.36644,0.354984
200,0.5546,0.881463,0.43272,0.434103,0.424295
250,0.5155,0.90299,0.489045,0.488014,0.485199
300,0.4582,0.890255,0.554155,0.588753,0.531194
350,0.3987,0.934621,0.525837,0.533333,0.513795
400,0.3884,0.974252,0.550951,0.605545,0.514487
450,0.344,1.033039,0.555705,0.643389,0.50966
500,0.3462,0.983595,0.532968,0.548934,0.512782


The following columns in the evaluation set  don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: Premise, Novelty, Conclusion, Validity, input_txt, Validity-Confidence, topic, __index_level_0__, Novelty-Confidence. If Premise, Novelty, Conclusion, Validity, input_txt, Validity-Confidence, topic, __index_level_0__, Novelty-Confidence are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 200
  Batch size = 8
  _warn_prf(average, modifier, msg_start, len(result))
Saving model checkpoint to ../../data-ceph/arguana/argmining22-sharedtask/models//task-A/novelty/classification/roberta/checkpoint-50
Configuration saved in ../../data-ceph/arguana/argmining22-sharedtask/models//task-A/novelty/classification/roberta/checkpoint-50/config.json
Model weights saved in ../../data-ceph/arguana/argmining22-sharedtask/models//task-A/novelty/classification/rob

TrainOutput(global_step=600, training_loss=0.4719393348693848, metrics={'train_runtime': 70.5678, 'train_samples_per_second': 136.039, 'train_steps_per_second': 8.502, 'total_flos': 611527648200000.0, 'train_loss': 0.4719393348693848, 'epoch': 8.0})

In [22]:
trainer.evaluate()

The following columns in the evaluation set  don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: Premise, Novelty, Conclusion, Validity, input_txt, Validity-Confidence, topic, __index_level_0__, Novelty-Confidence. If Premise, Novelty, Conclusion, Validity, input_txt, Validity-Confidence, topic, __index_level_0__, Novelty-Confidence are not expected by `RobertaForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 200
  Batch size = 8


{'eval_loss': 0.890255331993103,
 'eval_recall': 0.5541546093427037,
 'eval_precision': 0.5887533875338753,
 'eval_f1': 0.5311936530833034,
 'eval_runtime': 0.3981,
 'eval_samples_per_second': 502.351,
 'eval_steps_per_second': 32.653,
 'epoch': 8.0}