In [21]:
import os
import pandas as pd
from get_dataset import Get_Dataset
from eval_metrics import eval_run
path = os.path.dirname(os.path.abspath('demo_notebook.ipynb'))
pd.set_option('display.max_colwidth', None)

## Dataset structure

The ClaimBuster dataset consists of over 23000 sentences, labelled as NFS, UFS, CFS. Here, we give a brief preview of how the data is structured and how it was processed. The dataset is composed of three parts: all_sentences (containing every sentence from every debate from 1960 to 2016, unlabelled), crowdsourced (containing all crowdsourced labelled examples) and groundtruth (containing all examples labelled by dataset authors).

Below we show some examples of the data. The "Verdict" column gives the classification of the claim, with -1 = NFS, 0 = UFS, 1 = CFS.

In [22]:
df_groundtruth = pd.read_csv(str(path)+'/ClaimBuster_Datasets/datasets/groundtruth.csv')
df_all_sentences = pd.read_csv(str(path)+'/ClaimBuster_Datasets/datasets/all_sentences.csv')
df_crowdsourced = pd.read_csv(str(path)+'/ClaimBuster_Datasets/datasets/crowdsourced.csv')

df_groundtruth.head()


Unnamed: 0,Sentence_id,Text,Speaker,Speaker_title,Speaker_party,File_id,Length,Line_number,Sentiment,Verdict
0,26,"You know, I saw a movie - ""Crocodile Dundee.""",George Bush,Vice President,REPUBLICAN,1988-09-25.txt,9,26,0.0,0
1,80,We're consuming 50 percent of the world's cocaine.,Michael Dukakis,Governor,DEMOCRAT,1988-09-25.txt,8,80,-0.740979,1
2,129,That answer was about as clear as Boston harbor.,George Bush,Vice President,REPUBLICAN,1988-09-25.txt,9,129,0.0,-1
3,131,Let me help the governor.,George Bush,Vice President,REPUBLICAN,1988-09-25.txt,5,131,0.212987,-1
4,172,We've run up more debt in the last eight years than under all the presidents from George Washington to Jimmy Carter combined.,Michael Dukakis,Governor,DEMOCRAT,1988-09-25.txt,22,172,-0.268506,1


## Data preprocessing

We concatenate the groundtruth and crowdsourced CSVs into a single labelled dataset. We then tokenise each sentence in the dataset using a bert-base-cased tokeniser. Finally, the examples in the dataset are organised into a series of triples of sentences, $S_P$, $S_T$, $S_F$, each of which has an associated label for $S_T$, the target sentence we aim to classify. We reindex the labels such that 0 = NFS, 1 = UFS, 2 = CFS.

For each $S_T$, the preceding and following sentences are drawn from the "all_sentences" dataset, so that all labelled examples have context sentences included even if these context sentences do not possess labels of their own. If the target sentence $S_T$ comes from the first or last line of a respective debate, then its preceding/following contextual sentence is left blank, as we assume independence between debates. 

Data was preprocessed within the "Preprocessing.py" file into an 80:20 train/test split, and training/test sets were saved into the "Data" folder. These training and test sets were kept constant throughout the model development process, in order to accurately compare the performance of our models.

Shown below are example inputs that are ready to be fed into the model, having been tokenised and concatenated. The full preprocessing code can be found in the "Preprocessing.py" file, which processed and saved the dataset. The "Get_Dataset()" function extracts the data and prepares it to be fed into the model for training or evaluation.

In [24]:
trainset = Get_Dataset(train=True)
processed_examples = pd.DataFrame({'Sentences': trainset.sentences.tolist(), 'Labels': trainset.labels.tolist()})
processed_examples.head()


Unnamed: 0,Sentences,Labels
0,"[[101, 1109, 4223, 2088, 117, 1103, 7228, 2088, 1104, 1103, 1244, 1311, 1138, 1309, 1151, 1167, 2407, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [101, 4081, 1423, 1141, 1104, 1103, 7885, 12359, 1209, 22366, 1106, 1103, 1864, 1115, 25922, 1110, 1107, 1126, 3432, 1344, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [101, 1262, 25527, 117, 1107, 2538, 1104, 1103, 5910, 1104, 1103, 3331, 4813, 117, 1103, 2978, 4013, 2757, 117, 1177, 4268, 1494, 1366, 1114, 1115, 117, 1150, 2195, 109, 3102, 1550, 1121, 1103, 3331, 4813, 1149, 1104, 1103, 9455, 15906, 3098, 1113, 9468, 19878, 24262, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]]",0
1,"[[101, 1124, 1169, 1294, 1251, 9107, 1119, 3349, 117, 1133, 1103, 9193, 1132, 1115, 1195, 112, 1231, 7914, 1103, 1295, 1104, 8362, 4935, 10105, 6556, 1104, 1412, 1416, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [101, 1262, 1112, 1103, 6556, 1104, 1103, 1416, 1110, 4138, 9658, 117, 5006, 1103, 1155, 24097, 1115, 1195, 1274, 112, 189, 1920, 1105, 1195, 112, 1231, 1280, 1106, 1660, 1948, 1111, 1142, 2199, 1137, 1115, 2199, 1105, 1136, 1111, 1482, 1107, 1103, 1426, 1104, 2245, 1110, 5733, 21321, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [101, 2421, 1143, 1198, 1587, 1128, 1150, 1103, 7141, 1110, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]]",2
2,"[[101, 1135, 112, 188, 1126, 2486, 146, 1221, 170, 1974, 1164, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [101, 146, 1108, 170, 1353, 2949, 1825, 1111, 170, 1229, 1107, 1745, 2245, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [101, 1188, 1110, 1126, 3469, 1115, 112, 188, 1125, 1185, 2197, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]]",1
3,"[[101, 1262, 1103, 1314, 1645, 146, 112, 173, 1176, 1106, 1474, 1110, 1142, 131, 1188, 9478, 2239, 1114, 1103, 2461, 1913, 1107, 112, 5117, 1108, 6434, 117, 1105, 1828, 119, 4100, 1189, 1146, 1111, 1122, 1114, 1210, 9712, 6824, 2758, 1279, 117, 1141, 1222, 1412, 1319, 11989, 1107, 1999, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [101, 1337, 112, 188, 1136, 1103, 1236, 1106, 1576, 1412, 2880, 2818, 117, 1259, 1835, 2597, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [101, 2958, 117, 146, 112, 173, 1176, 1106, 3368, 1146, 1113, 1115, 1553, 117, 2140, 117, 1105, 1113, 1240, 5767, 1111, 170, 3407, 4929, 1104, 1237, 7891, 1863, 1107, 2880, 5707, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]]",0
4,"[[101, 1327, 4418, 1547, 1103, 1397, 2084, 1202, 14863, 118, 1107, 1103, 1856, 1104, 1126, 14863, 118, 1126, 15299, 1216, 1112, 2743, 2977, 1137, 1103, 5953, 118, 4073, 3465, 118, 22233, 136, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [101, 1622, 1103, 2484, 7587, 1104, 25827, 117, 148, 11680, 22680, 2137, 3663, 131, 2119, 1519, 1143, 1474, 1115, 146, 1341, 1115, 1103, 2084, 5049, 1107, 170, 1295, 1104, 1472, 1877, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], [101, 1752, 117, 1112, 170, 7663, 2301, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]]",0


## Model architectures

Detailed below are the model architectures for which we will investigate performance. Full code for each model can be found in the "network.py" file

#### Network 0 (baseline):

Inputs: [$S_T$]

Outputs: y $\in$ {0,1}

Network with single BERT embedding layer, followed by MLP (as in the ClaimBuster Adversarial Transformer paper)
* Embeds the target sentence using a BERT transformer.
* Passes pooled output (CLS tokens) through a two layer MLP.
* Outputs binary classification where 0 = NFS or UFS, 1 = CFS 

#### Network 1:

Inputs: [$S_P$, $S_T$, $S_F$]

Outputs: y $\in$ {0,1,2}

Network with 3 parallel BERT embedding layers, followed by LSTM layer and MLP layer
* Embeds three sentences using BERT transformers.
* Passes pooled outputs (CLS tokens) of each sentencea s a sequence to a Bi-LSTM. 
* Takes middle hidden state of Bi-LSTM and passes through a two layer MLP.
* Outputs multiclass classification where 0 = NFS, 1 = UFS, 2 = CFS 

#### Network 2:

Inputs: [$S_P$, $S_T$, $S_F$]

Outputs: y $\in$ {0,1,2}

Network with 3 parallel BERT embedding layers, followed by attention layer and MLP layer
* Embeds three sentences using BERT transformers.
* Passes pooled outputs (CLS tokens) into two attention heads, computing attention between (S_T, S_P) and (S_T, S_F). 
* Concatenates outputs of attention heads and passes through a two layer MLP.
* Outputs multiclass classification where 0 = NFS, 1 = UFS, 2 = CFS 

#### Network 3:

Inputs: [$S_P$, $S_T$, $S_F$]

Outputs: y $\in$ {0,1,2}

Network with 3 parallel BERT embedding layers, followed by attention layer and MLP layer
* Embeds three sentences using BERT transformers.
* Passes pooled outputs (CLS tokens) into two attention heads, computing attention between (S_T, S_P) and (S_T, S_F). 
* Concatenates outputs of attention heads AND S_T output of BERT layer and passes through a two layer MLP.
* Outputs multiclass classification where 0 = NFS, 1 = UFS, 2 = CFS 
    

## Training

The models were trained using the train.py file. The models were trained for 15 epochs with a batch size of 4. A basic SGD optimiser with a learning rate of 0.001 was used, as this was found to converge better than other optimisers in early experiments.

## Evaluation

For each model, we evaluate by computing the precision, recall and F1 scores for each class on the test set, as well as overall weighted avergage P, R, F1. We also output the confusion matrices for each model. Tests for each model can be run in the cells below, with metrics printed in the output. 

In [None]:
"""
This cell calls the eval_run function from the eval_metrics file
Precision, Recall, F1 arrays are ordered by the respective classes in the format "array([0,1,2])".

If you would like to view the outputs in a csv file, pass the argument "save=True" into the function call below.
"""
eval_run()