# Argument Component Detection

This notebook aims to implement a classification model for predicting argumentative components within a medical abstracts. Specifically, the model has been trained to identify the following components:

* **Claim**: A statement expressing an opinion or judgment on a topic.

* **Premise or Evidence**: A proposition providing a reason or support for a claim.

This task was approached as a *sequence tagging classification problem* using the BIO schema, thus token in the text was labeled with one of the following tags:

* *B-Claim*: Beginning of a claim
* *I-Claim*: Inside a claim
* *B-Premise*: Beginning of a premise
* *I-Premise*: Inside a premise
* *O*: Token that does not belong to any argumentative component


In [1]:
train_data_dir =  "data/train/neoplasm_train"
val_data_dir =  "data/dev/neoplasm_dev"
test_data_dir =  "data/test/neoplasm_test"
neoplasm_test_data_dir = "data/test/neoplasm_test"
glaucoma_test_data_dir = "data/test/glaucoma_test"
mixed_test_data_dir = "data/test/mixed_test"
custom_train_data_dir = "data/custom_datasets/train"
custom_val_data_dir = "data/custom_datasets/val"
custom_test_data_dir = "data/custom_datasets/test"

In [2]:
import torch # type: ignore
from torch.utils.data import DataLoader 
from transformers import AutoTokenizer 

from utils.utils_notebooks import *
from utils.utils_argument_comp_classifier import *
from models.ac_detector import BertGRUCRF, ArgumentationDataset
from utils.train import train, evaluate, predict

## 1. Corpus
To utilize our argument component detection model, we need to create a custom dataset based on the .ann and .txt in the AbstRCT dataset. Specifically, to assign each token a tag following the BIO schema, the text is tokenized (using an appropriate tokenizer), and based on the start and end information of each component in the .ann files, the tag *B-{Component Type}* is assigned to the first token, while *I-{Component Type}* is assigned to the subsequent tokens. Tokens that do not belong to any component are assigned the tag O.

Regarding the data used, we adhered to the original paper's approach, maintaining the division into train, dev, and test sets, with the test set further divided into neoplasm, glaucoma, and mixed. To improve performance and , we constructed a custom dataset by merging all the original set into a single one, which was then split into train, dev, and test sets. This process increased the dataset size from 350 to 428 training examples, 50 to 107 validation examples, and reduced the test set from 300 to 107 examples.

In [3]:
device = "cuda" if torch.cuda.is_available() else "cpu"

MODEL_CARD_BERT = "bert-base-uncased"
MODEL_CARD_BioBERT = "dmis-lab/biobert-v1.1"
MODEL_CARD_SciBERT = "allenai/scibert_scivocab_uncased"

# Load BERT, BioBERT, and SciBERT tokenizers
bert_tokenizer = AutoTokenizer.from_pretrained(MODEL_CARD_BERT)
scibert_tokenizer = AutoTokenizer.from_pretrained(MODEL_CARD_SciBERT)
bio_bert_tokenizer = AutoTokenizer.from_pretrained(MODEL_CARD_BioBERT)



In [35]:
_ ,special_tokens_bert = display_special_tokens(MODEL_CARD_BERT)
_ ,special_tokens_bio_bert = display_special_tokens(MODEL_CARD_BioBERT)
_ ,special_tokens_scibert = display_special_tokens(MODEL_CARD_SciBERT)




Special Tokens and their IDs:
unk_token: [UNK] (ID: 100)
sep_token: [SEP] (ID: 102)
pad_token: [PAD] (ID: 0)
cls_token: [CLS] (ID: 101)
mask_token: [MASK] (ID: 103)

Special Tokens and their IDs:
unk_token: [UNK] (ID: 100)
sep_token: [SEP] (ID: 102)
pad_token: [PAD] (ID: 0)
cls_token: [CLS] (ID: 101)
mask_token: [MASK] (ID: 103)

Special Tokens and their IDs:
unk_token: [UNK] (ID: 101)
sep_token: [SEP] (ID: 103)
pad_token: [PAD] (ID: 0)
cls_token: [CLS] (ID: 102)
mask_token: [MASK] (ID: 104)


In [6]:
TAGS = ['B-Claim', 'B-Premise', 'I-Claim', 'I-Premise', 'O']

tag_to_idx, idx_to_tag = setup_mappings(TAGS)  
NUM_LABELS = len(TAGS)

print(f"Tags: {TAGS}")
print(f"Number of labels: {NUM_LABELS}")
print(f"Tag to index: {tag_to_idx}")
print(f"Index to tag: {idx_to_tag}")

Tags: ['B-Claim', 'B-Premise', 'I-Claim', 'I-Premise', 'O']
Number of labels: 5
Tag to index: {'B-Claim': 0, 'B-Premise': 1, 'I-Claim': 2, 'I-Premise': 3, 'O': 4}
Index to tag: {0: 'B-Claim', 1: 'B-Premise', 2: 'I-Claim', 3: 'I-Premise', 4: 'O'}


### 1.1 BERT base - Corpus

In [9]:
# Original Corpus
MAX_LENGTH = 512

#create the dataset
train_df = create_dataframe_from_directory(train_data_dir,tag_to_idx,bert_tokenizer, "component_detection", MAX_LENGTH)
val_df = create_dataframe_from_directory(val_data_dir,tag_to_idx, bert_tokenizer,"component_detection",MAX_LENGTH)
neoplasm_test_df = create_dataframe_from_directory(neoplasm_test_data_dir,tag_to_idx,bert_tokenizer, "component_detection",MAX_LENGTH)
glaucoma_test_df = create_dataframe_from_directory(glaucoma_test_data_dir,tag_to_idx,bert_tokenizer, "component_detection",MAX_LENGTH)
mixed_test_df = create_dataframe_from_directory(mixed_test_data_dir,tag_to_idx,bert_tokenizer, "component_detection",MAX_LENGTH)


print(f"\nShape Training dataframe: {train_df.shape}")
print(f"Shape Validation dataframe : {val_df.shape}")
print(f"Shape Neoplasm Test dataframe: {neoplasm_test_df.shape}")
print(f"Shape Glaucoma Test dataframe: {glaucoma_test_df.shape}")
print(f"Shape Mixed Test dataframe: {mixed_test_df.shape}")


train_df.head()



Shape Training dataframe: (350, 8)
Shape Validation dataframe : (50, 8)
Shape Neoplasm Test dataframe: (100, 8)
Shape Glaucoma Test dataframe: (100, 8)
Shape Mixed Test dataframe: (100, 8)


Unnamed: 0,raw_text,tokenized_text,input_ids,attention_mask,tags,encoded_tags,new_start_end_positions,file_name
0,a combination of mitoxantrone plus prednisone...,"[[CLS], a, combination, of, mit, ##ox, ##ant, ...","[101, 1037, 5257, 1997, 10210, 11636, 4630, 20...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 38), 'T2': (227, 240), 'T3': (242, ...",10561201.ann
1,in endocrine therapy trials in advanced breas...,"[[CLS], in, end, ##oc, ##rine, therapy, trials...","[101, 1999, 2203, 10085, 11467, 7242, 7012, 19...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 72), 'T3': (322, 348), 'T4': (349, ...",10561203.ann
2,treatment with cisplatin-based chemotherapy p...,"[[CLS], treatment, with, cis, ##pl, ##atin, -,...","[101, 3949, 2007, 20199, 24759, 20363, 1011, 2...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 32), 'T2': (233, 304), 'T3': (305, ...",10653877.ann
3,extracellular adenosine 5'-triphosphate (atp)...,"[[CLS], extra, ##cellular, aden, ##osi, ##ne, ...","[101, 4469, 16882, 16298, 20049, 2638, 1019, 1...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (50, 98), 'T2': (221, 289), 'T3': (290,...",10675381.ann
4,"this phase iii, double-blind, randomized, mul...","[[CLS], this, phase, iii, ,, double, -, blind,...","[101, 2023, 4403, 3523, 1010, 3313, 1011, 6397...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (147, 179), 'T2': (180, 205), 'T3': (20...",10735887.ann


In [36]:
special_tokens_bert = [special_tokens_bert['cls_token'], special_tokens_bert['sep_token'], special_tokens_bert['pad_token']]


CLASS_WEIGHTS_NORMALIZED = compute_norm_class_weights(train_df, label_column='tags', special_tokens=special_tokens_bert, task_name='seqtag')
print(CLASS_WEIGHTS_NORMALIZED)

{'B-Claim': 0.6688466110545623, 'I-Claim': 0.026869789574398697, 'O': 0.00553470591967781, 'B-Premise': 0.2907444411081118, 'I-Premise': 0.008004452343249327}


In [11]:
print_annotated_example(train_df, show_special_tokens=True,sample_idx=0)


Example of an annotated sentence:

Raw text:
['a', 'combination', 'of', 'mitoxantrone', 'plus', 'prednisone', 'is', 'preferable', 'to', 'prednisone', 'alone', 'for', 'reduction', 'of', 'pain', 'in', 'men', 'with', 'metastatic,', 'hormone-resistant,', 'prostate', 'cancer.', 'the', 'purpose', 'of', 'this', 'study', 'was', 'to', 'assess', 'the', 'effects', 'of', 'these', 'treatments', 'on', 'health-related', 'quality', 'of', 'life', '(hql).', 'men', 'with', 'metastatic', 'prostate', 'cancer', '(n', '=', '161)', 'were', 'randomized', 'to', 'receive', 'either', 'daily', 'prednisone', 'alone', 'or', 'mitoxantrone', '(every', '3', 'weeks)', 'plus', 'prednisone.', 'those', 'who', 'received', 'prednisone', 'alone', 'could', 'have', 'mitoxantrone', 'added', 'after', '6', 'weeks', 'if', 'there', 'was', 'no', 'improvement', 'in', 'pain.', 'hql', 'was', 'assessed', 'before', 'treatment', 'initiation', 'and', 'then', 'every', '3', 'weeks', 'using', 'the', 'european', 'organization', 'for', 'researc

In [12]:
# custom corpus
custom_train_df = create_dataframe_from_directory(custom_train_data_dir, tag_to_idx,bert_tokenizer, "component_detection", MAX_LENGTH)
custom_val_df = create_dataframe_from_directory(custom_val_data_dir,tag_to_idx,bert_tokenizer,"component_detection", MAX_LENGTH)
custom_test_df = create_dataframe_from_directory(custom_test_data_dir,tag_to_idx,bert_tokenizer,"component_detection", MAX_LENGTH)


print(f"\nShape Training dataframe: {custom_train_df.shape}")
print(f"Shape Validation dataframe : {custom_val_df.shape}")
print(f"Shape Test dataframe: {custom_test_df.shape}")


custom_train_df.head()


Shape Training dataframe: (428, 8)
Shape Validation dataframe : (107, 8)
Shape Test dataframe: (134, 8)


Unnamed: 0,raw_text,tokenized_text,input_ids,attention_mask,tags,encoded_tags,new_start_end_positions,file_name
0,\n\nto compare the efficacy and side effects a...,"[[CLS], to, compare, the, efficacy, and, side,...","[101, 2000, 12826, 1996, 21150, 1998, 2217, 38...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (329, 371), 'T2': (372, 433), 'T3': (43...",10080213.ann
1,data from experimental studies suggest that o...,"[[CLS], data, from, experimental, studies, sug...","[101, 2951, 2013, 6388, 2913, 6592, 2008, 1332...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, B-Claim, I-Claim, I-Clai...","[4, 4, 4, 4, 4, 4, 4, 0, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (7, 29), 'T2': (135, 188), 'T3': (189, ...",10210927.ann
2,"in a prospective randomized study, 287 patien...","[[CLS], in, a, prospective, random, ##ized, st...","[101, 1999, 1037, 17464, 6721, 3550, 2817, 101...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (400, 422), 'T2': (250, 266), 'T3': (26...",10403690.ann
3,to evaluate the efficacy and safety of a slow...,"[[CLS], to, evaluate, the, efficacy, and, safe...","[101, 2000, 16157, 1996, 21150, 1998, 3808, 19...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (179, 212), 'T2': (213, 249), 'T3': (25...",10506606.ann
4,"for several decades, both preoperative intra-...","[[CLS], for, several, decades, ,, both, pre, #...","[101, 2005, 2195, 5109, 1010, 2119, 3653, 2591...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 33), 'T3': (338, 369), 'T4': (372, ...",10526263.ann


In [37]:
CLASS_WEIGHTS_NORMALIZED_CUSTOM = compute_norm_class_weights(custom_train_df,label_column='tags', special_tokens=special_tokens_bert, task_name='seqtag')
print(CLASS_WEIGHTS_NORMALIZED_CUSTOM)

{'O': 0.005413833186845046, 'B-Premise': 0.2924687881118337, 'I-Premise': 0.0077116132587323576, 'B-Claim': 0.6675346665761751, 'I-Claim': 0.026871098866413826}


In [14]:
print_annotated_example(custom_train_df, show_special_tokens=True,sample_idx=0)


Example of an annotated sentence:

Raw text:
['to', 'compare', 'the', 'efficacy', 'and', 'side', 'effects', 'and', 'the', 'effect', 'on', 'aqueous', 'humor', 'dynamics', 'of', '0.005%', 'latanoprost', 'applied', 'topically', 'once', 'daily', 'with', '0.5%', 'timolol', 'given', 'twice', 'daily', 'for', '12', 'months', 'to', 'patients', 'with', 'pigmentary', 'glaucoma.', 'prospective,', 'randomized,', 'double-masked,', 'clinical', 'study.', 'thirty-six', 'patients', 'affected', 'with', 'bilateral', 'pigmentary', 'glaucoma', 'controlled', 'with', 'no', 'more', 'than', 'a', 'single', 'hypotensive', 'medication', 'were', 'enrolled', 'in', 'the', 'study.', 'the', 'sample', 'population', 'was', 'randomly', 'divided', 'into', '2', 'age-', 'and', 'gender-matched', 'groups', 'each', 'of', '18', 'patients.', 'group', '1', 'received', '0.005%', 'latanoprost', 'eyedrops', 'once', 'daily', 'and', 'the', 'vehicle', '(placebo)', 'once', 'daily;', 'group', '2', 'was', 'assigned', 'to', 'timolol', '0.5

#### 1.2 BioBERT base - Corpus

In [15]:
MAX_LENGTH = 512

#original corpus
train_df_biobert = create_dataframe_from_directory(train_data_dir,tag_to_idx,bio_bert_tokenizer,"component_detection",MAX_LENGTH)
val_df_biobert = create_dataframe_from_directory(val_data_dir,tag_to_idx,bio_bert_tokenizer,"component_detection",MAX_LENGTH)
neoplasm_test_df_biobert = create_dataframe_from_directory(neoplasm_test_data_dir,tag_to_idx,bio_bert_tokenizer,"component_detection",MAX_LENGTH)
glaucoma_test_df_biobert = create_dataframe_from_directory(glaucoma_test_data_dir,tag_to_idx,bio_bert_tokenizer,"component_detection",MAX_LENGTH)
mixed_test_df_biobert = create_dataframe_from_directory(mixed_test_data_dir,tag_to_idx,bio_bert_tokenizer,"component_detection",MAX_LENGTH)


print(f"\nShape Training dataframe: {train_df_biobert.shape}")
print(f"Shape Validation dataframe : {val_df_biobert.shape}")
print(f"Shape Neoplasm Test dataframe: {neoplasm_test_df_biobert.shape}")
print(f"Shape Glaucoma Test dataframe: {glaucoma_test_df_biobert.shape}")
print(f"Shape Mixed Test dataframe: {mixed_test_df_biobert.shape}")


train_df_biobert.head()


Shape Training dataframe: (350, 8)
Shape Validation dataframe : (50, 8)
Shape Neoplasm Test dataframe: (100, 8)
Shape Glaucoma Test dataframe: (100, 8)
Shape Mixed Test dataframe: (100, 8)


Unnamed: 0,raw_text,tokenized_text,input_ids,attention_mask,tags,encoded_tags,new_start_end_positions,file_name
0,a combination of mitoxantrone plus prednisone...,"[[CLS], a, combination, of, mit, ##ox, ##ant, ...","[101, 170, 4612, 1104, 26410, 10649, 2861, 185...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 39), 'T2': (234, 248), 'T3': (250, ...",10561201.ann
1,in endocrine therapy trials in advanced breas...,"[[CLS], in, end, ##oc, ##rine, therapy, trials...","[101, 1107, 1322, 13335, 8643, 7606, 7356, 110...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 78), 'T3': (345, 371), 'T4': (372, ...",10561203.ann
2,treatment with cisplatin-based chemotherapy p...,"[[CLS], treatment, with, c, ##is, ##p, ##lat, ...","[101, 3252, 1114, 172, 1548, 1643, 16236, 1394...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 38), 'T2': (262, 336), 'T3': (337, ...",10653877.ann
3,extracellular adenosine 5'-triphosphate (atp)...,"[[CLS], extra, ##cellular, ad, ##eno, ##sin, #...","[101, 3908, 18091, 8050, 26601, 10606, 1162, 1...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (53, 108), 'T2': (240, 311), 'T3': (312...",10675381.ann
4,"this phase iii, double-blind, randomized, mul...","[[CLS], this, phase, ii, ##i, ,, double, -, bl...","[101, 1142, 4065, 25550, 1182, 117, 2702, 118,...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (152, 184), 'T2': (185, 211), 'T3': (21...",10735887.ann


In [16]:
CLASS_WEIGHTS_NORMALIZED_BIOBERT = compute_norm_class_weights(train_df_biobert, label_column='tags', special_tokens=special_tokens_bert, task_name='seqtag')
print(CLASS_WEIGHTS_NORMALIZED_BIOBERT)

{'B-Claim': 0.6760395112532146, 'I-Claim': 0.025783987264824865, 'O': 0.005075986351357486, 'B-Premise': 0.2855207745074885, 'I-Premise': 0.007579740623114633}


In [17]:
print_annotated_example(train_df_biobert, show_special_tokens=True, sample_idx=0)


Example of an annotated sentence:

Raw text:
['a', 'combination', 'of', 'mitoxantrone', 'plus', 'prednisone', 'is', 'preferable', 'to', 'prednisone', 'alone', 'for', 'reduction', 'of', 'pain', 'in', 'men', 'with', 'metastatic,', 'hormone-resistant,', 'prostate', 'cancer.', 'the', 'purpose', 'of', 'this', 'study', 'was', 'to', 'assess', 'the', 'effects', 'of', 'these', 'treatments', 'on', 'health-related', 'quality', 'of', 'life', '(hql).', 'men', 'with', 'metastatic', 'prostate', 'cancer', '(n', '=', '161)', 'were', 'randomized', 'to', 'receive', 'either', 'daily', 'prednisone', 'alone', 'or', 'mitoxantrone', '(every', '3', 'weeks)', 'plus', 'prednisone.', 'those', 'who', 'received', 'prednisone', 'alone', 'could', 'have', 'mitoxantrone', 'added', 'after', '6', 'weeks', 'if', 'there', 'was', 'no', 'improvement', 'in', 'pain.', 'hql', 'was', 'assessed', 'before', 'treatment', 'initiation', 'and', 'then', 'every', '3', 'weeks', 'using', 'the', 'european', 'organization', 'for', 'researc

In [None]:
#create the dataset
custom_train_df_biobert = create_dataframe_from_directory(custom_train_data_dir,tag_to_idx, bio_bert_tokenizer,"component_detection",MAX_LENGTH)
custom_val_df_biobert = create_dataframe_from_directory(custom_val_data_dir,tag_to_idx,bio_bert_tokenizer,"component_detection", MAX_LENGTH)
custom_test_df_biobert = create_dataframe_from_directory(custom_test_data_dir,tag_to_idx, bio_bert_tokenizer,"component_detection",MAX_LENGTH)


print(f"\nShape Training dataframe: {custom_train_df_biobert.shape}")
print(f"Shape Validation dataframe : {custom_val_df_biobert.shape}")
print(f"Shape Test dataframe: {custom_test_df_biobert.shape}")


custom_train_df_biobert.head()


Shape Training dataframe: (428, 8)
Shape Validation dataframe : (107, 8)
Shape Test dataframe: (134, 8)


Unnamed: 0,raw_text,tokenized_text,input_ids,attention_mask,tags,encoded_tags,new_start_end_positions,file_name
0,\n\nto compare the efficacy and side effects a...,"[[CLS], to, compare, the, efficacy, and, side,...","[101, 1106, 14133, 1103, 23891, 1105, 1334, 31...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (334, 379), 'T2': (380, 442), 'T3': (44...",10080213.ann
1,to evaluate the efficacy and safety of a slow...,"[[CLS], to, evaluate, the, efficacy, and, safe...","[101, 1106, 17459, 1103, 23891, 1105, 3429, 11...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (195, 229), 'T2': (230, 267), 'T3': (26...",10506606.ann
2,"in a prospective randomized study, 287 patien...","[[CLS], in, a, prospective, random, ##ized, st...","[101, 1107, 170, 19916, 7091, 2200, 2025, 117,...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (449, 476), 'T2': (279, 297), 'T3': (29...",10403690.ann
3,data from experimental studies suggest that o...,"[[CLS], data, from, experimental, studies, sug...","[101, 2233, 1121, 6700, 2527, 5996, 1115, 184,...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, B-Claim, I-Claim, I-Clai...","[4, 4, 4, 4, 4, 4, 4, 0, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (7, 33), 'T2': (148, 205), 'T3': (206, ...",10210927.ann
4,"for several decades, both preoperative intra-...","[[CLS], for, several, decades, ,, both, pre, #...","[101, 1111, 1317, 4397, 117, 1241, 3073, 19807...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 37), 'T3': (356, 389), 'T4': (392, ...",10526263.ann


In [None]:
CLASS_WEIGHTS_NORMALIZED_CUSTOM_BIOBERT = compute_norm_class_weights(custom_train_df_biobert,label_column='tags', special_tokens=special_tokens, task_name='seqtag')
print(CLASS_WEIGHTS_NORMALIZED_CUSTOM_BIOBERT)

{'O': 0.0049495532302118676, 'B-Premise': 0.28616515909891416, 'I-Premise': 0.007268584229717112, 'B-Claim': 0.6759685109525704, 'I-Claim': 0.025648192488586474}


In [None]:
print_annotated_example(custom_train_df_biobert, show_special_tokens=True,sample_idx=0)


Example of an annotated sentence:

Raw text:
['to', 'compare', 'the', 'efficacy', 'and', 'side', 'effects', 'and', 'the', 'effect', 'on', 'aqueous', 'humor', 'dynamics', 'of', '0.005%', 'latanoprost', 'applied', 'topically', 'once', 'daily', 'with', '0.5%', 'timolol', 'given', 'twice', 'daily', 'for', '12', 'months', 'to', 'patients', 'with', 'pigmentary', 'glaucoma.', 'prospective,', 'randomized,', 'double-masked,', 'clinical', 'study.', 'thirty-six', 'patients', 'affected', 'with', 'bilateral', 'pigmentary', 'glaucoma', 'controlled', 'with', 'no', 'more', 'than', 'a', 'single', 'hypotensive', 'medication', 'were', 'enrolled', 'in', 'the', 'study.', 'the', 'sample', 'population', 'was', 'randomly', 'divided', 'into', '2', 'age-', 'and', 'gender-matched', 'groups', 'each', 'of', '18', 'patients.', 'group', '1', 'received', '0.005%', 'latanoprost', 'eyedrops', 'once', 'daily', 'and', 'the', 'vehicle', '(placebo)', 'once', 'daily;', 'group', '2', 'was', 'assigned', 'to', 'timolol', '0.5

#### 1.3 SciBERT - Corpus

In [18]:
MAX_LENGTH = 512

#original corpus
train_df_scibert = create_dataframe_from_directory(train_data_dir,tag_to_idx,scibert_tokenizer,"component_detection",MAX_LENGTH)
val_df_scibert = create_dataframe_from_directory(val_data_dir,tag_to_idx,scibert_tokenizer,"component_detection",MAX_LENGTH)
neoplasm_test_df_scibert = create_dataframe_from_directory(neoplasm_test_data_dir,tag_to_idx,scibert_tokenizer,"component_detection",MAX_LENGTH)
glaucoma_test_df_scibert = create_dataframe_from_directory(glaucoma_test_data_dir,tag_to_idx,scibert_tokenizer,"component_detection",MAX_LENGTH)
mixed_test_df_scibert = create_dataframe_from_directory(mixed_test_data_dir,tag_to_idx,scibert_tokenizer,"component_detection",MAX_LENGTH)


print(f"\nShape Training dataframe: {train_df_scibert.shape}")
print(f"Shape Validation dataframe : {val_df_scibert.shape}")
print(f"Shape Neoplasm Test dataframe: {neoplasm_test_df_scibert.shape}")
print(f"Shape Glaucoma Test dataframe: {glaucoma_test_df_scibert.shape}")
print(f"Shape Mixed Test dataframe: {mixed_test_df_scibert.shape}")


train_df_scibert.head()


Shape Training dataframe: (350, 8)
Shape Validation dataframe : (50, 8)
Shape Neoplasm Test dataframe: (100, 8)
Shape Glaucoma Test dataframe: (100, 8)
Shape Mixed Test dataframe: (100, 8)


Unnamed: 0,raw_text,tokenized_text,input_ids,attention_mask,tags,encoded_tags,new_start_end_positions,file_name
0,a combination of mitoxantrone plus prednisone...,"[[CLS], a, combination, of, mit, ##ox, ##ant, ...","[102, 106, 2702, 131, 1805, 786, 268, 1809, 30...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 33), 'T2': (213, 226), 'T3': (228, ...",10561201.ann
1,in endocrine therapy trials in advanced breas...,"[[CLS], in, endocrine, therapy, trials, in, ad...","[102, 121, 13489, 2223, 3270, 121, 4378, 3479,...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 70), 'T3': (296, 321), 'T4': (322, ...",10561203.ann
2,treatment with cisplatin-based chemotherapy p...,"[[CLS], treatment, with, cisplatin, -, based, ...","[102, 922, 190, 14120, 579, 791, 6127, 2315, 1...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 28), 'T2': (195, 259), 'T3': (260, ...",10653877.ann
3,extracellular adenosine 5'-triphosphate (atp)...,"[[CLS], extracellular, adenosine, 5, ', -, tri...","[102, 6636, 14584, 305, 2505, 579, 15268, 2369...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (42, 85), 'T2': (200, 268), 'T3': (269,...",10675381.ann
4,"this phase iii, double-blind, randomized, mul...","[[CLS], this, phase, iii, ,, double, -, blind,...","[102, 238, 1481, 2786, 422, 3917, 579, 8478, 4...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (128, 160), 'T2': (161, 182), 'T3': (18...",10735887.ann


In [38]:
special_tokens_scibert = special_tokens_scibert = [special_tokens_scibert['cls_token'], special_tokens_scibert['sep_token'], special_tokens_scibert['pad_token']]


CLASS_WEIGHTS_NORMALIZED_SCIBERT = compute_norm_class_weights(train_df_scibert, label_column='tags', special_tokens=special_tokens_scibert, task_name='seqtag')
print(CLASS_WEIGHTS_NORMALIZED_SCIBERT)

{'B-Claim': 0.6587091711632913, 'I-Claim': 0.0297329452881422, 'O': 0.0062897817630954075, 'B-Premise': 0.2966805195517205, 'I-Premise': 0.008587582233750696}


In [20]:
print_annotated_example(train_df_scibert,show_special_tokens=True,sample_idx=0)


Example of an annotated sentence:

Raw text:
['a', 'combination', 'of', 'mitoxantrone', 'plus', 'prednisone', 'is', 'preferable', 'to', 'prednisone', 'alone', 'for', 'reduction', 'of', 'pain', 'in', 'men', 'with', 'metastatic,', 'hormone-resistant,', 'prostate', 'cancer.', 'the', 'purpose', 'of', 'this', 'study', 'was', 'to', 'assess', 'the', 'effects', 'of', 'these', 'treatments', 'on', 'health-related', 'quality', 'of', 'life', '(hql).', 'men', 'with', 'metastatic', 'prostate', 'cancer', '(n', '=', '161)', 'were', 'randomized', 'to', 'receive', 'either', 'daily', 'prednisone', 'alone', 'or', 'mitoxantrone', '(every', '3', 'weeks)', 'plus', 'prednisone.', 'those', 'who', 'received', 'prednisone', 'alone', 'could', 'have', 'mitoxantrone', 'added', 'after', '6', 'weeks', 'if', 'there', 'was', 'no', 'improvement', 'in', 'pain.', 'hql', 'was', 'assessed', 'before', 'treatment', 'initiation', 'and', 'then', 'every', '3', 'weeks', 'using', 'the', 'european', 'organization', 'for', 'researc

In [21]:
# custom corpus
custom_train_df_scibert = create_dataframe_from_directory(custom_train_data_dir,tag_to_idx, scibert_tokenizer,"component_detection",MAX_LENGTH)
custom_val_df_scibert = create_dataframe_from_directory(custom_val_data_dir,tag_to_idx,scibert_tokenizer,"component_detection",MAX_LENGTH)
custom_test_df_scibert = create_dataframe_from_directory(custom_test_data_dir,tag_to_idx,scibert_tokenizer,"component_detection",MAX_LENGTH)

print(f"\nShape Training dataframe: {custom_train_df_scibert.shape}")
print(f"Shape Validation dataframe : {custom_val_df_scibert.shape}")
print(f"Shape Test dataframe: {custom_test_df_scibert.shape}")

custom_train_df_scibert.head()


Shape Training dataframe: (428, 8)
Shape Validation dataframe : (107, 8)
Shape Test dataframe: (134, 8)


Unnamed: 0,raw_text,tokenized_text,input_ids,attention_mask,tags,encoded_tags,new_start_end_positions,file_name
0,\n\nto compare the efficacy and side effects a...,"[[CLS], to, compare, the, efficacy, and, side,...","[102, 147, 3745, 111, 4684, 137, 2480, 1056, 1...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (302, 341), 'T2': (342, 401), 'T3': (40...",10080213.ann
1,data from experimental studies suggest that o...,"[[CLS], data, from, experimental, studies, sug...","[102, 453, 263, 1798, 826, 1739, 198, 6595, 10...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, B-Claim, I-Claim, I-Clai...","[4, 4, 4, 4, 4, 4, 4, 0, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (7, 26), 'T2': (124, 173), 'T3': (174, ...",10210927.ann
2,"in a prospective randomized study, 287 patien...","[[CLS], in, a, prospective, randomized, study,...","[102, 121, 106, 6880, 5460, 527, 422, 26713, 5...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (398, 417), 'T2': (244, 261), 'T3': (26...",10403690.ann
3,to evaluate the efficacy and safety of a slow...,"[[CLS], to, evaluate, the, efficacy, and, safe...","[102, 147, 3138, 111, 4684, 137, 4104, 131, 10...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, O, O, O, O, O, O, O, O, O, O, O, O, O, O, ...","[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...","{'T1': (155, 186), 'T2': (187, 220), 'T3': (22...",10506606.ann
4,"for several decades, both preoperative intra-...","[[CLS], for, several, decades, ,, both, preope...","[102, 168, 1323, 8148, 422, 655, 10014, 4743, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...","[O, B-Claim, I-Claim, I-Claim, I-Claim, I-Clai...","[4, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...","{'T1': (1, 27), 'T3': (295, 322), 'T4': (325, ...",10526263.ann


In [22]:
CLASS_WEIGHTS_NORMALIZED_CUSTOM_SCIBERT = compute_norm_class_weights(custom_train_df_scibert, label_column='tags', special_tokens=special_tokens_scibert, task_name='seqtag')
print(CLASS_WEIGHTS_NORMALIZED_CUSTOM_SCIBERT)

{'O': 0.006138235983269374, 'B-Premise': 0.29795623535639276, 'I-Premise': 0.008266760717044224, 'B-Claim': 0.657910708179844, 'I-Claim': 0.029728059763449614}


In [23]:
print_annotated_example(custom_test_df_scibert, show_special_tokens=False, sample_idx=3)


Example of an annotated sentence:

Raw text:
['in', 'the', 'context', 'of', 'chronic', 'physical', 'illness,', 'such', 'as', 'breast', 'cancer,', 'depression', 'is', 'associated', 'with', 'increased', 'morbidity,', 'longer', 'periods', 'of', 'hospitalization,', 'and', 'greater', 'overall', 'disability.', 'prompt', 'diagnosis', 'and', 'effective', 'treatment', 'is,', 'therefore,', 'essential.', 'several', 'small', 'studies', 'have', 'established', 'the', 'efficacy', 'of', 'tricyclic', 'antidepressants', '(tcas)', 'in', 'this', 'setting,', 'and', 'the', 'selective', 'serotonin', 'reuptake', 'inhibitors', '(ssris)', 'would', 'appear', 'to', 'be', 'an', 'alternative', 'therapeutic', 'option', 'because', 'of', 'their', 'established', 'efficacy', 'and', 'better', 'tolerability', 'profile.', 'this', 'was', 'a', 'multicenter.', 'double-blind,', 'parallel-group', 'study', 'in', 'which', '179', 'women', 'with', 'breast', 'cancer', 'were', 'randomized', 'to', 'treatment', 'with', 'either', 'the'

### 2. Dataset and Dataloader Creation

For each dataframe we can now create a custom dataset and dataloader, which will be used to train the models. The chosen batch size is 8 while the maximum sequence length is 512. Sentence with more than 512 tokens are truncated, while shorter sentences are padded with the token [PAD].

#### 2.1 BERT base - Dataset and Dataloader

In [None]:
BATCH_SIZE = 8

#original dataset BERT
train_dataset = ArgumentationDataset(train_df, bert_tokenizer)
val_dataset = ArgumentationDataset(val_df, bert_tokenizer)
neoplasm_test_dataset = ArgumentationDataset(neoplasm_test_df, bert_tokenizer)
glaucoma_test_dataset = ArgumentationDataset(glaucoma_test_df, bert_tokenizer)
mixed_test_dataset = ArgumentationDataset(mixed_test_df, bert_tokenizer)

# original dataloader BERT
train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)
neoplasm_test_dataloader = DataLoader(neoplasm_test_dataset, batch_size=BATCH_SIZE, shuffle=False)
glaucoma_test_dataloader = DataLoader(glaucoma_test_dataset, batch_size=BATCH_SIZE, shuffle=False)
mixed_test_dataloader = DataLoader(mixed_test_dataset, batch_size=BATCH_SIZE, shuffle=False)

In [None]:
#custom dataset BERT
custom_train_dataset = ArgumentationDataset(custom_train_df, bert_tokenizer)
custom_val_dataset = ArgumentationDataset(custom_val_df, bert_tokenizer)
custom_test_dataset = ArgumentationDataset(custom_test_df, bert_tokenizer)

#custom dataloader BERT
custom_train_dataloader = DataLoader(custom_train_dataset, batch_size=BATCH_SIZE, shuffle=True)
custom_val_dataloader = DataLoader(custom_val_dataset, batch_size=BATCH_SIZE, shuffle=False)
custom_test_dataloader = DataLoader(custom_test_dataset, batch_size=BATCH_SIZE, shuffle=False)

#### 2.2 BioBERT base - Dataset and Dataloader

In [None]:
#original dataset BioBERT
train_dataset_bio = ArgumentationDataset(train_df_biobert, bio_bert_tokenizer)
val_dataset_bio = ArgumentationDataset(val_df_biobert, bio_bert_tokenizer)
neoplasm_test_dataset_bio = ArgumentationDataset(neoplasm_test_df_biobert, bio_bert_tokenizer)
glaucoma_test_dataset_bio = ArgumentationDataset(glaucoma_test_df_biobert, bio_bert_tokenizer)
mixed_test_dataset_bio = ArgumentationDataset(mixed_test_df_biobert, bio_bert_tokenizer)


#original dataloader BioBERT
train_dataloader_bio = DataLoader(train_dataset_bio, batch_size=BATCH_SIZE, shuffle=True)
val_dataloader_bio = DataLoader(val_dataset_bio, batch_size=BATCH_SIZE, shuffle=False)
neoplasm_test_dataloader_bio = DataLoader(neoplasm_test_dataset_bio, batch_size=BATCH_SIZE, shuffle=False)
glaucoma_test_dataloader_bio = DataLoader(glaucoma_test_dataset_bio, batch_size=BATCH_SIZE, shuffle=False)
mixed_test_dataloader_bio = DataLoader(mixed_test_dataset_bio, batch_size=BATCH_SIZE, shuffle=False)

In [None]:
#custom dataset BioBERT
custom_train_dataset_bio = ArgumentationDataset(custom_train_df_biobert, bio_bert_tokenizer)
custom_val_dataset_bio = ArgumentationDataset(custom_val_df_biobert, bio_bert_tokenizer)
custom_test_dataset_bio = ArgumentationDataset(custom_test_df_biobert, bio_bert_tokenizer)

#custom dataloader BioBERT
custom_train_dataloader_bio = DataLoader(custom_train_dataset_bio, batch_size=BATCH_SIZE, shuffle=True)
custom_val_dataloader_bio = DataLoader(custom_val_dataset_bio, batch_size=BATCH_SIZE, shuffle=False)
custom_test_dataloader_bio = DataLoader(custom_test_dataset_bio, batch_size=BATCH_SIZE, shuffle=False)

#### 2.3 SciBERT - Dataset and Dataloader

In [None]:
#original dataset SciBERT
train_dataset_sci = ArgumentationDataset(train_df_scibert, scibert_tokenizer)
val_dataset_sci = ArgumentationDataset(val_df_scibert, scibert_tokenizer)
neoplasm_test_dataset_sci = ArgumentationDataset(neoplasm_test_df_scibert, scibert_tokenizer)
glaucoma_test_dataset_sci = ArgumentationDataset(glaucoma_test_df_scibert, scibert_tokenizer)
mixed_test_dataset_sci = ArgumentationDataset(mixed_test_df_scibert, scibert_tokenizer)

#original dataloader SciBERT
train_dataloader_sci = DataLoader(train_dataset_sci, batch_size=BATCH_SIZE, shuffle=True)
val_dataloader_sci = DataLoader(val_dataset_sci, batch_size=BATCH_SIZE, shuffle=False)
neoplasm_test_dataloader_sci = DataLoader(neoplasm_test_dataset_sci, batch_size=BATCH_SIZE, shuffle=False)
glaucoma_test_dataloader_sci = DataLoader(glaucoma_test_dataset_sci, batch_size=BATCH_SIZE, shuffle=False)
mixed_test_dataloader_sci = DataLoader(mixed_test_dataset_sci, batch_size=BATCH_SIZE, shuffle=False)

In [None]:
#custom dataset SciBERT
custom_train_dataset_sci = ArgumentationDataset(custom_train_df_scibert, scibert_tokenizer)
custom_val_dataset_sci = ArgumentationDataset(custom_val_df_scibert, scibert_tokenizer)
custom_test_dataset_sci = ArgumentationDataset(custom_test_df_scibert, scibert_tokenizer)

#custom dataloader SciBERT
custom_train_dataloader_sci = DataLoader(custom_train_dataset_sci, batch_size=BATCH_SIZE, shuffle=True)
custom_val_dataloader_sci = DataLoader(custom_val_dataset_sci, batch_size=BATCH_SIZE, shuffle=False)
custom_test_dataloader_sci = DataLoader(custom_test_dataset_sci, batch_size=BATCH_SIZE, shuffle=False)

### 3. Model Definition
Building on the approaches presented in [1](https://pubmed.ncbi.nlm.nih.gov/34412851/) and [2](https://hal.science/hal-02879293/document), this experiment implements the best-performing models from these studies, namely:
* BERT + BiLSTM + CRF
* BioBERT + BiLSTM + CRF
* SciBERT + BiLSTM + CRF

We decided to adopt and adapt their architecture for our task, fine-tuning the model end-to-end and performing hyperparameter tuning to improve performance further.

In [None]:
import warnings
from sklearn.exceptions import UndefinedMetricWarning

# Disabilita i warning per metriche non definite
warnings.filterwarnings("ignore", category=UndefinedMetricWarning)

In [69]:
seeds = [42]
GRU_HIDDEN_SIZE = 128
DROPOUT = 0.43
LR = 1e-4
EPOCHS = 5

#BERT
bert_model_original = BertGRUCRF(MODEL_CARD_BERT, NUM_LABELS, GRU_HIDDEN_SIZE, dropout_prob=DROPOUT).to(device)
bert_model_custom = BertGRUCRF(MODEL_CARD_BERT, NUM_LABELS, GRU_HIDDEN_SIZE, dropout_prob=DROPOUT).to(device)

#BioBERT
bio_bert_model_original = BertGRUCRF(MODEL_CARD_BioBERT, NUM_LABELS, GRU_HIDDEN_SIZE, dropout_prob=DROPOUT).to(device)
bio_bert_model_custom = BertGRUCRF(MODEL_CARD_BioBERT, NUM_LABELS, GRU_HIDDEN_SIZE, dropout_prob=DROPOUT).to(device)

#SciBERT
scibert_model_original = BertGRUCRF(MODEL_CARD_SciBERT, NUM_LABELS, GRU_HIDDEN_SIZE, dropout_prob=DROPOUT).to(device)
scibert_model_custom = BertGRUCRF(MODEL_CARD_SciBERT, NUM_LABELS, GRU_HIDDEN_SIZE, dropout_prob=DROPOUT).to(device)

#### BERT - Training

In [None]:
#BERT training original dataset
output_folder = "bert_models"

bert_model_original, results_original = train(bert_model_original,  "bert-base_original" , train_dataloader, val_dataloader, learning_rate=LR, num_epochs=EPOCHS,class_weights = CLASS_WEIGHTS_NORMALIZED,scheduler=False,
                                seeds=seeds,save_model=True,models_folder=output_folder)


Training with seed 42...

Epoch 1/5




Training
	train_loss: 795.0075
	F1-Score Micro: 0.4989 | F1-Score Macro: 0.2611 | Weighted F1-Score: 0.530504 | Precision: 0.5737 | Recall: 0.4989 | Accuracy: 0.4989


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Validation
	val_loss: 1847.6604
	F1-Score Micro: 0.7861 | F1-Score Macro: 0.3284 | Weighted F1-Score: 0.7445 | Precision: 0.7086 | Recall: 0.7861 | Accuracy: 0.7861
****************************************************************************************************

Epoch 2/5




Training
	train_loss: 719.7863
	F1-Score Micro: 0.5592 | F1-Score Macro: 0.2972 | Weighted F1-Score: 0.591569 | Precision: 0.6404 | Recall: 0.5592 | Accuracy: 0.5592


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Validation
	val_loss: 1937.5831
	F1-Score Micro: 0.7808 | F1-Score Macro: 0.3285 | Weighted F1-Score: 0.7424 | Precision: 0.7253 | Recall: 0.7808 | Accuracy: 0.7808
****************************************************************************************************

Epoch 3/5




Training
	train_loss: 695.7893
	F1-Score Micro: 0.5839 | F1-Score Macro: 0.3189 | Weighted F1-Score: 0.614859 | Precision: 0.6625 | Recall: 0.5839 | Accuracy: 0.5839


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Validation
	val_loss: 1782.2304
	F1-Score Micro: 0.8062 | F1-Score Macro: 0.4229 | Weighted F1-Score: 0.7898 | Precision: 0.7971 | Recall: 0.8062 | Accuracy: 0.8062
****************************************************************************************************

Epoch 4/5




Training
	train_loss: 657.1298
	F1-Score Micro: 0.6203 | F1-Score Macro: 0.3563 | Weighted F1-Score: 0.648119 | Precision: 0.6985 | Recall: 0.6203 | Accuracy: 0.6203


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Validation
	val_loss: 1765.5448
	F1-Score Micro: 0.7750 | F1-Score Macro: 0.4044 | Weighted F1-Score: 0.7555 | Precision: 0.7743 | Recall: 0.7750 | Accuracy: 0.7750
****************************************************************************************************

Epoch 5/5




Training
	train_loss: 649.3673
	F1-Score Micro: 0.6215 | F1-Score Macro: 0.3498 | Weighted F1-Score: 0.648364 | Precision: 0.6898 | Recall: 0.6215 | Accuracy: 0.6215


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Validation
	val_loss: 1835.8728
	F1-Score Micro: 0.7904 | F1-Score Macro: 0.3907 | Weighted F1-Score: 0.7707 | Precision: 0.8008 | Recall: 0.7904 | Accuracy: 0.7904
****************************************************************************************************

Best Val F1-Score: 0.4229 at epoch 3
Saving model...
Saved!

Training completed.


In [66]:
evaluate(bert_model_original, neoplasm_test_dataloader, device, verbose=True);
evaluate(bert_model_original, glaucoma_test_dataloader, device, verbose=True);
evaluate(bert_model_original, mixed_test_dataloader, device, verbose=True);

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Test Results
	Average loss: 1137.9671
	F1-Score Micro: 0.8830
	F1-Score Macro: 0.5360
	Weighted F1-Score: 0.8759
	Precision: 0.8800
	Recall: 0.8830
	Accuracy: 0.8830
	F1 Claim: 0.7405
	F1 Evidence 0.8785
	Classification Report:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.06      0.74      0.11        35
           2       0.69      0.84      0.76      4496
           3       0.96      0.83      0.89     17047
           4       0.90      0.94      0.92     21295

    accuracy                           0.88     42873
   macro avg       0.52      0.67      0.54     42873
weighted avg       0.90      0.88      0.89     42873



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Test Results
	Average loss: 1215.0380
	F1-Score Micro: 0.8859
	F1-Score Macro: 0.5356
	Weighted F1-Score: 0.8820
	Precision: 0.8940
	Recall: 0.8859
	Accuracy: 0.8859
	F1 Claim: 0.7510
	F1 Evidence 0.8680
	Classification Report:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.06      0.73      0.11        30
           2       0.71      0.83      0.77      3567
           3       0.97      0.80      0.88     20076
           4       0.87      0.99      0.93     19689

    accuracy                           0.89     43362
   macro avg       0.52      0.67      0.54     43362
weighted avg       0.91      0.89      0.89     43362



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



Test Results
	Average loss: 1114.0633
	F1-Score Micro: 0.8890
	F1-Score Macro: 0.5428
	Weighted F1-Score: 0.8836
	Precision: 0.8878
	Recall: 0.8890
	Accuracy: 0.8890
	F1 Claim: 0.7241
	F1 Evidence 0.8944
	Classification Report:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         0
           1       0.08      0.89      0.15        35
           2       0.71      0.77      0.74      4349
           3       0.96      0.85      0.90     18525
           4       0.89      0.95      0.92     20733

    accuracy                           0.89     43642
   macro avg       0.53      0.69      0.54     43642
weighted avg       0.90      0.89      0.89     43642



In [67]:
#BERT training custom dataset

bert_model_custom, results_custom = train(bert_model_custom,  "bert-base_custom" , train_dataloader, val_dataloader, learning_rate=LR, num_epochs=EPOCHS,class_weights = CLASS_WEIGHTS_NORMALIZED_CUSTOM,
                                seeds=seeds,save_model=True,models_folder=output_folder)


Training with seed 42...

Epoch 1/5




Training
	train_loss: 671.9845
	F1-Score Micro: 0.5617 | F1-Score Macro: 0.3233 | Weighted F1-Score: 0.604411 | Precision: 0.6606 | Recall: 0.5617 | Accuracy: 0.5617


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Validation
	val_loss: 1027.4487
	F1-Score Micro: 0.8865 | F1-Score Macro: 0.5025 | Weighted F1-Score: 0.8821 | Precision: 0.8787 | Recall: 0.8865 | Accuracy: 0.8865
****************************************************************************************************

Epoch 2/5




Training
	train_loss: 501.7376
	F1-Score Micro: 0.6151 | F1-Score Macro: 0.3785 | Weighted F1-Score: 0.666763 | Precision: 0.7412 | Recall: 0.6151 | Accuracy: 0.6151


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Validation
	val_loss: 710.0954
	F1-Score Micro: 0.9117 | F1-Score Macro: 0.6562 | Weighted F1-Score: 0.9115 | Precision: 0.9129 | Recall: 0.9117 | Accuracy: 0.9117
****************************************************************************************************

Epoch 3/5




Training
	train_loss: 440.3242
	F1-Score Micro: 0.6352 | F1-Score Macro: 0.4084 | Weighted F1-Score: 0.686707 | Precision: 0.7663 | Recall: 0.6352 | Accuracy: 0.6352




Validation
	val_loss: 678.1822
	F1-Score Micro: 0.9251 | F1-Score Macro: 0.8010 | Weighted F1-Score: 0.9260 | Precision: 0.9284 | Recall: 0.9251 | Accuracy: 0.9251
****************************************************************************************************

Epoch 4/5




Training
	train_loss: 393.3184
	F1-Score Micro: 0.6488 | F1-Score Macro: 0.4264 | Weighted F1-Score: 0.695508 | Precision: 0.7698 | Recall: 0.6488 | Accuracy: 0.6488




Validation
	val_loss: 774.5019
	F1-Score Micro: 0.9100 | F1-Score Macro: 0.7876 | Weighted F1-Score: 0.9112 | Precision: 0.9164 | Recall: 0.9100 | Accuracy: 0.9100
****************************************************************************************************

Epoch 5/5




Training
	train_loss: 370.4218
	F1-Score Micro: 0.6791 | F1-Score Macro: 0.4496 | Weighted F1-Score: 0.721130 | Precision: 0.7878 | Recall: 0.6791 | Accuracy: 0.6791




Validation
	val_loss: 742.6063
	F1-Score Micro: 0.9037 | F1-Score Macro: 0.7971 | Weighted F1-Score: 0.9072 | Precision: 0.9146 | Recall: 0.9037 | Accuracy: 0.9037
****************************************************************************************************

Best Val F1-Score: 0.8010 at epoch 3
Saving model...
Saved!

Training completed.


In [68]:
evaluate(bert_model_custom, custom_test_dataloader, device, verbose=True);




Test Results
	Average loss: 570.0512
	F1-Score Micro: 0.9414
	F1-Score Macro: 0.8717
	Weighted F1-Score: 0.9419
	Precision: 0.9428
	Recall: 0.9414
	Accuracy: 0.9414
	F1 Claim: 0.8566
	F1 Evidence 0.9447
	Classification Report:
              precision    recall  f1-score   support

           0       0.73      0.77      0.75       248
           1       0.84      0.84      0.84       539
           2       0.90      0.82      0.86      7214
           3       0.94      0.95      0.95     20796
           4       0.95      0.96      0.96     29386

    accuracy                           0.94     58183
   macro avg       0.87      0.87      0.87     58183
weighted avg       0.94      0.94      0.94     58183



#### BioBERT - Training

In [50]:
#BioBERT training original dataset
output_folder = "bio_bert_models"

bio_bert_model_original, results_bio_custom = train(bio_bert_model_original,  "bio-bert_original" , train_dataloader_bio, val_dataloader_bio, learning_rate=LR, num_epochs=EPOCHS,class_weights = CLASS_WEIGHTS_NORMALIZED_BIOBERT,
                                seeds=seeds,save_model=True,models_folder=output_folder)


Training with seed 42...

Epoch 1/5




Training
	train_loss: 582.8458
	F1-Score Micro: 0.6989 | F1-Score Macro: 0.4083 | Weighted F1-Score: 0.725396 | Precision: 0.7837 | Recall: 0.6989 | Accuracy: 0.6989


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Validation
	val_loss: 733.4613
	F1-Score Micro: 0.9238 | F1-Score Macro: 0.5344 | Weighted F1-Score: 0.9182 | Precision: 0.9133 | Recall: 0.9238 | Accuracy: 0.9238
****************************************************************************************************

Epoch 2/5




Training
	train_loss: 454.5709
	F1-Score Micro: 0.7491 | F1-Score Macro: 0.4720 | Weighted F1-Score: 0.778903 | Precision: 0.8417 | Recall: 0.7491 | Accuracy: 0.7491




Validation
	val_loss: 709.7893
	F1-Score Micro: 0.9221 | F1-Score Macro: 0.8218 | Weighted F1-Score: 0.9218 | Precision: 0.9218 | Recall: 0.9221 | Accuracy: 0.9221
****************************************************************************************************

Epoch 3/5




Training
	train_loss: 406.4441
	F1-Score Micro: 0.7689 | F1-Score Macro: 0.5549 | Weighted F1-Score: 0.799124 | Precision: 0.8628 | Recall: 0.7689 | Accuracy: 0.7689




Validation
	val_loss: 782.5351
	F1-Score Micro: 0.9055 | F1-Score Macro: 0.8193 | Weighted F1-Score: 0.9088 | Precision: 0.9217 | Recall: 0.9055 | Accuracy: 0.9055
****************************************************************************************************

Epoch 4/5




Training
	train_loss: 362.7870
	F1-Score Micro: 0.8048 | F1-Score Macro: 0.5855 | Weighted F1-Score: 0.838963 | Precision: 0.8986 | Recall: 0.8048 | Accuracy: 0.8048




Validation
	val_loss: 587.3463
	F1-Score Micro: 0.9309 | F1-Score Macro: 0.8524 | Weighted F1-Score: 0.9315 | Precision: 0.9334 | Recall: 0.9309 | Accuracy: 0.9309
****************************************************************************************************

Epoch 5/5




Training
	train_loss: 329.3842
	F1-Score Micro: 0.8150 | F1-Score Macro: 0.5944 | Weighted F1-Score: 0.849952 | Precision: 0.9096 | Recall: 0.8150 | Accuracy: 0.8150




Validation
	val_loss: 733.3507
	F1-Score Micro: 0.9154 | F1-Score Macro: 0.8276 | Weighted F1-Score: 0.9158 | Precision: 0.9188 | Recall: 0.9154 | Accuracy: 0.9154
****************************************************************************************************

Best Val F1-Score: 0.8524 at epoch 4
Saving model...
Saved!

Training completed.


In [51]:
evaluate(bio_bert_model_original, neoplasm_test_dataloader_bio, device, verbose=True, name='Neoplasm');
evaluate(bio_bert_model_original, glaucoma_test_dataloader_bio, device, verbose=True, name='Glaucoma');
evaluate(bio_bert_model_original, mixed_test_dataloader_bio, device, verbose=True, name='Mixed');




Neoplasm Test Results
	Average loss: 1027.3477
	F1-Score Micro: 0.9135
	F1-Score Macro: 0.8313
	Weighted F1-Score: 0.9124
	Precision: 0.9155
	Recall: 0.9135
	Accuracy: 0.9135
	F1 Claim: 0.7797
	F1 Evidence 0.9191
	Classification Report:
              precision    recall  f1-score   support

           0       0.62      0.76      0.68       178
           1       0.90      0.77      0.83       499
           2       0.72      0.85      0.78      4713
           3       0.98      0.87      0.92     17039
           4       0.92      0.96      0.94     22082

    accuracy                           0.91     44511
   macro avg       0.83      0.84      0.83     44511
weighted avg       0.92      0.91      0.91     44511






Glaucoma Test Results
	Average loss: 924.9071
	F1-Score Micro: 0.9279
	F1-Score Macro: 0.8651
	Weighted F1-Score: 0.9283
	Precision: 0.9330
	Recall: 0.9279
	Accuracy: 0.9279
	F1 Claim: 0.8506
	F1 Evidence 0.9210
	Classification Report:
              precision    recall  f1-score   support

           0       0.75      0.81      0.78       139
           1       0.92      0.74      0.82       461
           2       0.82      0.89      0.85      3871
           3       0.98      0.87      0.92     18912
           4       0.91      0.99      0.95     21421

    accuracy                           0.93     44804
   macro avg       0.88      0.86      0.87     44804
weighted avg       0.93      0.93      0.93     44804






Mixed Test Results
	Average loss: 965.9122
	F1-Score Micro: 0.9156
	F1-Score Macro: 0.8378
	Weighted F1-Score: 0.9155
	Precision: 0.9173
	Recall: 0.9156
	Accuracy: 0.9156
	F1 Claim: 0.7847
	F1 Evidence 0.9290
	Classification Report:
              precision    recall  f1-score   support

           0       0.69      0.74      0.72       172
           1       0.89      0.76      0.82       435
           2       0.78      0.80      0.79      4706
           3       0.97      0.90      0.93     18127
           4       0.91      0.96      0.93     21980

    accuracy                           0.92     45420
   macro avg       0.85      0.83      0.84     45420
weighted avg       0.92      0.92      0.92     45420



In [52]:
#BioBERT training custom dataset

bio_bert_model_custom, results_custom_bio = train(bio_bert_model_custom,  "bio-bert_custom" , custom_train_dataloader_bio, custom_val_dataloader_bio, learning_rate=LR, num_epochs=EPOCHS,class_weights = CLASS_WEIGHTS_NORMALIZED_CUSTOM_BIOBERT,seeds=seeds,save_model=True,models_folder=output_folder)


Training with seed 42...

Epoch 1/5




Training
	train_loss: 593.0507
	F1-Score Micro: 0.6951 | F1-Score Macro: 0.4124 | Weighted F1-Score: 0.732026 | Precision: 0.8004 | Recall: 0.6951 | Accuracy: 0.6951


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Validation
	val_loss: 997.2362
	F1-Score Micro: 0.8867 | F1-Score Macro: 0.5156 | Weighted F1-Score: 0.8786 | Precision: 0.8762 | Recall: 0.8867 | Accuracy: 0.8867
****************************************************************************************************

Epoch 2/5




Training
	train_loss: 473.1755
	F1-Score Micro: 0.7529 | F1-Score Macro: 0.4778 | Weighted F1-Score: 0.793638 | Precision: 0.8641 | Recall: 0.7529 | Accuracy: 0.7529




Validation
	val_loss: 818.9150
	F1-Score Micro: 0.9104 | F1-Score Macro: 0.7785 | Weighted F1-Score: 0.9115 | Precision: 0.9211 | Recall: 0.9104 | Accuracy: 0.9104
****************************************************************************************************

Epoch 3/5




Training
	train_loss: 418.7474
	F1-Score Micro: 0.7732 | F1-Score Macro: 0.5198 | Weighted F1-Score: 0.815452 | Precision: 0.8890 | Recall: 0.7732 | Accuracy: 0.7732




Validation
	val_loss: 668.7815
	F1-Score Micro: 0.9280 | F1-Score Macro: 0.8570 | Weighted F1-Score: 0.9276 | Precision: 0.9289 | Recall: 0.9280 | Accuracy: 0.9280
****************************************************************************************************

Epoch 4/5




Training
	train_loss: 374.4778
	F1-Score Micro: 0.7938 | F1-Score Macro: 0.5396 | Weighted F1-Score: 0.836513 | Precision: 0.9098 | Recall: 0.7938 | Accuracy: 0.7938




Validation
	val_loss: 803.3311
	F1-Score Micro: 0.9174 | F1-Score Macro: 0.8428 | Weighted F1-Score: 0.9181 | Precision: 0.9210 | Recall: 0.9174 | Accuracy: 0.9174
****************************************************************************************************

Epoch 5/5




Training
	train_loss: 348.8136
	F1-Score Micro: 0.8337 | F1-Score Macro: 0.6234 | Weighted F1-Score: 0.870096 | Precision: 0.9263 | Recall: 0.8337 | Accuracy: 0.8337




Validation
	val_loss: 982.2916
	F1-Score Micro: 0.9136 | F1-Score Macro: 0.8336 | Weighted F1-Score: 0.9146 | Precision: 0.9202 | Recall: 0.9136 | Accuracy: 0.9136
****************************************************************************************************

Best Val F1-Score: 0.8570 at epoch 3
Saving model...
Saved!

Training completed.


In [53]:
evaluate(bio_bert_model_custom, custom_test_dataloader_bio, device, verbose=True);




Test Results
	Average loss: 960.0096
	F1-Score Micro: 0.9127
	F1-Score Macro: 0.8283
	Weighted F1-Score: 0.9124
	Precision: 0.9137
	Recall: 0.9127
	Accuracy: 0.9127
	F1 Claim: 0.7285
	F1 Evidence 0.9339
	Classification Report:
              precision    recall  f1-score   support

           0       0.69      0.71      0.70       238
           1       0.88      0.79      0.84       586
           2       0.72      0.74      0.73      6451
           3       0.97      0.90      0.94     23253
           4       0.91      0.96      0.94     29724

    accuracy                           0.91     60252
   macro avg       0.84      0.82      0.83     60252
weighted avg       0.91      0.91      0.91     60252



#### SciBERT - Training

In [54]:
#SciBERT training original dataset
output_folder = "/content/drive/MyDrive/data/sci_bert_models"

sci_bert_model_original, results = train(scibert_model_original,  "sci-bert_original" , train_dataloader_sci, val_dataloader_sci, learning_rate=LR, num_epochs=EPOCHS,class_weights = CLASS_WEIGHTS_NORMALIZED_SCIBERT,
                                seeds=seeds,save_model=True,models_folder=output_folder)


Training with seed 42...

Epoch 1/5




Training
	train_loss: 489.4452
	F1-Score Micro: 0.7250 | F1-Score Macro: 0.4169 | Weighted F1-Score: 0.746850 | Precision: 0.7854 | Recall: 0.7250 | Accuracy: 0.7250




Validation
	val_loss: 628.7805
	F1-Score Micro: 0.9212 | F1-Score Macro: 0.6688 | Weighted F1-Score: 0.9190 | Precision: 0.9212 | Recall: 0.9212 | Accuracy: 0.9212
****************************************************************************************************

Epoch 2/5




Training
	train_loss: 380.9554
	F1-Score Micro: 0.7865 | F1-Score Macro: 0.5338 | Weighted F1-Score: 0.808755 | Precision: 0.8567 | Recall: 0.7865 | Accuracy: 0.7865




Validation
	val_loss: 578.4485
	F1-Score Micro: 0.9193 | F1-Score Macro: 0.8136 | Weighted F1-Score: 0.9195 | Precision: 0.9201 | Recall: 0.9193 | Accuracy: 0.9193
****************************************************************************************************

Epoch 3/5




Training
	train_loss: 336.1370
	F1-Score Micro: 0.8294 | F1-Score Macro: 0.5985 | Weighted F1-Score: 0.848030 | Precision: 0.8918 | Recall: 0.8294 | Accuracy: 0.8294




Validation
	val_loss: 516.8951
	F1-Score Micro: 0.9192 | F1-Score Macro: 0.8304 | Weighted F1-Score: 0.9206 | Precision: 0.9260 | Recall: 0.9192 | Accuracy: 0.9192
****************************************************************************************************

Epoch 4/5




Training
	train_loss: 319.5597
	F1-Score Micro: 0.8598 | F1-Score Macro: 0.6561 | Weighted F1-Score: 0.872633 | Precision: 0.9077 | Recall: 0.8598 | Accuracy: 0.8598




Validation
	val_loss: 557.6377
	F1-Score Micro: 0.9206 | F1-Score Macro: 0.8439 | Weighted F1-Score: 0.9215 | Precision: 0.9245 | Recall: 0.9206 | Accuracy: 0.9206
****************************************************************************************************

Epoch 5/5




Training
	train_loss: 298.1745
	F1-Score Micro: 0.8746 | F1-Score Macro: 0.6827 | Weighted F1-Score: 0.885154 | Precision: 0.9173 | Recall: 0.8746 | Accuracy: 0.8746




Validation
	val_loss: 662.4119
	F1-Score Micro: 0.9157 | F1-Score Macro: 0.8387 | Weighted F1-Score: 0.9167 | Precision: 0.9198 | Recall: 0.9157 | Accuracy: 0.9157
****************************************************************************************************

Best Val F1-Score: 0.8439 at epoch 4
Saving model...
Saved!

Training completed.


In [55]:
evaluate(sci_bert_model_original, neoplasm_test_dataloader_sci, device, verbose=True, name='Neoplasm');
evaluate(sci_bert_model_original, glaucoma_test_dataloader_sci, device, verbose=True, name='Glaucoma');
evaluate(sci_bert_model_original, mixed_test_dataloader_sci, device, verbose=True, name='Mixed');




Neoplasm Test Results
	Average loss: 742.2750
	F1-Score Micro: 0.9101
	F1-Score Macro: 0.8374
	Weighted F1-Score: 0.9101
	Precision: 0.9109
	Recall: 0.9101
	Accuracy: 0.9101
	F1 Claim: 0.7946
	F1 Evidence 0.9156
	Classification Report:
              precision    recall  f1-score   support

           0       0.70      0.71      0.70       229
           1       0.87      0.79      0.83       473
           2       0.79      0.81      0.80      4890
           3       0.94      0.90      0.92     14970
           4       0.92      0.95      0.94     19138

    accuracy                           0.91     39700
   macro avg       0.84      0.83      0.84     39700
weighted avg       0.91      0.91      0.91     39700






Glaucoma Test Results
	Average loss: 736.6398
	F1-Score Micro: 0.9215
	F1-Score Macro: 0.8591
	Weighted F1-Score: 0.9222
	Precision: 0.9274
	Recall: 0.9215
	Accuracy: 0.9215
	F1 Claim: 0.8638
	F1 Evidence 0.9167
	Classification Report:
              precision    recall  f1-score   support

           0       0.80      0.75      0.77       182
           1       0.91      0.71      0.80       500
           2       0.88      0.85      0.87      4026
           3       0.97      0.88      0.92     17255
           4       0.89      0.99      0.94     17855

    accuracy                           0.92     39818
   macro avg       0.89      0.84      0.86     39818
weighted avg       0.92      0.92      0.92     39818






Mixed Test Results
	Average loss: 756.5417
	F1-Score Micro: 0.9068
	F1-Score Macro: 0.8323
	Weighted F1-Score: 0.9080
	Precision: 0.9108
	Recall: 0.9068
	Accuracy: 0.9068
	F1 Claim: 0.7799
	F1 Evidence 0.9213
	Classification Report:
              precision    recall  f1-score   support

           0       0.75      0.68      0.72       226
           1       0.86      0.77      0.81       429
           2       0.83      0.74      0.78      5099
           3       0.94      0.91      0.92     15770
           4       0.90      0.95      0.93     18522

    accuracy                           0.91     40046
   macro avg       0.86      0.81      0.83     40046
weighted avg       0.91      0.91      0.91     40046



In [56]:
#SciBERT training custom dataset

scibert_model_custom, results_custom = train(scibert_model_custom,  "sci-bert_custom" , custom_train_dataloader_sci, custom_val_dataloader_sci, learning_rate=LR, num_epochs=EPOCHS,class_weights = CLASS_WEIGHTS_NORMALIZED_CUSTOM_SCIBERT,seeds=seeds,save_model=True,models_folder=output_folder)


Training with seed 42...

Epoch 1/5




Training
	train_loss: 518.1846
	F1-Score Micro: 0.5868 | F1-Score Macro: 0.3825 | Weighted F1-Score: 0.659046 | Precision: 0.7594 | Recall: 0.5868 | Accuracy: 0.5868




Validation
	val_loss: 848.0863
	F1-Score Micro: 0.8779 | F1-Score Macro: 0.6415 | Weighted F1-Score: 0.8738 | Precision: 0.8805 | Recall: 0.8779 | Accuracy: 0.8779
****************************************************************************************************

Epoch 2/5




Training
	train_loss: 409.4875
	F1-Score Micro: 0.6363 | F1-Score Macro: 0.4440 | Weighted F1-Score: 0.713850 | Precision: 0.8328 | Recall: 0.6363 | Accuracy: 0.6363




Validation
	val_loss: 662.9297
	F1-Score Micro: 0.9085 | F1-Score Macro: 0.8069 | Weighted F1-Score: 0.9059 | Precision: 0.9105 | Recall: 0.9085 | Accuracy: 0.9085
****************************************************************************************************

Epoch 3/5




Training
	train_loss: 371.9703
	F1-Score Micro: 0.6562 | F1-Score Macro: 0.4680 | Weighted F1-Score: 0.735557 | Precision: 0.8597 | Recall: 0.6562 | Accuracy: 0.6562




Validation
	val_loss: 586.5672
	F1-Score Micro: 0.9169 | F1-Score Macro: 0.8284 | Weighted F1-Score: 0.9156 | Precision: 0.9181 | Recall: 0.9169 | Accuracy: 0.9169
****************************************************************************************************

Epoch 4/5




Training
	train_loss: 333.8136
	F1-Score Micro: 0.6926 | F1-Score Macro: 0.5099 | Weighted F1-Score: 0.770630 | Precision: 0.8923 | Recall: 0.6926 | Accuracy: 0.6926




Validation
	val_loss: 723.5990
	F1-Score Micro: 0.8933 | F1-Score Macro: 0.8194 | Weighted F1-Score: 0.8945 | Precision: 0.8965 | Recall: 0.8933 | Accuracy: 0.8933
****************************************************************************************************

Epoch 5/5




Training
	train_loss: 321.8989
	F1-Score Micro: 0.7016 | F1-Score Macro: 0.5219 | Weighted F1-Score: 0.780164 | Precision: 0.9042 | Recall: 0.7016 | Accuracy: 0.7016




Validation
	val_loss: 710.5429
	F1-Score Micro: 0.9113 | F1-Score Macro: 0.8420 | Weighted F1-Score: 0.9127 | Precision: 0.9199 | Recall: 0.9113 | Accuracy: 0.9113
****************************************************************************************************

Best Val F1-Score: 0.8420 at epoch 5
Saving model...
Saved!

Training completed.


In [57]:
evaluate(scibert_model_custom, custom_test_dataloader_sci, device, verbose=True);




Test Results
	Average loss: 692.0840
	F1-Score Micro: 0.9155
	F1-Score Macro: 0.8416
	Weighted F1-Score: 0.9165
	Precision: 0.9201
	Recall: 0.9155
	Accuracy: 0.9155
	F1 Claim: 0.7819
	F1 Evidence 0.9341
	Classification Report:
              precision    recall  f1-score   support

           0       0.76      0.69      0.72       307
           1       0.88      0.78      0.83       617
           2       0.82      0.75      0.78      6945
           3       0.97      0.91      0.94     21374
           4       0.90      0.97      0.94     24854

    accuracy                           0.92     54097
   macro avg       0.87      0.82      0.84     54097
weighted avg       0.92      0.92      0.91     54097



## Error Analysis

In [76]:
kgward = {'ac_detector':{
    'model_card': MODEL_CARD_SciBERT,
    'num_labels': NUM_LABELS,
    'gru_hidden_size': 128,
    'gru_num_layers': 1,
    'dropout_prob': 0.43}
}

sample_idx = 30
predicted_labels, predicted_tags = predict(scibert_model_custom, custom_test_df_scibert, custom_test_dataset_sci,sample_idx,idx_to_tag)


Example of an annotated sentence:

Raw text:
['proinflammatory', 'cytokines,', 'especially', 'tumour', 'necrosis', 'factor', 'alpha', '(tnf-alpha),', 'play', 'a', 'prominent', 'role', 'in', 'the', 'pathogenesis', 'of', 'cancer', 'cachexia.', 'thalidomide,', 'which', 'is', 'an', 'inhibitor', 'of', 'tnf-alpha', 'synthesis,', 'may', 'represent', 'a', 'novel', 'and', 'rational', 'approach', 'to', 'the', 'treatment', 'of', 'cancer', 'cachexia.', 'to', 'assess', 'the', 'safety', 'and', 'efficacy', 'of', 'thalidomide', 'in', 'attenuating', 'weight', 'loss', 'in', 'patients', 'with', 'cachexia', 'secondary', 'to', 'advanced', 'pancreatic', 'cancer.', 'fifty', 'patients', 'with', 'advanced', 'pancreatic', 'cancer', 'who', 'had', 'lost', 'at', 'least', '10%', 'of', 'their', 'body', 'weight', 'were', 'randomised', 'to', 'receive', 'thalidomide', '200', 'mg', 'daily', 'or', 'placebo', 'for', '24', 'weeks', 'in', 'a', 'single', 'centre,', 'double', 'blind,', 'randomised', 'controlled', 'trial.', '


Tokenized text with predicted tags:


The best performing model, SciBERT, was selected for the error analysis. As we can see the detection of the argumentative components is relatively well done, with the model being able to identify the majority of the claims and premises. However, there are still some errors, with the model sometimes failing to identify the correct component or assigning the wrong tag to a token. The most common errors are related to the detection of claims, this is due also to the less frequency in the abstracts. The detection of premises is generally more accurate, with the model being able to correctly identify the majority of the premises.