# BERT Meets Cranfield - Multilabel Training Approach
The dataset has been labeled with multiple labels indicating how relevant a certain document is. The Cranfield description explains it as follows:
1.  References which are a complete answer to the question.

2.  References of a high degree of relevance, the lack of which either would have made the research impracticable or would have resulted in a considerable amount of extra work.

3.  References which were useful, either as general background to the work or as suggesting methods of tackling certain aspects of the work.

4.  References of minimum interest, for example, those that have been included from an historical viewpoint.

5.  References of no interest. The following notebooks implements a functions that research wether and what method could make beneficial use of this relevance labeling.  

*NOTE: Not all changes are in the notebook, a number of changes can be found in the `utils.py`*

In [1]:
%cd /content/drive/MyDrive/COMPUTING SCIENCE/THESIS_PROJECT/BERT-BM25-Thesis-Project/bert-meets-cranfield-multilabel/Code
# %cd /home/jupyter/BERT-BM25-Thesis-Project/bert-meets-cranfield-multilabel/Code

/content/drive/MyDrive/COMPUTING SCIENCE/THESIS_PROJECT/BERT-BM25-Thesis-Project/bert-meets-cranfield-multilabel/Code


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
!pip3 install -r ../requirements.txt



## Import

In [4]:
import utils
import data_utils
from operator import itemgetter
import os
import numpy as np

import torch
import importlib
# from transformers import BertForSequenceClassification, BertTokenizer, BertForMaskedLM, BertForNextSentencePrediction
from transformers import BertForSequenceClassification

import timeit

### Import Refresh
When a supporting py-file (such as utils.py) is changed, this code will have the lib reloaded while not reloading the entire notebook.

In [5]:
# call after making any changes in utils.py
importlib.reload(utils) 
importlib.reload(data_utils)

<module 'data_utils' from '/content/drive/My Drive/COMPUTING SCIENCE/THESIS_PROJECT/BERT-BM25-Thesis-Project/bert-meets-cranfield-multilabel/Code/data_utils.py'>

## Set hyper-paramters and test settings

In [6]:
# ========================================
#               Hyper-Parameters
# ========================================
SEED = 76
MODE = 'Re-ranker'
MODEL_TYPE = 'bert-base-uncased'
LEARNING_RATE = 2e-5
MAX_LENGTH = 128
BATCH_SIZE = 32
EPOCHS = 1
TOP_BM25 = 100
MAP_CUT = 100
NDCG_CUT = 20
if MODE == 'Full-ranker':
    TEST_BATCH_SIZE = 1400
else:
    TEST_BATCH_SIZE = 100

# Set the seed value all over the place to make this reproducible.
utils.initialize_random_generators(SEED)

BM25_ENRICH = 'default' # or 'add' or 'swap' (default=no enrichment of BM25 results)
MULTI_LABEL = True
PER_LABEL_TESTING = False # if True, arg_max must be False
'''
  PER_LABEL_TESTING calculates the performance per predicted label. Contratry to
  the binary case, each relevance levels 0-5 has a prediction. Method can be
  seen as an alternative for the arg-max method. This flag is later implemented,
  therefore it will mess up the adminstration for the calculation of final NDCG.
  To circumvent this, the NDCG found for each fold, for each label has to be averaged.
'''
ARG_MAX_SORTING = True # if True, PER_LABEL_TESTING must be False
CUSTOM_MODEL = 'weighted-BCEWIthLogitsLoss' # default is None, utils.py explains

LOAD_CUSTOM_TRAINED_MODEL = False #

In [7]:
models_dir = "/home/jupyter/BERT-BM25-Thesis-Project/Models/" #@param {type:"string"}
custom_model_name = "BERT_Cranfield_MLM_model-128-16-5e-05-2.bin" #@param {type:"string"}

custom_model_path = models_dir + custom_model_name 

### Enriching function for BM25 results

In [8]:
def get_bm25_plus_other_rel(bm25_tn, labels, queries):
      bm25_top_n_rel_padded = [0]*len(queries) # a bm25_top_n list padded with the remaining relevant documents
      bm25_top_n_swap = [0]*len(queries) 
    
      for qi in range(len(queries)):
        # get the list of relelvant documents
        lbi = np.where(labels[qi] == 1)
        # note this numbering is only compatible with the labels list


        # get the list of bm25_top_n
        np_bm25_qi_docs = np.array(bm25_top_n[qi]) 

        # evaluate what relevant documents should be added
        pad_rel = np.setdiff1d(lbi, np_bm25_qi_docs)
        # if len(pad_rel) > 0:
        pad_rel = tuple(pad_rel)
        bm25_top_n_rel_padded[qi] = bm25_top_n[qi] + pad_rel
        # create a list with least relevant items swapped for unfound relevant
        for i in range(len(pad_rel)):
          # CHECK
          # are we to swap a relevant document?
          current_doc = np_bm25_qi_docs[-(i+1)] 
          
          if np.count_nonzero(current_doc == lbi) > 0:
            print('Relevant doc overwritten!')
          # CONTINUE  
          np_bm25_qi_docs[-(i+1)] = pad_rel[i]
          
        bm25_top_n_swap[qi] = np_bm25_qi_docs
      return bm25_top_n_rel_padded, bm25_top_n_swap

### Function for loading custom model
Load in fact an encoder, that is trained with a specific specification

In [9]:
def load_specific_encoder(model_path):
  '''
    function to load saved encoder paramters

    use this function to start every fold with a fresh model
  '''
  model = BertForSequenceClassification.from_pretrained(
        MODEL_TYPE,
        num_labels=2,
        output_attentions=False,
        output_hidden_states=False,
    )
  model.cuda
  print('LOAD : ', model_path )

  # =======================
  # NOTE WHAT MODEL IS USED
  model.load_state_dict(torch.load(model_path), strict=False)
  # now you get a warning that extra training is required

  if DO_FREEZING:
    print('FREEZING: set requires_grad to False')
    # freeze the encoder parameters (credits thomwolf of Huggingface)
    # for param in model.bert.encoder.parameters():
    #   param.requires_grad = False

    # other method
    model.bert.encoder.requires_grad_(False)
  return model

## Train and Test

In [10]:
# if __name__ == "__main__":
def train_test():
    print("# ========================================")
    print("#               Hyper-Parameters")
    print(MODE)
    print(MODEL_TYPE)
    print(LEARNING_RATE)
    print(MAX_LENGTH)
    print(BATCH_SIZE)
    print(EPOCHS)
    print("# ========================================")
    print("#               Experiment-Settings")
    print('BM25_ENRICHMENT:   ', BM25_ENRICH)
    print('MULTI_LABEL:       ', MULTI_LABEL)
    print('ARGMAX-SORTING:    ', ARG_MAX_SORTING)
    print('PER_LABEL_TESTING: ', PER_LABEL_TESTING)
    print('CUSTOM_MODEL:      ', CUSTOM_MODEL)


    print("# ========================================")
    print("#               Other")
    print(torch.cuda.get_device_name())
    print("# ========================================")
    
    start = timeit.default_timer()
    
    device = utils.get_gpu_device()
    if not os.path.exists('../Output_Folder'):
        os.makedirs('../Output_Folder')

    queries = data_utils.get_queries('../Data/cran/cran.qry')
    corpus = data_utils.get_corpus('../Data/cran/cran.all.1400')
    rel_fed = data_utils.get_judgments('../Data/cran/cranqrel')

    labels = utils.get_binary_labels(rel_fed, multilabel=MULTI_LABEL)
    tokenized_corpus = [doc.split(" ") for doc in corpus]
    tokenized_queries = [query.split(" ") for query in queries]

    bm25, bm25_top_n = utils.get_bm25_top_results(tokenized_corpus, tokenized_queries, TOP_BM25)

    # no matter what BM25_ENRICH is, this line is needed to get `temp_feedback` for the test set
    padded_all, attention_mask_all, token_type_ids_all, temp_feedback = utils.bert_tokenizer(MODE, bm25_top_n, corpus,
                                                                                             labels, queries,
                                                                                             MAX_LENGTH, MODEL_TYPE)
    if BM25_ENRICH == 'swap':
        bm25_top_n_ext, bm25_top_n_swap = get_bm25_plus_other_rel(bm25_top_n, labels, queries)
        padded_all_swap, attention_mask_all_swap, token_type_ids_all_swap, temp_feedback_swap = utils.bert_tokenizer(MODE, bm25_top_n_swap, corpus,
                                                                                                                     labels, queries,
                                                                                                                     MAX_LENGTH, MODEL_TYPE)
    elif BM25_ENRICH == 'add':
        bm25_top_n_add, bm25_top_n_swap = get_bm25_plus_other_rel(bm25_top_n, labels, queries)
        padded_all_add, attention_mask_all_add, token_type_ids_all_add, temp_feedback_add = utils.bert_tokenizer(MODE, bm25_top_n_add, corpus,
                                                                                                                 labels, queries,
                                                                                                                 MAX_LENGTH, MODEL_TYPE)

    # ========================================
    #               Folds
    # ========================================
    mrr_bm25_list, map_bm25_list, ndcg_bm25_list = [], [], []
    mrr_bert_list, map_bert_list, ndcg_bert_list = [], [], []
    mrr_bm25, map_bm25, ndcg_bm25 = 0, 0, 0
    mrr_bert, map_bert, ndcg_bert = 0, 0, 0

    for fold_number in range(1, 6):
        print('======== Fold {:} / {:} ========'.format(fold_number, 5))
        train_index, test_index = data_utils.load_fold(fold_number)

        padded, attention_mask, token_type_ids = [], [], []
        if MODE == 'Re-ranker':
            # no matter BM25_ENRICH-mode, next line required for test set construction
            padded, attention_mask, token_type_ids = padded_all, attention_mask_all, token_type_ids_all
            if BM25_ENRICH == 'swap':
                padded_swap, attention_mask_swap, token_type_ids_swap = padded_all_swap, attention_mask_all_swap, token_type_ids_all_swap
            elif BM25_ENRICH == 'add':
                padded_add, attention_mask_add, token_type_ids_add = padded_all_add, attention_mask_all_add, token_type_ids_all_add
            
        else:
            temp_feedback = []
            for query_num in range(0, len(bm25_top_n)):
                if query_num in test_index:
                    doc_nums = range(0, 1400)
                else:
                    doc_nums = bm25_top_n[query_num]
                padded.append(list(itemgetter(*doc_nums)(padded_all[query_num])))
                attention_mask.append(list(itemgetter(*doc_nums)(attention_mask_all[query_num])))
                token_type_ids.append(list(itemgetter(*doc_nums)(token_type_ids_all[query_num])))
                temp_feedback.append(list(itemgetter(*doc_nums)(labels[query_num])))

        # Enricht the training set (or keep default)
        if BM25_ENRICH == 'default':
            train_dataset = data_utils.get_tensor_dataset(train_index, padded, attention_mask, token_type_ids,
                                                          temp_feedback)
        elif BM25_ENRICH == 'swap':
            train_dataset = data_utils.get_tensor_dataset(train_index, padded_swap, attention_mask_swap, token_type_ids_swap,
                                                    temp_feedback_swap)
        elif BM25_ENRICH == 'add':
            train_dataset = data_utils.get_tensor_dataset(train_index, padded_add, attention_mask_add, token_type_ids_add,
                                                    temp_feedback_add)

        test_dataset = data_utils.get_tensor_dataset(test_index, padded, attention_mask, token_type_ids, temp_feedback)

        mrr_bm25, map_bm25, ndcg_bm25, mrr_bm25_list, map_bm25_list, ndcg_bm25_list = utils.get_bm25_results(
            mrr_bm25_list, map_bm25_list, ndcg_bm25_list, test_index, tokenized_queries, bm25, mrr_bm25, map_bm25,
            ndcg_bm25, rel_fed, fold_number, MAP_CUT, NDCG_CUT)

          
        # Option to load a custom trained model (used in transfer learning)
        if LOAD_CUSTOM_TRAINED_MODEL:
          model = load_specific_encoder(custom_model_path)
        else:
          model = None
          # with None the model_preparation loads the 'MODEL_TYPE' model
        if MULTI_LABEL:
          num_labels = 5
        else:
          num_labels = 2
        train_dataloader, test_dataloader, model, optimizer, scheduler = utils.model_preparation(MODEL_TYPE, train_dataset,
                                                                                                 test_dataset,
                                                                                                 BATCH_SIZE, TEST_BATCH_SIZE,
                                                                                                 LEARNING_RATE, EPOCHS, model=model,
                                                                                                 num_labels=num_labels,
                                                                                                 custom_model=CUSTOM_MODEL)


        # ========================================
        #               Training Loop
        # ========================================
        epochs_train_loss, epochs_val_loss = [], []
        for epoch_i in range(0, EPOCHS):
            # ========================================
            #               Training
            # ========================================
            print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, EPOCHS))
            print('Training...')
            model, optimizer, scheduler = utils.training(model, train_dataloader, device, optimizer, scheduler)
        # ========================================
        #               Testing
        # ========================================
        print('Testing...')
        mrr_bert, map_bert, ndcg_bert, mrr_bert_list, map_bert_list, ndcg_bert_list = utils.testing(MODE, model,
                                                                                                    test_dataloader,
                                                                                                    device, test_index,
                                                                                                    bm25_top_n,
                                                                                                    mrr_bert_list,
                                                                                                    map_bert_list,
                                                                                                    ndcg_bert_list,
                                                                                                    mrr_bert, map_bert,
                                                                                                    ndcg_bert, rel_fed,
                                                                                                    fold_number,
                                                                                                    MAP_CUT, NDCG_CUT,
                                                                                                    multilabel=MULTI_LABEL,
                                                                                                    argmax_sorting=ARG_MAX_SORTING,
                                                                                                    per_label_testing=PER_LABEL_TESTING)
    print("  BM25 MRR:  " + "{:.4f}".format(mrr_bm25 / 5))
    print("  BM25 MAP:  " + "{:.4f}".format(map_bm25 / 5))
    print("  BM25 NDCG: " + "{:.4f}".format(ndcg_bm25 / 5))

    print("  BERT MRR:  " + "{:.4f}".format(mrr_bert / 5))
    print("  BERT MAP:  " + "{:.4f}".format(map_bert / 5))
    print("  BERT NDCG: " + "{:.4f}".format(ndcg_bert / 5))

    utils.t_test(mrr_bm25_list, mrr_bert_list, 'MRR')
    utils.t_test(map_bm25_list, map_bert_list, 'MAP')
    utils.t_test(ndcg_bm25_list, ndcg_bert_list, 'NDCG')
    
    stop = timeit.default_timer()
    wall_time = (stop - start) / 60 

    print('Time: ', wall_time, ' min') 

    # utils.results_to_csv('./mrr_bm25_list.csv', mrr_bm25_list)
    # utils.results_to_csv('./mrr_bert_list.csv', mrr_bert_list)
    # utils.results_to_csv('./map_bm25_list.csv', map_bm25_list)
    # utils.results_to_csv('./map_bert_list.csv', map_bert_list)
    # utils.results_to_csv('./ndcg_bm25_list.csv', ndcg_bm25_list)
    # utils.results_to_csv('./ndcg_bert_list.csv', ndcg_bert_list)

In [11]:
train_test()

#               Hyper-Parameters
Re-ranker
bert-base-uncased
2e-05
128
32
1
#               Experiment-Settings
BM25_ENRICHMENT:    default
MULTI_LABEL:        True
ARGMAX-SORTING:     True
PER_LABEL_TESTING:  False
CUSTOM_MODEL:       weighted-BCEWIthLogitsLoss
#               Other
Tesla K80
GPU Type: Tesla K80


  "Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 "


MRR:  0.7837
MAP:  0.3493
NDCG: 0.5011
45


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
custom model, with loss type:  weighted-BCEWIthLogitsLoss
cu

KeyboardInterrupt: ignored