# BERT Meets Cranfield - Multilabel Training Approach
The dataset has been labeled with multiple labels indicating how relevant a certain document is. The Cranfield description explains it as follows:
1.  References which are a complete answer to the question.

2.  References of a high degree of relevance, the lack of which either would have made the research impracticable or would have resulted in a considerable amount of extra work.

3.  References which were useful, either as general background to the work or as suggesting methods of tackling certain aspects of the work.

4.  References of minimum interest, for example, those that have been included from an historical viewpoint.

5.  References of no interest. The following notebooks implements a functions that research wether and what method could make beneficial use of this relevance labeling.  

*NOTE: Not all changes are in the notebook, a number of changes can be found in the `utils.py`*

In [1]:
# %cd /content/drive/MyDrive/COMPUTING SCIENCE/THESIS_PROJECT/BERT-BM25-Thesis-Project/bert-meets-cranfield-multilabel/Code
%cd /home/jupyter/BERT-BM25-Thesis-Project/bert-meets-cranfield-multilabel/Code

/home/jupyter/BERT-BM25-Thesis-Project/bert-meets-cranfield-multilabel/Code


In [2]:
# from google.colab import drive
# drive.mount('/content/drive')

In [3]:
!pip3 install -r ../requirements.txt



## Import

In [4]:
import utils
import data_utils
from operator import itemgetter
import os
import numpy as np

import torch
import importlib
# from transformers import BertForSequenceClassification, BertTokenizer, BertForMaskedLM, BertForNextSentencePrediction
from transformers import BertForSequenceClassification

import timeit

### Import Refresh
When a supporting py-file (such as utils.py) is changed, this code will have the lib reloaded while not reloading the entire notebook.

In [5]:
# call after making any changes in utils.py
importlib.reload(utils) 
importlib.reload(data_utils)

<module 'data_utils' from '/home/jupyter/BERT-BM25-Thesis-Project/bert-meets-cranfield-multilabel/Code/data_utils.py'>

## Set hyper-paramters and test settings

In [6]:
# ========================================
#               Hyper-Parameters
# ========================================
SEED = 76
MODE = 'Re-ranker'
MODEL_TYPE = 'bert-base-uncased'
LEARNING_RATE = 2e-5
MAX_LENGTH = 128
BATCH_SIZE = 32
EPOCHS = 1
TOP_BM25 = 100
MAP_CUT = 100
NDCG_CUT = 20
if MODE == 'Full-ranker':
    TEST_BATCH_SIZE = 1400
else:
    TEST_BATCH_SIZE = 100

# Set the seed value all over the place to make this reproducible.
utils.initialize_random_generators(SEED)

BM25_ENRICH = 'default' # or 'add' or 'swap' (default=no enrichment of BM25 results)
MULTI_LABEL = True
PER_LABEL_TESTING = False # if True, arg_max must be False
'''
  PER_LABEL_TESTING calculates the performance per predicted label. Contratry to
  the binary case, each relevance levels 0-5 has a prediction. Method can be
  seen as an alternative for the arg-max method. This flag is later implemented,
  therefore it will mess up the adminstration for the calculation of final NDCG.
  To circumvent this, the NDCG found for each fold, for each label has to be averaged.
'''
ARG_MAX_SORTING = True # if True, PER_LABEL_TESTING must be False
CUSTOM_MODEL = 'weighted-BCEWIthLogitsLoss' # default is None, utils.py explains

LOAD_CUSTOM_TRAINED_MODEL = False #

In [7]:
models_dir = "/home/jupyter/BERT-BM25-Thesis-Project/Models/" #@param {type:"string"}
custom_model_name = "BERT_Cranfield_MLM_model-128-16-5e-05-2.bin" #@param {type:"string"}

custom_model_path = models_dir + custom_model_name 

### Enriching function for BM25 results

In [8]:
def get_bm25_plus_other_rel(bm25_tn, labels, queries):
      bm25_top_n_rel_padded = [0]*len(queries) # a bm25_top_n list padded with the remaining relevant documents
      bm25_top_n_swap = [0]*len(queries) 
    
      for qi in range(len(queries)):
        # get the list of relelvant documents
        lbi = np.where(labels[qi] == 1)
        # note this numbering is only compatible with the labels list


        # get the list of bm25_top_n
        np_bm25_qi_docs = np.array(bm25_top_n[qi]) 

        # evaluate what relevant documents should be added
        pad_rel = np.setdiff1d(lbi, np_bm25_qi_docs)
        # if len(pad_rel) > 0:
        pad_rel = tuple(pad_rel)
        bm25_top_n_rel_padded[qi] = bm25_top_n[qi] + pad_rel
        # create a list with least relevant items swapped for unfound relevant
        for i in range(len(pad_rel)):
          # CHECK
          # are we to swap a relevant document?
          current_doc = np_bm25_qi_docs[-(i+1)] 
          
          if np.count_nonzero(current_doc == lbi) > 0:
            print('Relevant doc overwritten!')
          # CONTINUE  
          np_bm25_qi_docs[-(i+1)] = pad_rel[i]
          
        bm25_top_n_swap[qi] = np_bm25_qi_docs
      return bm25_top_n_rel_padded, bm25_top_n_swap

### Function for loading custom model
Load in fact an encoder, that is trained with a specific specification

In [9]:
def load_specific_encoder(model_path):
  '''
    function to load saved encoder paramters

    use this function to start every fold with a fresh model
  '''
  model = BertForSequenceClassification.from_pretrained(
        MODEL_TYPE,
        num_labels=2,
        output_attentions=False,
        output_hidden_states=False,
    )
  model.cuda
  print('LOAD : ', model_path )

  # =======================
  # NOTE WHAT MODEL IS USED
  model.load_state_dict(torch.load(model_path), strict=False)
  # now you get a warning that extra training is required

  if DO_FREEZING:
    print('FREEZING: set requires_grad to False')
    # freeze the encoder parameters (credits thomwolf of Huggingface)
    # for param in model.bert.encoder.parameters():
    #   param.requires_grad = False

    # other method
    model.bert.encoder.requires_grad_(False)
  return model

## Train and Test

In [10]:
# if __name__ == "__main__":
def train_test():
    print("# ========================================")
    print("#               Hyper-Parameters")
    print(MODE)
    print(MODEL_TYPE)
    print(LEARNING_RATE)
    print(MAX_LENGTH)
    print(BATCH_SIZE)
    print(EPOCHS)
    print("# ========================================")
    print("#               Experiment-Settings")
    print('BM25_ENRICHMENT:   ', BM25_ENRICH)
    print('MULTI_LABEL:       ', MULTI_LABEL)
    print('ARGMAX-SORTING:    ', ARG_MAX_SORTING)
    print('PER_LABEL_TESTING: ', PER_LABEL_TESTING)
    print('CUSTOM_MODEL:      ', CUSTOM_MODEL)


    print("# ========================================")
    print("#               Other")
    print(torch.cuda.get_device_name())
    print("# ========================================")
    
    start = timeit.default_timer()
    
    device = utils.get_gpu_device()
    if not os.path.exists('../Output_Folder'):
        os.makedirs('../Output_Folder')

    queries = data_utils.get_queries('../Data/cran/cran.qry')
    corpus = data_utils.get_corpus('../Data/cran/cran.all.1400')
    rel_fed = data_utils.get_judgments('../Data/cran/cranqrel')

    labels = utils.get_binary_labels(rel_fed, multilabel=MULTI_LABEL)
    tokenized_corpus = [doc.split(" ") for doc in corpus]
    tokenized_queries = [query.split(" ") for query in queries]

    bm25, bm25_top_n = utils.get_bm25_top_results(tokenized_corpus, tokenized_queries, TOP_BM25)

    # no matter what BM25_ENRICH is, this line is needed to get `temp_feedback` for the test set
    padded_all, attention_mask_all, token_type_ids_all, temp_feedback = utils.bert_tokenizer(MODE, bm25_top_n, corpus,
                                                                                             labels, queries,
                                                                                             MAX_LENGTH, MODEL_TYPE)
    if BM25_ENRICH == 'swap':
        bm25_top_n_ext, bm25_top_n_swap = get_bm25_plus_other_rel(bm25_top_n, labels, queries)
        padded_all_swap, attention_mask_all_swap, token_type_ids_all_swap, temp_feedback_swap = utils.bert_tokenizer(MODE, bm25_top_n_swap, corpus,
                                                                                                                     labels, queries,
                                                                                                                     MAX_LENGTH, MODEL_TYPE)
    elif BM25_ENRICH == 'add':
        bm25_top_n_add, bm25_top_n_swap = get_bm25_plus_other_rel(bm25_top_n, labels, queries)
        padded_all_add, attention_mask_all_add, token_type_ids_all_add, temp_feedback_add = utils.bert_tokenizer(MODE, bm25_top_n_add, corpus,
                                                                                                                 labels, queries,
                                                                                                                 MAX_LENGTH, MODEL_TYPE)

    # ========================================
    #               Folds
    # ========================================
    mrr_bm25_list, map_bm25_list, ndcg_bm25_list = [], [], []
    mrr_bert_list, map_bert_list, ndcg_bert_list = [], [], []
    mrr_bm25, map_bm25, ndcg_bm25 = 0, 0, 0
    mrr_bert, map_bert, ndcg_bert = 0, 0, 0

    for fold_number in range(1, 6):
        print('======== Fold {:} / {:} ========'.format(fold_number, 5))
        train_index, test_index = data_utils.load_fold(fold_number)

        padded, attention_mask, token_type_ids = [], [], []
        if MODE == 'Re-ranker':
            # no matter BM25_ENRICH-mode, next line required for test set construction
            padded, attention_mask, token_type_ids = padded_all, attention_mask_all, token_type_ids_all
            if BM25_ENRICH == 'swap':
                padded_swap, attention_mask_swap, token_type_ids_swap = padded_all_swap, attention_mask_all_swap, token_type_ids_all_swap
            elif BM25_ENRICH == 'add':
                padded_add, attention_mask_add, token_type_ids_add = padded_all_add, attention_mask_all_add, token_type_ids_all_add
            
        else:
            temp_feedback = []
            for query_num in range(0, len(bm25_top_n)):
                if query_num in test_index:
                    doc_nums = range(0, 1400)
                else:
                    doc_nums = bm25_top_n[query_num]
                padded.append(list(itemgetter(*doc_nums)(padded_all[query_num])))
                attention_mask.append(list(itemgetter(*doc_nums)(attention_mask_all[query_num])))
                token_type_ids.append(list(itemgetter(*doc_nums)(token_type_ids_all[query_num])))
                temp_feedback.append(list(itemgetter(*doc_nums)(labels[query_num])))

        # Enricht the training set (or keep default)
        if BM25_ENRICH == 'default':
            train_dataset = data_utils.get_tensor_dataset(train_index, padded, attention_mask, token_type_ids,
                                                          temp_feedback)
        elif BM25_ENRICH == 'swap':
            train_dataset = data_utils.get_tensor_dataset(train_index, padded_swap, attention_mask_swap, token_type_ids_swap,
                                                    temp_feedback_swap)
        elif BM25_ENRICH == 'add':
            train_dataset = data_utils.get_tensor_dataset(train_index, padded_add, attention_mask_add, token_type_ids_add,
                                                    temp_feedback_add)

        test_dataset = data_utils.get_tensor_dataset(test_index, padded, attention_mask, token_type_ids, temp_feedback)

        mrr_bm25, map_bm25, ndcg_bm25, mrr_bm25_list, map_bm25_list, ndcg_bm25_list = utils.get_bm25_results(
            mrr_bm25_list, map_bm25_list, ndcg_bm25_list, test_index, tokenized_queries, bm25, mrr_bm25, map_bm25,
            ndcg_bm25, rel_fed, fold_number, MAP_CUT, NDCG_CUT)

          
        # Option to load a custom trained model (used in transfer learning)
        if LOAD_CUSTOM_TRAINED_MODEL:
          model = load_specific_encoder(custom_model_path)
        else:
          model = None
          # with None the model_preparation loads the 'MODEL_TYPE' model
        if MULTI_LABEL:
          num_labels = 5
        else:
          num_labels = 2
        train_dataloader, test_dataloader, model, optimizer, scheduler = utils.model_preparation(MODEL_TYPE, train_dataset,
                                                                                                 test_dataset,
                                                                                                 BATCH_SIZE, TEST_BATCH_SIZE,
                                                                                                 LEARNING_RATE, EPOCHS, model=model,
                                                                                                 num_labels=num_labels,
                                                                                                 custom_model=CUSTOM_MODEL)


        # ========================================
        #               Training Loop
        # ========================================
        epochs_train_loss, epochs_val_loss = [], []
        for epoch_i in range(0, EPOCHS):
            # ========================================
            #               Training
            # ========================================
            print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, EPOCHS))
            print('Training...')
            model, optimizer, scheduler = utils.training(model, train_dataloader, device, optimizer, scheduler)
        # ========================================
        #               Testing
        # ========================================
        print('Testing...')
        mrr_bert, map_bert, ndcg_bert, mrr_bert_list, map_bert_list, ndcg_bert_list = utils.testing(MODE, model,
                                                                                                    test_dataloader,
                                                                                                    device, test_index,
                                                                                                    bm25_top_n,
                                                                                                    mrr_bert_list,
                                                                                                    map_bert_list,
                                                                                                    ndcg_bert_list,
                                                                                                    mrr_bert, map_bert,
                                                                                                    ndcg_bert, rel_fed,
                                                                                                    fold_number,
                                                                                                    MAP_CUT, NDCG_CUT,
                                                                                                    multilabel=MULTI_LABEL,
                                                                                                    argmax_sorting=ARG_MAX_SORTING,
                                                                                                    per_label_testing=PER_LABEL_TESTING)
    print("  BM25 MRR:  " + "{:.4f}".format(mrr_bm25 / 5))
    print("  BM25 MAP:  " + "{:.4f}".format(map_bm25 / 5))
    print("  BM25 NDCG: " + "{:.4f}".format(ndcg_bm25 / 5))

    print("  BERT MRR:  " + "{:.4f}".format(mrr_bert / 5))
    print("  BERT MAP:  " + "{:.4f}".format(map_bert / 5))
    print("  BERT NDCG: " + "{:.4f}".format(ndcg_bert / 5))

    utils.t_test(mrr_bm25_list, mrr_bert_list, 'MRR')
    utils.t_test(map_bm25_list, map_bert_list, 'MAP')
    utils.t_test(ndcg_bm25_list, ndcg_bert_list, 'NDCG')
    
    stop = timeit.default_timer()
    wall_time = (stop - start) / 60 

    print('Time: ', wall_time, ' min') 

    # utils.results_to_csv('./mrr_bm25_list.csv', mrr_bm25_list)
    # utils.results_to_csv('./mrr_bert_list.csv', mrr_bert_list)
    # utils.results_to_csv('./map_bm25_list.csv', map_bm25_list)
    # utils.results_to_csv('./map_bert_list.csv', map_bert_list)
    # utils.results_to_csv('./ndcg_bm25_list.csv', ndcg_bm25_list)
    # utils.results_to_csv('./ndcg_bert_list.csv', ndcg_bert_list)

In [11]:
train_test()

#               Hyper-Parameters
Re-ranker
bert-base-uncased
2e-05
128
32
1
#               Experiment-Settings
BM25_ENRICHMENT:    default
MULTI_LABEL:        True
ARGMAX-SORTING:     True
PER_LABEL_TESTING:  False
CUSTOM_MODEL:       weighted-BCEWIthLogitsLoss
#               Other
Tesla T4
GPU Type: Tesla T4




MRR:  0.7837
MAP:  0.3493
NDCG: 0.5011
45


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6670
Testing...
  Test MRR:  0.8104
  Test MAP:  0.3929
  Test NDCG: 0.5392
45
MRR:  0.6596
MAP:  0.3036
NDCG: 0.4546
90


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6961
Testing...
  Test MRR:  0.7123
  Test MAP:  0.3329
  Test NDCG: 0.4730
90
MRR:  0.7611
MAP:  0.3341
NDCG: 0.4826
135


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6808
Testing...
  Test MRR:  0.8705
  Test MAP:  0.4382
  Test NDCG: 0.5750
135
MRR:  0.6859
MAP:  0.3317
NDCG: 0.4408
180


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6466
Testing...
  Test MRR:  0.7517
  Test MAP:  0.3747
  Test NDCG: 0.4772
180
MRR:  0.7796
MAP:  0.3182
NDCG: 0.4780
225


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6770
Testing...
  Test MRR:  0.8098
  Test MAP:  0.3929
  Test NDCG: 0.5296
225
  BM25 MRR:  0.7340
  BM25 MAP:  0.3274
  BM25 NDCG: 0.4714
  BERT MRR:  0.7909
  BERT MAP:  0.3863
  BERT NDCG: 0.5188
p-value MRR: 0.1016
p-value MAP: 0.0153
p-value NDCG: 0.0579
Time:  40.43083466898334  min


In [12]:
LEARNING_RATE = 2e-5
EPOCHS = 2

In [13]:
train_test()

#               Hyper-Parameters
Re-ranker
bert-base-uncased
2e-05
128
32
2
#               Experiment-Settings
BM25_ENRICHMENT:    default
MULTI_LABEL:        True
ARGMAX-SORTING:     True
PER_LABEL_TESTING:  False
CUSTOM_MODEL:       weighted-BCEWIthLogitsLoss
#               Other
Tesla T4
GPU Type: Tesla T4
MRR:  0.7837
MAP:  0.3493
NDCG: 0.5011
45


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6934
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.5324
Testing...
  Test MRR:  0.8409
  Test MAP:  0.3895
  Test NDCG: 0.5443
45
MRR:  0.6596
MAP:  0.3036
NDCG: 0.4546
90


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6940
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.5274
Testing...
  Test MRR:  0.7074
  Test MAP:  0.3334
  Test NDCG: 0.4765
90
MRR:  0.7611
MAP:  0.3341
NDCG: 0.4826
135


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6735
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.5430
Testing...
  Test MRR:  0.8769
  Test MAP:  0.4291
  Test NDCG: 0.5601
135
MRR:  0.6859
MAP:  0.3317
NDCG: 0.4408
180


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6910
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.5558
Testing...
  Test MRR:  0.7145
  Test MAP:  0.3569
  Test NDCG: 0.4460
180
MRR:  0.7796
MAP:  0.3182
NDCG: 0.4780
225


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6540
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.4981
Testing...
  Test MRR:  0.8068
  Test MAP:  0.4019
  Test NDCG: 0.5344
225
  BM25 MRR:  0.7340
  BM25 MAP:  0.3274
  BM25 NDCG: 0.4714
  BERT MRR:  0.7893
  BERT MAP:  0.3821
  BERT NDCG: 0.5123
p-value MRR: 0.1108
p-value MAP: 0.0254
p-value NDCG: 0.1035
Time:  75.04221240451668  min


In [14]:
LEARNING_RATE = 3e-5
EPOCHS = 1

In [None]:
train_test()

#               Hyper-Parameters
Re-ranker
bert-base-uncased
3e-05
128
32
1
#               Experiment-Settings
BM25_ENRICHMENT:    default
MULTI_LABEL:        True
ARGMAX-SORTING:     True
PER_LABEL_TESTING:  False
CUSTOM_MODEL:       weighted-BCEWIthLogitsLoss
#               Other
Tesla T4
GPU Type: Tesla T4
MRR:  0.7837
MAP:  0.3493
NDCG: 0.5011
45


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.7609
Testing...
  Test MRR:  0.7656
  Test MAP:  0.3720
  Test NDCG: 0.5168
45
MRR:  0.6596
MAP:  0.3036
NDCG: 0.4546
90


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.7137
Testing...
  Test MRR:  0.7196
  Test MAP:  0.3115
  Test NDCG: 0.4553
90
MRR:  0.7611
MAP:  0.3341
NDCG: 0.4826
135


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.7166
Testing...
  Test MRR:  0.7690
  Test MAP:  0.3997
  Test NDCG: 0.5294
135
MRR:  0.6859
MAP:  0.3317
NDCG: 0.4408
180


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6895
Testing...
  Test MRR:  0.7023
  Test MAP:  0.3741
  Test NDCG: 0.4700
180
MRR:  0.7796
MAP:  0.3182
NDCG: 0.4780
225


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.


In [None]:
LEARNING_RATE = 3e-5
EPOCHS = 2

In [17]:
train_test()

  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.7268
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.5427
Testing...
  Test MRR:  0.8014
  Test MAP:  0.3756
  Test NDCG: 0.5232
45
MRR:  0.6596
MAP:  0.3036
NDCG: 0.4546
90


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.7072
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.5555
Testing...
  Test MRR:  0.7233
  Test MAP:  0.3456
  Test NDCG: 0.4819
90
MRR:  0.7611
MAP:  0.3341
NDCG: 0.4826
135


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.7296
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.5503
Testing...
  Test MRR:  0.8311
  Test MAP:  0.4200
  Test NDCG: 0.5592
135
MRR:  0.6859
MAP:  0.3317
NDCG: 0.4408
180


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.7008
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.5530
Testing...
  Test MRR:  0.7247
  Test MAP:  0.3737
  Test NDCG: 0.4733
180
MRR:  0.7796
MAP:  0.3182
NDCG: 0.4780
225


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6835
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.5050
Testing...
  Test MRR:  0.8827
  Test MAP:  0.4254
  Test NDCG: 0.5707
225
  BM25 MRR:  0.7340
  BM25 MAP:  0.3274
  BM25 NDCG: 0.4714
  BERT MRR:  0.7926
  BERT MAP:  0.3880
  BERT NDCG: 0.5217
p-value MRR: 0.0908
p-value MAP: 0.0121
p-value NDCG: 0.0450
Time:  75.05398435261665  min


In [11]:
CUSTOM_MODEL = 'HingeLoss'
LEARNING_RATE = 2e-5
EPOCHS = 1
train_test()

#               Hyper-Parameters
Re-ranker
bert-base-uncased
2e-05
128
32
1
#               Experiment-Settings
BM25_ENRICHMENT:    default
MULTI_LABEL:        True
ARGMAX-SORTING:     True
PER_LABEL_TESTING:  False
CUSTOM_MODEL:       HingeLoss
#               Other
Tesla T4
GPU Type: Tesla T4




MRR:  0.7837
MAP:  0.3493
NDCG: 0.5011
45


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6756
Testing...
  Test MRR:  0.1868
  Test MAP:  0.0699
  Test NDCG: 0.1010
45
MRR:  0.6596
MAP:  0.3036
NDCG: 0.4546
90


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6518
Testing...
  Test MRR:  0.3496
  Test MAP:  0.1192
  Test NDCG: 0.1805
90
MRR:  0.7611
MAP:  0.3341
NDCG: 0.4826
135


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6644
Testing...
  Test MRR:  0.2574
  Test MAP:  0.0904
  Test NDCG: 0.1283
135
MRR:  0.6859
MAP:  0.3317
NDCG: 0.4408
180


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6575
Testing...
  Test MRR:  0.1783
  Test MAP:  0.0851
  Test NDCG: 0.1237
180
MRR:  0.7796
MAP:  0.3182
NDCG: 0.4780
225


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMultilabelSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMultilabelSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForMultilabelSequenceClassification were not 

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.6672
Testing...
  Test MRR:  0.1665
  Test MAP:  0.0722
  Test NDCG: 0.1025
225
  BM25 MRR:  0.7340
  BM25 MAP:  0.3274
  BM25 NDCG: 0.4714
  BERT MRR:  0.2277
  BERT MAP:  0.0874
  BERT NDCG: 0.1272
p-value MRR: 0.0000
p-value MAP: 0.0000
p-value NDCG: 0.0000
Time:  40.0054526384333  min


In [None]:
LEARNING_RATE = 2e-5
EPOCHS = 2
train_test()

In [None]:
LEARNING_RATE = 3e-5
EPOCHS = 1
train_test()

In [None]:
LEARNING_RATE = 3e-5
EPOCHS = 2
train_test()

In [12]:
# ========================================
#               Hyper-Parameters
# ========================================
SEED = 76
MODE = 'Re-ranker'
MODEL_TYPE = 'bert-base-uncased'
LEARNING_RATE = 2e-5
MAX_LENGTH = 128
BATCH_SIZE = 32
EPOCHS = 1
TOP_BM25 = 100
MAP_CUT = 100
NDCG_CUT = 20
if MODE == 'Full-ranker':
    TEST_BATCH_SIZE = 1400
else:
    TEST_BATCH_SIZE = 100

# Set the seed value all over the place to make this reproducible.
utils.initialize_random_generators(SEED)

BM25_ENRICH = 'default' # or 'add' or 'swap' (default=no enrichment of BM25 results)
MULTI_LABEL = True
PER_LABEL_TESTING = True # if True, arg_max must be False
'''
  PER_LABEL_TESTING calculates the performance per predicted label. Contratry to
  the binary case, each relevance levels 0-5 has a prediction. Method can be
  seen as an alternative for the arg-max method. This flag is later implemented,
  therefore it will mess up the adminstration for the calculation of final NDCG.
  To circumvent this, the NDCG found for each fold, for each label has to be averaged.
'''
ARG_MAX_SORTING = False # if True, PER_LABEL_TESTING must be False
CUSTOM_MODEL = None # default is None, utils.py explains

LOAD_CUSTOM_TRAINED_MODEL = False #

In [13]:
train_test()

#               Hyper-Parameters
Re-ranker
bert-base-uncased
2e-05
128
32
1
#               Experiment-Settings
BM25_ENRICHMENT:    default
MULTI_LABEL:        True
ARGMAX-SORTING:     False
PER_LABEL_TESTING:  True
CUSTOM_MODEL:       None
#               Other
Tesla T4
GPU Type: Tesla T4
MRR:  0.7837
MAP:  0.3493
NDCG: 0.5011
45


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2654
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.8102
  Test MAP:  0.4091
  Test NDCG: 0.5512
45
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.8088
  Test MAP:  0.4071
  Test NDCG: 0.5489
90
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.7702
  Test MAP:  0.3822
  Test NDCG: 0.5147
135
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.8046
  Test MAP:  0.4076
  Test NDCG: 0.5461
180
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.8066
  Test MAP:  0.3925
  Test NDCG: 0.5367
225
MRR:  0.6596
MAP:  0.3036
NDCG: 0.4546
90


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2421
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.7311
  Test MAP:  0.3564
  Test NDCG: 0.5029
270
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.7178
  Test MAP:  0.3357
  Test NDCG: 0.4777
315
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.7278
  Test MAP:  0.3418
  Test NDCG: 0.4825
360
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.7372
  Test MAP:  0.3584
  Test NDCG: 0.5064
405
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.2122
  Test MAP:  0.1247
  Test NDCG: 0.2050
450
MRR:  0.7611
MAP:  0.3341
NDCG: 0.4826
135


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2536
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.8955
  Test MAP:  0.4429
  Test NDCG: 0.5834
495
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.8850
  Test MAP:  0.4407
  Test NDCG: 0.5826
540
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.8843
  Test MAP:  0.4400
  Test NDCG: 0.5852
585
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.5340
  Test MAP:  0.2752
  Test NDCG: 0.3909
630
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.7939
  Test MAP:  0.3755
  Test NDCG: 0.5049
675
MRR:  0.6859
MAP:  0.3317
NDCG: 0.4408
180


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2444
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.7433
  Test MAP:  0.3942
  Test NDCG: 0.5077
720
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.7562
  Test MAP:  0.3999
  Test NDCG: 0.5103
765
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.7596
  Test MAP:  0.3990
  Test NDCG: 0.5089
810
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.7453
  Test MAP:  0.3975
  Test NDCG: 0.5059
855
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.7421
  Test MAP:  0.3950
  Test NDCG: 0.5039
900
MRR:  0.7796
MAP:  0.3182
NDCG: 0.4780
225


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2527
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.8375
  Test MAP:  0.4281
  Test NDCG: 0.5749
945
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.8388
  Test MAP:  0.4238
  Test NDCG: 0.5776
990
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.7697
  Test MAP:  0.3298
  Test NDCG: 0.4591
1035
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.8432
  Test MAP:  0.4147
  Test NDCG: 0.5628
1080
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.8369
  Test MAP:  0.4231
  Test NDCG: 0.5748
1125
  BM25 MRR:  0.7340
  BM25 MAP:  0.3274
  BM25 NDCG: 0.4714
  BERT MRR:  3.7983
  BERT MAP:  1.8990
  BERT NDCG: 2.5610
p-value MRR: 0.3486
p-value MAP: 0.0065
p-value NDCG: 0.0398
Time:  40.21411223545001  min


In [14]:
EPOCHS = 2

In [15]:
train_test()

#               Hyper-Parameters
Re-ranker
bert-base-uncased
2e-05
128
32
2
#               Experiment-Settings
BM25_ENRICHMENT:    default
MULTI_LABEL:        True
ARGMAX-SORTING:     False
PER_LABEL_TESTING:  True
CUSTOM_MODEL:       None
#               Other
Tesla T4
GPU Type: Tesla T4
MRR:  0.7837
MAP:  0.3493
NDCG: 0.5011
45


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.1800
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.8626
  Test MAP:  0.4228
  Test NDCG: 0.5671
45
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.8520
  Test MAP:  0.4136
  Test NDCG: 0.5623
90
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.8622
  Test MAP:  0.4187
  Test NDCG: 0.5647
135
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.7171
  Test MAP:  0.3750
  Test NDCG: 0.5110
180
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.7744
  Test MAP:  0.3670
  Test NDCG: 0.5116
225
MRR:  0.6596
MAP:  0.3036
NDCG: 0.4546
90


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2497
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.1797
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.7288
  Test MAP:  0.3600
  Test NDCG: 0.5072
270
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.7293
  Test MAP:  0.3617
  Test NDCG: 0.5065
315
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.7289
  Test MAP:  0.3574
  Test NDCG: 0.5039
360
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.7273
  Test MAP:  0.3614
  Test NDCG: 0.5081
405
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.7232
  Test MAP:  0.3390
  Test NDCG: 0.4842
450
MRR:  0.7611
MAP:  0.3341
NDCG: 0.4826
135


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2516
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.1806
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.8731
  Test MAP:  0.4681
  Test NDCG: 0.5949
495
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.8750
  Test MAP:  0.4682
  Test NDCG: 0.5965
540
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.8794
  Test MAP:  0.4682
  Test NDCG: 0.5968
585
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.8806
  Test MAP:  0.4699
  Test NDCG: 0.6020
630
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.8780
  Test MAP:  0.4719
  Test NDCG: 0.6075
675
MRR:  0.6859
MAP:  0.3317
NDCG: 0.4408
180


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2687
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.1883
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.7219
  Test MAP:  0.3957
  Test NDCG: 0.4821
720
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.7215
  Test MAP:  0.3941
  Test NDCG: 0.4812
765
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.7320
  Test MAP:  0.3946
  Test NDCG: 0.4790
810
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.5839
  Test MAP:  0.3337
  Test NDCG: 0.4139
855
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.7332
  Test MAP:  0.3942
  Test NDCG: 0.4895
900
MRR:  0.7796
MAP:  0.3182
NDCG: 0.4780
225


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2386
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.1726
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.8218
  Test MAP:  0.4296
  Test NDCG: 0.5705
945
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.8114
  Test MAP:  0.4300
  Test NDCG: 0.5720
990
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.8362
  Test MAP:  0.4338
  Test NDCG: 0.5713
1035
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.7335
  Test MAP:  0.4002
  Test NDCG: 0.5356
1080
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.6689
  Test MAP:  0.3238
  Test NDCG: 0.4524
1125
  BM25 MRR:  0.7340
  BM25 MAP:  0.3274
  BM25 NDCG: 0.4714
  BERT MRR:  3.8912
  BERT MAP:  2.0105
  BERT NDCG: 2.6544
p-value MRR: 0.0936
p-value MAP: 0.0001
p-value NDCG: 0.00

In [16]:
LEARNING_RATE = 3e-5
EPOCHS = 1
train_test()

#               Hyper-Parameters
Re-ranker
bert-base-uncased
3e-05
128
32
1
#               Experiment-Settings
BM25_ENRICHMENT:    default
MULTI_LABEL:        True
ARGMAX-SORTING:     False
PER_LABEL_TESTING:  True
CUSTOM_MODEL:       None
#               Other
Tesla T4
GPU Type: Tesla T4
MRR:  0.7837
MAP:  0.3493
NDCG: 0.5011
45


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2500
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.8368
  Test MAP:  0.4052
  Test NDCG: 0.5412
45
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.8045
  Test MAP:  0.3612
  Test NDCG: 0.5037
90
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.7740
  Test MAP:  0.3779
  Test NDCG: 0.5088
135
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.6755
  Test MAP:  0.3233
  Test NDCG: 0.4567
180
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.8352
  Test MAP:  0.3983
  Test NDCG: 0.5374
225
MRR:  0.6596
MAP:  0.3036
NDCG: 0.4546
90


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2438
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.7492
  Test MAP:  0.3542
  Test NDCG: 0.5130
270
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.7524
  Test MAP:  0.3511
  Test NDCG: 0.5091
315
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.7482
  Test MAP:  0.3448
  Test NDCG: 0.4977
360
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.7533
  Test MAP:  0.3555
  Test NDCG: 0.5166
405
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.7472
  Test MAP:  0.3420
  Test NDCG: 0.4959
450
MRR:  0.7611
MAP:  0.3341
NDCG: 0.4826
135


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2417
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.8815
  Test MAP:  0.4440
  Test NDCG: 0.5837
495
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.8835
  Test MAP:  0.4262
  Test NDCG: 0.5654
540
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.8532
  Test MAP:  0.4128
  Test NDCG: 0.5498
585
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.8291
  Test MAP:  0.3995
  Test NDCG: 0.5399
630
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.7370
  Test MAP:  0.2915
  Test NDCG: 0.4290
675
MRR:  0.6859
MAP:  0.3317
NDCG: 0.4408
180


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2361
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.7178
  Test MAP:  0.3751
  Test NDCG: 0.4820
720
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.7120
  Test MAP:  0.3778
  Test NDCG: 0.4832
765
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.6595
  Test MAP:  0.3459
  Test NDCG: 0.4511
810
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.7130
  Test MAP:  0.3719
  Test NDCG: 0.4843
855
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.0588
  Test MAP:  0.0396
  Test NDCG: 0.0261
900
MRR:  0.7796
MAP:  0.3182
NDCG: 0.4780
225


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2598
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.8401
  Test MAP:  0.4120
  Test NDCG: 0.5660
945
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.8198
  Test MAP:  0.4033
  Test NDCG: 0.5570
990
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.7792
  Test MAP:  0.3868
  Test NDCG: 0.5378
1035
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.8330
  Test MAP:  0.4211
  Test NDCG: 0.5682
1080
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.8084
  Test MAP:  0.4094
  Test NDCG: 0.5621
1125
  BM25 MRR:  0.7340
  BM25 MAP:  0.3274
  BM25 NDCG: 0.4714
  BERT MRR:  3.7604
  BERT MAP:  1.8261
  BERT NDCG: 2.4932
p-value MRR: 0.5146
p-value MAP: 0.0486
p-value NDCG: 0.1771
Time:  40.30960054991665  min


In [None]:
EPOCHS = 2
train_test()

#               Hyper-Parameters
Re-ranker
bert-base-uncased
3e-05
128
32
2
#               Experiment-Settings
BM25_ENRICHMENT:    default
MULTI_LABEL:        True
ARGMAX-SORTING:     False
PER_LABEL_TESTING:  True
CUSTOM_MODEL:       None
#               Other
Tesla T4
GPU Type: Tesla T4
MRR:  0.7837
MAP:  0.3493
NDCG: 0.5011
45


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2662
Training...
  Batch   100  of    563.
  Batch   200  of    563.
  Batch   300  of    563.
  Batch   400  of    563.
  Batch   500  of    563.
  Average training loss: 0.2015
Testing...
PER_LABEL_TESTING: LABEL  0
  Test MRR:  0.8157
  Test MAP:  0.4062
  Test NDCG: 0.5519
45
PER_LABEL_TESTING: LABEL  1
  Test MRR:  0.8008
  Test MAP:  0.3990
  Test NDCG: 0.5411
90
PER_LABEL_TESTING: LABEL  2
  Test MRR:  0.8127
  Test MAP:  0.3945
  Test NDCG: 0.5337
135
PER_LABEL_TESTING: LABEL  3
  Test MRR:  0.8085
  Test MAP:  0.4035
  Test NDCG: 0.5353
180
PER_LABEL_TESTING: LABEL  4
  Test MRR:  0.7979
  Test MAP:  0.3925
  Test NDCG: 0.5320
225
MRR:  0.6596
MAP:  0.3036
NDCG: 0.4546
90


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

Training...
  Batch   100  of    563.
