## In this notebook we will showcase the experiments realised using the FFI framework with the tct_colbert finetuned on the msmarco dataset encoder for the following datasets:

* irds:beir/nfcorpus
* irds:beir/fiqa
* irds:beir/scidocs
* irds:beir/cqadupstack/english
* irds:beir/arguana
* irds:beir/scifact

General Setup for all the datasets( first 4 cells)

In [2]:
import pyterrier as pt

if not pt.started():
    pt.init()

PyTerrier 0.10.0 has loaded Terrier 5.9 (built by craigm on 2024-05-02 17:40) and terrier-helper 0.0.8

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


Evaluation metrics used for all the datasets

In [3]:
from pyterrier.measures import RR, nDCG, MAP

eval_metrics = [RR @ 10, nDCG @ 10, MAP @ 100]

Create the query encoder that will run on CPU. Encoder used for embedding all the datasets/queries

In [4]:
from fast_forward.encoder import TCTColBERTQueryEncoder

model_name = "tct_colbert_msmarco"
q_encoder = TCTColBERTQueryEncoder("castorini/tct_colbert-msmarco")


Some weights of the model checkpoint at castorini/tct_colbert-msmarco were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Defined the path to root

In [5]:
path_to_root = "../../"

In the following cells, for each dataset we will run FFI using BM25 for the first stage retrieval and bge-base-en-v1.5 for the semantic re-ranking:

#### BM25 >> tct_colbert irds:beir/nfcorpus

In [9]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/nfcorpus"
dev_set_name = "irds:beir/nfcorpus/dev"
dataset_test_name = "irds:beir/nfcorpus/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 3633/3633 [00:00<00:00, 1698195.30it/s]
GridScan: 100%|██████████| 4/4 [00:50<00:00, 12.70s/it]


Best map is 0.114519
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7fa1130eabf0> alpha=0.02']
Experiment took 14.851 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,nfcorpus: BM25 >> tct_colbert_msmarco,0.522044,0.318184,0.144755


#### BM25 >> tct_colbert irds:beir/fiqa

In [5]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/fiqa"
dev_set_name = "irds:beir/fiqa/dev"
dataset_test_name = "irds:beir/fiqa/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

#### BM25 >> tct_colbert irds:beir/scidocs

In [11]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/scidocs"
dataset = pt.get_dataset(dataset_name)
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name,
                      path_to_root, timed=True, alpha=0.05)


#### BM25 >> tct_colbert irds:beir/cscqadupstack/english

In [12]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/cqadupstack/english"
dataset = pt.get_dataset("irds:beir/cqadupstack/english")
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name,
                      path_to_root, timed=True, alpha=0.05)


#### BM25 >> tct_colbert irds:beir/arguana

#### Local runtime : x minutes

In [13]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/arguana"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_name, q_encoder, eval_metrics, model_name, pipeline_name, path_to_root,
                           timed=True)

100%|██████████| 8674/8674 [00:00<00:00, 1027258.67it/s]


Experiment took 214.066 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,arguana: BM25 >> tct_colbert_msmarco,0.187759,0.288036,0.202261


#### BM25 >> tct_colbert irds:beir/scifact

In [16]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/test"
dev_set_name = "irds:beir/scifact/train"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 5183/5183 [00:00<00:00, 1437484.47it/s]
GridScan: 100%|██████████| 4/4 [02:34<00:00, 38.55s/it]


Best map is 0.621427
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7fa074c89150> alpha=0.02']
Experiment took 18.839 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scifact: BM25 >> tct_colbert_msmarco,0.592106,0.623456,0.583593


#### BM25 >> tct_colbert irds:msmarco-passage/trec-dl-2019

In [None]:
from experiment_utils.experiments_helper import default_test_pipeline_name
from config.sparse_pyterrier_index_global_paths import MSMARCO_PASSAGE_TERRIER_STEMMED_PATH

dataset_test_name = "irds:msmarco-passage/trec-dl-2019"
dev_set_name = "irds:beir/msmarco/dev"
pipeline_name = "BM25 >> " + model_name
default_test_pipeline_name(dataset_test_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True, in_memory_sparse=False,
                           in_memory_dense=False, index_path=MSMARCO_PASSAGE_TERRIER_STEMMED_PATH)

[INFO] [starting] opening zip file
[INFO] If you have a local copy of https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/msmarco.zip, you can symlink it here to avoid downloading it again: /home/anistor/.ir_datasets/downloads/444067daf65d982533ea17ebd59501e4
[INFO] [starting] https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/msmarco.zip
[INFO] [finished] https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/msmarco.zip: [00:38] [1.08GB] [27.8MB/s]
[INFO] [finished] opening zip file [39.18s]                                                                
[INFO] [starting] opening zip file
[INFO] [finished] opening zip file [0ms]
100%|██████████| 8841823/8841823 [00:11<00:00, 793875.48it/s] 
GridScan:   0%|          | 0/7 [00:00<?, ?it/s]