## In this notebook we will showcase the experiments realised using the FFI framework with the tct_colbert finetuned on the msmarco dataset encoder for the following datasets:

* irds:beir/nfcorpus
* irds:beir/fiqa
* irds:beir/scidocs
* irds:beir/cqadupstack/english
* irds:beir/arguana
* irds:beir/scifact

General Setup for all the datasets( first 4 cells)

In [1]:
import pyterrier as pt

if not pt.started():
    pt.init()

PyTerrier 0.10.0 has loaded Terrier 5.9 (built by craigm on 2024-05-02 17:40) and terrier-helper 0.0.8

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


Evaluation metrics used for all the datasets

In [2]:
from pyterrier.measures import RR, nDCG, MAP

eval_metrics = [RR @ 10, nDCG @ 10, MAP @ 100]

Create the query encoder that will run on CPU. Encoder used for embedding all the datasets/queries

In [3]:
from fast_forward.encoder import TCTColBERTQueryEncoder

model_name = "tct_colbert_msmarco"
q_encoder = TCTColBERTQueryEncoder("castorini/tct_colbert-msmarco")




Defined the path to root

In [4]:
path_to_root = "../../"

In the following cells, for each dataset we will run FFI using BM25 for the first stage retrieval and bge-base-en-v1.5 for the semantic re-ranking:

#### BM25 >> tct_colbert irds:beir/nfcorpus

In [5]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/nfcorpus"
dev_set_name = "irds:beir/nfcorpus/dev"
dataset_test_name = "irds:beir/nfcorpus/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 3633/3633 [00:00<00:00, 1566235.63it/s]
GridScan: 100%|██████████| 7/7 [01:20<00:00, 11.53s/it]


Best map is 0.118544
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7fe54e3caf80> alpha=0.2']
Experiment took 18.696 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,nfcorpus: BM25 >> tct_colbert_msmarco,0.551017,0.33851,0.151468


#### BM25 >> tct_colbert irds:beir/fiqa

In [6]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/fiqa"
dev_set_name = "irds:beir/fiqa/dev"
dataset_test_name = "irds:beir/fiqa/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 57638/57638 [00:00<00:00, 1503512.60it/s]
GridScan: 100%|██████████| 7/7 [02:43<00:00, 23.29s/it]


Best map is 0.282656
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7fe54603b850> alpha=0.1']
Experiment took 93.072 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,fiqa: BM25 >> tct_colbert_msmarco,0.384861,0.315708,0.26389


#### BM25 >> tct_colbert irds:beir/scidocs

In [11]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/scidocs"
dataset = pt.get_dataset(dataset_name)
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name,
                      path_to_root, timed=True, alpha=0.05)


#### BM25 >> tct_colbert irds:beir/cscqadupstack/english

In [12]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/cqadupstack/english"
dataset = pt.get_dataset("irds:beir/cqadupstack/english")
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name,
                      path_to_root, timed=True, alpha=0.05)


#### BM25 >> tct_colbert irds:beir/arguana

#### Local runtime : x minutes

In [13]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/arguana"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_name, q_encoder, eval_metrics, model_name, pipeline_name, path_to_root,
                           timed=True)

100%|██████████| 8674/8674 [00:00<00:00, 1027258.67it/s]


Experiment took 214.066 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,arguana: BM25 >> tct_colbert_msmarco,0.187759,0.288036,0.202261


#### BM25 >> tct_colbert irds:beir/scifact

In [7]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/test"
dev_set_name = "irds:beir/scifact/train"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 5183/5183 [00:00<00:00, 1589462.43it/s]
GridScan: 100%|██████████| 7/7 [04:18<00:00, 36.93s/it]


Best map is 0.676926
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7fe4b6a27790> alpha=0.2']
Experiment took 17.215 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scifact: BM25 >> tct_colbert_msmarco,0.661993,0.698182,0.65603


#### BM25 >> tct_colbert irds:msmarco-passage/trec-dl-2019

In [None]:
from experiment_utils.experiments_helper import default_test_pipeline_name
from config.sparse_pyterrier_index_global_paths import MSMARCO_PASSAGE_TERRIER_STEMMED_PATH

dataset_test_name = "irds:msmarco-passage/trec-dl-2019"
dev_set_name = "irds:beir/msmarco/dev"
pipeline_name = "BM25 >> " + model_name
default_test_pipeline_name(dataset_test_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True, in_memory_sparse=False,
                           in_memory_dense=False, index_path=MSMARCO_PASSAGE_TERRIER_STEMMED_PATH)

[INFO] [starting] opening zip file
[INFO] If you have a local copy of https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/msmarco.zip, you can symlink it here to avoid downloading it again: /home/anistor/.ir_datasets/downloads/444067daf65d982533ea17ebd59501e4
[INFO] [starting] https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/msmarco.zip
[INFO] [finished] https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/msmarco.zip: [00:38] [1.08GB] [27.8MB/s]
[INFO] [finished] opening zip file [39.18s]                                                                
[INFO] [starting] opening zip file
[INFO] [finished] opening zip file [0ms]
100%|██████████| 8841823/8841823 [00:11<00:00, 793875.48it/s] 
GridScan:   0%|          | 0/7 [00:00<?, ?it/s]

## Find out sensitivity of alpha value

In [8]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/train"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, 0,9]

for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True, alpha=alpha)
    print(result)

100%|██████████| 5183/5183 [00:00<00:00, 661003.33it/s]


Experiment took 88.443 seconds to execute.
                                   name     RR@10   nDCG@10   AP@100
0  scifact: BM25 >> tct_colbert_msmarco  0.555393  0.586517  0.54873


100%|██████████| 5183/5183 [00:00<00:00, 1415857.60it/s]


Experiment took 70.431 seconds to execute.
                                   name     RR@10  nDCG@10  AP@100
0  scifact: BM25 >> tct_colbert_msmarco  0.575409  0.60577  0.5678


100%|██████████| 5183/5183 [00:00<00:00, 1334340.64it/s]


Experiment took 49.942 seconds to execute.
                                   name    RR@10   nDCG@10    AP@100
0  scifact: BM25 >> tct_colbert_msmarco  0.59718  0.627204  0.588833


100%|██████████| 5183/5183 [00:00<00:00, 1528660.27it/s]


Experiment took 48.299 seconds to execute.
                                   name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> tct_colbert_msmarco  0.622516  0.652288  0.614314


100%|██████████| 5183/5183 [00:00<00:00, 1479553.37it/s]


Experiment took 50.169 seconds to execute.
                                   name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> tct_colbert_msmarco  0.661816  0.694801  0.655253


100%|██████████| 5183/5183 [00:00<00:00, 1275692.60it/s]


Experiment took 69.577 seconds to execute.
                                   name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> tct_colbert_msmarco  0.681321  0.715563  0.676635


100%|██████████| 5183/5183 [00:00<00:00, 888215.63it/s]


Experiment took 184.317 seconds to execute.
                                   name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> tct_colbert_msmarco  0.681982  0.714012  0.678134


100%|██████████| 5183/5183 [00:00<00:00, 351412.46it/s]


Experiment took 103.536 seconds to execute.
                                   name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> tct_colbert_msmarco  0.661731  0.695691  0.657452


100%|██████████| 5183/5183 [00:00<00:00, 667231.75it/s]


Experiment took 108.069 seconds to execute.
                                   name     RR@10   nDCG@10   AP@100
0  scifact: BM25 >> tct_colbert_msmarco  0.652321  0.687148  0.64768


100%|██████████| 5183/5183 [00:00<00:00, 1057142.46it/s]


Experiment took 89.397 seconds to execute.
                                   name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> tct_colbert_msmarco  0.549832  0.581373  0.543131


100%|██████████| 5183/5183 [00:00<00:00, 880623.74it/s]


Experiment took 121.884 seconds to execute.
                                   name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> tct_colbert_msmarco  0.623265  0.654808  0.618561


In [None]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/train"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
# alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, 0.9]
alpha_vals = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.6, 0.8, 0.9]
maxi = 0
max_alpha = 0
for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True, alpha=alpha)
    print( result["nDCG@10"].iloc[0])
    if  result["nDCG@10"].iloc[0] > maxi:
        maxi = result["nDCG@10"].iloc[0]
        max_alpha = alpha

print(max_alpha)

100%|██████████| 5183/5183 [00:00<00:00, 225147.09it/s]


Experiment took 83.504 seconds to execute.
0.6349634005693594


100%|██████████| 5183/5183 [00:00<00:00, 758013.80it/s]


In [5]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/fiqa"
dataset_test_name = "irds:beir/fiqa/dev"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
# alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, 0.9]
alpha_vals = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.6, 0.8, 0.9]
maxi = 0
max_alpha = 0
for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name,
                                        pipeline_name,
                                        path_to_root, alpha=alpha)

    if result["nDCG@10"].iloc[0] > maxi:
        maxi = result["nDCG@10"].iloc[0]
        max_alpha = alpha
print("RESULTS")
print(maxi)
print(max_alpha)

100%|██████████| 57638/57638 [00:00<00:00, 421723.47it/s]
100%|██████████| 57638/57638 [00:00<00:00, 1021413.85it/s]
100%|██████████| 57638/57638 [00:00<00:00, 239432.39it/s]
100%|██████████| 57638/57638 [00:00<00:00, 1266515.93it/s]
100%|██████████| 57638/57638 [00:00<00:00, 1369261.28it/s]
100%|██████████| 57638/57638 [00:00<00:00, 1254182.98it/s]
100%|██████████| 57638/57638 [00:00<00:00, 1132425.34it/s]
100%|██████████| 57638/57638 [00:00<00:00, 1474677.73it/s]
100%|██████████| 57638/57638 [00:00<00:00, 1297443.20it/s]


RESULTS
0.33523034524498424
0.5


In [None]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/nfcorpus"
dataset_test_name = "irds:beir/nfcorpus/dev"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
# alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, 0.9]
alpha_vals = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.6, 0.8, 0.9]
maxi = 0
max_alpha = 0
for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name,
                                        pipeline_name,
                                        path_to_root, alpha=alpha)

    if result["nDCG@10"].iloc[0] > maxi:
        maxi = result["nDCG@10"].iloc[0]
        max_alpha = alpha
print("RESULTS")
print(maxi)
print(max_alpha)