## In this notebook we will showcase the experiments realised using the FFI framework with the bge-base-en-v1.5 encoder for the following datasets:

* irds:beir/nfcorpus
* irds:beir/fiqa
* irds:beir/scidocs
* irds:beir/cqadupstack/english
* irds:beir/arguana
* irds:beir/scifact

General Setup for all the datasets( first 4 cells)

In [1]:
import pyterrier as pt

if not pt.started():
    pt.init()

PyTerrier 0.10.0 has loaded Terrier 5.9 (built by craigm on 2024-05-02 17:40) and terrier-helper 0.0.8

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


Evaluation metrics used for all the datasets

In [2]:
from pyterrier.measures import RR, nDCG, MAP

eval_metrics = [RR @ 10, nDCG @ 10, MAP @ 100]

Create the query encoder that will run on CPU. Encoder used for embedding all the datasets/queries

In [3]:
from encoders.bge_base_en import BgeEncoder

package = "BAAI/"
model_name = "bge-base-en-v1.5"
q_encoder = BgeEncoder(package + model_name)


Defined the path to root

In [4]:
path_to_root = "../../"

## In the following cells, for each dataset we will run FFI using BM25 for the first stage retrieval and bge-base-en-v1.5 for the semantic re-ranking:

#### BM25 >> bge-base-en-v1.5 irds:beir/nfcorpus

In [6]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/nfcorpus"
dev_set_name = "irds:beir/nfcorpus/dev"
dataset_test_name = "irds:beir/nfcorpus/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 3633/3633 [00:00<00:00, 1036451.26it/s]
GridScan: 100%|██████████| 4/4 [00:29<00:00,  7.40s/it]


Best map is 0.128253
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7f67849514b0> alpha=0.005']
Experiment took 10.244 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,nfcorpus: BM25 >> bge-base-en-v1.5,0.578157,0.366086,0.16912


#### BM25 >> bge-base-en-v1.5 irds:beir/fiqa

In [7]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/fiqa"
dev_set_name = "irds:beir/fiqa/dev"
dataset_test_name = "irds:beir/fiqa/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 57638/57638 [00:00<00:00, 330160.75it/s]
GridScan: 100%|██████████| 4/4 [01:53<00:00, 28.37s/it]


Best map is 0.336038
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7f67af5b3cd0> alpha=0.005']
Experiment took 84.289 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,fiqa: BM25 >> bge-base-en-v1.5,0.482286,0.403686,0.341608


#### BM25 >> bge-base-en-v1.5 irds:beir/cqadupstack/english

In [10]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/test"
dev_set_name = "irds:beir/scifact/train"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 5183/5183 [00:00<00:00, 1099154.50it/s]
GridScan: 100%|██████████| 4/4 [02:53<00:00, 43.32s/it]


Best map is 0.742524
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7faea371a350> alpha=0.001']
Experiment took 22.253 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scifact: BM25 >> bge-base-en-v1.5,0.740409,0.768561,0.735928


#### BM25 >> bge-base-en-v1.5 irds:msmarco-passage/trec-dl-2019

In [11]:
from experiment_utils.experiments_helper import default_test_pipeline_name
from config.sparse_pyterrier_index_global_paths import MSMARCO_PASSAGE_TERRIER_STEMMED_PATH

dataset_test_name = "irds:msmarco-passage/trec-dl-2019"
pipeline_name = "BM25 >> " + model_name
default_test_pipeline_name(dataset_test_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True, in_memory_sparse=False,
                           in_memory_dense=False, index_path=MSMARCO_PASSAGE_TERRIER_STEMMED_PATH)

100%|██████████| 8841823/8841823 [00:11<00:00, 738279.69it/s] 


Experiment took 270.859 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,passage_trec_dl_2019: BM25 >> bge-base-en-v1.5,0.982558,0.686668,0.422299


In [5]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/train"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, 0,9]

for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True, alpha=alpha)
    print(result)

100%|██████████| 5183/5183 [00:00<00:00, 696162.86it/s]


Experiment took 166.165 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> bge-base-en-v1.5  0.751103  0.779025  0.743871


100%|██████████| 5183/5183 [00:00<00:00, 1281860.82it/s]


Experiment took 84.036 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> bge-base-en-v1.5  0.743868  0.775129  0.738358


100%|██████████| 5183/5183 [00:00<00:00, 563188.54it/s]


Experiment took 85.967 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> bge-base-en-v1.5  0.724432  0.756248  0.719584


100%|██████████| 5183/5183 [00:00<00:00, 706295.77it/s]


Experiment took 112.330 seconds to execute.
                                name     RR@10   nDCG@10   AP@100
0  scifact: BM25 >> bge-base-en-v1.5  0.698478  0.731764  0.69494


100%|██████████| 5183/5183 [00:00<00:00, 796653.39it/s]


Experiment took 109.917 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> bge-base-en-v1.5  0.667512  0.703482  0.662442


100%|██████████| 5183/5183 [00:00<00:00, 1473136.66it/s]


Experiment took 81.408 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> bge-base-en-v1.5  0.654386  0.689373  0.649299


100%|██████████| 5183/5183 [00:00<00:00, 1440914.54it/s]


Experiment took 59.518 seconds to execute.
                                name    RR@10  nDCG@10    AP@100
0  scifact: BM25 >> bge-base-en-v1.5  0.64924  0.68315  0.644648


100%|██████████| 5183/5183 [00:00<00:00, 1342913.12it/s]


Experiment took 59.877 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> bge-base-en-v1.5  0.643904  0.678685  0.639312


100%|██████████| 5183/5183 [00:00<00:00, 1682330.73it/s]


Experiment took 57.279 seconds to execute.
                                name     RR@10  nDCG@10    AP@100
0  scifact: BM25 >> bge-base-en-v1.5  0.643066  0.67796  0.638344


100%|██████████| 5183/5183 [00:00<00:00, 1663153.36it/s]


Experiment took 53.429 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> bge-base-en-v1.5  0.734953  0.764916  0.728001


100%|██████████| 5183/5183 [00:00<00:00, 1462630.53it/s]


Experiment took 52.626 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> bge-base-en-v1.5  0.641886  0.676291  0.637212


In [5]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/train"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
# alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, 0.9]
alpha_vals = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.6, 0.8, 0.9]
maxi = 0
max_alpha = 0
for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True, alpha=alpha)
    print( result["nDCG@10"].iloc[0])
    if  result["nDCG@10"].iloc[0] > maxi:
        maxi = result["nDCG@10"].iloc[0]
        max_alpha = alpha

print(max_alpha)

100%|██████████| 5183/5183 [00:00<00:00, 917996.61it/s]


Experiment took 82.593 seconds to execute.
0.7751159573621502


100%|██████████| 5183/5183 [00:00<00:00, 1211833.30it/s]


Experiment took 64.023 seconds to execute.
0.7829849465580678


100%|██████████| 5183/5183 [00:00<00:00, 1214677.19it/s]


Experiment took 70.606 seconds to execute.
0.7842919580106625


100%|██████████| 5183/5183 [00:00<00:00, 1487449.72it/s]


Experiment took 76.708 seconds to execute.
0.7791299086345621


100%|██████████| 5183/5183 [00:00<00:00, 1134725.84it/s]


Experiment took 64.776 seconds to execute.
0.775140919446663


100%|██████████| 5183/5183 [00:00<00:00, 1105020.97it/s]


Experiment took 61.039 seconds to execute.
0.7635949055610315


100%|██████████| 5183/5183 [00:00<00:00, 1230769.27it/s]


Experiment took 62.995 seconds to execute.
0.7635949055610315


100%|██████████| 5183/5183 [00:00<00:00, 1321926.28it/s]


Experiment took 58.427 seconds to execute.
0.7311620504981617


100%|██████████| 5183/5183 [00:00<00:00, 876858.57it/s]


Experiment took 59.705 seconds to execute.
0.7064335688967621
0.3
