## In this notebook we will showcase the experiments realised using the FFI framework with the bge-base-en-v1.5 encoder for the following datasets:

* irds:beir/nfcorpus
* irds:beir/fiqa
* irds:beir/scidocs
* irds:beir/cqadupstack/english
* irds:beir/arguana
* irds:beir/scifact

General Setup for all the datasets( first 4 cells)

In [1]:
import pyterrier as pt

if not pt.started():
    pt.init()

PyTerrier 0.10.0 has loaded Terrier 5.9 (built by craigm on 2024-05-02 17:40) and terrier-helper 0.0.8

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


Evaluation metrics used for all the datasets

In [2]:
from pyterrier.measures import RR, nDCG, MAP

eval_metrics = [RR @ 10, nDCG @ 10, MAP @ 100]

Create the query encoder that will run on CPU. Encoder used for embedding all the datasets/queries

In [3]:
from encoders.bge_base_en import BgeEncoder

package = "BAAI/"
model_name = "bge-base-en-v1.5"
q_encoder = BgeEncoder(package + model_name)


Defined the path to root

In [4]:
path_to_root = "../../"

## In the following cells, for each dataset we will run 2 tests using:

* BM25 sparse retrieval
* FFI using the BM25 for first stage retrieval and bge-base-en-v1.5 model for the second stage retrival


#### BM25 irds:beir/nfcorpus

In [5]:
from experiment_utils.experiments_helper import test_first_stage_retrieval_name

dataset_name = "irds:beir/nfcorpus"
dataset_test_name = "irds:beir/nfcorpus/test"
pipeline_name = "BM25"

test_first_stage_retrieval_name(dataset_name, dataset_test_name, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 5.415 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,nfcorpus: BM25,0.534378,0.322219,0.143582


#### BM25 >> bge-base-en-v1.5 irds:beir/nfcorpus

In [6]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/nfcorpus"
dev_set_name = "irds:beir/nfcorpus/dev"
dataset_test_name = "irds:beir/nfcorpus/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 3633/3633 [00:00<00:00, 1036451.26it/s]
GridScan: 100%|██████████| 4/4 [00:29<00:00,  7.40s/it]


Best map is 0.128253
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7f67849514b0> alpha=0.005']
Experiment took 10.244 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,nfcorpus: BM25 >> bge-base-en-v1.5,0.578157,0.366086,0.16912


#### BM25 irds:beir/fiqa

In [6]:
from experiment_utils.experiments_helper import test_first_stage_retrieval_name

dataset_name = "irds:beir/fiqa"
dev_set_name = "irds:beir/fiqa/dev"
dataset_test_name = "irds:beir/fiqa/test"
pipeline_name = "BM25"

test_first_stage_retrieval_name(dataset_name, dataset_test_name, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 13.647 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,fiqa: BM25,0.310271,0.252589,0.20864


#### BM25 >> bge-base-en-v1.5 irds:beir/fiqa

In [7]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/fiqa"
dev_set_name = "irds:beir/fiqa/dev"
dataset_test_name = "irds:beir/fiqa/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 57638/57638 [00:00<00:00, 330160.75it/s]
GridScan: 100%|██████████| 4/4 [01:53<00:00, 28.37s/it]


Best map is 0.336038
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7f67af5b3cd0> alpha=0.005']
Experiment took 84.289 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,fiqa: BM25 >> bge-base-en-v1.5,0.482286,0.403686,0.341608


### For the Scidocs dataset and the other datasets below, dev sets were not available and the alpha value was set to a default of 0.05%

#### BM25 irds:beir/scidocs

In [7]:
from experiment_utils.experiments_helper import test_first_stage_retrieval

dataset_name = "irds:beir/scidocs"
dataset = pt.get_dataset(dataset_name)
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25"

test_first_stage_retrieval(dataset_name, test_topics, test_qrels, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 21.843 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scidocs: BM25,0.253743,0.147069,0.097278


#### BM25 >> bge-base-en-v1.5 irds:beir/scidocs

In [8]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/scidocs"
dataset = pt.get_dataset(dataset_name)
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name,
                      path_to_root, timed=True, alpha=0.05)


100%|██████████| 25657/25657 [00:00<00:00, 950445.65it/s]


Experiment took 135.130 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scidocs: BM25 >> bge-base-en-v1.5,0.280411,0.162646,0.110501


#### BM25 irds:beir/scidocscqadupstack/english

In [6]:
from experiment_utils.experiments_helper import test_first_stage_retrieval

dataset_name = "irds:beir/cqadupstack/english"
dataset = pt.get_dataset(dataset_name)
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25"

test_first_stage_retrieval(dataset_name, test_topics, test_qrels, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 30.998 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,cqadupstack_english: BM25,0.284848,0.279502,0.252386


#### BM25 >> bge-base-en-v1.5 irds:beir/scidocscqadupstack/english

In [9]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/cqadupstack/english"
dataset = pt.get_dataset("irds:beir/cqadupstack/english")
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name,
                      path_to_root, timed=True, alpha=0.05)


100%|██████████| 40221/40221 [00:00<00:00, 1046682.81it/s]


Experiment took 325.548 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,cqadupstack_english: BM25 >> bge-base-en-v1.5,0.343584,0.334487,0.30262


#### BM25 irds:beir/arguana

In [8]:
from experiment_utils.experiments_helper import test_first_stage_retrieval_name

dataset_name = "irds:beir/arguana"
pipeline_name = "BM25"

test_first_stage_retrieval_name(dataset_name, dataset_name, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 37.611 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,arguana: BM25,0.225617,0.342442,0.236988


#### BM25 >> bge-base-en-v1.5 irds:beir/arguana

#### Local runtime : x minutes

In [8]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/arguana"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_name, q_encoder, eval_metrics, model_name, pipeline_name, path_to_root,
                           timed=True)

100%|██████████| 8674/8674 [00:00<00:00, 1217094.64it/s]


Experiment took 982.696 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,arguana: BM25 >> bge-base-en-v1.5,0.245394,0.368852,0.255896


#### BM25 irds:beir/scifact

In [13]:
from experiment_utils.experiments_helper import test_first_stage_retrieval_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/test"
pipeline_name = "BM25"

test_first_stage_retrieval_name(dataset_name, dataset_test_name, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 5.821 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scifact: BM25,0.632427,0.672167,0.626749


#### BM25 >> bge-base-en-v1.5 irds:beir/scifact

In [9]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True)

100%|██████████| 5183/5183 [00:00<00:00, 1281331.94it/s]


Experiment took 33.471 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scifact: BM25 >> bge-base-en-v1.5,0.657052,0.695476,0.65038


# Is it correct to use the train set for alpha tuning given that we do not have a dev set?

In [10]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/test"
dev_set_name = "irds:beir/scifact/train"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 5183/5183 [00:00<00:00, 810071.46it/s]
GridScan: 100%|██████████| 4/4 [02:41<00:00, 40.33s/it]


Best map is 0.742524
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7f3f6b0c3a90> alpha=0.001']
Experiment took 18.762 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scifact: BM25 >> bge-base-en-v1.5,0.740409,0.768561,0.735928


#### BM25 irds:msmarco-passage/trec-dl-2019

In [7]:
from experiment_utils.experiments_helper import test_first_stage_retrieval_name
from config.sparse_pyterrier_index_global_paths import MSMARCO_PASSAGE_TERRIER_STEMMED_PATH

dataset_test_name = "irds:msmarco-passage/trec-dl-2019"
pipeline_name = "BM25"

test_first_stage_retrieval_name(dataset_test_name, dataset_test_name, eval_metrics, pipeline_name, path_to_root,
                                timed=True,
                                in_memory_sparse=False, index_path=MSMARCO_PASSAGE_TERRIER_STEMMED_PATH)


Experiment took 15.203 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,passage_trec_dl_2019: BM25,0.79438,0.47954,0.290692


#### BM25 >> bge-base-en-v1.5 irds:msmarco-passage/trec-dl-2019

In [7]:
from experiment_utils.experiments_helper import default_test_pipeline_name
from config.sparse_pyterrier_index_global_paths import MSMARCO_PASSAGE_TERRIER_STEMMED_PATH

dataset_test_name = "irds:msmarco-passage/trec-dl-2019"
pipeline_name = "BM25 >> " + model_name
default_test_pipeline_name(dataset_test_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True, in_memory_sparse=False,
                           in_memory_dense=False, index_path=MSMARCO_PASSAGE_TERRIER_STEMMED_PATH)

100%|██████████| 8841823/8841823 [00:11<00:00, 749690.13it/s] 


Experiment took 290.133 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,passage_trec_dl_2019: BM25 >> bge-base-en-v1.5,0.884496,0.553799,0.342877
