    ## In this notebook we will showcase the experiments realised using the FFI framework with the gte-base-en-1.5 encoder for the following datasets:

* irds:beir/nfcorpus
* irds:beir/fiqa
* irds:beir/scidocs
* irds:beir/cqadupstack/english
* irds:beir/arguana
* irds:beir/scifact

General Setup for all the datasets( first 4 cells)

In [1]:
import pyterrier as pt

if not pt.started():
    pt.init()

PyTerrier 0.10.0 has loaded Terrier 5.9 (built by craigm on 2024-05-02 17:40) and terrier-helper 0.0.8

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


Evaluation metrics used for all the datasets

In [2]:
from pyterrier.measures import RR, nDCG, MAP

eval_metrics = [RR @ 10, nDCG @ 10, MAP @ 100]

Create the query encoder that will run on CPU. Encoder used for embedding all the datasets/queries

In [3]:
from encoders.gte_base_en_encoder import GTEBaseDocumentEncoder

package = "Alibaba-NLP/"
model_name = "gte-base-en-v1.5"
q_encoder = GTEBaseDocumentEncoder(package + model_name)


Defined the path to root

In [4]:
path_to_root = "../../"

## In the following cells, for each dataset we will run 2 tests using:

* BM25 sparse retrieval
* FFI using the BM25 for first stage retrieval and gte-base-en-v1.5 model for the second stage retrival


#### BM25 irds:beir/nfcorpus

In [5]:
from experiment_utils.experiments_helper import test_first_stage_retrieval_name

dataset_name = "irds:beir/nfcorpus"
dataset_test_name = "irds:beir/nfcorpus/test"
pipeline_name = "BM25"

test_first_stage_retrieval_name(dataset_name, dataset_test_name, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 3.586 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,nfcorpus: BM25,0.534378,0.322219,0.143582


#### BM25 >> gte-base-en-v1.5 irds:beir/nfcorpus

In [9]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/nfcorpus"
dev_set_name = "irds:beir/nfcorpus/dev"
dataset_test_name = "irds:beir/nfcorpus/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 3633/3633 [00:00<00:00, 1669541.63it/s]
GridScan: 100%|██████████| 4/4 [00:24<00:00,  6.16s/it]


Best map is 0.126401
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7fd97302ebf0> alpha=0.01']
Experiment took 7.976 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,nfcorpus: BM25 >> gte-base-en-v1.5,0.582751,0.364177,0.166036


#### BM25 irds:beir/fiqa

In [6]:
from experiment_utils.experiments_helper import test_first_stage_retrieval_name

dataset_name = "irds:beir/fiqa"
dev_set_name = "irds:beir/fiqa/dev"
dataset_test_name = "irds:beir/fiqa/test"
pipeline_name = "BM25"

test_first_stage_retrieval_name(dataset_name, dataset_test_name, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 13.647 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,fiqa: BM25,0.310271,0.252589,0.20864


#### BM25 >> gte-base-en-v1.5 irds:beir/fiqa

### For the Scidocs dataset and the other datasets below, dev sets were not available and the alpha value was set to a default of 0.05%

In [10]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/fiqa"
dev_set_name = "irds:beir/fiqa/dev"
dataset_test_name = "irds:beir/fiqa/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 57638/57638 [00:00<00:00, 1485159.51it/s]
GridScan: 100%|██████████| 4/4 [01:33<00:00, 23.34s/it]


Best map is 0.369902
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7fd972d16fb0> alpha=0.001']
Experiment took 56.353 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,fiqa: BM25 >> gte-base-en-v1.5,0.562507,0.474643,0.41231


#### BM25 irds:beir/scidocs

In [7]:
from experiment_utils.experiments_helper import test_first_stage_retrieval

dataset_name = "irds:beir/scidocs"
dataset = pt.get_dataset(dataset_name)
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25"

test_first_stage_retrieval(dataset_name, test_topics, test_qrels, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 21.843 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scidocs: BM25,0.253743,0.147069,0.097278


#### BM25 >> gte-base-en-v1.5 irds:beir/scidocs

In [11]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/scidocs"
dataset = pt.get_dataset(dataset_name)
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name,
                      path_to_root, timed=True, alpha=0.05)


100%|██████████| 25657/25657 [00:00<00:00, 1068005.06it/s]


Experiment took 150.884 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scidocs: BM25 >> gte-base-en-v1.5,0.290996,0.168889,0.114767


#### BM25 irds:beir/scidocscqadupstack/english

In [6]:
from experiment_utils.experiments_helper import test_first_stage_retrieval

dataset_name = "irds:beir/cqadupstack/english"
dataset = pt.get_dataset(dataset_name)
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25"

test_first_stage_retrieval(dataset_name, test_topics, test_qrels, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 30.998 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,cqadupstack_english: BM25,0.284848,0.279502,0.252386


#### BM25 >> gte-base-en-v1.5 irds:beir/scidocscqadupstack/english

In [5]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/cqadupstack/english"
dataset = pt.get_dataset("irds:beir/cqadupstack/english")
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name,
                      path_to_root, timed=True, alpha=0.05)


100%|██████████| 40221/40221 [00:00<00:00, 250622.62it/s]


Experiment took 269.557 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,cqadupstack_english: BM25 >> gte-base-en-v1.5,0.360818,0.351231,0.320657


#### BM25 irds:beir/arguana

In [8]:
from experiment_utils.experiments_helper import test_first_stage_retrieval_name

dataset_name = "irds:beir/arguana"
pipeline_name = "BM25"

test_first_stage_retrieval_name(dataset_name, dataset_name, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 37.611 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,arguana: BM25,0.225617,0.342442,0.236988


#### BM25 >> gte-base-en-v1.5 irds:beir/arguana

#### DelftBlue runtime : 15 minutes. Local runtime : 20 minutes

In [10]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/arguana"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_name, q_encoder, eval_metrics, model_name, pipeline_name, path_to_root,
                           timed=True)

100%|██████████| 8674/8674 [00:00<00:00, 1156874.62it/s]


Experiment took 1077.244 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,arguana: BM25 >> gte-base-en-v1.5,0.250022,0.376302,0.260735


#### BM25 irds:beir/scifact

In [13]:
from experiment_utils.experiments_helper import test_first_stage_retrieval_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/test"
pipeline_name = "BM25"

test_first_stage_retrieval_name(dataset_name, dataset_test_name, eval_metrics, pipeline_name, path_to_root, timed=True)

Experiment took 5.821 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scifact: BM25,0.632427,0.672167,0.626749


#### BM25 >> gte-base-en-v1.5 irds:beir/scifact

In [14]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True)

100%|██████████| 5183/5183 [00:00<00:00, 1567571.22it/s]


Experiment took 22.114 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scifact: BM25 >> gte-base-en-v1.5,0.669475,0.708775,0.664073


# Is it correct to use the train set for alpha tuning given that we do not have a dev set?

In [15]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/test"
dev_set_name = "irds:beir/scifact/train"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 5183/5183 [00:00<00:00, 1188057.58it/s]
GridScan: 100%|██████████| 4/4 [02:47<00:00, 41.93s/it]


Best map is 0.723258
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7f52eac5c3a0> alpha=0.005']
Experiment took 20.988 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scifact: BM25 >> gte-base-en-v1.5,0.719434,0.759086,0.71864


In [5]:
from experiment_utils.experiments_helper import default_test_pipeline_name
from config.sparse_pyterrier_index_global_paths import MSMARCO_PASSAGE_TERRIER_STEMMED_PATH

dataset_test_name = "irds:msmarco-passage/trec-dl-2019"
pipeline_name = "BM25 >> " + model_name
default_test_pipeline_name(dataset_test_name, dataset_test_name, q_encoder, eval_metrics, model_name,pipeline_name,path_to_root, timed=True, in_memory_sparse=False,
in_memory_dense=False, index_path=MSMARCO_PASSAGE_TERRIER_STEMMED_PATH)

100%|██████████| 8841823/8841823 [00:54<00:00, 163074.65it/s]


Experiment took 728.144 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,passage_trec_dl_2019: BM25 >> gte-base-en-v1.5,0.930233,0.606804,0.366907
