## In this notebook we will showcase the experiments realised using the FFI framework with the gte-base-en-1.5 encoder for the following datasets:

* irds:beir/nfcorpus
* irds:beir/fiqa
* irds:beir/scidocs
* irds:beir/cqadupstack/english
* irds:beir/arguana
* irds:beir/scifact

General Setup for all the datasets( first 4 cells)

In [1]:
import pyterrier as pt

if not pt.started():
    pt.init()

PyTerrier 0.10.0 has loaded Terrier 5.9 (built by craigm on 2024-05-02 17:40) and terrier-helper 0.0.8

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


Evaluation metrics used for all the datasets

In [2]:
from pyterrier.measures import RR, nDCG, MAP

eval_metrics = [RR @ 10, nDCG @ 10, MAP @ 100]

Create the query encoder that will run on CPU. Encoder used for embedding all the datasets/queries

In [3]:
from encoders.gte_base_en_encoder import GTEBaseEncoder

package = "Alibaba-NLP/"
model_name = "gte-base-en-v1.5"
q_encoder = GTEBaseEncoder("Alibaba-NLP/gte-base-en-v1.5")


Defined the path to root

In [4]:
path_to_root = "../../"

#### BM25 >> gte-base-en-v1.5 irds:beir/nfcorpus

In [9]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/nfcorpus"
dev_set_name = "irds:beir/nfcorpus/dev"
dataset_test_name = "irds:beir/nfcorpus/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 3633/3633 [00:00<00:00, 1669541.63it/s]
GridScan: 100%|██████████| 4/4 [00:24<00:00,  6.16s/it]


Best map is 0.126401
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7fd97302ebf0> alpha=0.01']
Experiment took 7.976 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,nfcorpus: BM25 >> gte-base-en-v1.5,0.582751,0.364177,0.166036


#### BM25 >> gte-base-en-v1.5 irds:beir/fiqa

In [10]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/fiqa"
dev_set_name = "irds:beir/fiqa/dev"
dataset_test_name = "irds:beir/fiqa/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 57638/57638 [00:00<00:00, 1485159.51it/s]
GridScan: 100%|██████████| 4/4 [01:33<00:00, 23.34s/it]


Best map is 0.369902
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7fd972d16fb0> alpha=0.001']
Experiment took 56.353 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,fiqa: BM25 >> gte-base-en-v1.5,0.562507,0.474643,0.41231


#### BM25 >> gte-base-en-v1.5 irds:beir/scidocs

In [6]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/scidocs"
dataset = pt.get_dataset(dataset_name)
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name,
                      path_to_root, timed=True)


100%|██████████| 25657/25657 [00:00<00:00, 817910.16it/s]


Experiment took 87.282 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scidocs: BM25 >> gte-base-en-v1.5,0.359105,0.212301,0.147581


#### BM25 >> gte-base-en-v1.5 irds:beir/cqadupstack/english

In [5]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/cqadupstack/english"
dataset = pt.get_dataset("irds:beir/cqadupstack/english")
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name,
                      path_to_root, timed=True)


100%|██████████| 40221/40221 [00:00<00:00, 1638046.19it/s]


Experiment took 179.060 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,cqadupstack_english: BM25 >> gte-base-en-v1.5,0.4057,0.394129,0.359377


#### BM25 >> gte-base-en-v1.5 irds:beir/arguana

#### DelftBlue runtime : 15 minutes. Local runtime : 20 minutes

In [7]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/arguana"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_name, q_encoder, eval_metrics, model_name, pipeline_name, path_to_root,
                           timed=True)

100%|██████████| 8674/8674 [00:00<00:00, 1273679.91it/s]


Experiment took 1095.980 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,arguana: BM25 >> gte-base-en-v1.5,0.304558,0.444803,0.310985


#### BM25 >> gte-base-en-v1.5 irds:beir/scifact

In [9]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/test"
dev_set_name = "irds:beir/scifact/train"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 5183/5183 [00:00<00:00, 1273973.14it/s]
GridScan: 100%|██████████| 4/4 [03:06<00:00, 46.62s/it]


Best map is 0.723258
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7f8f50bb83d0> alpha=0.005']
Experiment took 21.415 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scifact: BM25 >> gte-base-en-v1.5,0.719434,0.759086,0.71864


#### BM25 >> gte-base-en-v1.5 irds:msmarco-passage/trec-dl-2019

In [10]:
from experiment_utils.experiments_helper import default_test_pipeline_name
from config.sparse_pyterrier_index_global_paths import MSMARCO_PASSAGE_TERRIER_STEMMED_PATH

dataset_test_name = "irds:msmarco-passage/trec-dl-2019"
pipeline_name = "BM25 >> " + model_name
default_test_pipeline_name(dataset_test_name, dataset_test_name, q_encoder, eval_metrics, model_name,pipeline_name,path_to_root, timed=True, in_memory_sparse=False,
in_memory_dense=False, index_path=MSMARCO_PASSAGE_TERRIER_STEMMED_PATH)

100%|██████████| 8841823/8841823 [00:12<00:00, 720781.43it/s] 


Experiment took 290.126 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,passage_trec_dl_2019: BM25 >> gte-base-en-v1.5,0.984496,0.718658,0.422681


In [5]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/train"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, 0,9]

for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True, alpha=alpha)
    print(result)

100%|██████████| 5183/5183 [00:00<00:00, 1110099.46it/s]


Experiment took 87.996 seconds to execute.
                                name    RR@10   nDCG@10    AP@100
0  scifact: BM25 >> gte-base-en-v1.5  0.70743  0.739848  0.704782


100%|██████████| 5183/5183 [00:00<00:00, 1042691.62it/s]


Experiment took 84.112 seconds to execute.
                                name     RR@10  nDCG@10    AP@100
0  scifact: BM25 >> gte-base-en-v1.5  0.728565  0.76019  0.727808


100%|██████████| 5183/5183 [00:00<00:00, 1026299.58it/s]


Experiment took 74.502 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> gte-base-en-v1.5  0.720289  0.753616  0.718874


100%|██████████| 5183/5183 [00:00<00:00, 1291149.11it/s]


Experiment took 74.330 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> gte-base-en-v1.5  0.705533  0.742014  0.702155


100%|██████████| 5183/5183 [00:00<00:00, 495850.50it/s]


Experiment took 71.971 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> gte-base-en-v1.5  0.680539  0.715878  0.676222


100%|██████████| 5183/5183 [00:00<00:00, 1415949.82it/s]


Experiment took 75.200 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> gte-base-en-v1.5  0.661715  0.696382  0.657038


100%|██████████| 5183/5183 [00:00<00:00, 586433.17it/s]


Experiment took 74.257 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> gte-base-en-v1.5  0.653517  0.687996  0.648555


100%|██████████| 5183/5183 [00:00<00:00, 1117116.01it/s]


Experiment took 73.727 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> gte-base-en-v1.5  0.646598  0.681191  0.642089


100%|██████████| 5183/5183 [00:00<00:00, 1271068.10it/s]


Experiment took 70.629 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> gte-base-en-v1.5  0.643701  0.678644  0.639047


100%|██████████| 5183/5183 [00:00<00:00, 1169333.42it/s]


Experiment took 69.514 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> gte-base-en-v1.5  0.689084  0.722969  0.685904


100%|██████████| 5183/5183 [00:00<00:00, 307461.67it/s]


Experiment took 70.500 seconds to execute.
                                name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> gte-base-en-v1.5  0.641956  0.675832  0.637287


In [None]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/train"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
# alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, 0.9]
alpha_vals = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.6, 0.8, 0.9]
maxi = 0
max_alpha = 0
for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True, alpha=alpha)
    print( result["nDCG@10"].iloc[0])
    if  result["nDCG@10"].iloc[0] > maxi:
        maxi = result["nDCG@10"].iloc[0]
        max_alpha = alpha
print("Results")
print(maxi)
print(max_alpha)

100%|██████████| 5183/5183 [00:00<00:00, 499301.26it/s]


In [None]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/fiqa"
dataset_test_name = "irds:beir/fiqa/dev"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
# alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, 0.9]
alpha_vals = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.6, 0.8, 0.9]
maxi = 0
max_alpha = 0
for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name,
                                        pipeline_name,
                                        path_to_root, alpha=alpha)

    if result["nDCG@10"].iloc[0] > maxi:
        maxi = result["nDCG@10"].iloc[0]
        max_alpha = alpha
print("RESULTS")
print(maxi)
print(max_alpha)

In [None]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/nfcorpus"
dataset_test_name = "irds:beir/nfcorpus/dev"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
# alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, 0.9]
alpha_vals = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.6, 0.8, 0.9]
maxi = 0
max_alpha = 0
for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name,
                                        pipeline_name,
                                        path_to_root, alpha=alpha)

    if result["nDCG@10"].iloc[0] > maxi:
        maxi = result["nDCG@10"].iloc[0]
        max_alpha = alpha
print("RESULTS")
print(maxi)
print(max_alpha)