## In this notebook we will showcase the experiments realised using the FFI framework with the snowflake-arctic-embed-m encoder for the following datasets:

* irds:beir/nfcorpus
* irds:beir/fiqa
* irds:beir/scidocs
* irds:beir/cqadupstack/english
* irds:beir/arguana
* irds:beir/scifact

General Setup for all the datasets( first 4 cells)

In [1]:
import pyterrier as pt

if not pt.started():
    pt.init()

PyTerrier 0.10.0 has loaded Terrier 5.9 (built by craigm on 2024-05-02 17:40) and terrier-helper 0.0.8

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


Evaluation metrics used for all the datasets

In [2]:
from pyterrier.measures import RR, nDCG, MAP

eval_metrics = [RR @ 10, nDCG @ 10, MAP @ 100]

Create the query encoder that will run on CPU. Encoder used for embedding all the datasets/queries

In [3]:
from encoders.snowflake_arctic_embed_m import SnowFlakeQueryEncoder

package = "Snowflake/"
model_name = "snowflake-arctic-embed-m"
q_encoder = SnowFlakeQueryEncoder(package + model_name)


Defined the path to root

In [4]:
path_to_root = "../../"

#### BM25 >> snowflake-arctic-embed-m irds:beir/nfcorpus

In [5]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/nfcorpus"
dev_set_name = "irds:beir/nfcorpus/dev"
dataset_test_name = "irds:beir/nfcorpus/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 3633/3633 [00:00<00:00, 352868.18it/s]
GridScan: 100%|██████████| 4/4 [00:30<00:00,  7.63s/it]


Best map is 0.130506
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7f9a12fcbf70> alpha=0.005']
Experiment took 10.047 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,nfcorpus: BM25 >> snowflake-arctic-embed-m,0.588965,0.36285,0.165687


#### BM25 >> snowflake-arctic-embed-m irds:beir/fiqa

In [9]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/fiqa"
dev_set_name = "irds:beir/fiqa/dev"
dataset_test_name = "irds:beir/fiqa/test"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 57638/57638 [00:00<00:00, 1540386.22it/s]
GridScan: 100%|██████████| 4/4 [01:28<00:00, 22.01s/it]


Best map is 0.352237
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7f68e3d1a530> alpha=0.001']
Experiment took 47.453 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,fiqa: BM25 >> snowflake-arctic-embed-m,0.50829,0.423863,0.364366


#### BM25 >> snowflake-arctic-embed-m irds:beir/scidocs

In [5]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/scidocs"
dataset = pt.get_dataset(dataset_name)
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name,
                      path_to_root, timed=True)


100%|██████████| 25657/25657 [00:00<00:00, 614623.83it/s]


Experiment took 84.253 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scidocs: BM25 >> snowflake-arctic-embed-m,0.335683,0.198038,0.136153


#### BM25 >> snowflake-arctic-embed-m irds:beir/scidocscqadupstack/english

In [6]:
from experiment_utils.experiments_helper import default_test_pipeline

dataset_name = "irds:beir/cqadupstack/english"
dataset = pt.get_dataset("irds:beir/cqadupstack/english")
test_topics = dataset.get_topics('text')
test_qrels = dataset.get_qrels()
pipeline_name = "BM25 >> " + model_name

default_test_pipeline(dataset_name, test_topics, test_qrels, q_encoder, eval_metrics, model_name, pipeline_name, path_to_root, timed=True)


100%|██████████| 40221/40221 [00:00<00:00, 904805.10it/s]


Experiment took 192.824 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,cqadupstack_english: BM25 >> snowflake-arctic-...,0.426943,0.414886,0.381229


#### BM25 >> snowflake-arctic-embed-m irds:beir/arguana

#### Local runtime : 13 minutes

In [8]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/arguana"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_name, q_encoder, eval_metrics, model_name, pipeline_name, path_to_root, timed=True)

100%|██████████| 8674/8674 [00:00<00:00, 1051090.43it/s]


Experiment took 790.915 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,arguana: BM25 >> snowflake-arctic-embed-m,0.257686,0.384756,0.268826


#### BM25 >> snowflake-arctic-embed-m irds:beir/scifact

In [10]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/test"
dev_set_name = "irds:beir/scifact/train"
pipeline_name = "BM25 >> " + model_name

default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name, path_to_root, dev_set_name=dev_set_name, timed=True)

100%|██████████| 5183/5183 [00:00<00:00, 1442731.46it/s]
GridScan: 100%|██████████| 4/4 [02:48<00:00, 42.04s/it]


Best map is 0.715182
Best setting is ['<fast_forward.util.pyterrier.FFInterpolate object at 0x7f9977f9e230> alpha=0.005']
Experiment took 20.137 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,scifact: BM25 >> snowflake-arctic-embed-m,0.706074,0.748149,0.69906


#### BM25 >> arctic-embed-m irds:msmarco-passage/trec-dl-2019

In [11]:
from experiment_utils.experiments_helper import default_test_pipeline_name
from config.sparse_pyterrier_index_global_paths import MSMARCO_PASSAGE_TERRIER_STEMMED_PATH

dataset_test_name = "irds:msmarco-passage/trec-dl-2019"
pipeline_name = "BM25 >> " + model_name
default_test_pipeline_name(dataset_test_name, dataset_test_name, q_encoder, eval_metrics, model_name,pipeline_name,path_to_root, timed=True, in_memory_sparse=False,
in_memory_dense=False, index_path=MSMARCO_PASSAGE_TERRIER_STEMMED_PATH)

100%|██████████| 8841823/8841823 [00:11<00:00, 778834.60it/s] 


Experiment took 378.238 seconds to execute.


Unnamed: 0,name,RR@10,nDCG@10,AP@100
0,passage_trec_dl_2019: BM25 >> snowflake-arctic...,0.907752,0.691548,0.426755


In [5]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/train"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, ]

for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True, alpha=alpha)
    print(result)

100%|██████████| 5183/5183 [00:00<00:00, 1263973.35it/s]


Experiment took 165.219 seconds to execute.
                                        name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> snowflake-arctic-embed-m  0.725594  0.750758  0.718244


100%|██████████| 5183/5183 [00:00<00:00, 1082731.23it/s]


Experiment took 68.061 seconds to execute.
                                        name     RR@10  nDCG@10    AP@100
0  scifact: BM25 >> snowflake-arctic-embed-m  0.726893   0.7563  0.719885


100%|██████████| 5183/5183 [00:00<00:00, 1241949.13it/s]


Experiment took 56.256 seconds to execute.
                                        name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> snowflake-arctic-embed-m  0.720714  0.752778  0.716091


100%|██████████| 5183/5183 [00:00<00:00, 1186566.11it/s]


Experiment took 58.148 seconds to execute.
                                        name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> snowflake-arctic-embed-m  0.699535  0.734031  0.696262


100%|██████████| 5183/5183 [00:00<00:00, 1583442.18it/s]


Experiment took 53.250 seconds to execute.
                                        name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> snowflake-arctic-embed-m  0.672977  0.708186  0.668209


100%|██████████| 5183/5183 [00:00<00:00, 1397472.21it/s]


Experiment took 54.070 seconds to execute.
                                        name    RR@10   nDCG@10    AP@100
0  scifact: BM25 >> snowflake-arctic-embed-m  0.65313  0.689284  0.648537


100%|██████████| 5183/5183 [00:00<00:00, 1400713.76it/s]


Experiment took 56.283 seconds to execute.
                                        name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> snowflake-arctic-embed-m  0.647739  0.683547  0.643187


100%|██████████| 5183/5183 [00:00<00:00, 1673910.65it/s]


Experiment took 53.878 seconds to execute.
                                        name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> snowflake-arctic-embed-m  0.644956  0.679609  0.640381


100%|██████████| 5183/5183 [00:00<00:00, 1661120.01it/s]


Experiment took 53.181 seconds to execute.
                                        name    RR@10   nDCG@10    AP@100
0  scifact: BM25 >> snowflake-arctic-embed-m  0.64323  0.678266  0.638551


100%|██████████| 5183/5183 [00:00<00:00, 1636609.02it/s]


Experiment took 53.883 seconds to execute.
                                        name    RR@10  nDCG@10    AP@100
0  scifact: BM25 >> snowflake-arctic-embed-m  0.71706  0.74384  0.709463


100%|██████████| 5183/5183 [00:00<00:00, 1411628.42it/s]


Experiment took 55.038 seconds to execute.
                                        name     RR@10   nDCG@10    AP@100
0  scifact: BM25 >> snowflake-arctic-embed-m  0.642105  0.676105  0.637102


In [5]:
from experiment_utils.experiments_helper import default_test_pipeline_name

dataset_name = "irds:beir/scifact"
dataset_test_name = "irds:beir/scifact/train"  #choosen as dev is not available
pipeline_name = "BM25 >> " + model_name
# alpha_vals = [0.001, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 0.7, 0.9]
alpha_vals = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.6, 0.8, 0.9]
maxi = 0
max_alpha = 0
for alpha in alpha_vals:
    result = default_test_pipeline_name(dataset_name, dataset_test_name, q_encoder, eval_metrics, model_name, pipeline_name,
                           path_to_root, timed=True, alpha=alpha)
    print( result["nDCG@10"].iloc[0])
    if  result["nDCG@10"].iloc[0] > maxi:
        maxi = result["nDCG@10"].iloc[0]
        max_alpha = alpha

print(max_alpha)

100%|██████████| 5183/5183 [00:00<00:00, 328498.99it/s]


Experiment took 199.191 seconds to execute.
0.7497777473513524


100%|██████████| 5183/5183 [00:00<00:00, 1415673.20it/s]


Experiment took 232.174 seconds to execute.
0.7510823123153572


100%|██████████| 5183/5183 [00:00<00:00, 1268398.25it/s]


Experiment took 115.838 seconds to execute.
0.7542933732232102


100%|██████████| 5183/5183 [00:00<00:00, 1432558.66it/s]


Experiment took 75.016 seconds to execute.
0.7559668749099717


100%|██████████| 5183/5183 [00:00<00:00, 945876.41it/s]


Experiment took 111.058 seconds to execute.
0.7559835898788692


100%|██████████| 5183/5183 [00:00<00:00, 1240814.93it/s]


Experiment took 67.512 seconds to execute.
0.752606838240911


100%|██████████| 5183/5183 [00:00<00:00, 861875.18it/s]


Experiment took 68.227 seconds to execute.
0.752606838240911


100%|██████████| 5183/5183 [00:00<00:00, 1630227.04it/s]


Experiment took 57.719 seconds to execute.
0.7242708293674179


100%|██████████| 5183/5183 [00:00<00:00, 1353616.29it/s]


Experiment took 56.455 seconds to execute.
0.7040731421619785
0.5
