#### **BEIR BM25**

This notebook uses the open-source elastic search BM25 to establish a baseline on BEIR

##### Download Elastic Search

In [None]:
!wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.0.0-linux-x86_64.tar.gz -q
!tar -xzf elasticsearch-7.0.0-linux-x86_64.tar.gz
!chown -R daemon:daemon elasticsearch-7.0.0
# start server
import os
from subprocess import Popen, PIPE, STDOUT
es_server = Popen(['elasticsearch-7.0.0/bin/elasticsearch'], 
                  stdout=PIPE, stderr=STDOUT,
                  preexec_fn=lambda: os.setuid(1)  # as daemon
                 )

In [None]:
import time
time.sleep(20)

In [None]:
# wait a bit then test - should show something like cluster_name" : "elasticsearch" ...
!curl -X GET "localhost:9200/"

{
  "name" : "fc060de88c38",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "6xWl5Pn4RtGiI9rl-Vo2FA",
  "version" : {
    "number" : "7.0.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "b7e28a7",
    "build_date" : "2019-04-05T22:55:32.697037Z",
    "build_snapshot" : false,
    "lucene_version" : "8.0.0",
    "minimum_wire_compatibility_version" : "6.7.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}


##### Run on BEIR - Sample

In [None]:
!pip install -q beir

In [None]:
from beir import util, LoggingHandler
import logging
# Code to print debug information to stdout
logging.basicConfig(format='%(asctime)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                    level=logging.INFO,
                    handlers=[LoggingHandler()])

  from tqdm.autonotebook import tqdm


In [None]:
dataset = "trec-covid"

In [None]:
from beir.datasets.data_loader import GenericDataLoader

url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{}.zip".format(dataset)
out_dir = os.path.join(os.getcwd(), "datasets")
data_path = util.download_and_unzip(url, out_dir)
print("Dataset downloaded here: {}".format(data_path))

# Load the dataset into BEIR
data_path = f"datasets/{dataset}"
corpus, queries, qrels = GenericDataLoader(data_path).load(split="test")

2021-11-23 03:57:01 - Downloading trec-covid.zip ...


/content/datasets/trec-covid.zip:   0%|          | 0.00/70.5M [00:00<?, ?iB/s]

2021-11-23 03:57:16 - Unzipping trec-covid.zip ...
Dataset downloaded here: /content/datasets/trec-covid
2021-11-23 03:57:18 - Loading Corpus...


  0%|          | 0/171332 [00:00<?, ?it/s]

2021-11-23 03:57:20 - Loaded 171332 TEST Documents.
2021-11-23 03:57:20 - Doc Example: {'text': 'OBJECTIVE: This retrospective chart review describes the epidemiology and clinical features of 40 patients with culture-proven Mycoplasma pneumoniae infections at King Abdulaziz University Hospital, Jeddah, Saudi Arabia. METHODS: Patients with positive M. pneumoniae cultures from respiratory specimens from January 1997 through December 1998 were identified through the Microbiology records. Charts of patients were reviewed. RESULTS: 40 patients were identified, 33 (82.5%) of whom required admission. Most infections (92.5%) were community-acquired. The infection affected all age groups but was most common in infants (32.5%) and pre-school children (22.5%). It occurred year-round but was most common in the fall (35%) and spring (30%). More than three-quarters of patients (77.5%) had comorbidities. Twenty-four isolates (60%) were associated with pneumonia, 14 (35%) with upper respiratory tract 

In [None]:
from beir.retrieval.search.lexical import BM25Search as BM25
from beir.retrieval.evaluation import EvaluateRetrieval

#### Provide parameters for elastic-search
hostname = "localhost"
index_name = dataset
initialize = True # True, will delete existing index with same name and reindex all documents

model = BM25(index_name=index_name, hostname=hostname, initialize=initialize)
retriever = EvaluateRetrieval(model)

#### Retrieve dense results (format of results is identical to qrels)
results = retriever.retrieve(corpus, queries)

2021-11-23 03:57:21 - Loading faiss with AVX2 support.
2021-11-23 03:57:21 - Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2021-11-23 03:57:21 - Loading faiss.
2021-11-23 03:57:21 - Successfully loaded faiss.
2021-11-23 03:57:21 - Activating Elasticsearch....
2021-11-23 03:57:21 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'trec-covid', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-23 03:57:21 - Deleting previous Elasticsearch-Index named - trec-covid
2021-11-23 03:57:21 - Creating fresh Elasticsearch-Index named - trec-covid


  0%|          | 0/171332 [00:00<?, ?docs/s]
que: 100%|██████████| 1/1 [00:05<00:00,  5.94s/it]


In [None]:
import json
# Save scores for top 1000 docs for each query, i.e. 1000 * queries lines
with open(f"./results_{dataset}.json", 'w') as fp:
    json.dump(results, fp)

In [None]:
#### Evaluate your retrieval using NDCG@k, MAP@K ...
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)

2021-11-23 03:58:12 - 

2021-11-23 03:58:12 - NDCG@1: 0.8200
2021-11-23 03:58:12 - NDCG@3: 0.7623
2021-11-23 03:58:12 - NDCG@5: 0.7249
2021-11-23 03:58:12 - NDCG@10: 0.6880
2021-11-23 03:58:12 - NDCG@100: 0.4845
2021-11-23 03:58:12 - NDCG@1000: 0.4330
2021-11-23 03:58:12 - 

2021-11-23 03:58:12 - MAP@1: 0.0023
2021-11-23 03:58:12 - MAP@3: 0.0062
2021-11-23 03:58:12 - MAP@5: 0.0096
2021-11-23 03:58:12 - MAP@10: 0.0170
2021-11-23 03:58:12 - MAP@100: 0.0852
2021-11-23 03:58:12 - MAP@1000: 0.2084
2021-11-23 03:58:12 - 

2021-11-23 03:58:12 - Recall@1: 0.0023
2021-11-23 03:58:12 - Recall@3: 0.0064
2021-11-23 03:58:12 - Recall@5: 0.0103
2021-11-23 03:58:12 - Recall@10: 0.0191
2021-11-23 03:58:12 - Recall@100: 0.1173
2021-11-23 03:58:12 - Recall@1000: 0.4128
2021-11-23 03:58:12 - 

2021-11-23 03:58:12 - P@1: 0.8800
2021-11-23 03:58:12 - P@3: 0.8000
2021-11-23 03:58:12 - P@5: 0.7680
2021-11-23 03:58:12 - P@10: 0.7340
2021-11-23 03:58:12 - P@100: 0.5084
2021-11-23 03:58:12 - P@1000: 0.2019


In [None]:
with open(f"./results_{dataset}.json", 'r') as fp:
    results_loaded = json.load(fp)

In [None]:
# Just evaluate
ndcg, _map, recall, precision = EvaluateRetrieval().evaluate(qrels, results, retriever.k_values)

2021-11-23 03:58:12 - 

2021-11-23 03:58:12 - NDCG@1: 0.8200
2021-11-23 03:58:12 - NDCG@3: 0.7623
2021-11-23 03:58:12 - NDCG@5: 0.7249
2021-11-23 03:58:12 - NDCG@10: 0.6880
2021-11-23 03:58:12 - NDCG@100: 0.4845
2021-11-23 03:58:12 - NDCG@1000: 0.4330
2021-11-23 03:58:12 - 

2021-11-23 03:58:12 - MAP@1: 0.0023
2021-11-23 03:58:12 - MAP@3: 0.0062
2021-11-23 03:58:12 - MAP@5: 0.0096
2021-11-23 03:58:12 - MAP@10: 0.0170
2021-11-23 03:58:12 - MAP@100: 0.0852
2021-11-23 03:58:12 - MAP@1000: 0.2084
2021-11-23 03:58:12 - 

2021-11-23 03:58:12 - Recall@1: 0.0023
2021-11-23 03:58:12 - Recall@3: 0.0064
2021-11-23 03:58:12 - Recall@5: 0.0103
2021-11-23 03:58:12 - Recall@10: 0.0191
2021-11-23 03:58:12 - Recall@100: 0.1173
2021-11-23 03:58:12 - Recall@1000: 0.4128
2021-11-23 03:58:12 - 

2021-11-23 03:58:12 - P@1: 0.8800
2021-11-23 03:58:12 - P@3: 0.8000
2021-11-23 03:58:12 - P@5: 0.7680
2021-11-23 03:58:12 - P@10: 0.7340
2021-11-23 03:58:12 - P@100: 0.5084
2021-11-23 03:58:12 - P@1000: 0.2019


##### Run on BEIR - Base datasets

In [None]:
import json

from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.search.lexical import BM25Search as BM25
from beir.retrieval.evaluation import EvaluateRetrieval

datasets_all = ["msmarco", "msmarco-v2", "trec-covid", "nfcorpus", "bioasq", "nq", "hotpotqa", "fiqa", 
                "signal1m", "trec-news", "arguana", "webis-touche2020", "cqadupstack", "quora", "dbpedia-entity",
                "scidocs", "fever", "climate-fever", "scifact", "robust04"]

# Subselecting all available ones
datasets = ["msmarco", "trec-covid", "nfcorpus", "nq", "hotpotqa", "fiqa", 
           "arguana", "webis-touche2020", "cqadupstack", "quora", "dbpedia-entity",
           "scidocs", "fever", "climate-fever", "scifact"]

#### Provide parameters for elastic-search
hostname = "localhost"
initialize = True # True, will delete existing index with same name and reindex all documents

ndcgs = {}

for i, dataset in enumerate(datasets):

    logging.info(f"\n{'-' * 10} Running {dataset} {'-' * 10}\n")

    url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{}.zip".format(dataset)
    out_dir = os.path.join(os.getcwd(), "datasets")
    data_path = util.download_and_unzip(url, out_dir)
    print("Dataset downloaded here: {}".format(data_path))
    # Load the dataset into BEIR
    data_path = f"datasets/{dataset}"
    # In the paper it says, BEIR used the dev set for msmarco
    split = "dev" if dataset == "msmarco" else "test"

    # cqadupstack - Contains several sub datasets
    if dataset == "cqadupstack":
        cqa_ndcgs, cqa_maps, cqa_recalls, cqa_precisions = [], [], [], []
        for sub_dataset in os.listdir(data_path):
            sub_data_path = f"datasets/{dataset}/{sub_dataset}"
            corpus, queries, qrels = GenericDataLoader(sub_data_path).load(split=split)
            model = BM25(index_name=sub_dataset, hostname=hostname, initialize=initialize)
            retriever = EvaluateRetrieval(model)
            #### Retrieve dense results (format of results is identical to qrels)
            results = retriever.retrieve(corpus, queries)
            # Save scores for top 1000 docs for each query, i.e. 1000 * queries lines
            with open(f"./results_{dataset}_{sub_dataset}.json", 'w') as fp:
                json.dump(results, fp)

            ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)

            cqa_ndcgs.append(ndcg)
            cqa_maps.append(_map)
            cqa_recalls.append(recall)
            cqa_precisions.append(precision)

        for (metric, group) in [(ndcg, cqa_ndcgs), (_map, cqa_maps), (recall, cqa_recalls), (precision, cqa_precisions)]:
            for k in metric.keys():
                metric[k] = sum([score[k] for score in group]) / len(group)

        logging.info("CQA Final")
        logging.info(f"{ndcg}")
        logging.info(f"{_map}")
        logging.info(f"{recall}")
        logging.info(f"{precision}")

    else:
        corpus, queries, qrels = GenericDataLoader(data_path).load(split=split)
        model = BM25(index_name=dataset, hostname=hostname, initialize=initialize)
        retriever = EvaluateRetrieval(model)
        #### Retrieve dense results (format of results is identical to qrels)
        results = retriever.retrieve(corpus, queries)
        # Save scores for top 1000 docs for each query, i.e. 1000 * queries lines
        with open(f"./results_{dataset}.json", 'w') as fp:
            json.dump(results, fp)
        ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)

    ndcgs[dataset] = ndcg
    # Clean-up each time to avoid running out of space
    !rm -r datasets

with open("./beir_bm25_ndcgs.json", 'w') as fp:
    json.dump(ndcgs, fp)

2021-11-10 06:13:58 - Loading faiss with AVX2 support.
2021-11-10 06:13:58 - Could not load library with AVX2 support due to:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
2021-11-10 06:13:58 - Loading faiss.
2021-11-10 06:13:58 - Successfully loaded faiss.
2021-11-10 06:13:58 - 
---------- Running msmarco ----------

2021-11-10 06:13:58 - Downloading msmarco.zip ...


/content/datasets/msmarco.zip:   0%|          | 0.00/1.01G [00:00<?, ?iB/s]

2021-11-10 06:14:20 - Unzipping msmarco.zip ...
Dataset downloaded here: /content/datasets/msmarco
2021-11-10 06:14:50 - Loading Corpus...


  0%|          | 0/8841823 [00:00<?, ?it/s]

2021-11-10 06:15:51 - Loaded 8841823 DEV Documents.
2021-11-10 06:15:51 - Doc Example: {'text': 'The presence of communication amid scientific minds was equally important to the success of the Manhattan Project as scientific intellect was. The only cloud hanging over the impressive achievement of the atomic researchers and engineers is what their success truly meant; hundreds of thousands of innocent lives obliterated.', 'title': ''}
2021-11-10 06:15:51 - Loading Queries...
2021-11-10 06:15:53 - Loaded 6980 DEV Queries.
2021-11-10 06:15:53 - Query Example: how many years did william bradford serve as governor of plymouth colony?
2021-11-10 06:15:53 - Activating Elasticsearch....
2021-11-10 06:15:53 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'msmarco', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-10 06:15:53 - Deleting previous Elasticsearch-Ind

  0%|          | 0/8841823 [00:00<?, ?docs/s]
que: 100%|██████████| 55/55 [04:33<00:00,  4.96s/it]


2021-11-10 06:39:07 - 

2021-11-10 06:39:07 - NDCG@1: 0.1012
2021-11-10 06:39:07 - NDCG@3: 0.1678
2021-11-10 06:39:07 - NDCG@5: 0.1968
2021-11-10 06:39:07 - NDCG@10: 0.2275
2021-11-10 06:39:07 - NDCG@100: 0.2865
2021-11-10 06:39:07 - NDCG@1000: 0.3109
2021-11-10 06:39:07 - 

2021-11-10 06:39:07 - MAP@1: 0.0979
2021-11-10 06:39:07 - MAP@3: 0.1493
2021-11-10 06:39:07 - MAP@5: 0.1654
2021-11-10 06:39:07 - MAP@10: 0.1782
2021-11-10 06:39:07 - MAP@100: 0.1897
2021-11-10 06:39:07 - MAP@1000: 0.1905
2021-11-10 06:39:07 - 

2021-11-10 06:39:07 - Recall@1: 0.0979
2021-11-10 06:39:07 - Recall@3: 0.2178
2021-11-10 06:39:07 - Recall@5: 0.2879
2021-11-10 06:39:07 - Recall@10: 0.3816
2021-11-10 06:39:07 - Recall@100: 0.6629
2021-11-10 06:39:07 - Recall@1000: 0.8548
2021-11-10 06:39:07 - 

2021-11-10 06:39:07 - P@1: 0.1012
2021-11-10 06:39:07 - P@3: 0.0753
2021-11-10 06:39:07 - P@5: 0.0598
2021-11-10 06:39:07 - P@10: 0.0397
2021-11-10 06:39:07 - P@100: 0.0070
2021-11-10 06:39:07 - P@1000: 0.0009
2021

/content/datasets/trec-covid.zip:   0%|          | 0.00/70.5M [00:00<?, ?iB/s]

2021-11-10 06:39:10 - Unzipping trec-covid.zip ...
Dataset downloaded here: /content/datasets/trec-covid
2021-11-10 06:39:12 - Loading Corpus...


  0%|          | 0/171332 [00:00<?, ?it/s]

2021-11-10 06:39:14 - Loaded 171332 TEST Documents.
2021-11-10 06:39:14 - Doc Example: {'text': 'OBJECTIVE: This retrospective chart review describes the epidemiology and clinical features of 40 patients with culture-proven Mycoplasma pneumoniae infections at King Abdulaziz University Hospital, Jeddah, Saudi Arabia. METHODS: Patients with positive M. pneumoniae cultures from respiratory specimens from January 1997 through December 1998 were identified through the Microbiology records. Charts of patients were reviewed. RESULTS: 40 patients were identified, 33 (82.5%) of whom required admission. Most infections (92.5%) were community-acquired. The infection affected all age groups but was most common in infants (32.5%) and pre-school children (22.5%). It occurred year-round but was most common in the fall (35%) and spring (30%). More than three-quarters of patients (77.5%) had comorbidities. Twenty-four isolates (60%) were associated with pneumonia, 14 (35%) with upper respiratory tract 

  0%|          | 0/171332 [00:00<?, ?docs/s]
que: 100%|██████████| 1/1 [00:03<00:00,  3.75s/it]


2021-11-10 06:40:00 - 

2021-11-10 06:40:00 - NDCG@1: 0.8200
2021-11-10 06:40:00 - NDCG@3: 0.7623
2021-11-10 06:40:00 - NDCG@5: 0.7249
2021-11-10 06:40:00 - NDCG@10: 0.6880
2021-11-10 06:40:00 - NDCG@100: 0.4845
2021-11-10 06:40:00 - NDCG@1000: 0.4330
2021-11-10 06:40:00 - 

2021-11-10 06:40:00 - MAP@1: 0.0023
2021-11-10 06:40:00 - MAP@3: 0.0062
2021-11-10 06:40:00 - MAP@5: 0.0096
2021-11-10 06:40:00 - MAP@10: 0.0170
2021-11-10 06:40:00 - MAP@100: 0.0852
2021-11-10 06:40:00 - MAP@1000: 0.2084
2021-11-10 06:40:00 - 

2021-11-10 06:40:00 - Recall@1: 0.0023
2021-11-10 06:40:00 - Recall@3: 0.0064
2021-11-10 06:40:00 - Recall@5: 0.0103
2021-11-10 06:40:00 - Recall@10: 0.0191
2021-11-10 06:40:00 - Recall@100: 0.1173
2021-11-10 06:40:00 - Recall@1000: 0.4128
2021-11-10 06:40:00 - 

2021-11-10 06:40:00 - P@1: 0.8800
2021-11-10 06:40:00 - P@3: 0.8000
2021-11-10 06:40:00 - P@5: 0.7680
2021-11-10 06:40:00 - P@10: 0.7340
2021-11-10 06:40:00 - P@100: 0.5084
2021-11-10 06:40:00 - P@1000: 0.2019
2021

/content/datasets/nfcorpus.zip:   0%|          | 0.00/2.34M [00:00<?, ?iB/s]

2021-11-10 06:40:00 - Unzipping nfcorpus.zip ...
Dataset downloaded here: /content/datasets/nfcorpus
2021-11-10 06:40:00 - Loading Corpus...


  0%|          | 0/3633 [00:00<?, ?it/s]

2021-11-10 06:40:01 - Loaded 3633 TEST Documents.
2021-11-10 06:40:01 - Doc Example: {'text': 'Recent studies have suggested that statins, an established drug group in the prevention of cardiovascular mortality, could delay or prevent breast cancer recurrence but the effect on disease-specific mortality remains unclear. We evaluated risk of breast cancer death among statin users in a population-based cohort of breast cancer patients. The study cohort included all newly diagnosed breast cancer patients in Finland during 1995–2003 (31,236 cases), identified from the Finnish Cancer Registry. Information on statin use before and after the diagnosis was obtained from a national prescription database. We used the Cox proportional hazards regression method to estimate mortality among statin users with statin use as time-dependent variable. A total of 4,151 participants had used statins. During the median follow-up of 3.25 years after the diagnosis (range 0.08–9.0 years) 6,011 participants die

  0%|          | 0/3633 [00:00<?, ?docs/s]
que: 100%|██████████| 3/3 [00:03<00:00,  1.10s/it]


2021-11-10 06:40:06 - 

2021-11-10 06:40:06 - NDCG@1: 0.4139
2021-11-10 06:40:06 - NDCG@3: 0.3710
2021-11-10 06:40:06 - NDCG@5: 0.3428
2021-11-10 06:40:06 - NDCG@10: 0.3063
2021-11-10 06:40:06 - NDCG@100: 0.2503
2021-11-10 06:40:06 - NDCG@1000: 0.2743
2021-11-10 06:40:06 - 

2021-11-10 06:40:06 - MAP@1: 0.0454
2021-11-10 06:40:06 - MAP@3: 0.0805
2021-11-10 06:40:06 - MAP@5: 0.0921
2021-11-10 06:40:06 - MAP@10: 0.1054
2021-11-10 06:40:06 - MAP@100: 0.1267
2021-11-10 06:40:06 - MAP@1000: 0.1316
2021-11-10 06:40:06 - 

2021-11-10 06:40:06 - Recall@1: 0.0454
2021-11-10 06:40:06 - Recall@3: 0.0907
2021-11-10 06:40:06 - Recall@5: 0.1090
2021-11-10 06:40:06 - Recall@10: 0.1350
2021-11-10 06:40:06 - Recall@100: 0.2167
2021-11-10 06:40:06 - Recall@1000: 0.3222
2021-11-10 06:40:06 - 

2021-11-10 06:40:06 - P@1: 0.4338
2021-11-10 06:40:06 - P@3: 0.3554
2021-11-10 06:40:06 - P@5: 0.2993
2021-11-10 06:40:06 - P@10: 0.2262
2021-11-10 06:40:06 - P@100: 0.0616
2021-11-10 06:40:06 - P@1000: 0.0103
2021

/content/datasets/nq.zip:   0%|          | 0.00/475M [00:00<?, ?iB/s]

2021-11-10 06:40:16 - Unzipping nq.zip ...
Dataset downloaded here: /content/datasets/nq
2021-11-10 06:40:29 - Loading Corpus...


  0%|          | 0/2681468 [00:00<?, ?it/s]

2021-11-10 06:40:51 - Loaded 2681468 TEST Documents.
2021-11-10 06:40:51 - Doc Example: {'text': "In accounting, minority interest (or non-controlling interest) is the portion of a subsidiary corporation's stock that is not owned by the parent corporation. The magnitude of the minority interest in the subsidiary company is generally less than 50% of outstanding shares, or the corporation would generally cease to be a subsidiary of the parent.[1]", 'title': 'Minority interest'}
2021-11-10 06:40:51 - Loading Queries...
2021-11-10 06:40:51 - Loaded 3452 TEST Queries.
2021-11-10 06:40:51 - Query Example: what is non controlling interest on balance sheet
2021-11-10 06:40:51 - Activating Elasticsearch....
2021-11-10 06:40:51 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'nq', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-10 06:40:51 - Deleting previous E

  0%|          | 0/2681468 [00:00<?, ?docs/s]
que: 100%|██████████| 27/27 [03:20<00:00,  7.41s/it]


2021-11-10 06:50:53 - 

2021-11-10 06:50:53 - NDCG@1: 0.1793
2021-11-10 06:50:53 - NDCG@3: 0.2503
2021-11-10 06:50:53 - NDCG@5: 0.2832
2021-11-10 06:50:53 - NDCG@10: 0.3261
2021-11-10 06:50:53 - NDCG@100: 0.3873
2021-11-10 06:50:53 - NDCG@1000: 0.4044
2021-11-10 06:50:53 - 

2021-11-10 06:50:53 - MAP@1: 0.1600
2021-11-10 06:50:53 - MAP@3: 0.2240
2021-11-10 06:50:53 - MAP@5: 0.2432
2021-11-10 06:50:53 - MAP@10: 0.2619
2021-11-10 06:50:53 - MAP@100: 0.2750
2021-11-10 06:50:53 - MAP@1000: 0.2757
2021-11-10 06:50:53 - 

2021-11-10 06:50:53 - Recall@1: 0.1600
2021-11-10 06:50:53 - Recall@3: 0.3031
2021-11-10 06:50:53 - Recall@5: 0.3800
2021-11-10 06:50:53 - Recall@10: 0.5066
2021-11-10 06:50:53 - Recall@100: 0.7829
2021-11-10 06:50:53 - Recall@1000: 0.9107
2021-11-10 06:50:53 - 

2021-11-10 06:50:53 - P@1: 0.1793
2021-11-10 06:50:53 - P@3: 0.1162
2021-11-10 06:50:53 - P@5: 0.0883
2021-11-10 06:50:53 - P@10: 0.0594
2021-11-10 06:50:53 - P@100: 0.0094
2021-11-10 06:50:53 - P@1000: 0.0011
2021

/content/datasets/hotpotqa.zip:   0%|          | 0.00/624M [00:00<?, ?iB/s]

2021-11-10 06:51:09 - Unzipping hotpotqa.zip ...
Dataset downloaded here: /content/datasets/hotpotqa
2021-11-10 06:51:26 - Loading Corpus...


  0%|          | 0/5233329 [00:00<?, ?it/s]

2021-11-10 06:52:07 - Loaded 5233329 TEST Documents.
2021-11-10 06:52:08 - Doc Example: {'text': 'Anarchism is a political philosophy that advocates self-governed societies based on voluntary institutions. These are often described as stateless societies, although several authors have defined them more specifically as institutions based on non-hierarchical free associations. Anarchism holds the state to be undesirable, unnecessary and harmful.', 'title': 'Anarchism'}
2021-11-10 06:52:08 - Loading Queries...
2021-11-10 06:52:08 - Loaded 7405 TEST Queries.
2021-11-10 06:52:08 - Query Example: Were Scott Derrickson and Ed Wood of the same nationality?
2021-11-10 06:52:08 - Activating Elasticsearch....
2021-11-10 06:52:08 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'hotpotqa', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-10 06:52:08 - Deleting previ

  0%|          | 0/5233329 [00:00<?, ?docs/s]
que: 100%|██████████| 58/58 [13:02<00:00, 13.49s/it]


2021-11-10 07:16:13 - 

2021-11-10 07:16:13 - NDCG@1: 0.6980
2021-11-10 07:16:13 - NDCG@3: 0.5536
2021-11-10 07:16:13 - NDCG@5: 0.5808
2021-11-10 07:16:13 - NDCG@10: 0.6022
2021-11-10 07:16:13 - NDCG@100: 0.6363
2021-11-10 07:16:13 - NDCG@1000: 0.6506
2021-11-10 07:16:13 - 

2021-11-10 07:16:13 - MAP@1: 0.3490
2021-11-10 07:16:13 - MAP@3: 0.4776
2021-11-10 07:16:13 - MAP@5: 0.4977
2021-11-10 07:16:13 - MAP@10: 0.5098
2021-11-10 07:16:13 - MAP@100: 0.5191
2021-11-10 07:16:13 - MAP@1000: 0.5198
2021-11-10 07:16:13 - 

2021-11-10 07:16:13 - Recall@1: 0.3490
2021-11-10 07:16:13 - Recall@3: 0.5220
2021-11-10 07:16:13 - Recall@5: 0.5759
2021-11-10 07:16:13 - Recall@10: 0.6296
2021-11-10 07:16:13 - Recall@100: 0.7631
2021-11-10 07:16:13 - Recall@1000: 0.8577
2021-11-10 07:16:13 - 

2021-11-10 07:16:13 - P@1: 0.6980
2021-11-10 07:16:13 - P@3: 0.3480
2021-11-10 07:16:13 - P@5: 0.2304
2021-11-10 07:16:13 - P@10: 0.1259
2021-11-10 07:16:13 - P@100: 0.0153
2021-11-10 07:16:13 - P@1000: 0.0017
2021

/content/datasets/fiqa.zip:   0%|          | 0.00/17.1M [00:00<?, ?iB/s]

2021-11-10 07:16:15 - Unzipping fiqa.zip ...
Dataset downloaded here: /content/datasets/fiqa
2021-11-10 07:16:15 - Loading Corpus...


  0%|          | 0/57638 [00:00<?, ?it/s]

2021-11-10 07:16:16 - Loaded 57638 TEST Documents.
2021-11-10 07:16:16 - Doc Example: {'text': "I'm not saying I don't like the idea of on-the-job training too, but you can't expect the company to do that. Training workers is not their job - they're building software. Perhaps educational systems in the U.S. (or their students) should worry a little about getting marketable skills in exchange for their massive investment in education, rather than getting out with thousands in student debt and then complaining that they aren't qualified to do anything.", 'title': ''}
2021-11-10 07:16:16 - Loading Queries...
2021-11-10 07:16:16 - Loaded 648 TEST Queries.
2021-11-10 07:16:16 - Query Example: How to deposit a cheque issued to an associate in my business into my business account?
2021-11-10 07:16:16 - Activating Elasticsearch....
2021-11-10 07:16:16 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'fiqa', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_

  0%|          | 0/57638 [00:00<?, ?docs/s]
que: 100%|██████████| 6/6 [00:17<00:00,  2.83s/it]


2021-11-10 07:16:47 - 

2021-11-10 07:16:47 - NDCG@1: 0.2330
2021-11-10 07:16:47 - NDCG@3: 0.2172
2021-11-10 07:16:47 - NDCG@5: 0.2272
2021-11-10 07:16:47 - NDCG@10: 0.2541
2021-11-10 07:16:47 - NDCG@100: 0.3127
2021-11-10 07:16:47 - NDCG@1000: 0.3472
2021-11-10 07:16:47 - 

2021-11-10 07:16:47 - MAP@1: 0.1168
2021-11-10 07:16:47 - MAP@3: 0.1636
2021-11-10 07:16:47 - MAP@5: 0.1772
2021-11-10 07:16:47 - MAP@10: 0.1915
2021-11-10 07:16:47 - MAP@100: 0.2046
2021-11-10 07:16:47 - MAP@1000: 0.2063
2021-11-10 07:16:47 - 

2021-11-10 07:16:47 - Recall@1: 0.1168
2021-11-10 07:16:47 - Recall@3: 0.2013
2021-11-10 07:16:47 - Recall@5: 0.2433
2021-11-10 07:16:47 - Recall@10: 0.3244
2021-11-10 07:16:47 - Recall@100: 0.5465
2021-11-10 07:16:47 - Recall@1000: 0.7565
2021-11-10 07:16:47 - 

2021-11-10 07:16:47 - P@1: 0.2330
2021-11-10 07:16:47 - P@3: 0.1446
2021-11-10 07:16:47 - P@5: 0.1068
2021-11-10 07:16:47 - P@10: 0.0702
2021-11-10 07:16:47 - P@100: 0.0130
2021-11-10 07:16:47 - P@1000: 0.0019
2021

/content/datasets/arguana.zip:   0%|          | 0.00/3.60M [00:00<?, ?iB/s]

2021-11-10 07:16:48 - Unzipping arguana.zip ...
Dataset downloaded here: /content/datasets/arguana
2021-11-10 07:16:48 - Loading Corpus...


  0%|          | 0/8674 [00:00<?, ?it/s]

2021-11-10 07:16:48 - Loaded 8674 TEST Documents.
2021-11-10 07:16:48 - Doc Example: {'text': "You don’t have to be vegetarian to be green. Many special environments have been created by livestock farming – for example chalk down land in England and mountain pastures in many countries. Ending livestock farming would see these areas go back to woodland with a loss of many unique plants and animals. Growing crops can also be very bad for the planet, with fertilisers and pesticides polluting rivers, lakes and seas. Most tropical forests are now cut down for timber, or to allow oil palm trees to be grown in plantations, not to create space for meat production.  British farmer and former editor Simon Farrell also states: “Many vegans and vegetarians rely on one source from the U.N. calculation that livestock generates 18% of global carbon emissions, but this figure contains basic mistakes. It attributes all deforestation from ranching to cattle, rather than logging or development. It also m

  0%|          | 0/8674 [00:00<?, ?docs/s]
que: 100%|██████████| 11/11 [00:41<00:00,  3.73s/it]


2021-11-10 07:17:36 - 

2021-11-10 07:17:36 - NDCG@1: 0.2311
2021-11-10 07:17:36 - NDCG@3: 0.3678
2021-11-10 07:17:36 - NDCG@5: 0.4196
2021-11-10 07:17:36 - NDCG@10: 0.4717
2021-11-10 07:17:36 - NDCG@100: 0.5168
2021-11-10 07:17:36 - NDCG@1000: 0.5220
2021-11-10 07:17:36 - 

2021-11-10 07:17:36 - MAP@1: 0.2311
2021-11-10 07:17:36 - MAP@3: 0.3339
2021-11-10 07:17:36 - MAP@5: 0.3625
2021-11-10 07:17:36 - MAP@10: 0.3841
2021-11-10 07:17:36 - MAP@100: 0.3946
2021-11-10 07:17:36 - MAP@1000: 0.3948
2021-11-10 07:17:36 - 

2021-11-10 07:17:36 - Recall@1: 0.2311
2021-11-10 07:17:36 - Recall@3: 0.4659
2021-11-10 07:17:36 - Recall@5: 0.5925
2021-11-10 07:17:36 - Recall@10: 0.7532
2021-11-10 07:17:36 - Recall@100: 0.9516
2021-11-10 07:17:36 - Recall@1000: 0.9915
2021-11-10 07:17:36 - 

2021-11-10 07:17:36 - P@1: 0.2311
2021-11-10 07:17:36 - P@3: 0.1553
2021-11-10 07:17:36 - P@5: 0.1185
2021-11-10 07:17:36 - P@10: 0.0753
2021-11-10 07:17:36 - P@100: 0.0095
2021-11-10 07:17:36 - P@1000: 0.0010
2021

/content/datasets/webis-touche2020.zip:   0%|          | 0.00/217M [00:00<?, ?iB/s]

2021-11-10 07:17:41 - Unzipping webis-touche2020.zip ...
Dataset downloaded here: /content/datasets/webis-touche2020
2021-11-10 07:17:47 - Loading Corpus...


  0%|          | 0/382545 [00:00<?, ?it/s]

2021-11-10 07:17:53 - Loaded 382545 TEST Documents.
2021-11-10 07:17:53 - Doc Example: {'text': 'My opponent forfeited every round. None of my arguments were answered. I don’t like the idea of winning by default, but here we are.Tule: it’s good for students to get involved and address big issues like teen pregnancy. You need to be able to answer arguments like mine and not simply prepare for an abstinence-only type of response. You should also be aware that, in the U.S., condoms may be sold to minors in ANY state. A retailer who says it is illegal to sell you them is, frankly, wrong.', 'title': 'Contraceptive Forms for High School Students'}
2021-11-10 07:17:53 - Loading Queries...
2021-11-10 07:17:53 - Loaded 49 TEST Queries.
2021-11-10 07:17:53 - Query Example: Should teachers get tenure?
2021-11-10 07:17:53 - Activating Elasticsearch....
2021-11-10 07:17:53 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'webis-touche2020', 'keys': {'title': 'title', 'body': 't

  0%|          | 0/382545 [00:00<?, ?docs/s]
que: 100%|██████████| 1/1 [00:04<00:00,  4.60s/it]


2021-11-10 07:19:58 - 

2021-11-10 07:19:58 - NDCG@1: 0.3775
2021-11-10 07:19:58 - NDCG@3: 0.3633
2021-11-10 07:19:58 - NDCG@5: 0.3589
2021-11-10 07:19:58 - NDCG@10: 0.3471
2021-11-10 07:19:58 - NDCG@100: 0.4548
2021-11-10 07:19:58 - NDCG@1000: 0.5580
2021-11-10 07:19:58 - 

2021-11-10 07:19:58 - MAP@1: 0.0254
2021-11-10 07:19:58 - MAP@3: 0.0654
2021-11-10 07:19:58 - MAP@5: 0.0985
2021-11-10 07:19:58 - MAP@10: 0.1387
2021-11-10 07:19:58 - MAP@100: 0.2211
2021-11-10 07:19:58 - MAP@1000: 0.2385
2021-11-10 07:19:58 - 

2021-11-10 07:19:58 - Recall@1: 0.0254
2021-11-10 07:19:58 - Recall@3: 0.0800
2021-11-10 07:19:58 - Recall@5: 0.1296
2021-11-10 07:19:58 - Recall@10: 0.2122
2021-11-10 07:19:58 - Recall@100: 0.5609
2021-11-10 07:19:58 - Recall@1000: 0.8820
2021-11-10 07:19:58 - 

2021-11-10 07:19:58 - P@1: 0.3878
2021-11-10 07:19:58 - P@3: 0.3878
2021-11-10 07:19:58 - P@5: 0.3796
2021-11-10 07:19:58 - P@10: 0.3306
2021-11-10 07:19:58 - P@100: 0.0953
2021-11-10 07:19:58 - P@1000: 0.0163
2021

/content/datasets/cqadupstack.zip:   0%|          | 0.00/4.98G [00:00<?, ?iB/s]

2021-11-10 07:22:00 - Unzipping cqadupstack.zip ...
Dataset downloaded here: /content/datasets/cqadupstack
2021-11-10 07:24:41 - Loading Corpus...


  0%|          | 0/37637 [00:00<?, ?it/s]

2021-11-10 07:24:41 - Loaded 37637 TEST Documents.
2021-11-10 07:24:41 - Doc Example: {'text': "There is a satellite image it's size is 10 GB and I need to display this image using GeoServer and OpenLayers. When user select the Satellite image in the layer switcher need to display image within 10 seconds. I tried geopdf but the image quality loss isn't acceptable to customer. I want to achieve 10 seconds response time using 32 GB satellite image. Please advice me how to achieve this? Thanks in advance.", 'title': 'Satellite image display with the help of GeoServer and OpenLayers'}
2021-11-10 07:24:41 - Loading Queries...
2021-11-10 07:24:45 - Loaded 885 TEST Queries.
2021-11-10 07:24:45 - Query Example: Calculating mean upslope aspect from each cell in DEM using Python?
2021-11-10 07:24:45 - Activating Elasticsearch....
2021-11-10 07:24:45 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'gis', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_ti

  0%|          | 0/37637 [00:00<?, ?docs/s]
que: 100%|██████████| 7/7 [00:20<00:00,  2.98s/it]


2021-11-10 07:25:16 - 

2021-11-10 07:25:16 - NDCG@1: 0.2294
2021-11-10 07:25:16 - NDCG@3: 0.2767
2021-11-10 07:25:16 - NDCG@5: 0.2938
2021-11-10 07:25:16 - NDCG@10: 0.3113
2021-11-10 07:25:16 - NDCG@100: 0.3656
2021-11-10 07:25:16 - NDCG@1000: 0.3892
2021-11-10 07:25:16 - 

2021-11-10 07:25:16 - MAP@1: 0.2143
2021-11-10 07:25:16 - MAP@3: 0.2585
2021-11-10 07:25:16 - MAP@5: 0.2685
2021-11-10 07:25:16 - MAP@10: 0.2762
2021-11-10 07:25:16 - MAP@100: 0.2867
2021-11-10 07:25:16 - MAP@1000: 0.2877
2021-11-10 07:25:16 - 

2021-11-10 07:25:16 - Recall@1: 0.2143
2021-11-10 07:25:16 - Recall@3: 0.3105
2021-11-10 07:25:16 - Recall@5: 0.3520
2021-11-10 07:25:16 - Recall@10: 0.4039
2021-11-10 07:25:16 - Recall@100: 0.6571
2021-11-10 07:25:16 - Recall@1000: 0.8324
2021-11-10 07:25:16 - 

2021-11-10 07:25:16 - P@1: 0.2294
2021-11-10 07:25:16 - P@3: 0.1160
2021-11-10 07:25:16 - P@5: 0.0793
2021-11-10 07:25:16 - P@10: 0.0460
2021-11-10 07:25:16 - P@100: 0.0077
2021-11-10 07:25:16 - P@1000: 0.0010
2021

  0%|          | 0/45301 [00:00<?, ?it/s]

2021-11-10 07:25:17 - Loaded 45301 TEST Documents.
2021-11-10 07:25:17 - Doc Example: {'text': 'What\'s your Supreme Commander 2 build order. I don\'t just want "6 mass extractors, 2 power and a factory". List of building and units out to the second or third factory, please.', 'title': 'Supreme Commander 2 - Build Orders'}
2021-11-10 07:25:17 - Loading Queries...
2021-11-10 07:25:29 - Loaded 1595 TEST Queries.
2021-11-10 07:25:29 - Query Example: Can the trophy system protect me against bullets?
2021-11-10 07:25:29 - Activating Elasticsearch....
2021-11-10 07:25:29 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'gaming', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-10 07:25:29 - Deleting previous Elasticsearch-Index named - gaming
2021-11-10 07:25:29 - Creating fresh Elasticsearch-Index named - gaming


  0%|          | 0/45301 [00:00<?, ?docs/s]
que: 100%|██████████| 13/13 [00:40<00:00,  3.15s/it]


2021-11-10 07:26:21 - 

2021-11-10 07:26:21 - NDCG@1: 0.3586
2021-11-10 07:26:21 - NDCG@3: 0.4187
2021-11-10 07:26:21 - NDCG@5: 0.4412
2021-11-10 07:26:21 - NDCG@10: 0.4645
2021-11-10 07:26:21 - NDCG@100: 0.5070
2021-11-10 07:26:21 - NDCG@1000: 0.5234
2021-11-10 07:26:21 - 

2021-11-10 07:26:21 - MAP@1: 0.3107
2021-11-10 07:26:21 - MAP@3: 0.3864
2021-11-10 07:26:21 - MAP@5: 0.4014
2021-11-10 07:26:21 - MAP@10: 0.4126
2021-11-10 07:26:21 - MAP@100: 0.4220
2021-11-10 07:26:21 - MAP@1000: 0.4227
2021-11-10 07:26:21 - 

2021-11-10 07:26:21 - Recall@1: 0.3107
2021-11-10 07:26:21 - Recall@3: 0.4620
2021-11-10 07:26:21 - Recall@5: 0.5181
2021-11-10 07:26:21 - Recall@10: 0.5864
2021-11-10 07:26:21 - Recall@100: 0.7769
2021-11-10 07:26:21 - Recall@1000: 0.8960
2021-11-10 07:26:21 - 

2021-11-10 07:26:21 - P@1: 0.3586
2021-11-10 07:26:21 - P@3: 0.1879
2021-11-10 07:26:21 - P@5: 0.1290
2021-11-10 07:26:21 - P@10: 0.0750
2021-11-10 07:26:21 - P@100: 0.0105
2021-11-10 07:26:21 - P@1000: 0.0012
2021

  0%|          | 0/42269 [00:00<?, ?it/s]

2021-11-10 07:26:21 - Loaded 42269 TEST Documents.
2021-11-10 07:26:21 - Doc Example: {'text': "I'm a beginner in statistics and R, sorry if this question may seem trivial. I've collected data measuring several different parameters in 40 subjects at two time-points (t1 and t2). There are 3 main parameters in which I'm interested, let's call them ParA, ParB, ParC. ParA is a score of disability. It is on an arbitrary scale (so it is an ordinal scale measure, if my understanding is correct) and values range from 0.0 to 10.0. Note that the increments in this scale are by 0.5 unit, so values like, e.g. 1.5 are possible. I have two measures, at t1 and t2, so I can describe at least three variables from ParA: ParA at t1, ParA at t2, and whether a subject progressed or not (0 or 1). Being a ratio scale measure, I think it would not make much sense to compute a difference (eg. ParA at t2 - ParA at t1), but I'm willing to accept suggestions on this matter. ParB and ParC are meausurements of two 

  0%|          | 0/42269 [00:00<?, ?docs/s]
que: 100%|██████████| 6/6 [00:16<00:00,  2.71s/it]


2021-11-10 07:26:50 - 

2021-11-10 07:26:50 - NDCG@1: 0.2546
2021-11-10 07:26:50 - NDCG@3: 0.2789
2021-11-10 07:26:50 - NDCG@5: 0.2883
2021-11-10 07:26:50 - NDCG@10: 0.3083
2021-11-10 07:26:50 - NDCG@100: 0.3524
2021-11-10 07:26:50 - NDCG@1000: 0.3771
2021-11-10 07:26:50 - 

2021-11-10 07:26:50 - MAP@1: 0.2286
2021-11-10 07:26:50 - MAP@3: 0.2611
2021-11-10 07:26:50 - MAP@5: 0.2672
2021-11-10 07:26:50 - MAP@10: 0.2757
2021-11-10 07:26:50 - MAP@100: 0.2839
2021-11-10 07:26:50 - MAP@1000: 0.2849
2021-11-10 07:26:50 - 

2021-11-10 07:26:50 - Recall@1: 0.2286
2021-11-10 07:26:50 - Recall@3: 0.2970
2021-11-10 07:26:50 - Recall@5: 0.3232
2021-11-10 07:26:50 - Recall@10: 0.3849
2021-11-10 07:26:50 - Recall@100: 0.5921
2021-11-10 07:26:50 - Recall@1000: 0.7726
2021-11-10 07:26:50 - 

2021-11-10 07:26:50 - P@1: 0.2546
2021-11-10 07:26:50 - P@3: 0.1150
2021-11-10 07:26:50 - P@5: 0.0758
2021-11-10 07:26:50 - P@10: 0.0456
2021-11-10 07:26:50 - P@100: 0.0073
2021-11-10 07:26:50 - P@1000: 0.0010
2021

  0%|          | 0/22998 [00:00<?, ?it/s]

2021-11-10 07:26:50 - Loaded 22998 TEST Documents.
2021-11-10 07:26:50 - Doc Example: {'text': "I want to send files to android tablet with a application from PC. - I can send files directly to tablet (2.3 android OS) PC see it as a external usb drive. - But i can't send files to tablet (4.2 android OS), because PC see it as a portable media player.(MTP) - How can i fix this problem ? - How can show my device as a external drive? my application that sent files written via Delphi.", 'title': 'How can show android tablet as a external storage to PC?'}
2021-11-10 07:26:50 - Loading Queries...
2021-11-10 07:26:52 - Loaded 699 TEST Queries.
2021-11-10 07:26:52 - Query Example: Android chroot ubuntu - is it possible to get ubuntu to recognise usb devices
2021-11-10 07:26:52 - Activating Elasticsearch....
2021-11-10 07:26:52 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'android', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'max

  0%|          | 0/22998 [00:00<?, ?docs/s]
que: 100%|██████████| 6/6 [00:16<00:00,  2.83s/it]


2021-11-10 07:27:14 - 

2021-11-10 07:27:14 - NDCG@1: 0.3247
2021-11-10 07:27:14 - NDCG@3: 0.3636
2021-11-10 07:27:14 - NDCG@5: 0.3842
2021-11-10 07:27:14 - NDCG@10: 0.4020
2021-11-10 07:27:14 - NDCG@100: 0.4479
2021-11-10 07:27:14 - NDCG@1000: 0.4744
2021-11-10 07:27:14 - 

2021-11-10 07:27:14 - MAP@1: 0.2617
2021-11-10 07:27:14 - MAP@3: 0.3230
2021-11-10 07:27:14 - MAP@5: 0.3395
2021-11-10 07:27:14 - MAP@10: 0.3503
2021-11-10 07:27:14 - MAP@100: 0.3615
2021-11-10 07:27:14 - MAP@1000: 0.3629
2021-11-10 07:27:14 - 

2021-11-10 07:27:14 - Recall@1: 0.2617
2021-11-10 07:27:14 - Recall@3: 0.3825
2021-11-10 07:27:14 - Recall@5: 0.4419
2021-11-10 07:27:14 - Recall@10: 0.4989
2021-11-10 07:27:14 - Recall@100: 0.6986
2021-11-10 07:27:14 - Recall@1000: 0.8739
2021-11-10 07:27:14 - 

2021-11-10 07:27:14 - P@1: 0.3247
2021-11-10 07:27:14 - P@3: 0.1741
2021-11-10 07:27:14 - P@5: 0.1262
2021-11-10 07:27:14 - P@10: 0.0754
2021-11-10 07:27:14 - P@100: 0.0119
2021-11-10 07:27:14 - P@1000: 0.0017
2021

  0%|          | 0/38316 [00:00<?, ?it/s]

2021-11-10 07:27:15 - Loaded 38316 TEST Documents.
2021-11-10 07:27:15 - Doc Example: {'text': "Let's discuss about $SU(3)$. I understand that the most important representations (relevant to physics) are the defining and the adjoint. In the defining representation of $SU(3)$; namely $\\mathbf{3}$, the Gell-Mann matrices are used to represent the generators $$ \\left[T^{A}\\right]_{ij} = \\dfrac{1}{2}\\lambda^{A}, $$ where $T^A$ are the generators and $\\lambda^A$ the Gell-Mann matrices. In adjoint representation, on the other hand, an $\\mathbf{8}$, the generators are represented by matrices according to $$ \\left[ T_{i} \\right]_{jk} = -if_{ijk}, $$ where $f_{ijk}$ are the structure constants. My question is this, how can one represent the generators in the $\\mathbf{10}$ of $SU(3)$, which corresponds to a symmetric tensor with 3 upper or lower indices (or for that matter how to represent the $\\mathbf{6}$ with two symmetric indices). What is the general procedure to represent the gen

  0%|          | 0/38316 [00:00<?, ?docs/s]
que: 100%|██████████| 9/9 [00:24<00:00,  2.75s/it]


2021-11-10 07:27:54 - 

2021-11-10 07:27:54 - NDCG@1: 0.2791
2021-11-10 07:27:54 - NDCG@3: 0.3175
2021-11-10 07:27:54 - NDCG@5: 0.3339
2021-11-10 07:27:54 - NDCG@10: 0.3527
2021-11-10 07:27:54 - NDCG@100: 0.4027
2021-11-10 07:27:54 - NDCG@1000: 0.4317
2021-11-10 07:27:54 - 

2021-11-10 07:27:54 - MAP@1: 0.2296
2021-11-10 07:27:54 - MAP@3: 0.2849
2021-11-10 07:27:54 - MAP@5: 0.2965
2021-11-10 07:27:54 - MAP@10: 0.3059
2021-11-10 07:27:54 - MAP@100: 0.3165
2021-11-10 07:27:54 - MAP@1000: 0.3178
2021-11-10 07:27:54 - 

2021-11-10 07:27:54 - Recall@1: 0.2296
2021-11-10 07:27:54 - Recall@3: 0.3455
2021-11-10 07:27:54 - Recall@5: 0.3883
2021-11-10 07:27:54 - Recall@10: 0.4457
2021-11-10 07:27:54 - Recall@100: 0.6622
2021-11-10 07:27:54 - Recall@1000: 0.8617
2021-11-10 07:27:54 - 

2021-11-10 07:27:54 - P@1: 0.2791
2021-11-10 07:27:54 - P@3: 0.1460
2021-11-10 07:27:54 - P@5: 0.1018
2021-11-10 07:27:54 - P@10: 0.0610
2021-11-10 07:27:54 - P@100: 0.0101
2021-11-10 07:27:54 - P@1000: 0.0014
2021

  0%|          | 0/68184 [00:00<?, ?it/s]

2021-11-10 07:27:56 - Loaded 68184 TEST Documents.
2021-11-10 07:27:56 - Doc Example: {'text': "I am using a pgfplots stacked bar to display the aggregated energy demand of a houshold and the associated price. When the energy demand exceeds a certain threshold, than a higher price has to be paid. This is visualized by the color red and blue of the bars. The threshold is displayed by the thick red horizontal line. My problem is, that I want this red line to exceed the width of the bar, so that it's width is circa 120 percent of the width of the bar. Is there any possibility to achieve this? Thanks ![enter image description here](http://i.stack.imgur.com/3qeEi.jpg)               \\documentclass[tikz]{standalone}     \\usepackage{pgfplots}     \\pgfplotsset{compat=1.10}     \\begin{document}     \\begin{tikzpicture}     \\begin{axis}[       ymin=0,ymax=4,       samples=3,       enlarge x limits={abs=0.5},       bar width=0.6,       ybar stacked,       legend pos=south east,         every 

  0%|          | 0/68184 [00:00<?, ?docs/s]
que: 100%|██████████| 23/23 [01:16<00:00,  3.33s/it]


2021-11-10 07:31:01 - 

2021-11-10 07:31:01 - NDCG@1: 0.1893
2021-11-10 07:31:01 - NDCG@3: 0.2148
2021-11-10 07:31:01 - NDCG@5: 0.2282
2021-11-10 07:31:01 - NDCG@10: 0.2444
2021-11-10 07:31:01 - NDCG@100: 0.2886
2021-11-10 07:31:01 - NDCG@1000: 0.3179
2021-11-10 07:31:01 - 

2021-11-10 07:31:01 - MAP@1: 0.1565
2021-11-10 07:31:01 - MAP@3: 0.1934
2021-11-10 07:31:01 - MAP@5: 0.2027
2021-11-10 07:31:01 - MAP@10: 0.2099
2021-11-10 07:31:01 - MAP@100: 0.2184
2021-11-10 07:31:01 - MAP@1000: 0.2195
2021-11-10 07:31:01 - 

2021-11-10 07:31:01 - Recall@1: 0.1565
2021-11-10 07:31:01 - Recall@3: 0.2337
2021-11-10 07:31:01 - Recall@5: 0.2686
2021-11-10 07:31:01 - Recall@10: 0.3167
2021-11-10 07:31:01 - Recall@100: 0.5208
2021-11-10 07:31:01 - Recall@1000: 0.7333
2021-11-10 07:31:01 - 

2021-11-10 07:31:01 - P@1: 0.1893
2021-11-10 07:31:01 - P@3: 0.0978
2021-11-10 07:31:01 - P@5: 0.0685
2021-11-10 07:31:01 - P@10: 0.0416
2021-11-10 07:31:01 - P@100: 0.0074
2021-11-10 07:31:01 - P@1000: 0.0011
2021

  0%|          | 0/32176 [00:00<?, ?it/s]

2021-11-10 07:31:01 - Loaded 32176 TEST Documents.
2021-11-10 07:31:01 - Doc Example: {'text': "I am in the midst of writing a web application for work. Everything is from scratch. I have been a PHP programmer for about 13 years, Node.js programmer for the past 2 years, and have no shortage of experience with JavaScript. I love Node.js, and recently rebuilt the company's API in it... So, in planning this web application, the approach I'm considering is, have the Node.js API for getting data from the server, but render everything in the browser. Use AJAX for retrieving data, History API for loading pages, and a MVC-like pattern for the different components. I have read articles detailing twitters rebuild a few years ago. It was more or less a client-side JavaScript app, but a couple years after launching it, they started moving a lot of processing/rendering back to the server, claiming the app improved dramatically in terms of speed. So, my question is as the title asks, is a client-sid

  0%|          | 0/32176 [00:00<?, ?docs/s]
que: 100%|██████████| 7/7 [00:22<00:00,  3.26s/it]


2021-11-10 07:31:41 - 

2021-11-10 07:31:41 - NDCG@1: 0.2432
2021-11-10 07:31:41 - NDCG@3: 0.2787
2021-11-10 07:31:41 - NDCG@5: 0.2957
2021-11-10 07:31:41 - NDCG@10: 0.3161
2021-11-10 07:31:41 - NDCG@100: 0.3694
2021-11-10 07:31:41 - NDCG@1000: 0.3945
2021-11-10 07:31:41 - 

2021-11-10 07:31:41 - MAP@1: 0.1974
2021-11-10 07:31:41 - MAP@3: 0.2500
2021-11-10 07:31:41 - MAP@5: 0.2620
2021-11-10 07:31:41 - MAP@10: 0.2716
2021-11-10 07:31:41 - MAP@100: 0.2829
2021-11-10 07:31:41 - MAP@1000: 0.2839
2021-11-10 07:31:41 - 

2021-11-10 07:31:41 - Recall@1: 0.1974
2021-11-10 07:31:41 - Recall@3: 0.3019
2021-11-10 07:31:41 - Recall@5: 0.3465
2021-11-10 07:31:41 - Recall@10: 0.4059
2021-11-10 07:31:41 - Recall@100: 0.6389
2021-11-10 07:31:41 - Recall@1000: 0.8162
2021-11-10 07:31:41 - 

2021-11-10 07:31:41 - P@1: 0.2432
2021-11-10 07:31:41 - P@3: 0.1339
2021-11-10 07:31:41 - P@5: 0.0938
2021-11-10 07:31:41 - P@10: 0.0567
2021-11-10 07:31:41 - P@100: 0.0097
2021-11-10 07:31:41 - P@1000: 0.0013
2021

  0%|          | 0/40221 [00:00<?, ?it/s]

2021-11-10 07:31:41 - Loaded 40221 TEST Documents.
2021-11-10 07:31:41 - Doc Example: {'text': 'An eponym is one way to eternal (if posthumous) fame. But is there a word meaning an eponym someone would sooner not have? (One would presume that Captain Charles _Boycott_ , Mr Justice _Lynch_ , and Patrick _Hooligan_ would not appreciate their undying notoriety.)', 'title': 'Is there a word meaning "an unwanted eponym"?'}
2021-11-10 07:31:41 - Loading Queries...
2021-11-10 07:31:53 - Loaded 1570 TEST Queries.
2021-11-10 07:31:53 - Query Example: Is "a wide range of features" singular or plural?
2021-11-10 07:31:53 - Activating Elasticsearch....
2021-11-10 07:31:53 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'english', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-10 07:31:53 - Deleting previous Elasticsearch-Index named - english
2021-11-10 07:31:53 

  0%|          | 0/40221 [00:00<?, ?docs/s]
que: 100%|██████████| 13/13 [00:37<00:00,  2.88s/it]


2021-11-10 07:32:41 - 

2021-11-10 07:32:41 - NDCG@1: 0.3045
2021-11-10 07:32:41 - NDCG@3: 0.3258
2021-11-10 07:32:41 - NDCG@5: 0.3386
2021-11-10 07:32:41 - NDCG@10: 0.3579
2021-11-10 07:32:41 - NDCG@100: 0.3980
2021-11-10 07:32:41 - NDCG@1000: 0.4211
2021-11-10 07:32:41 - 

2021-11-10 07:32:41 - MAP@1: 0.2498
2021-11-10 07:32:41 - MAP@3: 0.2949
2021-11-10 07:32:41 - MAP@5: 0.3053
2021-11-10 07:32:41 - MAP@10: 0.3157
2021-11-10 07:32:41 - MAP@100: 0.3254
2021-11-10 07:32:41 - MAP@1000: 0.3265
2021-11-10 07:32:41 - 

2021-11-10 07:32:41 - Recall@1: 0.2498
2021-11-10 07:32:41 - Recall@3: 0.3372
2021-11-10 07:32:41 - Recall@5: 0.3737
2021-11-10 07:32:41 - Recall@10: 0.4299
2021-11-10 07:32:41 - Recall@100: 0.6055
2021-11-10 07:32:41 - Recall@1000: 0.7607
2021-11-10 07:32:41 - 

2021-11-10 07:32:41 - P@1: 0.3045
2021-11-10 07:32:41 - P@3: 0.1512
2021-11-10 07:32:41 - P@5: 0.1042
2021-11-10 07:32:41 - P@10: 0.0643
2021-11-10 07:32:41 - P@100: 0.0106
2021-11-10 07:32:41 - P@1000: 0.0015
2021

  0%|          | 0/47382 [00:00<?, ?it/s]

2021-11-10 07:32:42 - Loaded 47382 TEST Documents.
2021-11-10 07:32:42 - Loading Queries...
2021-11-10 07:32:49 - Loaded 1072 TEST Queries.
2021-11-10 07:32:49 - Query Example: Yanked USB Key During Move
2021-11-10 07:32:49 - Activating Elasticsearch....
2021-11-10 07:32:49 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'unix', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-10 07:32:49 - Deleting previous Elasticsearch-Index named - unix
2021-11-10 07:32:49 - Creating fresh Elasticsearch-Index named - unix


  0%|          | 0/47382 [00:00<?, ?docs/s]
que: 100%|██████████| 9/9 [00:25<00:00,  2.85s/it]


2021-11-10 07:33:27 - 

2021-11-10 07:33:27 - NDCG@1: 0.2472
2021-11-10 07:33:27 - NDCG@3: 0.2761
2021-11-10 07:33:27 - NDCG@5: 0.2908
2021-11-10 07:33:27 - NDCG@10: 0.3077
2021-11-10 07:33:27 - NDCG@100: 0.3515
2021-11-10 07:33:27 - NDCG@1000: 0.3807
2021-11-10 07:33:27 - 

2021-11-10 07:33:27 - MAP@1: 0.2168
2021-11-10 07:33:27 - MAP@3: 0.2561
2021-11-10 07:33:27 - MAP@5: 0.2648
2021-11-10 07:33:27 - MAP@10: 0.2721
2021-11-10 07:33:27 - MAP@100: 0.2808
2021-11-10 07:33:27 - MAP@1000: 0.2820
2021-11-10 07:33:27 - 

2021-11-10 07:33:27 - Recall@1: 0.2168
2021-11-10 07:33:27 - Recall@3: 0.2986
2021-11-10 07:33:27 - Recall@5: 0.3353
2021-11-10 07:33:27 - Recall@10: 0.3841
2021-11-10 07:33:27 - Recall@100: 0.5819
2021-11-10 07:33:27 - Recall@1000: 0.7931
2021-11-10 07:33:27 - 

2021-11-10 07:33:27 - P@1: 0.2472
2021-11-10 07:33:27 - P@3: 0.1200
2021-11-10 07:33:27 - P@5: 0.0815
2021-11-10 07:33:27 - P@10: 0.0477
2021-11-10 07:33:27 - P@100: 0.0076
2021-11-10 07:33:27 - P@1000: 0.0011
2021

  0%|          | 0/48605 [00:00<?, ?it/s]

2021-11-10 07:33:29 - Loaded 48605 TEST Documents.
2021-11-10 07:33:29 - Doc Example: {'text': "In a shortcode context, is there any difference here?               array(             'slideshow' => '',         ),       and               array(             'slideshow' => NULL,         ),       Is there a best practice for that?", 'title': 'What is the difference between Null vs Empty (Zero Length) string?'}
2021-11-10 07:33:29 - Loading Queries...
2021-11-10 07:33:31 - Loaded 541 TEST Queries.
2021-11-10 07:33:31 - Query Example: How to enqueue script or style in a theme's template file?
2021-11-10 07:33:31 - Activating Elasticsearch....
2021-11-10 07:33:31 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'wordpress', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-10 07:33:31 - Deleting previous Elasticsearch-Index named - wordpress
2021-11-10 07:33:31 

  0%|          | 0/48605 [00:00<?, ?docs/s]
que: 100%|██████████| 5/5 [00:13<00:00,  2.72s/it]


2021-11-10 07:33:56 - 

2021-11-10 07:33:56 - NDCG@1: 0.2107
2021-11-10 07:33:56 - NDCG@3: 0.2577
2021-11-10 07:33:56 - NDCG@5: 0.2769
2021-11-10 07:33:56 - NDCG@10: 0.2921
2021-11-10 07:33:56 - NDCG@100: 0.3324
2021-11-10 07:33:56 - NDCG@1000: 0.3630
2021-11-10 07:33:56 - 

2021-11-10 07:33:56 - MAP@1: 0.1945
2021-11-10 07:33:56 - MAP@3: 0.2394
2021-11-10 07:33:56 - MAP@5: 0.2503
2021-11-10 07:33:56 - MAP@10: 0.2569
2021-11-10 07:33:56 - MAP@100: 0.2647
2021-11-10 07:33:56 - MAP@1000: 0.2657
2021-11-10 07:33:56 - 

2021-11-10 07:33:56 - Recall@1: 0.1945
2021-11-10 07:33:56 - Recall@3: 0.2934
2021-11-10 07:33:56 - Recall@5: 0.3384
2021-11-10 07:33:56 - Recall@10: 0.3816
2021-11-10 07:33:56 - Recall@100: 0.5706
2021-11-10 07:33:56 - Recall@1000: 0.8097
2021-11-10 07:33:56 - 

2021-11-10 07:33:56 - P@1: 0.2107
2021-11-10 07:33:56 - P@3: 0.1078
2021-11-10 07:33:56 - P@5: 0.0762
2021-11-10 07:33:56 - P@10: 0.0442
2021-11-10 07:33:56 - P@100: 0.0068
2021-11-10 07:33:56 - P@1000: 0.0010
2021

  0%|          | 0/16705 [00:00<?, ?it/s]

2021-11-10 07:33:56 - Loaded 16705 TEST Documents.
2021-11-10 07:33:56 - Doc Example: {'text': "I'm trying to use `Get` to load some pretty substantial packages from a custom menu in the _Mathematica_ toolbar (added via MenuSetup.tr).   The problem is, the standard 5-second evaluation timeout seems to apply to commands executed with `KernelExecute`, so only a fraction of my `Get` is evaluated before the command times out. I'm wondering whether there's an option that can be passed to `KernelExecute` (or to `Item` / `MenuItem`) that will remove that time constraint so that my command can be executed completely.", 'title': 'Time constraints on KernelExecute commands or MenuItems?'}
2021-11-10 07:33:56 - Loading Queries...
2021-11-10 07:34:02 - Loaded 804 TEST Queries.
2021-11-10 07:34:02 - Query Example: How to use Automorphisms[] on a graph?
2021-11-10 07:34:02 - Activating Elasticsearch....
2021-11-10 07:34:02 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'mathem

  0%|          | 0/16705 [00:00<?, ?docs/s]
que: 100%|██████████| 7/7 [00:17<00:00,  2.54s/it]


2021-11-10 07:34:26 - 

2021-11-10 07:34:26 - NDCG@1: 0.1530
2021-11-10 07:34:26 - NDCG@3: 0.1758
2021-11-10 07:34:26 - NDCG@5: 0.2004
2021-11-10 07:34:26 - NDCG@10: 0.2212
2021-11-10 07:34:26 - NDCG@100: 0.2722
2021-11-10 07:34:26 - NDCG@1000: 0.3018
2021-11-10 07:34:26 - 

2021-11-10 07:34:26 - MAP@1: 0.1235
2021-11-10 07:34:26 - MAP@3: 0.1566
2021-11-10 07:34:26 - MAP@5: 0.1713
2021-11-10 07:34:26 - MAP@10: 0.1802
2021-11-10 07:34:26 - MAP@100: 0.1904
2021-11-10 07:34:26 - MAP@1000: 0.1915
2021-11-10 07:34:26 - 

2021-11-10 07:34:26 - Recall@1: 0.1235
2021-11-10 07:34:26 - Recall@3: 0.1931
2021-11-10 07:34:26 - Recall@5: 0.2528
2021-11-10 07:34:26 - Recall@10: 0.3160
2021-11-10 07:34:26 - Recall@100: 0.5407
2021-11-10 07:34:26 - Recall@1000: 0.7550
2021-11-10 07:34:26 - 

2021-11-10 07:34:26 - P@1: 0.1530
2021-11-10 07:34:26 - P@3: 0.0825
2021-11-10 07:34:26 - P@5: 0.0654
2021-11-10 07:34:26 - P@10: 0.0412
2021-11-10 07:34:26 - P@100: 0.0078
2021-11-10 07:34:26 - P@1000: 0.0011
2021

  0%|          | 0/17405 [00:00<?, ?it/s]

2021-11-10 07:34:26 - Loaded 17405 TEST Documents.
2021-11-10 07:34:26 - Doc Example: {'text': 'I\'m making a website for a small hotel in php. The hotel owners want a reservation system that uses paypal. They want people to see a calendar and choose a date to make a reservation. If the day has vacancy, they want the user to request booking a room. This would then require the hotel owner to accept the purchase. I have not worked on a project that has this "request to purchase" method of buying with paypal. Is this possible? Does anyone know of an open php system that handles this?', 'title': 'Hotel Reservation Request Booking Paypal PHP'}
2021-11-10 07:34:26 - Loading Queries...
2021-11-10 07:34:28 - Loaded 506 TEST Queries.
2021-11-10 07:34:28 - Query Example: Someone else is using our Google Analytics Tracking code number. What do we do?
2021-11-10 07:34:28 - Activating Elasticsearch....
2021-11-10 07:34:28 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'webmas

  0%|          | 0/17405 [00:00<?, ?docs/s]
que: 100%|██████████| 4/4 [00:11<00:00,  2.95s/it]


2021-11-10 07:34:44 - 

2021-11-10 07:34:44 - NDCG@1: 0.2648
2021-11-10 07:34:44 - NDCG@3: 0.3029
2021-11-10 07:34:44 - NDCG@5: 0.3169
2021-11-10 07:34:44 - NDCG@10: 0.3313
2021-11-10 07:34:44 - NDCG@100: 0.3773
2021-11-10 07:34:44 - NDCG@1000: 0.4096
2021-11-10 07:34:44 - 

2021-11-10 07:34:44 - MAP@1: 0.2217
2021-11-10 07:34:44 - MAP@3: 0.2709
2021-11-10 07:34:44 - MAP@5: 0.2817
2021-11-10 07:34:44 - MAP@10: 0.2897
2021-11-10 07:34:44 - MAP@100: 0.3015
2021-11-10 07:34:44 - MAP@1000: 0.3034
2021-11-10 07:34:44 - 

2021-11-10 07:34:44 - Recall@1: 0.2217
2021-11-10 07:34:44 - Recall@3: 0.3196
2021-11-10 07:34:44 - Recall@5: 0.3597
2021-11-10 07:34:44 - Recall@10: 0.4084
2021-11-10 07:34:44 - Recall@100: 0.6136
2021-11-10 07:34:44 - Recall@1000: 0.8241
2021-11-10 07:34:44 - 

2021-11-10 07:34:44 - P@1: 0.2648
2021-11-10 07:34:44 - P@3: 0.1397
2021-11-10 07:34:44 - P@5: 0.0984
2021-11-10 07:34:44 - P@10: 0.0583
2021-11-10 07:34:44 - P@100: 0.0112
2021-11-10 07:34:44 - P@1000: 0.0020
2021

/content/datasets/quora.zip:   0%|          | 0.00/15.1M [00:00<?, ?iB/s]

2021-11-10 07:34:46 - Unzipping quora.zip ...
Dataset downloaded here: /content/datasets/quora
2021-11-10 07:34:47 - Loading Corpus...


  0%|          | 0/522931 [00:00<?, ?it/s]

2021-11-10 07:34:50 - Loaded 522931 TEST Documents.
2021-11-10 07:34:50 - Doc Example: {'text': 'What is the step by step guide to invest in share market in india?', 'title': ''}
2021-11-10 07:34:50 - Loading Queries...
2021-11-10 07:34:50 - Loaded 10000 TEST Queries.
2021-11-10 07:34:50 - Query Example: Which question should I ask on Quora?
2021-11-10 07:34:50 - Activating Elasticsearch....
2021-11-10 07:34:50 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'quora', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-10 07:34:50 - Deleting previous Elasticsearch-Index named - quora
2021-11-10 07:34:50 - Creating fresh Elasticsearch-Index named - quora


  0%|          | 0/522931 [00:00<?, ?docs/s]
que: 100%|██████████| 79/79 [03:26<00:00,  2.61s/it]


2021-11-10 07:39:23 - 

2021-11-10 07:39:23 - NDCG@1: 0.7230
2021-11-10 07:39:23 - NDCG@3: 0.7701
2021-11-10 07:39:23 - NDCG@5: 0.7895
2021-11-10 07:39:23 - NDCG@10: 0.8077
2021-11-10 07:39:23 - NDCG@100: 0.8277
2021-11-10 07:39:23 - NDCG@1000: 0.8312
2021-11-10 07:39:23 - 

2021-11-10 07:39:23 - MAP@1: 0.6310
2021-11-10 07:39:23 - MAP@3: 0.7294
2021-11-10 07:39:23 - MAP@5: 0.7476
2021-11-10 07:39:23 - MAP@10: 0.7596
2021-11-10 07:39:23 - MAP@100: 0.7669
2021-11-10 07:39:23 - MAP@1000: 0.7672
2021-11-10 07:39:23 - 

2021-11-10 07:39:23 - Recall@1: 0.6310
2021-11-10 07:39:23 - Recall@3: 0.7969
2021-11-10 07:39:23 - Recall@5: 0.8495
2021-11-10 07:39:23 - Recall@10: 0.9022
2021-11-10 07:39:23 - Recall@100: 0.9770
2021-11-10 07:39:23 - Recall@1000: 0.9957
2021-11-10 07:39:23 - 

2021-11-10 07:39:23 - P@1: 0.7230
2021-11-10 07:39:23 - P@3: 0.3324
2021-11-10 07:39:23 - P@5: 0.2202
2021-11-10 07:39:23 - P@10: 0.1217
2021-11-10 07:39:23 - P@100: 0.0145
2021-11-10 07:39:23 - P@1000: 0.0015
2021

/content/datasets/dbpedia-entity.zip:   0%|          | 0.00/610M [00:00<?, ?iB/s]

2021-11-10 07:39:37 - Unzipping dbpedia-entity.zip ...
Dataset downloaded here: /content/datasets/dbpedia-entity
2021-11-10 07:39:54 - Loading Corpus...


  0%|          | 0/4635922 [00:00<?, ?it/s]

2021-11-10 07:40:33 - Loaded 4635922 TEST Documents.
2021-11-10 07:40:33 - Doc Example: {'text': "Animalia is an illustrated children's book by Graeme Base. It was originally published in 1986, followed by a tenth anniversary edition in 1996, and a 25th anniversary edition in 2012. Over three million copies have been sold.   A special numbered and signed anniversary edition was also published in 1996, with an embossed gold jacket.", 'title': 'Animalia (book)'}
2021-11-10 07:40:33 - Loading Queries...
2021-11-10 07:40:34 - Loaded 400 TEST Queries.
2021-11-10 07:40:34 - Query Example: Szechwan dish food cuisine
2021-11-10 07:40:34 - Activating Elasticsearch....
2021-11-10 07:40:34 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'dbpedia-entity', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-10 07:40:34 - Deleting previous Elasticsearch-Index named - db

  0%|          | 0/4635922 [00:00<?, ?docs/s]
que: 100%|██████████| 4/4 [00:21<00:00,  5.27s/it]


2021-11-10 07:53:29 - 

2021-11-10 07:53:29 - NDCG@1: 0.3937
2021-11-10 07:53:29 - NDCG@3: 0.3461
2021-11-10 07:53:29 - NDCG@5: 0.3299
2021-11-10 07:53:29 - NDCG@10: 0.3202
2021-11-10 07:53:29 - NDCG@100: 0.3620
2021-11-10 07:53:29 - NDCG@1000: 0.4220
2021-11-10 07:53:29 - 

2021-11-10 07:53:29 - MAP@1: 0.0648
2021-11-10 07:53:29 - MAP@3: 0.1068
2021-11-10 07:53:30 - MAP@5: 0.1245
2021-11-10 07:53:30 - MAP@10: 0.1520
2021-11-10 07:53:30 - MAP@100: 0.2171
2021-11-10 07:53:30 - MAP@1000: 0.2276
2021-11-10 07:53:30 - 

2021-11-10 07:53:30 - Recall@1: 0.0648
2021-11-10 07:53:30 - Recall@3: 0.1227
2021-11-10 07:53:30 - Recall@5: 0.1512
2021-11-10 07:53:30 - Recall@10: 0.2088
2021-11-10 07:53:30 - Recall@100: 0.4347
2021-11-10 07:53:30 - Recall@1000: 0.6387
2021-11-10 07:53:30 - 

2021-11-10 07:53:30 - P@1: 0.5075
2021-11-10 07:53:30 - P@3: 0.3983
2021-11-10 07:53:30 - P@5: 0.3445
2021-11-10 07:53:30 - P@10: 0.2742
2021-11-10 07:53:30 - P@100: 0.0841
2021-11-10 07:53:30 - P@1000: 0.0144
2021

/content/datasets/scidocs.zip:   0%|          | 0.00/136M [00:00<?, ?iB/s]

2021-11-10 07:53:33 - Unzipping scidocs.zip ...
Dataset downloaded here: /content/datasets/scidocs
2021-11-10 07:53:36 - Loading Corpus...


  0%|          | 0/25657 [00:00<?, ?it/s]

2021-11-10 07:53:38 - Loaded 25657 TEST Documents.
2021-11-10 07:53:38 - Doc Example: {'text': 'An evolutionary recurrent network which automates the design of recurrent neural/fuzzy networks using a new evolutionary learning algorithm is proposed in this paper. This new evolutionary learning algorithm is based on a hybrid of genetic algorithm (GA) and particle swarm optimization (PSO), and is thus called HGAPSO. In HGAPSO, individuals in a new generation are created, not only by crossover and mutation operation as in GA, but also by PSO. The concept of elite strategy is adopted in HGAPSO, where the upper-half of the best-performing individuals in a population are regarded as elites. However, instead of being reproduced directly to the next generation, these elites are first enhanced. The group constituted by the elites is regarded as a swarm, and each elite corresponds to a particle within it. In this regard, the elites are enhanced by PSO, an operation which mimics the maturing pheno

  0%|          | 0/25657 [00:00<?, ?docs/s]
que: 100%|██████████| 8/8 [00:24<00:00,  3.01s/it]


2021-11-10 07:54:14 - 

2021-11-10 07:54:14 - NDCG@1: 0.1900
2021-11-10 07:54:14 - NDCG@3: 0.1566
2021-11-10 07:54:14 - NDCG@5: 0.1351
2021-11-10 07:54:14 - NDCG@10: 0.1647
2021-11-10 07:54:14 - NDCG@100: 0.2306
2021-11-10 07:54:14 - NDCG@1000: 0.2739
2021-11-10 07:54:14 - 

2021-11-10 07:54:14 - MAP@1: 0.0387
2021-11-10 07:54:14 - MAP@3: 0.0703
2021-11-10 07:54:14 - MAP@5: 0.0827
2021-11-10 07:54:14 - MAP@10: 0.0964
2021-11-10 07:54:14 - MAP@100: 0.1120
2021-11-10 07:54:14 - MAP@1000: 0.1143
2021-11-10 07:54:14 - 

2021-11-10 07:54:14 - Recall@1: 0.0387
2021-11-10 07:54:14 - Recall@3: 0.0891
2021-11-10 07:54:14 - Recall@5: 0.1190
2021-11-10 07:54:14 - Recall@10: 0.1737
2021-11-10 07:54:14 - Recall@100: 0.3678
2021-11-10 07:54:14 - Recall@1000: 0.5785
2021-11-10 07:54:14 - 

2021-11-10 07:54:14 - P@1: 0.1900
2021-11-10 07:54:14 - P@3: 0.1463
2021-11-10 07:54:14 - P@5: 0.1174
2021-11-10 07:54:14 - P@10: 0.0857
2021-11-10 07:54:14 - P@100: 0.0181
2021-11-10 07:54:14 - P@1000: 0.0029
2021

/content/datasets/fever.zip:   0%|          | 0.00/1.15G [00:00<?, ?iB/s]

2021-11-10 07:54:40 - Unzipping fever.zip ...
Dataset downloaded here: /content/datasets/fever
2021-11-10 07:55:11 - Loading Corpus...


  0%|          | 0/5416568 [00:00<?, ?it/s]

2021-11-10 07:55:57 - Loaded 5416568 TEST Documents.
2021-11-10 07:55:57 - Doc Example: {'text': 'The following are the football ( soccer ) events of the year 1928 throughout the world .', 'title': '1928 in association football'}
2021-11-10 07:55:57 - Loading Queries...
2021-11-10 07:55:58 - Loaded 6666 TEST Queries.
2021-11-10 07:55:58 - Query Example: Ukrainian Soviet Socialist Republic was a founding participant of the UN.
2021-11-10 07:55:58 - Activating Elasticsearch....
2021-11-10 07:55:58 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'fever', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-10 07:55:58 - Deleting previous Elasticsearch-Index named - fever
2021-11-10 07:55:58 - Creating fresh Elasticsearch-Index named - fever


  0%|          | 0/5416568 [00:00<?, ?docs/s]
que: 100%|██████████| 53/53 [08:32<00:00,  9.67s/it]


2021-11-10 08:20:49 - 

2021-11-10 08:20:49 - NDCG@1: 0.4862
2021-11-10 08:20:49 - NDCG@3: 0.5998
2021-11-10 08:20:49 - NDCG@5: 0.6264
2021-11-10 08:20:49 - NDCG@10: 0.6494
2021-11-10 08:20:49 - NDCG@100: 0.6739
2021-11-10 08:20:49 - NDCG@1000: 0.6794
2021-11-10 08:20:49 - 

2021-11-10 08:20:49 - MAP@1: 0.4580
2021-11-10 08:20:49 - MAP@3: 0.5600
2021-11-10 08:20:49 - MAP@5: 0.5754
2021-11-10 08:20:49 - MAP@10: 0.5854
2021-11-10 08:20:49 - MAP@100: 0.5909
2021-11-10 08:20:49 - MAP@1000: 0.5911
2021-11-10 08:20:49 - 

2021-11-10 08:20:49 - Recall@1: 0.4580
2021-11-10 08:20:49 - Recall@3: 0.6815
2021-11-10 08:20:49 - Recall@5: 0.7458
2021-11-10 08:20:49 - Recall@10: 0.8141
2021-11-10 08:20:49 - Recall@100: 0.9215
2021-11-10 08:20:49 - Recall@1000: 0.9598
2021-11-10 08:20:49 - 

2021-11-10 08:20:49 - P@1: 0.4862
2021-11-10 08:20:49 - P@3: 0.2449
2021-11-10 08:20:49 - P@5: 0.1617
2021-11-10 08:20:49 - P@10: 0.0888
2021-11-10 08:20:49 - P@100: 0.0103
2021-11-10 08:20:49 - P@1000: 0.0011
2021

/content/datasets/climate-fever.zip:   0%|          | 0.00/1.14G [00:00<?, ?iB/s]

2021-11-10 08:21:17 - Unzipping climate-fever.zip ...
Dataset downloaded here: /content/datasets/climate-fever
2021-11-10 08:21:49 - Loading Corpus...


  0%|          | 0/5416593 [00:00<?, ?it/s]

2021-11-10 08:22:36 - Loaded 5416593 TEST Documents.
2021-11-10 08:22:36 - Doc Example: {'text': 'The following are the football ( soccer ) events of the year 1928 throughout the world .', 'title': '1928 in association football'}
2021-11-10 08:22:36 - Loading Queries...
2021-11-10 08:22:36 - Loaded 1535 TEST Queries.
2021-11-10 08:22:36 - Query Example: Global warming is driving polar bears toward extinction
2021-11-10 08:22:36 - Activating Elasticsearch....
2021-11-10 08:22:36 - Elastic Search Credentials: {'hostname': 'localhost', 'index_name': 'climate-fever', 'keys': {'title': 'title', 'body': 'txt'}, 'timeout': 100, 'retry_on_timeout': True, 'maxsize': 24, 'number_of_shards': 'default', 'language': 'english'}
2021-11-10 08:22:36 - Deleting previous Elasticsearch-Index named - climate-fever
2021-11-10 08:22:36 - Creating fresh Elasticsearch-Index named - climate-fever


  0%|          | 0/5416593 [00:00<?, ?docs/s]
que: 100%|██████████| 12/12 [03:35<00:00, 17.98s/it]


2021-11-10 08:41:45 - 

2021-11-10 08:41:45 - NDCG@1: 0.1674
2021-11-10 08:41:45 - NDCG@3: 0.1499
2021-11-10 08:41:45 - NDCG@5: 0.1615
2021-11-10 08:41:45 - NDCG@10: 0.1862
2021-11-10 08:41:45 - NDCG@100: 0.2368
2021-11-10 08:41:45 - NDCG@1000: 0.2697
2021-11-10 08:41:45 - 

2021-11-10 08:41:45 - MAP@1: 0.0756
2021-11-10 08:41:45 - MAP@3: 0.1081
2021-11-10 08:41:45 - MAP@5: 0.1178
2021-11-10 08:41:45 - MAP@10: 0.1281
2021-11-10 08:41:45 - MAP@100: 0.1388
2021-11-10 08:41:45 - MAP@1000: 0.1403
2021-11-10 08:41:45 - 

2021-11-10 08:41:45 - Recall@1: 0.0756
2021-11-10 08:41:45 - Recall@3: 0.1418
2021-11-10 08:41:45 - Recall@5: 0.1749
2021-11-10 08:41:45 - Recall@10: 0.2318
2021-11-10 08:41:45 - Recall@100: 0.4093
2021-11-10 08:41:45 - Recall@1000: 0.5977
2021-11-10 08:41:45 - 

2021-11-10 08:41:45 - P@1: 0.1674
2021-11-10 08:41:45 - P@3: 0.1090
2021-11-10 08:41:45 - P@5: 0.0837
2021-11-10 08:41:45 - P@10: 0.0575
2021-11-10 08:41:45 - P@100: 0.0111
2021-11-10 08:41:45 - P@1000: 0.0017
2021

/content/datasets/scifact.zip:   0%|          | 0.00/2.69M [00:00<?, ?iB/s]

2021-11-10 08:41:46 - Unzipping scifact.zip ...
Dataset downloaded here: /content/datasets/scifact
2021-11-10 08:41:46 - Loading Corpus...


  0%|          | 0/5183 [00:00<?, ?it/s]

2021-11-10 08:41:47 - Loaded 5183 TEST Documents.
2021-11-10 08:41:47 - Doc Example: {'text': 'Alterations of the architecture of cerebral white matter in the developing human brain can affect cortical development and result in functional disabilities. A line scan diffusion-weighted magnetic resonance imaging (MRI) sequence with diffusion tensor analysis was applied to measure the apparent diffusion coefficient, to calculate relative anisotropy, and to delineate three-dimensional fiber architecture in cerebral white matter in preterm (n = 17) and full-term infants (n = 7). To assess effects of prematurity on cerebral white matter development, early gestation preterm infants (n = 10) were studied a second time at term. In the central white matter the mean apparent diffusion coefficient at 28 wk was high, 1.8 microm2/ms, and decreased toward term to 1.2 microm2/ms. In the posterior limb of the internal capsule, the mean apparent diffusion coefficients at both times were similar (1.2 vers

  0%|          | 0/5183 [00:00<?, ?docs/s]
que: 100%|██████████| 3/3 [00:06<00:00,  2.18s/it]


2021-11-10 08:42:00 - 

2021-11-10 08:42:00 - NDCG@1: 0.5100
2021-11-10 08:42:00 - NDCG@3: 0.5607
2021-11-10 08:42:00 - NDCG@5: 0.5895
2021-11-10 08:42:00 - NDCG@10: 0.6110
2021-11-10 08:42:00 - NDCG@100: 0.6329
2021-11-10 08:42:00 - NDCG@1000: 0.6397
2021-11-10 08:42:00 - 

2021-11-10 08:42:00 - MAP@1: 0.4961
2021-11-10 08:42:00 - MAP@3: 0.5428
2021-11-10 08:42:00 - MAP@5: 0.5593
2021-11-10 08:42:00 - MAP@10: 0.5700
2021-11-10 08:42:00 - MAP@100: 0.5749
2021-11-10 08:42:00 - MAP@1000: 0.5751
2021-11-10 08:42:00 - 

2021-11-10 08:42:00 - Recall@1: 0.4961
2021-11-10 08:42:00 - Recall@3: 0.5961
2021-11-10 08:42:00 - Recall@5: 0.6639
2021-11-10 08:42:00 - Recall@10: 0.7249
2021-11-10 08:42:00 - Recall@100: 0.8232
2021-11-10 08:42:00 - Recall@1000: 0.8767
2021-11-10 08:42:00 - 

2021-11-10 08:42:00 - P@1: 0.5100
2021-11-10 08:42:00 - P@3: 0.2100
2021-11-10 08:42:00 - P@5: 0.1427
2021-11-10 08:42:00 - P@10: 0.0797
2021-11-10 08:42:00 - P@100: 0.0092
2021-11-10 08:42:00 - P@1000: 0.0010


In [None]:
!pip install -U -q kaggle==1.5.8
!mkdir ~/.kaggle
# Upload kaggle.json before running this cell
!mv /content/kaggle.json ~/.kaggle/
# Move all json's to kaggle dataset
!mkdir bm25_results
!mv *.json bm25_results/
!kaggle datasets init -p bm25_results
# > Modify metadata

[?25l[K     |█████▌                          | 10 kB 27.4 MB/s eta 0:00:01[K     |███████████                     | 20 kB 32.8 MB/s eta 0:00:01[K     |████████████████▋               | 30 kB 36.4 MB/s eta 0:00:01[K     |██████████████████████▏         | 40 kB 30.5 MB/s eta 0:00:01[K     |███████████████████████████▊    | 51 kB 18.6 MB/s eta 0:00:01[K     |████████████████████████████████| 59 kB 4.9 MB/s 
[?25h  Building wheel for kaggle (setup.py) ... [?25l[?25hdone
  Building wheel for slugify (setup.py) ... [?25l[?25hdone
Data package template written to: bm25_results/dataset-metadata.json


In [None]:
# Upload to kaggle
!kaggle datasets create -p bm25_results/ --dir-mode zip

Starting upload for file results_cqadupstack_english.json
100% 29.2M/29.2M [00:02<00:00, 10.6MB/s]
Upload successful: results_cqadupstack_english.json (29MB)
Starting upload for file results_webis-touche2020.json
100% 2.51M/2.51M [00:02<00:00, 1.11MB/s]
Upload successful: results_webis-touche2020.json (3MB)
Starting upload for file results_cqadupstack_physics.json
100% 19.3M/19.3M [00:05<00:00, 3.82MB/s]
Upload successful: results_cqadupstack_physics.json (19MB)
Starting upload for file results_cqadupstack_unix.json
100% 20.3M/20.3M [00:02<00:00, 7.93MB/s]
Upload successful: results_cqadupstack_unix.json (20MB)
Starting upload for file results_cqadupstack_tex.json
100% 55.6M/55.6M [00:03<00:00, 17.7MB/s]
Upload successful: results_cqadupstack_tex.json (56MB)
Starting upload for file results_arguana.json
100% 66.3M/66.3M [00:02<00:00, 24.0MB/s]
Upload successful: results_arguana.json (66MB)
Starting upload for file results_fiqa.json
100% 12.6M/12.6M [00:01<00:00, 7.12MB/s]
Upload succes

##### Run on BEIR - Extra datasets (For BioASQ, see separate NB)

In [None]:
!mkdir /root/.kaggle
!cp /content/kaggle.json /root/.kaggle/

In [None]:
!kaggle datasets download -d signal1m
!unzip ./signal1m.zip

Downloading signal1m.zip to /content
 89% 137M/154M [00:12<00:01, 13.6MB/s]
100% 154M/154M [00:12<00:00, 12.6MB/s]
Archive:  ./signal1m.zip
  inflating: signal1m/corpus.jsonl   
  inflating: signal1m/qrels/test.tsv  
  inflating: signal1m/queries.jsonl  


In [None]:
!kaggle datasets download -d trecnews
!unzip ./trecnews.zip

Downloading trecnews.zip to /content
 99% 809M/820M [00:29<00:00, 17.8MB/s]
100% 820M/820M [00:29<00:00, 29.2MB/s]
Archive:  ./trecnews.zip
  inflating: trec-news/corpus.jsonl  
  inflating: trec-news/qrels/test.tsv  
  inflating: trec-news/queries.jsonl  


In [None]:
!kaggle datasets download -d robust
!unzip ./robust.zip

In [None]:
import json

from beir.datasets.data_loader import GenericDataLoader
from beir.retrieval.search.lexical import BM25Search as BM25
from beir.retrieval.evaluation import EvaluateRetrieval


def run_bm25(dataset, data_path, initialize = True, hostname = "localhost"):
    corpus, queries, qrels = GenericDataLoader(data_path).load(split="test")
    model = BM25(index_name=dataset, hostname=hostname, initialize=initialize)
    retriever = EvaluateRetrieval(model)
    #### Retrieve dense results (format of results is identical to qrels)
    results = retriever.retrieve(corpus, queries)
    # Save scores for top 1000 docs for each query, i.e. 1000 * queries lines
    with open(f"./results_{dataset}.json", 'w') as fp:
        json.dump(results, fp)
    ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
    return ndcg


### Signal 1M

#run_bm25("signal1m", "signal1m")

##### TREC-News

#run_bm25("trec-news", "trec-news")

##### TREC-Robust

#run_bm25("robust04", "robust04")

  from tqdm.autonotebook import tqdm


  0%|          | 0/14913938 [00:00<?, ?it/s]

In [None]:
!mv /content/TREC-Disk-4 /root/.ir_datasets/trec-robust04/trec45/
!mv /content/TREC-Disk-5 /root/.ir_datasets/trec-robust04/trec45/

In [None]:
!pip install -q ir_datasets
!pip install -q jsonlines

In [None]:
import ir_datasets
dataset = ir_datasets.load("trec-robust04")
for query in dataset.queries_iter():
    print(query)
    break

[INFO] Please confirm you agree to the TREC Robust 2004 data usage agreement found at <https://trec.nist.gov/data/cd45/index.html>
[INFO] [starting] https://trec.nist.gov/data/robust/04.testset.gz
[INFO] [finished] https://trec.nist.gov/data/robust/04.testset.gz: [00:00] [34.3kB] [195kB/s]
                                                                           

TrecQuery(query_id='301', title='International Organized Crime', description='Identify organizations that participate in international criminal\nactivity, the activity, and, if possible, collaborating organizations\nand the countries involved.', narrative='A relevant document must as a minimum identify the organization and the\ntype of illegal activity (e.g., Columbian cartel exporting cocaine).\nVague references to international drug trade without identification of\nthe organization(s) involved would not be relevant.')




In [None]:
### Robust04 preprocessing ###

# Also see https://ir-datasets.com/trec-robust04.html#trec-robust04

import jsonlines
import ir_datasets

dataset = ir_datasets.load("trec-robust04")

with jsonlines.open(f'queries.jsonl', 'w') as jsonl_out:
    for query in dataset.queries_iter():
        line = {"_id": query.query_id, "text": query.description}
        jsonl_out.write(line)

with jsonlines.open(f'corpus.jsonl', 'w') as jsonl_out:
    for doc in dataset.docs_iter():
        line = {"_id": doc.doc_id, "text": doc.text}
        jsonl_out.write(line)

with open("test.tsv", "w") as record_file:
    record_file.write("query-id\tcorpus-id\tscore\n")
    for qrel in dataset.qrels_iter():
        record_file.write(f"{qrel.query_id}\t{qrel.doc_id}\t{qrel.relevance}\n")


!mkdir robust04
!mkdir robust04/qrels

!mv /content/queries.jsonl robust04/
!mv /content/corpus.jsonl robust04/
!mv /content/test.tsv robust04/qrels/test.tsv

In [None]:
def compute_result(dataset, data_path, initialize = True, hostname = "localhost", k_values=[1, 10, 100, 1000]):
    corpus, queries, qrels = GenericDataLoader(data_path).load(split="test")
    with open(f"./results_{dataset}.json", 'r') as fp:
        results = json.load(fp)
    ndcg, _map, recall, precision = EvaluateRetrieval.evaluate(qrels, results, k_values)
    return ndcg

print(compute_result("signal1m", "signal1m"))
print(compute_result("trec-news", "trec-news"))

2021-11-23 04:20:03 - Loading Corpus...


  0%|          | 0/2866316 [00:00<?, ?it/s]

2021-11-23 04:20:22 - Loaded 2866316 TEST Documents.
2021-11-23 04:20:22 - Doc Example: {'text': 'This Boston college professor who lives in #NH is on leave after being arrested for child pornography, endangerment:', 'title': ''}
2021-11-23 04:20:22 - Loading Queries...
2021-11-23 04:20:22 - Loaded 97 TEST Queries.
2021-11-23 04:20:22 - Query Example: VIDEO:Good Samaritans Stop Alleged Hit-and-Run Driver in Miami
2021-11-23 04:20:22 - 

2021-11-23 04:20:22 - NDCG@1: 0.4021
2021-11-23 04:20:22 - NDCG@10: 0.3302
2021-11-23 04:20:22 - NDCG@100: 0.3313
2021-11-23 04:20:22 - NDCG@1000: 0.3890
2021-11-23 04:20:22 - 

2021-11-23 04:20:22 - MAP@1: 0.0293
2021-11-23 04:20:22 - MAP@10: 0.1249
2021-11-23 04:20:22 - MAP@100: 0.1765
2021-11-23 04:20:22 - MAP@1000: 0.1839
2021-11-23 04:20:22 - 

2021-11-23 04:20:22 - Recall@1: 0.0293
2021-11-23 04:20:22 - Recall@10: 0.1628
2021-11-23 04:20:22 - Recall@100: 0.3737
2021-11-23 04:20:22 - Recall@1000: 0.5608
2021-11-23 04:20:22 - 

2021-11-23 04:20:22 -

  0%|          | 0/594977 [00:00<?, ?it/s]

2021-11-23 04:20:45 - Loaded 594977 TEST Documents.
2021-11-23 04:20:45 - Doc Example: {'text': 'NEW ORLEANS — Whenever a Virginia Tech offensive coach is asked how the most prolific receiving duo in school history came to be, inevitably the first road game in 2008 against North Carolina comes up. Midway through the first quarter, Virginia Tech had to call two timeouts in a row because then-freshmen Jarrett Boykin and Danny Coale couldn’t seem to line up right, and “they had those big eyes out there looking around,” Kevin Sherman, their position coach, said recently. Now that Boykin and Coale have only Tuesday’s Sugar Bowl remaining before leaving Virginia Tech with every major school record for a wide receiver, they’ve taken a different stance. “I still don’t think that was on us. Macho [Harris] was in the game and he lined up wrong,” said Boykin, as Coale sat next to him nodding in agreement. Just add that to the list of slights these seniors have had to overcome. Boykin has been the

In [None]:
from google.colab import files
files.download('/content/results_signal1m.json')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

##### Evaluate BEIR

##### References

- https://stackoverflow.com/questions/58418844/google-colaboratory-can-i-use-it-to-connect-to-a-local-server
