In [1]:
import json
import re

from time import sleep

## Downloading the Dataset

To download the dataset, you must first first initialize the environment variables with the `setup_env.sh` under `gamechangerml` with the `DEV` argument. You can then run the `dl_data_cli.py` under `gamechangerml/src/search/evaluation/`. It will prompt you to the name of the dataset and where the dataset will be downloaded. For this example, we will be using the `msmarco_1k` dataset.

![](./assets/dl_dataset.png)

In [2]:
def load_json(fpath):
    with open(fpath, "r") as fp:
        data = json.load(fp)
    return data

def save_json(data, fpath):
    with open(fpath, "w") as fp:
        json.dump(data, fp)

In [3]:
documents = load_json("./msmarco_1k/collection.json")
queries = load_json("./msmarco_1k/queries.json")

## Modifying the Documents or Inputs

At this point, you can modify how the inputs and documents based on your model architecture. This can be adding tokens or words at the end or processing the text to your need. For this example, we'll simply perform an ASCII cleanup on the text.

In [4]:
def clean_text(text):
    clean_text = re.sub(r'\W+', ' ', text)
    return clean_text

In [5]:
clean_text(documents['1333116'])

'Leptin from Greek Î ÎµÏ Ï Ï Ï leptos thin â the hormone of energy expenditureâ is a hormone predominantly made by adipose cells that helps to regulate energy balance by inhibiting hunger Leptin is opposed by the actions of the hormone ghrelin the hunger hormone '

As long as you maintain the mapping of the text to its corresponding document or query id, then it should be okay.

In [6]:
new_docs = {key:clean_text(value) for key, value in documents.items()}
new_queries = {key:clean_text(value) for key, value in queries.items()}

## Elasticsearch Setup

For this example, we'll setup an Elasticsearch container for test search. To run it, pull the image `elasticsearch:7.10.1`. If you don't have it, it should automatically pull it. On a separate terminal, you can run the command below.

`docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.10.1`

Some terminal logs should start populating but let it run for around 30 seconds for warm up.

In [7]:
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk

In [8]:
es = Elasticsearch()

In [9]:
# Setting the mapping for the Elasticsearch index
mapping = {
    "mappings": {
        "properties": {
            "doc_id": {"type": "text"},
            "body": {"type": "text"}
        }
    },
}

# Delete the index if it already exists
es.indices.delete(index = "documents", ignore = [400, 404])

# Creating the index
es.indices.create(index = "documents", ignore = 400, body = mapping)

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'documents'}

In [10]:
"""
I added a break here to stop running all of the code. For some reason, Elasticsearch doesn't accept the documents 
even after waiting 60 seconds but stopping the code then running the following code blocks work.
"""
assert 1 == 2

AssertionError: 

In [11]:
elastic_documents = []

for doc_id, document in new_docs.items():
    es_doc = {
        "_index": "documents",
        "doc_id": doc_id,
        "body": document
    }
    elastic_documents.append(es_doc)
    
bulk(es, elastic_documents)

(1000, [])

In [12]:
def search(text, n_return = 100):
    search_body = {
        "query": {
            "match": {
                "body": text
            }
        },
        "size": n_return
    }
    
    answers = es.search(index = "documents", body = search_body)
    answers = answers["hits"]["hits"]
    return answers

In [13]:
search(new_queries["1048579"], n_return = 2)

[{'_index': 'documents',
  '_type': '_doc',
  '_id': 'CFaXd3YBLF4VyGNYgJJu',
  '_score': 10.982681,
  '_source': {'doc_id': '7187227',
   'body': 'PCNT stands for 1 8 PCNT Pericentrin Medical 2 5 similar PCNT Panama Canal Net Tonnage Business Tanker Cargo shipping 3 1 PCNT Panama Canal Nett Tonnage 4 3 PCNT Public Carrier Networks Technology Technology Telecom Telecommunications 5 1 PCNT Paideia Commentaries on the New Testament '}},
 {'_index': 'documents',
  '_type': '_doc',
  '_id': 'LVaXd3YBLF4VyGNYgJNv',
  '_score': 4.495153,
  '_source': {'doc_id': '1304031',
   'body': ' and called us according to His own purpose 2 Timothy 1 9 KJV T o know one s purpose is to know who they themselves are and what they are doing They also have the determination to accomplish what it is that they are purposed to do '}}]

## Creating the Answer File

The EvalTool takes 2 JSON files to compare. The first is the predictions of the model and the second if the ground truth. The ground truth file comes with the evaluation dataset downloaded earlier and is called `relations.json`. The answer should have a JSON format shown below

In [14]:
# Sample format
sample_answer = {
    "query_id_1": {
        "document_id_1": 1,
        "document_id_2": 2,
        "document_id_3": 3
    },
    "query_id_2": {
        "document_id_4": 1,
        "document_id_5": 2,
        "document_id_6": 3
    }
}

The keys of the dictionary refers to the query. The dictionary mapped with that key is the ranked set of documents where the key is the document id and the value is the rank of that document assigned by the model.

In [15]:
model_answers = {}
for query_id, query in new_queries.items():
    answer = search(query, n_return = 25)
    query_answers = {}
    for idx, doc in enumerate(answer):
        query_answers[doc["_source"]["doc_id"]] = idx + 1
    model_answers[query_id] = query_answers

In [16]:
save_json(model_answers, "./eval_folder/answers.json")

## Running the Evaluation

You can run the `evaltool.py` under `gamechangerml/src/search/evaluation/` and plug the answers and ground truth JSON file. Example below:

`python gamechangerml/src/search/evaluation/evaltool.py -p gamechangerml/experimental/notebooks/evaluation/eval_folder/answers.json -g gamechangerml/experimental/notebooks/evaluation/msmarco_1k/relations.json -m gamechangerml/experimental/notebooks/evaluation/eval_folder/`

The script will then compare the prediction with the ground truth and generate a `metrics.json` which contains the score at varying values of `k`. It will also generate graphs of these metrics at different values of `k`. Interpretations of these metrics are shown in the [Explaining Evaluation](./Explaining_Evaluation.ipynb) notebook.