In this notebook we explore the `evaluate` function offered by `ranx`.

First of all we need to install [ranx](https://github.com/AmenRa/ranx)

Mind that the first time you run any ranx' functions they may take a while as they must be compiled first

In [None]:
!pip install -U ranx

Download the data we need

In [None]:
import os
import requests

for file in ["qrels", "results"]:
    os.makedirs("notebooks/data", exist_ok=True)

    with open(f"notebooks/data/{file}.txt", "w") as f:
        master = f"https://raw.githubusercontent.com/AmenRa/ranx/master/notebooks/data/{file}.test"
        f.write(requests.get(master).text)

In [None]:
from ranx import Qrels, Run, evaluate

In [None]:
qrels = Qrels.from_file("notebooks/data/qrels.test", kind="trec")
run = Run.from_file("notebooks/data/results.test", kind="trec")

Evaluate

For a full list of the available metrics see [here](https://amenra.github.io/ranx/metrics/).

In [None]:
# Single metric
print(evaluate(qrels, run, "hits"))
print(evaluate(qrels, run, "hit_rate"))
print(evaluate(qrels, run, "precision"))
print(evaluate(qrels, run, "recall"))
print(evaluate(qrels, run, "f1"))
print(evaluate(qrels, run, "r-precision"))
print(evaluate(qrels, run, "mrr"))
print(evaluate(qrels, run, "map"))
print(evaluate(qrels, run, "ndcg"))

In [None]:
# Single metric with cutoff
evaluate(qrels, run, "ndcg@10")

In [None]:
# Multiple metrics
evaluate(qrels, run, ["map", "mrr", "ndcg"])

In [None]:
# Multiple metrics with cutoffs (you can use different cutoffs for each metric)
evaluate(qrels, run, ["map@100", "mrr@10", "ndcg@10"])

In [None]:
# By default, scores are saved in the evaluated Run
# You can disable this behaviour by passing `save_results_in_run=False`
# when calling `evaluate`
run.mean_scores

In [None]:
import json  # Just for pretty printing

print(json.dumps(run.scores, indent=4))

# 301, 302, and 303 are the query ids

In [None]:
# Alternatively, per query scores can be extracted as Numpy Arrays by passing
# `return_mean = False` to `evaluate`
evaluate(qrels, run, ["map@100", "mrr@10", "ndcg@10"], return_mean=False)

In [None]:
# Finally, you can set the number of threads used for computing the metric
# scores, by passing `threads = n` to `evaluate`
# `threads = 0` by default, which means all the available threads will be used
# Note that if the number of queries is small, `ranx` will automatically set
# `threads = 1` to prevent performance degradations
evaluate(qrels, run, ["map@100", "mrr@10", "ndcg@10"], threads=1)