Rankings are a common system response in a variety of learning tasks like search, recommendation, and NLP. Developing evaluation metrics for rankings often involves meta-evaluation data consisting of rankings from a variety of systems for a shared set of system inputs. This repository simplifies gathering meta-evaluation data across several retrieval and recommendation domains.
Given a set of systems, for each request in the domain, we have,
- an incomplete set of item labels, and
- a set of top-k rankings, one per systems.
All data are in TREC format with relevance judgments (qrels) in standard four-column trec_eval format,
<query id><subtopic id><document id><relevance grade>
where subtopic id is a base 1 identifier (or "0"/"Q0" if there are no subtopic labels); if you are using this outside of the TREC context, you can just use "0" for the second column. The relevance grade is ordinal with <=0 indicating nonrelevance. If a query has no documents with relevance grade >0, it is removed from the evaluation.
System runs are assumed to be in standard six-column trec_eval format, with one system per file,
<query id><iteration><document id><rank><score>[<run id>]
where iteration, rank, and run id are often ignored, with the rank re-computed from the score.
IMPORTANT: You must have a license to access NIST's TREC data. For more information see here.
Set environment variables TREC_RESULTS_USER and TREC_RESULTS_PASSWORD and then run make. Data will be in the qrels and runs directories.
| domain tag | requests | systems | rel/request | items/request | reference |
|---|---|---|---|---|---|
| legal/2006 | 39 | 34 | 110.85 | 4835.07 | paper, www |
| legal/2007 | 43 | 68 | 101.023 | 22240.30 | paper, www |
| core/2017 | 50 | 75 | 180.04 | 8853.11 | paper, www |
| core/2018 | 50 | 72 | 78.96 | 7102.61 | www |
| deep-docs/2019 | 43 | 38 | 153.42 | 623.77 | paper, www |
| deep-docs/2020 | 45 | 64 | 39.27 | 99.55 | paper, www |
| deep-docs/2021 | 57 | 66 | 189.63 | 98.83 | paper, www |
| deep-docs/2022 | 76 | 42 | 1245.62 | 98.86 | paper, www |
| deep-docs/2023 | 82 | 5 | 75.10 | 100 | paper, www |
| deep-pass/2019 | 43 | 37 | 95.40 | 892.51 | paper, www |
| deep-pass/2020 | 54 | 59 | 66.78 | 978.01 | paper, www |
| deep-pass/2021 | 53 | 63 | 191.96 | 99.95 | paper, www |
| deep-pass/2022 | 76 | 100 | 628.145 | 97.5 | paper, www |
| deep-pass/2023 | 82 | 35 | 49.87 | 99.90 | paper, www |
| web/2009 | 50 | 48 | 129.98 | 925.31 | paper, www |
| web/2010 | 48 | 32 | 187.63 | 7013.21 | paper, www |
| web/2011 | 50 | 61 | 167.56 | 8325.07 | paper, www |
| web/2012 | 50 | 48 | 187.36 | 6719.53 | paper, www |
| web/2013 | 50 | 61 | 182.42 | 7174.38 | paper, www |
| web/2014 | 50 | 30 | 212.58 | 6313.98 | paper, www |
| robust/2004 | 249 | 110 | 69.93 | 913.82 | paper, www |
| domain tag | requests | systems | rel/request | items/request | reference |
|---|---|---|---|---|---|
| movielens/2018 | 6005 | 21 | 18.87 | 100.00 | paper, www |
| libraryThing/2018 | 7227 | 21 | 13.15 | 100.00 | paper, www |
| beerAdvocate/2018 | 17564 | 21 | 13.66 | 99.39 | paper, www |