Skip to content

What We Need To Provide

Jason edited this page Jul 18, 2019 · 9 revisions

In order to execute an evaluation process, RRE requires three different inputs. This section details what they are and what is their role within the execution.

Corpus

We already described in the Domain Model section what is a corpus: it is the dataset/test collection which consists of representative data belonging to a given domain. One of the most important concern about each corpus we configure in RRE is the representativity of the dataset itself. This usually has a direct impact on the size, which should be not so small, and not so big.

In some functional scenarios, where the system manages different entity kinds (e.g. cars and smartphones), RRE allows to provide more than one dataset.

Although formally a dataset is provided in JSON files, the actual content depends on the target search platform. Apache Solr datasets are provided using a plain JSON format (no JSON Update Commands!) while Elasticsearch uses the pseudo-json bulk format.

The file must be a .json file or it can also be a compressed (.zip) file, which in this case must contain the .json file with the actual data. This is useful when you have huge dataset files.

If RRE detects a zip file, first it will uncompress it in a temporary folder (the folder indicated by the "java.io.tmpdir" system property) and then it proceed with the evaluation.

corpus

Configuration Sets

RRE encourages a configuration immutability approach. Even for internal iterations, each time we make a relevant change to the current configuration, it’s better to clone it and move forward with a new version.
In this way we’ll end up having the historical progression of our system, and RRE will be able to make comparisons.
The actual content of the configuration sets actually depends on the target search platform. For Apache Solr, each version folder (see the image below) contains one or more Solr cores. Elasticsearch instead, contains a JSON file (the "index-shape.json" in the picture below) which contains the index settings & mappings.

configuration_sets

Ratings

The ratings files, which are provided in JSON format, are the core input of RRE. Each ratings file is a structured set of judgements (i.e. relevant documents for a given query). It’s not a plain list (e.g. q1=d1,d3,d9,d24) because it is structured on top of the composite RRE domain model.

In the ratings file we can define all things that compose the RRE domain model: corpus, topics, query groups, and queries. Once arrived at query group level, we can list all documents which are relevant to all queries belonging to that group and, for each relevant document, we can express a judgment, which indicates how much a document is relevant. If a document is in this list, that means it is relevant for the current query.

The only mandatory element is "queries": topics and query groups are optional, and in case they are omitted RRE will create "unnamed" nodes which act as logical parents. This is useful when your evaluation model is simpler than that deep nested model. For example you want to declare only a set of query groups, without dividing them by topic.

The current implementation uses a three-level judgement, but this is one thing that most probably will be generalised in future versions:

  • 1 => marginally relevant
  • 2 => relevant
  • 3 => very relevant

within the "relevant_documents" node, you can provide the judgements in one of the following (alternative) ways:

"relevant_documents": {
   "docid1": {
       "gain": 2
   }, 
   "docid2": {
       "gain": 2
   }, 
   "docid3": {
       "gain": 3
   }, 
   "docid5": {
       "gain": 2
   }, 
   "docid99": {
       "gain": 3
   }
}
"relevant_documents": {
   "2": ["docid1", "docid2", "docid5"],
   "3": ["docid3", "docid99"]   
}

A couple of things:

  • "gain" can also be replaced with "rating"
  • only few metrics (e.g. AP or NDCG) use that "gain" value because the most part of the currently available metrics (e.g. Precision, Recall, P@k) are only interested in a binary judgment (relevant or not).

ratings

A more complete example of a ratings file:

{
  "index": "<string>",
  "corpora_field": "<string>",
  "id_field": "<string>",
  "topics": [
    {
      "description": "<string>",
      "query_groups": [
        {
          "name": "<string>",
          "queries": [
            {
              "template": "<string>",
              "placeholders": {
                "$key": "<value>",
              }
            }
          ],
          "relevant_documents": [
            {
              "document_id": {
                "gain": "<number>"
              }
            }
          ]
        }
      ]
    }
  ],
  "query_groups": [
    {
      "name": "<string>",
      "queries": [
        {
          "template": "<string>",
          "placeholders": {
            "$key": "<value>",
          }
        }
      ],
      "relevant_documents": [
        {
          "document_id": {
            "gain": "<number>"
          }
        }
      ]
    }
  ],
  "queries": [
    {
      "template": "<string>",
      "placeholders": {
        "$key": "<value>",
      }
    }
  ]
}
  • index: index name in ElasticSearch/collection name in Solr
  • corpora_field: not required for external Solr/ElasticSearch, corresponds to corpus filename
  • id_field: Field in schema to represent as document ID
  • topics: optional list of topics and/or query groups
    • description: Title of a topic used in reporting ouotput
    • query_groups: list of queries that are grouped with the same name and topic
      • name: query name used for the reporting
      • queries: list of queries to execute for a topic
        • template: String name for the template this query uses
        • placeholders: Object of key-value pairs to substitute in the template
    • relevant_documents: list objects with mapping documents to gain or relevance values
  • query_groups: list of objects with related queries. Can exist outside of topics, is optional
  • queries: required* list of objects for template and placeholder substitutions for evaluations.
    • *If topics and query_groups do not exist

Query Templates

For each query (or for each query group) it’s possible to define a query template, which is a kind of query shape containing one or more placeholders. Then, in the ratings file you can reference one of those defined templates and you can provide a value for each placeholder.
Templates have been introduced in order to:

  • allow a common query management between search platforms
  • define complex queries
  • define runtime parameters that cannot be statically determined (e.g. filters)

In the picture below you can see three examples: the first two are examples of Solr queries, while the third is using Elasticsearch.

query_templates

You may also create multiple version folders (v1.0, v1.1, v1.2, etc) and keep different query template versions in those folders. This will run evaluations for each query template version and allows you easily to compare across query template changes.