# Evaluation of a Pipeline and its Components

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/deepset-ai/haystack-tutorials/blob/main/tutorials/05_Evaluation.ipynb)

To be able to make a statement about the quality of results a question-answering pipeline or any other pipeline in haystack produces, it is important to evaluate it. Furthermore, evaluation allows determining which components of the pipeline can be improved.
The results of the evaluation can be saved as CSV files, which contain all the information to calculate additional metrics later on or inspect individual predictions.

In [1]:
# from google.colab import drive
# drive.mount('/content/drive')

To start, install the latest release of Haystack with `pip`:

In [2]:
%%bash
pip install --upgrade pip
pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pip
  Downloading pip-22.3-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
Successfully installed pip-22.3
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting farm-haystack[colab]
  Cloning https://github.com/deepset-ai/haystack.git to /tmp/pip-install-yxknuqek/farm-haystack_7e9dca76c9f04eec9e50176c9a3b2c74
  Resolved https://github.com/deepset-ai/haystack.git to commit 3860bb9966d814a00612ec34890ad88126062d61
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started

  Running command git clone --filter=blob:none --quiet https://github.com/deepset-ai/haystack.git /tmp/pip-install-yxknuqek/farm-haystack_7e9dca76c9f04eec9e50176c9a3b2c74


## Logging

We configure how logging messages should be displayed and which log level should be used before importing Haystack.
Example log message:
INFO - haystack.utils.preprocessing -  Converting data/tutorial1/218_Olenna_Tyrell.txt
Default log level in basicConfig is WARNING so the explicit parameter is not necessary but can be changed easily:

In [3]:
import logging

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)

## Start an Elasticsearch server

You can start Elasticsearch on your local machine instance using Docker:

In [4]:
# Recommended: Start Elasticsearch using Docker via the Haystack utility function
from haystack.utils import launch_es

launch_es()



If Docker is not readily available in your environment (eg., in Colab notebooks), then you can manually download and execute Elasticsearch from source:

In [5]:
%%bash

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
chown -R daemon:daemon elasticsearch-7.9.2

In [6]:
%%bash --bg

sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch



Wait 30 seconds only to be sure Elasticsearch is ready before continuing:

In [7]:
import time

time.sleep(30)

## Preprocess the Evaluation Dataset (Dev and Test)



In [11]:
# Download evaluation data, which is a subset of Natural Questions development set containing 50 documents with one question per document and multiple annotated answers
doc_dev_dir = "data/dev_ViQuAD.json"
doc_test_dir = "data/dev_ViQuAD.json"

In [13]:
import os

from haystack.document_stores import ElasticsearchDocumentStore


# make sure these indices do not collide with existing ones, the indices will be wiped clean before data is inserted
doc_index = "orqa_docs"
label_index = "orqa_labels"

# Get the host where Elasticsearch is running, default to localhost
host = os.environ.get("ELASTICSEARCH_HOST", "localhost")

# Connect to Elasticsearch
document_store = ElasticsearchDocumentStore(
    host=host,
    username="",
    password="",
    index=doc_index,
    label_index=label_index,
    embedding_field="emb",
    embedding_dim=768,
    excluded_meta_data=["emb"],
)

# Evaluate on Dev dataset 
For Test dataset, replace filename in document_store.add_eval_data by doc_test_dir

In [14]:
from haystack.nodes import PreProcessor

# Add evaluation data to Elasticsearch Document Store
# We first delete the custom tutorial indices to not have duplicate elements
# and also split our documents into shorter passages using the PreProcessor
preprocessor = PreProcessor(
    split_by="word",
    split_length=200,
    split_overlap=0,
    split_respect_sentence_boundary=False,
    clean_empty_lines=False,
    clean_whitespace=False,
)
document_store.delete_documents(index=doc_index)
document_store.delete_documents(index=label_index)

# The add_eval_data() method converts the given dataset in json format into Haystack document and label objects. Those objects are then indexed in their respective document and label index in the document store. The method can be used with any dataset in SQuAD format.
document_store.add_eval_data(
    filename=doc_dev_dir,
    doc_index=doc_index,
    label_index=label_index,
    preprocessor=preprocessor,
)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

Preprocessing:   0%|          | 0/1 [00:00<?, ?docs/s]

There were conversion errors for question ids: ['uit_01__00256_6_3', 'uit_01__00256_42_3', 'uit_01__00256_76_1', 'uit_01__00256_77_3', 'uit_01__00084_16_4', 'uit_01__00084_22_1', 'uit_01__00084_25_4', 'uit_01__00084_29_1', 'uit_01__00084_32_1', 'uit_01__05450_35_3', 'uit_01__01275_21_3', 'uit_01__05956_8_1', 'uit_01__05956_20_3', 'uit_01__05956_27_4', 'uit_01__05956_28_5', 'uit_01__05956_30_5', 'uit_01__05956_32_5', 'uit_01__03245_20_5', 'uit_01__03245_32_1', 'uit_01__03245_61_3', 'uit_01__03245_65_4']


## Initialize the Two Components of an ExtractiveQAPipeline: Retriever and Reader

In [15]:
# Initialize Retriever
from haystack.nodes import BM25Retriever, TfidfRetriever

retriever = TfidfRetriever(document_store=document_store)
# retriever = BM25Retriever(document_store=document_store)

INFO:haystack.nodes.retriever.sparse:Found 627 candidate paragraphs from 627 docs in DB


In [16]:
# Initialize Reader
from haystack.nodes import FARMReader, TransformersReader

reader = FARMReader(model_name_or_path = "daotc2/xlmr-base-qa", top_k = 5,  return_no_answer=True, batch_size = 8)

# Define a pipeline consisting of the initialized retriever and reader
from haystack.pipelines import ExtractiveQAPipeline

pipeline = ExtractiveQAPipeline(reader=reader, retriever=retriever)

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


Downloading config.json:   0%|          | 0.00/709 [00:00<?, ?B/s]

INFO:haystack.modeling.model.language_model: * LOADING MODEL: 'daotc2/xlmr-base-qa' (Roberta)


Downloading pytorch_model.bin:   0%|          | 0.00/1.03G [00:00<?, ?B/s]

INFO:haystack.modeling.model.language_model:Auto-detected model language: english
INFO:haystack.modeling.model.language_model:Loaded 'daotc2/xlmr-base-qa' (Roberta model) from model hub.


Downloading tokenizer_config.json:   0%|          | 0.00/357 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/16.3M [00:00<?, ?B/s]

Downloading special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


## Evaluation of an ExtractiveQAPipeline
Here we evaluate retriever and reader in open domain fashion on the full corpus of documents i.e. a document is considered
correctly retrieved if it contains the gold answer string within it. The reader is evaluated based purely on the
predicted answer string, regardless of which document this came from and the position of the extracted span.

The generation of predictions is separated from the calculation of metrics. This allows you to run the computation-heavy model predictions only once and then iterate flexibly on the metrics or reports you want to generate.


In [None]:
from haystack.schema import EvaluationResult, MultiLabel


eval_labels = document_store.get_all_labels_aggregated(drop_negative_labels=True, drop_no_answers=True)
top_k = 1
eval_result = pipeline.eval(labels=eval_labels, params={"Retriever": {"top_k": top_k}})

# eval_results = [pipeline.eval(labels=eval_labels, params={"Retriever": {"top_k": k}}) for k [1,5,10,15,20,25,30]]

Inferencing Samples: 100%|██████████| 1/1 [00:02<00:00,  2.93s/ Batches]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 34.24 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 32.80 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.33 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.28 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 34.92 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 33.76 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 35.04 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 33.57 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 19.99 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 33.93 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 33.46 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 20.15 Batches/s]
Inferencing Samples: 100%|██████████| 1/1 [00:00<00

In [None]:
# # The EvaluationResult contains a pandas dataframe for each pipeline node.
# # That's why there are two dataframes in the EvaluationResult of an ExtractiveQAPipeline.

# retriever_result = eval_result["Retriever"]
# retriever_result.head()

In [None]:
# reader_result = eval_result["Reader"]
# reader_result.head()

In [None]:
# Save the evaluation result so that we can reload it later and calculate evaluation metrics without running the pipeline again.
# eval_result.save("../")

## Calculating Evaluation Metrics
Load an EvaluationResult to quickly calculate standard evaluation metrics for all predictions,
such as F1-score of each individual prediction of the Reader node or recall of the retriever.
To learn more about the metrics, see [Evaluation Metrics](https://haystack.deepset.ai/guides/evaluation#metrics-retrieval)

In [None]:
# saved_eval_result = EvaluationResult.load("../")
def print_result(eval_result):
  metrics = eval_result.calculate_metrics()
  # print(f'Retriever - Recall (single relevant document): {metrics["Retriever"]["recall_single_hit"]}')
  # print(f'Retriever - Recall (multiple relevant documents): {metrics["Retriever"]["recall_multi_hit"]}')
  # print(f'Retriever - Mean Reciprocal Rank: {metrics["Retriever"]["mrr"]}')
  # print(f'Retriever - Precision: {metrics["Retriever"]["precision"]}')
  # print(f'Retriever - Mean Average Precision: {metrics["Retriever"]["map"]}')
  print(f'Reader - F1-Score: {metrics["Reader"]["f1"]}')
  print(f'Reader - Exact Match: {metrics["Reader"]["exact_match"]}')

In [None]:
print_result(eval_result)

In [None]:
topKs = [5,10,15,20]

eval_results = [pipeline.eval(labels=eval_labels, params={"Retriever": {"top_k": k}}) for k in topKs]
for i, eval_res in enumerate(eval_results):
  print("Retriever top {}".format(topKs[i]))
  print_result(eval_res)

## Generating an Evaluation Report
A summary of the evaluation results can be printed to get a quick overview. It includes some aggregated metrics and also shows a few wrongly predicted examples.

In [None]:
# pipeline.print_eval_report(saved_eval_result)


## About us

This [Haystack](https://github.com/deepset-ai/haystack/) notebook was made with love by [deepset](https://deepset.ai/) in Berlin, Germany

We bring NLP to the industry via open source!  
Our focus: Industry specific language models & large scale QA systems.  
  
Some of our other work: 
- [German BERT](https://deepset.ai/german-bert)
- [GermanQuAD and GermanDPR](https://deepset.ai/germanquad)
- [FARM](https://github.com/deepset-ai/FARM)

Get in touch:
[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)

By the way: [we're hiring!](https://www.deepset.ai/jobs)