# Deepset Haystack Pipeline Tests
https://haystack.deepset.ai/tutorials/11_pipelines

In [2]:
%%bash

nvidia-smi

Mon Mar 13 14:44:34 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   55C    P0    25W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
%%bash

pip install --upgrade pip
pip install git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab]

apt install libgraphviz-dev
pip install pygraphviz

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pip
  Downloading pip-23.0.1-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 74.9 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 22.0.4
    Uninstalling pip-22.0.4:
      Successfully uninstalled pip-22.0.4
Successfully installed pip-23.0.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting farm-haystack[colab]
  Cloning https://github.com/deepset-ai/haystack.git to /tmp/pip-install-srxr0olo/farm-haystack_2a4f69465f024913bf88837e1378a585
  Resolved https://github.com/deepset-ai/haystack.git to commit fd3f3143d4010de4c2da84e12bfc8613c698be87
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to buil

DEPRECATION: git+https://github.com/deepset-ai/haystack.git#egg=farm-haystack[colab] contains an egg fragment with a non-PEP 508 name pip 25.0 will enforce this behaviour change. A possible replacement is to use the req @ url syntax, and remove the egg fragment. Discussion can be found at https://github.com/pypa/pip/issues/11617
  Running command git clone --filter=blob:none --quiet https://github.com/deepset-ai/haystack.git /tmp/pip-install-srxr0olo/farm-haystack_2a4f69465f024913bf88837e1378a585




In [4]:
%pip install farm-haystack

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[0m

# Logging

In [5]:
import logging

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)

## Start an Elasticsearch server

In [6]:
# Recommended: Start Elasticsearch using Docker via the Haystack utility function
from haystack.utils import launch_es

launch_es()

INFO:haystack.telemetry_2:Haystack sends anonymous usage data to understand the actual usage and steer dev efforts towards features that are most meaningful to users. You can opt-out at anytime by manually setting the environment variable HAYSTACK_TELEMETRY_ENABLED as described for different operating systems in the [documentation page](https://docs.haystack.deepset.ai/docs/telemetry#how-can-i-opt-out). More information at [Telemetry](https://docs.haystack.deepset.ai/docs/telemetry).


## Start an Elasticsearch server in Colab
Since docker is not available in colab, manually download and execute Elasticsearch from source.

In [7]:
%%bash

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.9.2-linux-x86_64.tar.gz -q
tar -xzf elasticsearch-7.9.2-linux-x86_64.tar.gz
chown -R daemon:daemon elasticsearch-7.9.2

In [8]:
%%bash --bg

sudo -u daemon -- elasticsearch-7.9.2/bin/elasticsearch

# Initialization
 let’s fetch some data (in this case, pages from the Game of Thrones wiki) and prepare it so that it can be used indexed into our DocumentStore

In [9]:
from haystack.utils import fetch_archive_from_http, convert_files_to_docs, clean_wiki_text

# Download and prepare data - 517 Wikipedia articles for Game of Thrones
doc_dir = "data/tutorial11"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt11.zip"
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

# convert files to dicts containing documents that can be indexed to our datastore
got_docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)


INFO:haystack.utils.import_utils:Fetching from https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt11.zip to 'data/tutorial11'
INFO:haystack.utils.preprocessing:Converting data/tutorial11/331_Bran_Stark.txt
INFO:haystack.utils.preprocessing:Converting data/tutorial11/20_Light_of_the_Seven.txt
INFO:haystack.utils.preprocessing:Converting data/tutorial11/446_A_Golden_Crown.txt
INFO:haystack.utils.preprocessing:Converting data/tutorial11/511_After_the_Thrones.txt
INFO:haystack.utils.preprocessing:Converting data/tutorial11/371_Cersei_Lannister.txt
INFO:haystack.utils.preprocessing:Converting data/tutorial11/504_List_of_A_Song_of_Ice_and_Fire_video_games.txt
INFO:haystack.utils.preprocessing:Converting data/tutorial11/210_The_Mountain_and_the_Viper.txt
INFO:haystack.utils.preprocessing:Converting data/tutorial11/57_The_Laws_of_Gods_and_Men.txt
INFO:haystack.utils.preprocessing:Converting data/tutorial11/408_The_Last_of_the_Starks.txt
INFO:hayst

Here we initialize the core components that we will be gluing together using the Pipeline class. We have a DocumentStore, an BM25Retriever and a FARMReader. These can be combined to create a classic Retriever-Reader pipeline that is designed to perform Open Domain Question Answering.

In [10]:
import os
import time

from haystack.document_stores import ElasticsearchDocumentStore
from haystack.nodes import BM25Retriever, EmbeddingRetriever, FARMReader

# Wait 30 seconds only to be sure Elasticsearch is ready before continuing
time.sleep(30)

# Get the host where Elasticsearch is running, default to localhost
host = os.environ.get("ELASTICSEARCH_HOST", "localhost")

# Initialize DocumentStore and index documents
document_store = ElasticsearchDocumentStore(host=host)
document_store.delete_documents()
document_store.write_documents(got_docs)

# Initialize Sparse retriever
bm25_retriever = BM25Retriever(document_store=document_store)

# Initialize dense retriever
embedding_retriever = EmbeddingRetriever(
    document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1"
)
document_store.update_embeddings(embedding_retriever, update_existing_embeddings=False)

# Initialize reader
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2")


INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

INFO:haystack.nodes.retriever.dense:Init retriever using embeddings of model sentence-transformers/multi-qa-mpnet-base-dot-v1


Downloading (…)16ebc/.gitattributes:   0%|          | 0.00/737 [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)b6b5d16ebc/README.md:   0%|          | 0.00/8.65k [00:00<?, ?B/s]

Downloading (…)b5d16ebc/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)ebc/data_config.json:   0%|          | 0.00/25.5k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)16ebc/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)6ebc/train_script.py:   0%|          | 0.00/13.9k [00:00<?, ?B/s]

Downloading (…)b6b5d16ebc/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)5d16ebc/modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

INFO:haystack.document_stores.search_engine:Updating embeddings for all 2357 docs without embeddings...


Updating embeddings:   0%|          | 0/2357 [00:00<?, ? Docs/s]

Batches:   0%|          | 0/74 [00:00<?, ?it/s]

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1
INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


Downloading (…)lve/main/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

INFO:haystack.modeling.model.language_model: * LOADING MODEL: 'deepset/roberta-base-squad2' (Roberta)


Downloading pytorch_model.bin:   0%|          | 0.00/496M [00:00<?, ?B/s]

INFO:haystack.modeling.model.language_model:Auto-detected model language: english
INFO:haystack.modeling.model.language_model:Loaded 'deepset/roberta-base-squad2' (Roberta model) from model hub.


Downloading (…)okenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


## Prebuilt Pipelines
Haystack features many prebuilt pipelines that cover common tasks. Here we have an ExtractiveQAPipeline (the successor to the now deprecated Finder class).

In [11]:
from haystack.pipelines import ExtractiveQAPipeline
from haystack.utils import print_answers

# Prebuilt pipeline
p_extractive_premade = ExtractiveQAPipeline(reader=reader, retriever=bm25_retriever)
res = p_extractive_premade.run(
    query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
)
print_answers(res, details="minimum")


Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

'Query: Who is the father of Arya Stark?'
'Answers:'
[   {   'answer': 'Eddard',
        'context': 's Nymeria after a legendary warrior queen. She travels '
                   "with her father, Eddard, to King's Landing when he is made "
                   'Hand of the King. Before she leaves,'},
    {   'answer': 'Ned',
        'context': '\n'
                   '====Season 1====\n'
                   'Arya accompanies her father Ned and her sister Sansa to '
                   "King's Landing. Before their departure, Arya's "
                   'half-brother Jon Snow gifts A'},
    {   'answer': 'Lord Eddard Stark',
        'context': 'ark daughters.\n'
                   'During the Tourney of the Hand to honour her father Lord '
                   'Eddard Stark, Sansa Stark is enchanted by the knights '
                   'performing in the event.'},
    {   'answer': 'Joffrey',
        'context': 'laying with one of his wooden toys.\n'
                   "After Eddard discovers t

If we want to just do the retrieval step, we can use a DocumentSearchPipeline

In [12]:
from haystack.pipelines import DocumentSearchPipeline
from haystack.utils import print_documents

p_retrieval = DocumentSearchPipeline(bm25_retriever)
res = p_retrieval.run(query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}})
print_documents(res, max_text_len=200)


Query: Who is the father of Arya Stark?

{   'content': '\n'
               '===In the Riverlands===\n'
               'The Stark army reaches the Twins, a bridge stronghold '
               'controlled by Walder Frey, who agrees to allow the army to '
               'cross the river and to commit his troops in return for Robb '
               'an...',
    'name': '450_Baelor.txt'}

{   'content': '\n'
               '===On the Kingsroad===\n'
               'City Watchmen search the caravan for Gendry but are turned '
               'away by Yoren. Gendry tells Arya Stark that he knows she is a '
               'girl, and she reveals she is actually Arya Stark after ...',
    'name': '224_The_Night_Lands.txt'}

{   'content': '\n'
               "===''A Game of Thrones''===\n"
               'Sansa Stark begins the novel by being betrothed to Crown '
               'Prince Joffrey Baratheon, believing Joffrey to be a gallant '
               'prince. While Joffrey and Sansa are walki

Or if we want to use a Generator instead of a Reader, we can initialize a GenerativeQAPipeline like this:

In [13]:
from haystack.pipelines import GenerativeQAPipeline
from haystack.nodes import RAGenerator

# We set this to True so that the document store returns document embeddings with each document
# This is needed by the Generator
document_store.return_embedding = True

# Initialize generator
rag_generator = RAGenerator()

# Generative QA
p_generator = GenerativeQAPipeline(generator=rag_generator, retriever=embedding_retriever)
res = p_generator.run(query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}})
print_answers(res, details="minimum")

# We are setting this to False so that in later pipelines,
# we get a cleaner printout
document_store.return_embedding = False

INFO:haystack.modeling.utils:Using devices: CUDA:0 - Number of GPUs: 1


Downloading (…)lve/main/config.json:   0%|          | 0.00/4.60k [00:00<?, ?B/s]



Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)_tokenizer/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'DPRQuestionEncoderTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'DPRQuestionEncoderTokenizerFast'.


Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)tokenizer/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)tokenizer/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'BartTokenizer'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'RagTokenizer'. 
The class this function is called from is 'BartTokenizerFast'.


Downloading pytorch_model.bin:   0%|          | 0.00/2.06G [00:00<?, ?B/s]

Some weights of the model checkpoint at facebook/rag-token-nq were not used when initializing RagTokenForGeneration: ['rag.question_encoder.question_encoder.bert_model.pooler.dense.bias', 'rag.question_encoder.question_encoder.bert_model.pooler.dense.weight']
- This IS expected if you are initializing RagTokenForGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RagTokenForGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RagTokenForGeneration were not initialized from the model checkpoint at facebook/rag-token-nq and are newly initialized: ['rag.generator.lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictio

Batches:   0%|          | 0/1 [00:00<?, ?it/s]



'Query: Who is the father of Arya Stark?'
'Answers:'
[{'answer': ' robb stark'}, {'answer': ' rhaegar targaryen'}]


Haystack features prebuilt pipelines to do:

just document search (DocumentSearchPipeline),
document search with summarization (SearchSummarizationPipeline)
generative QA (GenerativeQAPipeline)
FAQ style QA (FAQPipeline)
translated search (TranslationWrapperPipeline) To find out more about these pipelines, have a look at our documentation
With any Pipeline, whether prebuilt or custom constructed, you can save a diagram showing how all the components are connected.



In [14]:
p_extractive_premade.draw("pipeline_extractive_premade.png")
p_retrieval.draw("pipeline_retrieval.png")
p_generator.draw("pipeline_generator.png")

# Custom Pipelines
Now we are going to rebuild the ExtractiveQAPipelines using the generic Pipeline class. We do this by adding the building blocks that we initialized as nodes in the graph.

In [16]:
from haystack.pipelines import Pipeline


# Custom built extractive QA pipeline
p_extractive = Pipeline()
p_extractive.add_node(component=bm25_retriever, name="Retriever", inputs=["Query"])
p_extractive.add_node(component=reader, name="Reader", inputs=["Retriever"])

# Now we can run it
res = p_extractive.run(
    query="Who is the father of Arya Stark?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5}}
)
print_answers(res, details="minimum")

p_extractive.draw("pipeline_extractive.png")

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

'Query: Who is the father of Arya Stark?'
'Answers:'
[   {   'answer': 'Eddard',
        'context': 's Nymeria after a legendary warrior queen. She travels '
                   "with her father, Eddard, to King's Landing when he is made "
                   'Hand of the King. Before she leaves,'},
    {   'answer': 'Ned',
        'context': '\n'
                   '====Season 1====\n'
                   'Arya accompanies her father Ned and her sister Sansa to '
                   "King's Landing. Before their departure, Arya's "
                   'half-brother Jon Snow gifts A'},
    {   'answer': 'Lord Eddard Stark',
        'context': 'ark daughters.\n'
                   'During the Tourney of the Hand to honour her father Lord '
                   'Eddard Stark, Sansa Stark is enchanted by the knights '
                   'performing in the event.'},
    {   'answer': 'Joffrey',
        'context': 'laying with one of his wooden toys.\n'
                   "After Eddard discovers t

Here we use a JoinDocuments node so that the predictions from each retriever can be merged together.

In [17]:
from haystack.nodes import JoinDocuments

# Create ensembled pipeline
p_ensemble = Pipeline()
p_ensemble.add_node(component=bm25_retriever, name="ESRetriever", inputs=["Query"])
p_ensemble.add_node(component=embedding_retriever, name="EmbeddingRetriever", inputs=["Query"])
p_ensemble.add_node(
    component=JoinDocuments(join_mode="concatenate"), name="JoinResults", inputs=["ESRetriever", "EmbeddingRetriever"]
)
p_ensemble.add_node(component=reader, name="Reader", inputs=["JoinResults"])

# Uncomment the following to generate the pipeline image
p_ensemble.draw("pipeline_ensemble.png")

# Run pipeline
res = p_ensemble.run(
    query="Who is the father of Arya Stark?", params={"EmbeddingRetriever": {"top_k": 5}, "ESRetriever": {"top_k": 5}}
)
print_answers(res, details="minimum")

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

'Query: Who is the father of Arya Stark?'
'Answers:'
[   {   'answer': 'Ned',
        'context': '\n'
                   '====Season 1====\n'
                   'Arya accompanies her father Ned and her sister Sansa to '
                   "King's Landing. Before their departure, Arya's "
                   'half-brother Jon Snow gifts A'},
    {   'answer': 'Eddard and Catelyn Stark',
        'context': 'tark ===\n'
                   'Arya Stark is the third child and younger daughter of '
                   'Eddard and Catelyn Stark. She serves as a POV character '
                   "for 33 chapters throughout ''A "},
    {   'answer': 'Lord Eddard Stark',
        'context': 'ark daughters.\n'
                   'During the Tourney of the Hand to honour her father Lord '
                   'Eddard Stark, Sansa Stark is enchanted by the knights '
                   'performing in the event.'},
    {   'answer': 'Lord Eddard Stark',
        'context': "Game of Thrones'', Arya is the t

# Custom Nodes

In [19]:
from haystack import BaseComponent
from typing import Optional, List


class CustomNode(BaseComponent):
    # Number of outputs. >1 only required, if we have a decision node
    outgoing_edges = 1

    def run(self, query: str, my_optional_param: Optional[int]):
        # process the inputs
        output = {"my_output": ...}
        return output, "output_1"

    def run_batch(self, queries: List[str], my_optional_param: Optional[int]):
        # process the inputs
        output = {"my_output": ...}
        return output, "output_1"

# Decision Node

In [20]:
class CustomQueryClassifier(BaseComponent):
    outgoing_edges = 2

    def run(self, query: str):
        if "?" in query:
            return {}, "output_2"
        else:
            return {}, "output_1"

    def run_batch(self, queries: List[str]):
        split = {"output_1": {"queries": []}, "output_2": {"queries": []}}
        for query in queries:
            if "?" in query:
                split["output_2"]["queries"].append(query)
            else:
                split["output_1"]["queries"].append(query)

        return split, "split"


# Here we build the pipeline
p_classifier = Pipeline()
p_classifier.add_node(component=CustomQueryClassifier(), name="QueryClassifier", inputs=["Query"])
p_classifier.add_node(component=bm25_retriever, name="ESRetriever", inputs=["QueryClassifier.output_1"])
p_classifier.add_node(component=embedding_retriever, name="EmbeddingRetriever", inputs=["QueryClassifier.output_2"])
p_classifier.add_node(component=reader, name="QAReader", inputs=["ESRetriever", "EmbeddingRetriever"])
# Uncomment the following to generate the pipeline image
p_classifier.draw("pipeline_classifier.png")

# Run only the dense retriever on the full sentence query
res_1 = p_classifier.run(query="Who is the father of Arya Stark?")
print("Embedding Retriever Results" + "\n" + "=" * 15)
print_answers(res_1)

# Run only the sparse retriever on a keyword based query
res_2 = p_classifier.run(query="Arya Stark father")
print("ES Results" + "\n" + "=" * 15)
print_answers(res_2)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Embedding Retriever Results
'Query: Who is the father of Arya Stark?'
'Answers:'
[   <Answer {'answer': 'Eddard and Catelyn Stark', 'type': 'extractive', 'score': 0.9058835506439209, 'context': "tark ===\nArya Stark is the third child and younger daughter of Eddard and Catelyn Stark. She serves as a POV character for 33 chapters throughout ''A ", 'offsets_in_document': [{'start': 74, 'end': 98}], 'offsets_in_context': [{'start': 63, 'end': 87}], 'document_ids': ['965789c741c68963042e85b6e7b89757'], 'meta': {'name': '30_List_of_A_Song_of_Ice_and_Fire_characters.txt'}}>,
    <Answer {'answer': 'Lord Eddard Stark', 'type': 'extractive', 'score': 0.8793494701385498, 'context': "Game of Thrones'', Arya is the third child and younger daughter of Lord Eddard Stark and his wife Lady Catelyn Stark.  She is tomboyish, headstrong, f", 'offsets_in_document': [{'start': 419, 'end': 436}], 'offsets_in_context': [{'start': 67, 'end': 84}], 'document_ids': ['2ee56bdd46dfd30b23f91bcc046456a4'], 'meta':

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

ES Results
'Query: Arya Stark father'
'Answers:'
[   <Answer {'answer': 'Eddard', 'type': 'extractive', 'score': 0.9085894823074341, 'context': "s Nymeria after a legendary warrior queen. She travels with her father, Eddard, to King's Landing when he is made Hand of the King. Before she leaves,", 'offsets_in_document': [{'start': 147, 'end': 153}], 'offsets_in_context': [{'start': 72, 'end': 78}], 'document_ids': ['ba2a8e87ddd95e380bec55983ee7d55f'], 'meta': {'name': '43_Arya_Stark.txt'}}>,
    <Answer {'answer': 'Ned', 'type': 'extractive', 'score': 0.7877876162528992, 'context': "\n====Season 1====\nArya accompanies her father Ned and her sister Sansa to King's Landing. Before their departure, Arya's half-brother Jon Snow gifts A", 'offsets_in_document': [{'start': 46, 'end': 49}], 'offsets_in_context': [{'start': 46, 'end': 49}], 'document_ids': ['180c2a6b36369712b361a80842e79356'], 'meta': {'name': '43_Arya_Stark.txt'}}>,
    <Answer {'answer': 'Lord Eddard Stark', 'type': 'extract

## Debug Nodes

In [21]:
# 1) You can set the `debug` attribute of a given node.
bm25_retriever.debug = True

# 2) You can provide `debug` as a parameter when running your pipeline
result = p_classifier.run(query="Who is the father of Arya Stark?", params={"ESRetriever": {"debug": True}})

# 3) You can provide the `debug` paramter to all nodes in your pipeline
result = p_classifier.run(query="Who is the father of Arya Stark?", params={"debug": True})

result["_debug"]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]

{'Query': {'input': {'debug': True}, 'output': {}, 'exec_time_ms': 0.1},
 'QueryClassifier': {'input': {'query': 'Who is the father of Arya Stark?',
   'debug': True},
  'output': {},
  'exec_time_ms': 0.1},
 'EmbeddingRetriever': {'input': {'root_node': 'Query',
   'query': 'Who is the father of Arya Stark?',
   'debug': True},
  'output': {'documents': [<Document: {'content': '\n=== Background ===\nArya is the third child and younger daughter of Eddard and Catelyn Stark and is nine years old at the beginning of the book series.  She has five siblings: an older brother Robb, an older sister Sansa, two younger brothers Bran and Rickon, and an older illegitimate half-brother, Jon Snow.', 'content_type': 'text', 'score': 0.5609863333292572, 'meta': {'name': '43_Arya_Stark.txt'}, 'id_hash_keys': ['content'], 'embedding': None, 'id': 'd7a98cb66f592540fa7de20bf46a5e64'}>,
    <Document: {'content': "\n===Arya Stark===\n'''Arya Stark''' portrayed by Maisie Williams. Arya Stark of House Stark

## YAML Configs
A full Pipeline can be defined in a YAML file and simply loaded. Having your pipeline available in a YAML is particularly useful when you move between experimentation and production environments. Just export the YAML from your notebook / IDE and import it into your production environment. It also helps with version control of pipelines, allows you to share your pipeline easily with colleagues, and simplifies the configuration of pipeline parameters in production.

It consists of two main sections: you define all objects (e.g. a reader) in components and then stick them together to a pipeline in pipelines. You can also set one component to be multiple nodes of a pipeline or to be a node across multiple pipelines. It will be loaded just once in memory and therefore doesn’t hurt your resources more than actually needed.

The contents of a YAML file should look something like this:

version: '0.7'
components:    # define all the building-blocks for Pipeline
- name: MyReader       # custom-name for the component; helpful for visualization & debugging
  type: FARMReader    # Haystack Class name for the component
  params:
    no_ans_boost: -10
    model_name_or_path: deepset/roberta-base-squad2
- name: MyESRetriever
  type: BM25Retriever
  params:
    document_store: MyDocumentStore    # params can reference other components defined in the YAML
    custom_query: null
- name: MyDocumentStore
  type: ElasticsearchDocumentStore
  params:
    index: haystack_test
pipelines:    # multiple Pipelines can be defined using the components from above
- name: my_query_pipeline    # a simple extractive-qa Pipeline
  nodes:
  - name: MyESRetriever
    inputs: [Query]
  - name: MyReader
    inputs: [MyESRetriever]

To load, simply call:

yaml_pipeline = Pipeline.load_from_yaml(Path("sample.yaml"))