Return intermediate nodes output in pipelines #1558

ZanSara · 2021-10-05T08:01:26Z

Related to #1193

Proposed changes:
These changes make nodes capable of recording some debug information during execution. This is accomplished by managing one extra key in the output dictionary, called _debug.

By default, the data collected includes the input, the output and the logs produced by nodes (all or some of the nodes, depending on the configuration). However, a node can choose to add its own debug information under _debug, and such information will be preserved. The content of each node's _debug entry will be in the final response (grouped by producer, see sample output below).

Note that the content of _debug is generally passed from node to node, but to avoid infinite recursion it is removed from the output that is stored in the_debug key itself (see example output below)

Details:

Modifies BaseComponent.run() to make every node accept debug and debug_logs as parameters, and if detected, saves them in the instance state as attributes. This will enable users to set these values through pipeline.run(params={'node_name':{'debug': True}}).
Modifies pipeline.run() to accept debug and debug_logs as attributes, and to apply them to each node's parameters, overwriting whatever was set in the params (see the example below).
Modifies BaseComponent._dispatch_run() to deal properly with the _debug key's content.
For the logs collection, introduces an implicit decorator to BaseComponent.run() that, if it detects the attribute debug, set to True, in the state of the current object, will record the debug logs of the execution of a specific node and push them to their _debug. These logs are also printed to the console debug_logs is also defined and set to True.

Example code:

from haystack.document_store import ElasticsearchDocumentStore
from haystack.retriever.sparse import ElasticsearchRetriever
from haystack.retriever.dense import DensePassageRetriever
from haystack.reader import FARMReader
from haystack.pipeline import Pipeline, JoinDocuments

def main():

    document_store_with_docs = ElasticsearchDocumentStore()
    es_retriever = ElasticsearchRetriever(document_store=document_store_with_docs)
    reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2")

    pipeline = Pipeline()
    pipeline.add_node(component=es_retriever, name="ESRetriever", inputs=["Query"])
    pipeline.add_node(component=reader, name="Reader", inputs=["ESRetriever"])

    prediction = pipeline.run(
        query="Who lives in Berlin?", 
        params={
            # New API: `debug` and `debug_logs` can be passed to single nodes as parameters
            # Note that subclasses of `BaseComponent` don't need to explicitly support them for this to work
            "ESRetriever": {"top_k": 10, "debug": True, "debug_logs": True},
            "Reader": {"top_k": 3}
        },
        # New API: the debug parameters can also be passed to `run()` directly
        # They will override any node-specific setting
        debug=True,
        debug_logs=True,
    )

    # Note: printing in JSON helps detecting circular references (`pprint` instead can deal with them)
    print("############### DEBUG LOGS #####################")
    import json
    print(json.dumps(response, default=str, indent=4),"\n")
    print("################################################")

if __name__ == "__main__":
    main()

Example output:

{
    "answers": [],
    "_debug": {
        "ESRetriever": {
            "logs": [
                "Retriever query: {'size': '10', 'query': {'bool...",
                "POST http://localhost:9200/document/_se...",
                "> {\"size\":\"10\",\"query\":{\"bool\":{\"shou...",
                "< {\"took\":3,\"timed_out\":false,\"_shards\"...",
                "Retrieved documents with IDs: [67341323..."
            ],
            "input": {
                "root_node": "Query",
                "query": "Who lives in Berlin?",
                "top_k": 10,
                "debug": true
            },
            "output": {
                "documents": []
            }
        },
        "Reader": {
            "logs": [],
            "input": {
                "documents": [],
                "query": "Who lives in Berlin?",
                "top_k": 3,
                "debug": true
            },
            "output": {
                "answers": []
            }
        }
    },
    "documents": [],
    "root_node": "Query",
    "params": {
        "ESRetriever": {
            "top_k": 10
        },
        "Reader": {
            "top_k": 3
        },
        "Query": {
            "debug": true
        }
    },
    "query": "Who lives in Berlin?",
    "node_id": "Reader"
}

Status (please check what you already did):

First draft (up for discussions & feedback)
Final code
Added tests
Updated documentation

…ssue (they now run locally)

…nd to make the suite more structured)

…in it

…and output of the node alongside it

haystack/pipeline.py

…ry node when defined

…ault would override the params values

…e specific settings

…s infinite recursion

…levant info

tholor

Already looking very good. Left three minor comments

haystack/pipeline.py

haystack/schema.py

haystack/__init__.py

… debug_pipelines

tholor

LGTM

This reverts commit e6503a9.

ZanSara · 2021-10-07T17:13:43Z

Hey I'll put back ImMemoryLogger in schema.py for now, because in utils.py it causes a circular import issue :/

… debug_pipelines

ZanSara added 13 commits September 30, 2021 15:31

Add rest api endpoint to delete documents by filter.

db38a3e

Remove parametrization of rest api test to see if they solve the CI i…

5398e60

…ssue (they now run locally)

Make the paths in rest_api/config.py absolute

e67e5c7

Fix path to pipelines.yaml

e1903ce

Restructuring test_rest_api.py to be able to test only my endpoint (a…

bbd05b2

…nd to make the suite more structured)

Convert DELETE /documents into POST /documents/delete_by_filters

681a803

Merge branch 'master' into debug_pipelines

09bfef0

First rough implementation

a6729c1

Merge branch 'master' into debug_pipelines

1c3b9cc

Add a flag to dump the debug logs to the console as well

b940693

Add type to the debug dictionary

4468a45

Typing run() and _dispatch_run() to please mypy

a714ad4

Mypy requires more types

42d6724

ZanSara linked an issue Oct 5, 2021 that may be closed by this pull request

Return Intermediate Node Output #1193

Closed

ZanSara self-assigned this Oct 5, 2021

ZanSara requested a review from brandenchan October 5, 2021 08:48

ZanSara added 4 commits October 5, 2021 11:08

Clarify docstrings a bit

75f80f5

Allow enable_debug and console_debug to be passed as arguments of run()

736d3fd

Avoid overwriting _debug, later we might want to store other objects …

be670b9

…in it

Put logs under a separate key of the _debug dictionary and add input …

36f6b55

…and output of the node alongside it

brandenchan reviewed Oct 5, 2021

View reviewed changes

haystack/pipeline.py Outdated Show resolved Hide resolved

ZanSara added 7 commits October 6, 2021 12:25

Introduce global arguments for pipeline.run() that get applied to eve…

ce1aaf4

…ry node when defined

Change default values of debug variables to None, otherwise their def…

08092b3

…ault would override the params values

Remove unused import

da9d586

more typing for mypy

0c77b29

Remove a potential infinite recursion on the overridden __getattr__

6f244e6

Add a simple test for the debug attributes

24ceba8

Add test and fix small issue on global debug=False not overriding nod…

53a582e

…e specific settings

ZanSara changed the title ~~WIP Return intermediate nodes output in pipelines~~ Return intermediate nodes output in pipelines Oct 6, 2021

Do not append the output of the last node in the _debug key, it cause…

97fe6b9

…s infinite recursion

ZanSara added 5 commits October 6, 2021 16:37

Fix tests

bb6b63d

Removed recursion between _debug and output and fixed tests

8c61e8c

Apparently node_input can be None :/

96c1418

Move the input/output collection into _dispatch_run to gather only re…

a03eee7

…levant info

Minor cleanup

bfcea46

tholor requested changes Oct 7, 2021

View reviewed changes

haystack/pipeline.py Show resolved Hide resolved

haystack/schema.py Show resolved Hide resolved

haystack/__init__.py Show resolved Hide resolved

ZanSara and others added 6 commits October 7, 2021 18:19

Add partial Pipeline.run() docstring

4b0a28c

Move InMemoryLogger into utils.py

e6503a9

Add latest docstring and tutorial changes

443146b

Add io import to utils.py

9308faa

Merge branch 'debug_pipelines' of github.com:deepset-ai/haystack into…

8f17063

… debug_pipelines

Update docstring

a5ca35b

tholor approved these changes Oct 7, 2021

View reviewed changes

github-actions bot and others added 2 commits October 7, 2021 16:50

Add latest docstring and tutorial changes

c1c7d6b

Revert "Move InMemoryLogger into utils.py"

d893cf7

This reverts commit e6503a9.

ZanSara and others added 2 commits October 7, 2021 19:14

Merge branch 'debug_pipelines' of github.com:deepset-ai/haystack into…

d18a630

… debug_pipelines

Add latest docstring and tutorial changes

66c771b

ZanSara merged commit 54947cb into master Oct 7, 2021

ZanSara deleted the debug_pipelines branch October 7, 2021 20:13

tholor mentioned this pull request Oct 12, 2021

Add debug and debug_logs params to standard pipelines #1586

Merged

4 tasks

ZanSara mentioned this pull request Oct 13, 2021

Same kind of answers are not generating from similar kind of documents #1308

Closed

tholor added the topic:pipeline label Dec 3, 2021

julian-risch mentioned this pull request Mar 10, 2022

Reintroduce debug as a valid global key for Pipeline's params #2298

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return intermediate nodes output in pipelines #1558

Return intermediate nodes output in pipelines #1558

ZanSara commented Oct 5, 2021 •

edited

Loading

tholor left a comment

tholor left a comment

ZanSara commented Oct 7, 2021

Return intermediate nodes output in pipelines #1558

Return intermediate nodes output in pipelines #1558

Conversation

ZanSara commented Oct 5, 2021 • edited Loading

Related to #1193

tholor left a comment

Choose a reason for hiding this comment

tholor left a comment

Choose a reason for hiding this comment

ZanSara commented Oct 7, 2021

ZanSara commented Oct 5, 2021 •

edited

Loading