Refactor communication between Pipeline Components #1321

oryx1729 · 2021-08-05T15:11:13Z

Passing data between Components

Handling of `kwargs`

The nodes' run() methods no longer need to deal with kwargs.

The signature of the run() should include input data primitives(query, documents, answers, ...) and params (top_k, filters, ...) that it needs to work with.

Passing Parameters

Parameters for the pipeline execution must be passed as a param dict in the pipeline.run() method. By default, Pipeline attempts to match all params to each node.

To target a specific node, it must be explicitly specified by the node name. For instance, pipeline.run(query="Why?")

Validating Parameters

All targeted parameters(as described above) are now validated. For instance, supplying top_p instead of top_k will now return an error.

Adding "debug" information

Nodes can return a _debug key in the output that gets appended in the final output of a request. For example,

class MyComponent:
    def run(....):
        output = {..., "_debug": "my debug information ..."}  # the _debug key can also be a dict
        return output, "output_1"

will get transformed in final response like,

{"answers": ..., "_debug": {"MyComponent": "my debug information ..."}}

⚠️ Breaking Changes

component params like top_k, no_ans_boost for Pipeline.run() must be passed in a params dict
component specific top_ks like top_k_reader, top_k_retriever are now replaced with top_k. To disambiguate, the params can be "targeted" to a specific node. For instance, pipeline.run(query="Why?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5})

tholor

Nice. Good first step :)

Left a few comments

the "graph validation" part before calling pipe.run() is still missing

haystack/schema.py

haystack/reader/base.py

review-notebook-app · 2021-08-20T10:20:38Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

tholor

Looking great!

Left a few minor comments. Mostly around documentation.

I think it would also make sense to add a few more test cases (e.g. testing the _debug field).
Did you test the tutorials? especially the ones on pipeline (11) and query classifier (14)? I only had a superficial review there...
We need to update documentation (especially the usage page in the docs + the pipeline tutorial). I am fine with doing that in a separate PR, but we should do it right away after merging this one so that others can understand how to use it + the content is still very fresh in our memory.

haystack/classifier/farm.py

haystack/eval.py

haystack/pipeline.py

tholor · 2021-08-31T13:47:53Z

haystack/pipeline.py


        # Convert to answer format to allow "drop-in replacement" for other QA pipelines
-        if return_in_answer_format:
+        if self.return_in_answer_format:
            results: Dict = {"query": query, "answers": []}


We could probably also use the new Doc2Answer node instead here. But I think it's not really the scope of this PR

haystack/schema.py

tholor

LGTM!

tholor reviewed Aug 5, 2021

View reviewed changes

haystack/schema.py Show resolved Hide resolved

haystack/schema.py Show resolved Hide resolved

haystack/reader/base.py Show resolved Hide resolved

oryx1729 force-pushed the pipeline-args branch 2 times, most recently from f9d3720 to c63031a Compare August 19, 2021 14:14

tholor reviewed Aug 31, 2021

View reviewed changes

tholor mentioned this pull request Aug 31, 2021

Update documentation for new Pipeline design #1386

Closed

tholor changed the title ~~WIP: Refactor communication between Pipeline Components~~ Refactor communication between Pipeline Components Sep 2, 2021

oryx1729 added 23 commits September 2, 2021 11:37

Add POC for extractive-qa pipeline

324e615

Remove kwargs from run

09eabbc

Remove kwargs from reader run

b431e3d

Add handling of debug information from nodes

73739e4

Fix type hints

97d6ab1

Fix standard pipelines

4de5f13

Handle null params

0b1e8e1

Refactor run() for all components

8d0af9f

Fix type hint

9e078eb

Fix EvalDocuments

9406cda

Fix typing

525486e

Fix EvalAnswers

4466c54

Fix Summarizer test

7108c5f

Fix EvalAnswers

79cbb2e

Fix test

a4ab9b4

Fix Ray test

de1cd30

Fix QueryClassifier

50f90be

Fix TransformersQueryClassifier

76d82ed

Fix SklearnQueryClassifier

bbb7886

Add support for more types as Pipeline inputs

d6ab5c1

Fix eval test

e2b750a

Fix RayPipeline

2d7ddbf

Fix QuestionGenerator

94cdd59

oryx1729 added 10 commits September 2, 2021 11:40

Revert dict cast for primitives

032b977

Update tests for rest_api

28ae659

Fix pipeline test

9498fea

Fix Eval

9fa4252

Add tests for invalid input to Pipelines

a960bf7

Add docstring for _dispatch_run()

d673516

Adapt UI query endpoint

266ba37

Update tutorials

6eeb4ac

Fix filters dict access in query API

fb82bb3

Update tutorial

b383000

oryx1729 force-pushed the pipeline-args branch from b5264c0 to b383000 Compare September 2, 2021 09:50

oryx1729 added 12 commits September 2, 2021 11:52

Add type hints for run() in eval.py

45c1d55

Fix docstring

cfbcc05

Add explicit args for run() in BaseComponent

7bd549c

Update docstrings for standard pipelines

8cf409a

Add missing import

1c28554

Add test for _debug

ef46880

Update example in README

f97acb4

Update Pipelines README

c8ff595

Update Pipeline Tutorial

d2c9755

Remove kwargs from crawler run()

ca1c214

Remove kwargs from FileTypeClassifier run()

49f5d8a

Fix QueryClassifier in tutorial

32b6aae

oryx1729 requested a review from tholor September 10, 2021 07:53

tholor approved these changes Sep 10, 2021

View reviewed changes

oryx1729 merged commit 9dd7c74 into master Sep 10, 2021

oryx1729 deleted the pipeline-args branch September 10, 2021 09:41

oryx1729 mentioned this pull request Sep 13, 2021

Refactor how args are passed between pipeline nodes (First design) #1233

Closed

lalitpagaria mentioned this pull request Sep 14, 2021

Need Help Running Basic QA Notebook on Windows 10 #1445

Closed

tholor mentioned this pull request Sep 29, 2021

Return Intermediate Node Output #1193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor communication between Pipeline Components #1321

Refactor communication between Pipeline Components #1321

oryx1729 commented Aug 5, 2021 •

edited by tholor

Loading

tholor left a comment

review-notebook-app bot commented Aug 20, 2021

tholor left a comment

tholor Aug 31, 2021

tholor left a comment

Refactor communication between Pipeline Components #1321

Refactor communication between Pipeline Components #1321

Conversation

oryx1729 commented Aug 5, 2021 • edited by tholor Loading

Passing data between Components

Handling of kwargs

Passing Parameters

Validating Parameters

Adding "debug" information

⚠️ Breaking Changes

tholor left a comment

Choose a reason for hiding this comment

review-notebook-app bot commented Aug 20, 2021

tholor left a comment

Choose a reason for hiding this comment

tholor Aug 31, 2021

Choose a reason for hiding this comment

tholor left a comment

Choose a reason for hiding this comment

oryx1729 commented Aug 5, 2021 •

edited by tholor

Loading

Handling of `kwargs`