Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor communication between Pipeline Components #1321

Merged
merged 55 commits into from
Sep 10, 2021
Merged

Conversation

oryx1729
Copy link
Contributor

@oryx1729 oryx1729 commented Aug 5, 2021

Passing data between Components

Handling of kwargs

The nodes' run() methods no longer need to deal with kwargs.

The signature of the run() should include input data primitives(query, documents, answers, ...) and params (top_k, filters, ...) that it needs to work with.

Passing Parameters

Parameters for the pipeline execution must be passed as a param dict in the pipeline.run() method. By default, Pipeline attempts to match all params to each node.

To target a specific node, it must be explicitly specified by the node name. For instance, pipeline.run(query="Why?")

Validating Parameters

All targeted parameters(as described above) are now validated. For instance, supplying top_p instead of top_k will now return an error.

Adding "debug" information

Nodes can return a _debug key in the output that gets appended in the final output of a request. For example,

class MyComponent:
    def run(....):
        output = {..., "_debug": "my debug information ..."}  # the _debug key can also be a dict
        return output, "output_1"

will get transformed in final response like,

{"answers": ..., "_debug": {"MyComponent": "my debug information ..."}}

⚠️ Breaking Changes

  • component params like top_k, no_ans_boost for Pipeline.run() must be passed in a params dict
  • component specific top_ks like top_k_reader, top_k_retriever are now replaced with top_k. To disambiguate, the params can be "targeted" to a specific node. For instance, pipeline.run(query="Why?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5})

Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Good first step :)

Left a few comments

  • the "graph validation" part before calling pipe.run() is still missing

haystack/schema.py Show resolved Hide resolved
haystack/schema.py Show resolved Hide resolved
haystack/reader/base.py Show resolved Hide resolved
@oryx1729 oryx1729 force-pushed the pipeline-args branch 2 times, most recently from f9d3720 to c63031a Compare August 19, 2021 14:14
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

Left a few minor comments. Mostly around documentation.

  • I think it would also make sense to add a few more test cases (e.g. testing the _debug field).
  • Did you test the tutorials? especially the ones on pipeline (11) and query classifier (14)? I only had a superficial review there...
  • We need to update documentation (especially the usage page in the docs + the pipeline tutorial). I am fine with doing that in a separate PR, but we should do it right away after merging this one so that others can understand how to use it + the content is still very fresh in our memory.

haystack/classifier/farm.py Show resolved Hide resolved
haystack/eval.py Show resolved Hide resolved
haystack/pipeline.py Show resolved Hide resolved

# Convert to answer format to allow "drop-in replacement" for other QA pipelines
if return_in_answer_format:
if self.return_in_answer_format:
results: Dict = {"query": query, "answers": []}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably also use the new Doc2Answer node instead here. But I think it's not really the scope of this PR

haystack/schema.py Outdated Show resolved Hide resolved
@tholor tholor changed the title WIP: Refactor communication between Pipeline Components Refactor communication between Pipeline Components Sep 2, 2021
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants