-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor communication between Pipeline Components #1321
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Good first step :)
Left a few comments
- the "graph validation" part before calling pipe.run() is still missing
f9d3720
to
c63031a
Compare
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great!
Left a few minor comments. Mostly around documentation.
- I think it would also make sense to add a few more test cases (e.g. testing the _debug field).
- Did you test the tutorials? especially the ones on pipeline (11) and query classifier (14)? I only had a superficial review there...
- We need to update documentation (especially the usage page in the docs + the pipeline tutorial). I am fine with doing that in a separate PR, but we should do it right away after merging this one so that others can understand how to use it + the content is still very fresh in our memory.
|
||
# Convert to answer format to allow "drop-in replacement" for other QA pipelines | ||
if return_in_answer_format: | ||
if self.return_in_answer_format: | ||
results: Dict = {"query": query, "answers": []} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably also use the new Doc2Answer
node instead here. But I think it's not really the scope of this PR
b5264c0
to
b383000
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Passing data between Components
Handling of
kwargs
The nodes'
run()
methods no longer need to deal withkwargs
.The signature of the
run()
should include input data primitives(query
,documents
,answers
, ...) and params (top_k
,filters
, ...) that it needs to work with.Passing Parameters
Parameters for the pipeline execution must be passed as a
param
dict in thepipeline.run()
method. By default, Pipeline attempts to match all params to each node.To target a specific node, it must be explicitly specified by the node name. For instance, pipeline.run(query="Why?")
Validating Parameters
All targeted parameters(as described above) are now validated. For instance, supplying
top_p
instead oftop_k
will now return an error.Adding "debug" information
Nodes can return a
_debug
key in the output that gets appended in the final output of a request. For example,will get transformed in final response like,
top_k
,no_ans_boost
forPipeline.run()
must be passed in aparams
dicttop_k
s liketop_k_reader
,top_k_retriever
are now replaced withtop_k
. To disambiguate, the params can be "targeted" to a specific node. For instance,pipeline.run(query="Why?", params={"Retriever": {"top_k": 10}, "Reader": {"top_k": 5})