-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pipelines tutorial #991
Pipelines tutorial #991
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Note that the Colab notebook has not been fully tested yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Added a few comments + requests for minor changes.
One bigger question: Any particular reason to not use a pipeline for indexing the data in the beginning? I think it would make the picture complete.
haystack/utils.py
Outdated
@@ -29,6 +29,39 @@ def launch_es(): | |||
time.sleep(15) | |||
|
|||
|
|||
def launch_milvus(): | |||
# Start a Milvus server | |||
# You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment here for milvus / remove the "executable part"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean exactly by the "executable part"?
haystack/utils.py
Outdated
pp.pprint(results) | ||
|
||
|
||
def print_answers_gen(results: dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is for generated answers? Can't we adjust print_answers()
or adjust the result format from the generator so that we need a special method here? I think we should try to "standardize" the formats more in the next weeks. The more special helpers we add now on top, the more we'll have to refactor again soon...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have merged this fn with print_answers()
in a rather hacky way. Anticipating that Reader and Generator outputs will be standardized and that we can come up with a proper solution once we have decided on that format.
I don't have so much familiarity with it yet and I couldn't find a good example. Also with the current tutorial setup, it might over complicate what is otherwise very simple indexing. For the sake of just getting this tutorial out there, I am going to move ahead with merging this. Adding indexing pipeline is in the backlog (#992) and we can think of a good way to incorporate indexing pipelines when we revisit it. |
This PR adds a notebook and .py tutorial which shows what can be done with Haystack pipelines