Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipelines tutorial #991

Merged
merged 10 commits into from
Apr 29, 2021
Merged

Pipelines tutorial #991

merged 10 commits into from
Apr 29, 2021

Conversation

brandenchan
Copy link
Contributor

This PR adds a notebook and .py tutorial which shows what can be done with Haystack pipelines

@brandenchan brandenchan self-assigned this Apr 22, 2021
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@brandenchan brandenchan requested a review from tholor April 22, 2021 10:50
@brandenchan
Copy link
Contributor Author

Note that the Colab notebook has not been fully tested yet

Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Added a few comments + requests for minor changes.

One bigger question: Any particular reason to not use a pipeline for indexing the data in the beginning? I think it would make the picture complete.

@@ -29,6 +29,39 @@ def launch_es():
time.sleep(15)


def launch_milvus():
# Start a Milvus server
# You can start Elasticsearch on your local machine instance using Docker. If Docker is not readily available in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment here for milvus / remove the "executable part"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean exactly by the "executable part"?

haystack/utils.py Outdated Show resolved Hide resolved
haystack/utils.py Show resolved Hide resolved
pp.pprint(results)


def print_answers_gen(results: dict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is for generated answers? Can't we adjust print_answers() or adjust the result format from the generator so that we need a special method here? I think we should try to "standardize" the formats more in the next weeks. The more special helpers we add now on top, the more we'll have to refactor again soon...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged this fn with print_answers() in a rather hacky way. Anticipating that Reader and Generator outputs will be standardized and that we can come up with a proper solution once we have decided on that format.

tutorials/Tutorial11_Pipelines.py Outdated Show resolved Hide resolved
tutorials/Tutorial11_Pipelines.py Outdated Show resolved Hide resolved
tutorials/Tutorial11_Pipelines.py Show resolved Hide resolved
@brandenchan
Copy link
Contributor Author

Looking good. Added a few comments + requests for minor changes.

One bigger question: Any particular reason to not use a pipeline for indexing the data in the beginning? I think it would make the picture complete.

I don't have so much familiarity with it yet and I couldn't find a good example. Also with the current tutorial setup, it might over complicate what is otherwise very simple indexing.

For the sake of just getting this tutorial out there, I am going to move ahead with merging this. Adding indexing pipeline is in the backlog (#992) and we can think of a good way to incorporate indexing pipelines when we revisit it.

@brandenchan brandenchan merged commit 9827b36 into master Apr 29, 2021
@brandenchan brandenchan deleted the pipelines_tutorial branch April 29, 2021 15:31
@brandenchan brandenchan mentioned this pull request May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants