Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pipeline] Allow for batch indexing when using Pipelines fix #1168 #1231

Merged
merged 9 commits into from
Jun 30, 2021
Merged

[pipeline] Allow for batch indexing when using Pipelines fix #1168 #1231

merged 9 commits into from
Jun 30, 2021

Conversation

akkefa
Copy link
Contributor

@akkefa akkefa commented Jun 26, 2021

Proposed changes:

Status (please check what you already did):

  • First draft (up for discussions & feedback)
  • Final code
  • Added tests
  • Updated documentation

@akkefa
Copy link
Contributor Author

akkefa commented Jun 26, 2021

@tholor @brandenchan This the first draft of how we can introduce batch processing when using the pipeline. Please review the design changes and Let me know that I am on the right track 😎

After your approval, I will add the remaining changes to the Pull Request.

Copy link
Contributor

@brandenchan brandenchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main changes look good. I'd just like to 1) make sure the meta argument is more flexible, 2) suggest using Path.suffix and 3) ask whether its possible to have the REST API also support batches of documents

haystack/file_converter/base.py Outdated Show resolved Hide resolved
rest_api/controller/file_upload.py Outdated Show resolved Hide resolved
haystack/file_converter/base.py Show resolved Hide resolved
@akkefa akkefa marked this pull request as ready for review June 30, 2021 09:56
@akkefa akkefa requested a review from brandenchan June 30, 2021 09:56
@brandenchan
Copy link
Contributor

Great thanks for making those changes! Just one last thing came to mind - I am wondering if these REST API changes might break the file_upload functionality in the UI. Could you test this out @akkefa ?

@akkefa
Copy link
Contributor Author

akkefa commented Jun 30, 2021

Great thanks for making those changes! Just one last thing came to mind - I am wondering if these REST API changes might break the file_upload functionality in the UI. Could you test this out @akkefa ?

Thank you for your feedback. Which UI you are talking about?
I tested the multiple file upload changes on Swagger API Documentation.

@brandenchan
Copy link
Contributor

I'm thinking about the Streamlit ui that's packaged with haystack.

@akkefa
Copy link
Contributor Author

akkefa commented Jun 30, 2021

I'm thinking about the Streamlit ui that's packaged with haystack.

@brandenchan
Ahh. Let me verify the changes.

@akkefa
Copy link
Contributor Author

akkefa commented Jun 30, 2021

@brandenchan I update the upload file parameter to files in streamlit. Rest are the changes working fine.

Copy link
Contributor

@brandenchan brandenchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for your work on this. This gets my thumbs up.

@brandenchan brandenchan merged commit 29e1401 into deepset-ai:master Jun 30, 2021
@akkefa akkefa deleted the batch_indexing branch June 30, 2021 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for batch indexing when using Pipelines
2 participants