-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add PyPDFToDocument component (2.0) #5850
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! 🚀
@Timoeller @ZanSara Inspired by the #4467 thread, perhaps we can adjust the default |
Related issues
#5670
Why:
This PR introduces the
PyPDFToDocument
component to Haystack 2.0. This component is designed to convert PDF files into a list ofDocument
objects, which can then be seamlessly integrated into the Haystack 2.0 pipeline.What:
A new
PyPDFToDocument
component has been added to thefile_converters
package. This package serves as a collection of various file conversion utilities within Haystack.How can it be used:
Here's a code snippet demonstrating how to use the new
PyPDFToDocument
component:How did you test it:
Unit tests have been added to cover the new component. Additionally, manual tests were conducted, as described in the "How can it be used" section above.
Notes For Reviewer:
Please ensure that the new component and its unit tests are correctly implemented. Double-check to make sure everything aligns with the project's standards and that there are no unintended side effects.