-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make PreProcessor.process() work on lists of documents #1163
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great quality-of-life feature :)
Ready to merge after comments are adressed
""" | ||
Perform document cleaning and splitting. Takes a single document as input and returns a list of documents. | ||
Perform document cleaning and splitting. Can takes a single document or a list of documents as input and returns a list of documents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo "takes"
documents=list(documents), | ||
**kwargs | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's raise an error if it's neither list nor dict. Right now we just return an empty list
Previously, applying the PreProcessor to multiple docs required extra lines of code to iterate over documents and then unnest results. This PR allows the
PreProcessor.process()
method to handle both single dict inputs and also lists of dicts.