Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PreProcessor.process() work on lists of documents #1163

Merged
merged 10 commits into from
Jun 23, 2021

Conversation

brandenchan
Copy link
Contributor

@brandenchan brandenchan commented Jun 9, 2021

Previously, applying the PreProcessor to multiple docs required extra lines of code to iterate over documents and then unnest results. This PR allows the PreProcessor.process() method to handle both single dict inputs and also lists of dicts.

@brandenchan brandenchan requested a review from tholor June 9, 2021 09:55
@brandenchan brandenchan self-assigned this Jun 9, 2021
@brandenchan brandenchan changed the title Add process_batch method to PreProcessor Make PreProcessor.process() work on lists of documents Jun 9, 2021
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great quality-of-life feature :)
Ready to merge after comments are adressed

"""
Perform document cleaning and splitting. Takes a single document as input and returns a list of documents.
Perform document cleaning and splitting. Can takes a single document or a list of documents as input and returns a list of documents.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "takes"

documents=list(documents),
**kwargs
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's raise an error if it's neither list nor dict. Right now we just return an empty list

@brandenchan brandenchan merged commit efc03f7 into master Jun 23, 2021
@brandenchan brandenchan deleted the simplify_process_batch branch June 23, 2021 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants