Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max_multiprocessing_chunksize as a param for DataSilo #168

Merged
merged 2 commits into from
Dec 12, 2019

Conversation

tanaysoni
Copy link
Contributor

The default mp chunksize value computed by calc_chunksize() for lm_finetuning task is rather large leading to memory issues.

This PR introduces a max_multiprocessing_chunksize param for the DataSilo, enabling to have an upper limit for the chunksize.

The default mp chunksize value for lm_finetuning task is rather large
leading to memory issues. Adding max chunksize as a param enables to
add an upper limit for the chunksize calculations.
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@tanaysoni tanaysoni merged commit 7b60e4e into master Dec 12, 2019
@tholor tholor deleted the fix_lm_finetuning_chunksize branch April 28, 2020 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants