Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing causes data preprocessing to crash #110

Closed
brandenchan opened this issue Oct 9, 2019 · 1 comment
Closed

Multiprocessing causes data preprocessing to crash #110

brandenchan opened this issue Oct 9, 2019 · 1 comment
Labels
bug Something isn't working

Comments

@brandenchan
Copy link
Contributor

brandenchan commented Oct 9, 2019

Data preprocessing crashes when performing language model finetuning. This has to do with the size of the dataset since training ran smoothly with a dataset of 10k. This error was thrown when processing a dataset of 5 million samples.

Screenshot 2019-10-08 at 10 11 29

@brandenchan brandenchan added the bug Something isn't working label Oct 9, 2019
@brandenchan
Copy link
Contributor Author

Fixed by setting the DataSilo's multiprocessing_chunk_size argument to a larger value (in this case, multiprocessing_chunk_size=2000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant