-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ram Issue in Tutorial 9 (DPR Training) Colab #735
Comments
How about our tutorial just uses small files so people can quickly go through the code + execution, and we have the links for the large datafiles for interested users as comments? |
Actually following line read whole file in memory also decompress it in memory as well.
Python 3 support buffered reading automatically so if change it to as follows memory utilization will improve -
I have not tested above snippet, I will do tonight. If this not work fine there is another solution to read file in chunks. But I don't think the would be necessary as gzip already support buffered IO. One more point we can directly uncompress file from url instead of download to temp file and uncompressing it. Python compression libs have streaming support. Refer this #709 (comment) |
Hey @lalitpagaria thanks so much for the suggestion! I actually tested it and it solved the problem. I haven't removed the tempfile code but I did integrate your snippet in #737. |
@brandenchan glad to know it. Thanks for testing it. 🙂 |
@tholor This can be closed now |
have you update the repo, when i run the tutorial i still face the same issue :(( |
When running the DPR Training tutorial in Colab, the download of the training dataset seems to run fine, but the program crashes before starting to download the dev file due to running out of RAM.
We need to find some way to reduce this RAM consumption. It likely has something to do with the way the files are being uncompressed
The text was updated successfully, but these errors were encountered: