Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly delete device-converted tensors. #19

Closed
wants to merge 1 commit into from

Conversation

tomcur
Copy link
Contributor

@tomcur tomcur commented Apr 17, 2020

Fixes #4.

@dotchen
Copy link
Owner

dotchen commented Oct 4, 2020

I don't think this is a fix as the python garbage collector would automatically handle this.

@dotchen dotchen closed this Oct 4, 2020
@tomcur
Copy link
Contributor Author

tomcur commented Oct 5, 2020

The garbage collector does not handle this, as the objects live at least until the variables fall out of scope or are reassigned. A for-loop doesn't create a new scope. That means in the current code memory usage of input tensors is doubled.

More information here: #4 (comment)

@dotchen
Copy link
Owner

dotchen commented Oct 5, 2020

This would not be an issue unless your GPU does not have the mem to hold 2x of the data, no? A standard deep learning GPU (1080Ti/Titan) would be able to handle the default batch-size.

@tomcur
Copy link
Contributor Author

tomcur commented Oct 5, 2020

That's right; it's likely only an issue when trying the code on simple development machines. Do note that train_birdview.py, for the code's default batch size of 256, "wastes" 3GB vmem. It's good practice to del tensors when you no longer need them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Training: CUDA: Out of Memory Optimizations
2 participants