-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting out of memory error at inference time but very little memory usage #310
Comments
Interesting. You trained the model without memory errors and ran out of memory when testing the model? That doesn't sound right. Are you using the same version of Caffe now as you were when you were training? I wouldn't expect |
Yes, it’s the same version Caffe/DIGITS. I just tried using the test image web UI button and got that error. I was watching nvidia-smi with the –l option and the memory that’s used doesn’t appear to be released (~90 MB). From: Luke Yeager <notifications@github.commailto:notifications@github.com> Interesting. You trained the model without memory errors and ran out of memory when testing the model? That doesn't sound right. Are you using the same version of Caffe now as you were when you were training? I wouldn't expect nvidia-smi to be very useful here. All of the memory allocations should happen very quickly and then release very quickly as soon as the error occurs. So you'd have to run nvidia-smi at just the right time to catch it. — |
Alright, can you give me a little more information?
|
Running on an Amazon g2.8xlarge Both Alexnet and GoogleNet Experienced the same problem From: Luke Yeager <notifications@github.commailto:notifications@github.com> Alright, can you give me a little more information?
— |
I was able to verify the same issue with the v2.0 web installer as well, which makes this a pretty serious bug. Unfortunately, I don't have time to fight with compilation on AWS right now. I've refiled this bug at NVIDIA/caffe#34. |
@ajsander can you try using the v3.0 RC3 deb packages to see if the issue persists? https://github.com/NVIDIA/DIGITS/blob/digits-3.0/docs/UbuntuInstall.md |
I'm going to close this. A lot of code has changed in cuDNN, Caffe and DIGITS since then. This has likely been fixed. Please reply to this thread if you still see this issue with DIGITS >= 3.0. |
Hi guys, I'm experiencing the same error and I was wondering if you found a way to fix this. |
I've trained a couple models (Alexnet and GoogleNet) using DIGITS successfully with statistics shown for test and validation accuracy, but when I try to classify a single image using the web interface I get the following error:
When I check nvidia-smi it appears that it the amount of memory is increasing by around 100MB but it's still nowhere near the full memory capacity of the card at 3GB.
The text was updated successfully, but these errors were encountered: