Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting out of memory error at inference time but very little memory usage #310

Closed
ajsander opened this issue Sep 15, 2015 · 8 comments
Closed
Labels

Comments

@ajsander
Copy link

I've trained a couple models (Alexnet and GoogleNet) using DIGITS successfully with statistics shown for test and validation accuracy, but when I try to classify a single image using the web interface I get the following error:

WARNING: Logging before InitGoogleLogging() is written to STDERR
F0915 14:10:45.809661 98789 common.cpp:266] Check failed: error == cudaSuccess (2 vs. 0)  out of        memory
*** Check failure stack trace: ***

When I check nvidia-smi it appears that it the amount of memory is increasing by around 100MB but it's still nowhere near the full memory capacity of the card at 3GB.

@lukeyeager
Copy link
Member

Interesting. You trained the model without memory errors and ran out of memory when testing the model? That doesn't sound right. Are you using the same version of Caffe now as you were when you were training?

I wouldn't expect nvidia-smi to be very useful here. All of the memory allocations should happen very quickly and then release very quickly as soon as the error occurs. So you'd have to run nvidia-smi at just the right time to catch it.

@ajsander
Copy link
Author

Yes, it’s the same version Caffe/DIGITS. I just tried using the test image web UI button and got that error. I was watching nvidia-smi with the –l option and the memory that’s used doesn’t appear to be released (~90 MB).

From: Luke Yeager <notifications@github.commailto:notifications@github.com>
Reply-To: NVIDIA/DIGITS <reply@reply.github.commailto:reply@reply.github.com>
Date: Tuesday, September 15, 2015 at 12:28 PM
To: NVIDIA/DIGITS <DIGITS@noreply.github.commailto:DIGITS@noreply.github.com>
Cc: Aaron Sander <Sander_Aaron@bah.commailto:Sander_Aaron@bah.com>
Subject: [External] Re: [DIGITS] Getting out of memory error at inference time but very little memory usage (#310)

Interesting. You trained the model without memory errors and ran out of memory when testing the model? That doesn't sound right. Are you using the same version of Caffe now as you were when you were training?

I wouldn't expect nvidia-smi to be very useful here. All of the memory allocations should happen very quickly and then release very quickly as soon as the error occurs. So you'd have to run nvidia-smi at just the right time to catch it.


Reply to this email directly or view it on GitHubhttps://github.com//issues/310#issuecomment-140451886.

@lukeyeager
Copy link
Member

Alright, can you give me a little more information?

  • GPU[s]
  • Driver version
  • CUDA version
  • cuDNN version
  • Caffe version
  • DIGITS version
  • Network architecture (AlexNet, GoogLeNet, both?)

@ajsander
Copy link
Author

Running on an Amazon g2.8xlarge
GPU[s]: 4x GRID K520
CUDA 7.0
cuDNN 7.0
Caffe version 0.12 NVIDIA fork
DIGITS 2.1

Both Alexnet and GoogleNet Experienced the same problem

From: Luke Yeager <notifications@github.commailto:notifications@github.com>
Reply-To: NVIDIA/DIGITS <reply@reply.github.commailto:reply@reply.github.com>
Date: Tuesday, September 15, 2015 at 12:33 PM
To: NVIDIA/DIGITS <DIGITS@noreply.github.commailto:DIGITS@noreply.github.com>
Cc: Aaron Sander <Sander_Aaron@bah.commailto:Sander_Aaron@bah.com>
Subject: [External] Re: [DIGITS] Getting out of memory error at inference time but very little memory usage (#310)

Alright, can you give me a little more information?

  • GPU[s]
  • Driver version
  • CUDA version
  • cuDNN version
  • Caffe version
  • DIGITS version
  • Network architecture (AlexNet, GoogLeNet, etc.)


Reply to this email directly or view it on GitHubhttps://github.com//issues/310#issuecomment-140453324.

@lukeyeager
Copy link
Member

I was able to verify the same issue with the v2.0 web installer as well, which makes this a pretty serious bug. Unfortunately, I don't have time to fight with compilation on AWS right now. I've refiled this bug at NVIDIA/caffe#34.

@lukeyeager
Copy link
Member

@ajsander can you try using the v3.0 RC3 deb packages to see if the issue persists?

https://github.com/NVIDIA/DIGITS/blob/digits-3.0/docs/UbuntuInstall.md

@lukeyeager
Copy link
Member

I'm going to close this.

A lot of code has changed in cuDNN, Caffe and DIGITS since then. This has likely been fixed. Please reply to this thread if you still see this issue with DIGITS >= 3.0.

@apolo74
Copy link

apolo74 commented Nov 29, 2017

Hi guys, I'm experiencing the same error and I was wondering if you found a way to fix this.
My info:
Laptop Intel Core i7-6500U CPU @ 2.50 GHzx x 4, 12GB RAM (Ubuntu 16.04, 64b)
GPU: Nvidia BeForce 940M, 2GB
Working with Nvidia-Docker
Caffe: 0.15.14 (Nvidia)
DIGITS: 6.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants