Getting out of memory error at inference time but very little memory usage #310

ajsander · 2015-09-15T14:25:45Z

I've trained a couple models (Alexnet and GoogleNet) using DIGITS successfully with statistics shown for test and validation accuracy, but when I try to classify a single image using the web interface I get the following error:

WARNING: Logging before InitGoogleLogging() is written to STDERR
F0915 14:10:45.809661 98789 common.cpp:266] Check failed: error == cudaSuccess (2 vs. 0)  out of        memory
*** Check failure stack trace: ***

When I check nvidia-smi it appears that it the amount of memory is increasing by around 100MB but it's still nowhere near the full memory capacity of the card at 3GB.

The text was updated successfully, but these errors were encountered:

lukeyeager · 2015-09-15T16:27:58Z

Interesting. You trained the model without memory errors and ran out of memory when testing the model? That doesn't sound right. Are you using the same version of Caffe now as you were when you were training?

I wouldn't expect nvidia-smi to be very useful here. All of the memory allocations should happen very quickly and then release very quickly as soon as the error occurs. So you'd have to run nvidia-smi at just the right time to catch it.

ajsander · 2015-09-15T16:31:22Z

Yes, it’s the same version Caffe/DIGITS. I just tried using the test image web UI button and got that error. I was watching nvidia-smi with the –l option and the memory that’s used doesn’t appear to be released (~90 MB).

From: Luke Yeager <notifications@github.com mailto:notifications@github.com>
Reply-To: NVIDIA/DIGITS <reply@reply.github.com mailto:reply@reply.github.com>
Date: Tuesday, September 15, 2015 at 12:28 PM
To: NVIDIA/DIGITS <DIGITS@noreply.github.com mailto:DIGITS@noreply.github.com>
Cc: Aaron Sander <Sander_Aaron@bah.com mailto:Sander_Aaron@bah.com>
Subject: [External] Re: [DIGITS] Getting out of memory error at inference time but very little memory usage (#310)

Interesting. You trained the model without memory errors and ran out of memory when testing the model? That doesn't sound right. Are you using the same version of Caffe now as you were when you were training?

I wouldn't expect nvidia-smi to be very useful here. All of the memory allocations should happen very quickly and then release very quickly as soon as the error occurs. So you'd have to run nvidia-smi at just the right time to catch it.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/310#issuecomment-140451886.

lukeyeager · 2015-09-15T16:33:39Z

Alright, can you give me a little more information?

GPU[s]
Driver version
CUDA version
cuDNN version
Caffe version
DIGITS version
Network architecture (AlexNet, GoogLeNet, both?)

ajsander · 2015-09-15T16:41:52Z

Running on an Amazon g2.8xlarge
GPU[s]: 4x GRID K520
CUDA 7.0
cuDNN 7.0
Caffe version 0.12 NVIDIA fork
DIGITS 2.1

Both Alexnet and GoogleNet Experienced the same problem

From: Luke Yeager <notifications@github.com mailto:notifications@github.com>
Reply-To: NVIDIA/DIGITS <reply@reply.github.com mailto:reply@reply.github.com>
Date: Tuesday, September 15, 2015 at 12:33 PM
To: NVIDIA/DIGITS <DIGITS@noreply.github.com mailto:DIGITS@noreply.github.com>
Cc: Aaron Sander <Sander_Aaron@bah.com mailto:Sander_Aaron@bah.com>
Subject: [External] Re: [DIGITS] Getting out of memory error at inference time but very little memory usage (#310)

Alright, can you give me a little more information?

GPU[s]
Driver version
CUDA version
cuDNN version
Caffe version
DIGITS version
Network architecture (AlexNet, GoogLeNet, etc.)

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/310#issuecomment-140453324.

lukeyeager · 2015-09-15T18:12:40Z

I was able to verify the same issue with the v2.0 web installer as well, which makes this a pretty serious bug. Unfortunately, I don't have time to fight with compilation on AWS right now. I've refiled this bug at NVIDIA/caffe#34.

lukeyeager · 2015-12-15T19:16:35Z

@ajsander can you try using the v3.0 RC3 deb packages to see if the issue persists?

https://github.com/NVIDIA/DIGITS/blob/digits-3.0/docs/UbuntuInstall.md

lukeyeager · 2016-05-18T22:52:28Z

I'm going to close this.

A lot of code has changed in cuDNN, Caffe and DIGITS since then. This has likely been fixed. Please reply to this thread if you still see this issue with DIGITS >= 3.0.

apolo74 · 2017-11-29T09:15:42Z

Hi guys, I'm experiencing the same error and I was wondering if you found a way to fix this.
My info:
Laptop Intel Core i7-6500U CPU @ 2.50 GHzx x 4, 12GB RAM (Ubuntu 16.04, 64b)
GPU: Nvidia BeForce 940M, 2GB
Working with Nvidia-Docker
Caffe: 0.15.14 (Nvidia)
DIGITS: 6.0.0

lukeyeager assigned lukeyeager and unassigned lukeyeager Sep 15, 2015

lukeyeager added the bug label Sep 15, 2015

lukeyeager mentioned this issue Sep 15, 2015

Out-of-memory on g2.8xlarge NVIDIA/caffe#34

Closed

lukeyeager closed this as completed May 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting out of memory error at inference time but very little memory usage #310

Getting out of memory error at inference time but very little memory usage #310

ajsander commented Sep 15, 2015

lukeyeager commented Sep 15, 2015

ajsander commented Sep 15, 2015

lukeyeager commented Sep 15, 2015

ajsander commented Sep 15, 2015

lukeyeager commented Sep 15, 2015

lukeyeager commented Dec 15, 2015

lukeyeager commented May 18, 2016

apolo74 commented Nov 29, 2017

Getting out of memory error at inference time but very little memory usage #310

Getting out of memory error at inference time but very little memory usage #310

Comments

ajsander commented Sep 15, 2015

lukeyeager commented Sep 15, 2015

ajsander commented Sep 15, 2015

lukeyeager commented Sep 15, 2015

ajsander commented Sep 15, 2015

lukeyeager commented Sep 15, 2015

lukeyeager commented Dec 15, 2015

lukeyeager commented May 18, 2016

apolo74 commented Nov 29, 2017