GPU Memory Usage for Multiple GPUS #1399

liuxianming · 2014-11-04T19:05:52Z

Hi, Guys

I'm wondering if anyone has already posted this issue before. I'm working on a machine with two Tesla K40 gpus. When I trained a network on GPU1, there is a thread in GPU0 which takes a portion of GPU memory. I used to find this problem on early version of Caffe, in which the process in GPU0 took a duplicated memory block but the utilization of GPU is 0. Now it only takes a portion of the total memory usage.

I guess it is because that some initialization of the code is fixed to use GPU0, but where is it?

longjon · 2014-11-17T22:24:17Z

Should have been fixed by #507, unless something has caused a regression. Are you having this problem with current dev? In the medium term, this should become totally impossible with per-net device settings.

shelhamer · 2014-11-29T19:19:22Z

Echoing that this should be fixed by #507 and will be totally resolved in #1500. Closing, but comment if this is still seen in current master / dev.

xavigibert · 2015-08-14T17:20:20Z

This issue is still unresolved. The DataLayer prefetch thread, "DataLayer::InternalThreadEntry" in data_layer.cpp, indirectly calls "cudaEventCreate" in benchmark.cpp. The whole story is that "DataLayer::InternalThreadEntry()" instantiates an object of class "CPUTimer". "CPUTimer" is derived from "Timer", and the constructor of "Timer" calls "Timer::Init", which calls "cudaEventCreate". Since only the main thread calls caffe::Caffe::SetDevice, this other thread defaults to GPU 0. In my machine, this results in a memory allocation of about 38M on GPU 0 for each instance of Caffe (see my example below).

In my application, I cannot not run two instances of Caffe on a GTX 690 because it exceeds the 2048MB limit, so I have had to fix this problem by modifying class CPUTimer so that it no longer inherits Timer. If you find my fix acceptable, please let me know and I will create and submit a pull request.

…as causing the allocation of 38MiB of memory on GPU 0 when we instructed caffe to run on another GPU.

yeyun111 · 2016-04-27T04:08:01Z

I got the same issue, and finally found to be the use from opencv in preprocessing, particularly cv::merge, cv::subtract and cv::split which are common operations for caffe.

shelhamer closed this as completed Nov 29, 2014

xavigibert pushed a commit to xavigibert/caffe that referenced this issue Aug 14, 2015

Fixes BVLC#1399 by preventing CPUTimer from accessing the GPU. This w…

1a539b1

…as causing the allocation of 38MiB of memory on GPU 0 when we instructed caffe to run on another GPU.

xavigibert mentioned this issue Aug 15, 2015

Fixes #1399 by preventing CPUTimer from accessing the GPU. This was c… #2929

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Memory Usage for Multiple GPUS #1399

GPU Memory Usage for Multiple GPUS #1399

liuxianming commented Nov 4, 2014

longjon commented Nov 17, 2014

shelhamer commented Nov 29, 2014

xavigibert commented Aug 14, 2015

yeyun111 commented Apr 27, 2016 •

edited

GPU Memory Usage for Multiple GPUS #1399

GPU Memory Usage for Multiple GPUS #1399

Comments

liuxianming commented Nov 4, 2014

longjon commented Nov 17, 2014

shelhamer commented Nov 29, 2014

xavigibert commented Aug 14, 2015

yeyun111 commented Apr 27, 2016 • edited

yeyun111 commented Apr 27, 2016 •

edited