Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Memory Usage for Multiple GPUS #1399

Closed
liuxianming opened this issue Nov 4, 2014 · 4 comments
Closed

GPU Memory Usage for Multiple GPUS #1399

liuxianming opened this issue Nov 4, 2014 · 4 comments

Comments

@liuxianming
Copy link

screen shot 2014-11-04 at 1 10 30 pm
Hi, Guys

I'm wondering if anyone has already posted this issue before. I'm working on a machine with two Tesla K40 gpus. When I trained a network on GPU1, there is a thread in GPU0 which takes a portion of GPU memory. I used to find this problem on early version of Caffe, in which the process in GPU0 took a duplicated memory block but the utilization of GPU is 0. Now it only takes a portion of the total memory usage.

I guess it is because that some initialization of the code is fixed to use GPU0, but where is it?

@longjon
Copy link
Contributor

longjon commented Nov 17, 2014

Should have been fixed by #507, unless something has caused a regression. Are you having this problem with current dev? In the medium term, this should become totally impossible with per-net device settings.

@shelhamer
Copy link
Member

Echoing that this should be fixed by #507 and will be totally resolved in #1500. Closing, but comment if this is still seen in current master / dev.

@xavigibert
Copy link

This issue is still unresolved. The DataLayer prefetch thread, "DataLayer::InternalThreadEntry" in data_layer.cpp, indirectly calls "cudaEventCreate" in benchmark.cpp. The whole story is that "DataLayer::InternalThreadEntry()" instantiates an object of class "CPUTimer". "CPUTimer" is derived from "Timer", and the constructor of "Timer" calls "Timer::Init", which calls "cudaEventCreate". Since only the main thread calls caffe::Caffe::SetDevice, this other thread defaults to GPU 0. In my machine, this results in a memory allocation of about 38M on GPU 0 for each instance of Caffe (see my example below).

screen shot 2015-08-14 at 1 08 42 pm

In my application, I cannot not run two instances of Caffe on a GTX 690 because it exceeds the 2048MB limit, so I have had to fix this problem by modifying class CPUTimer so that it no longer inherits Timer. If you find my fix acceptable, please let me know and I will create and submit a pull request.

xavigibert pushed a commit to xavigibert/caffe that referenced this issue Aug 14, 2015
…as causing the allocation of 38MiB of memory on GPU 0 when we instructed caffe to run on another GPU.
@yeyun111
Copy link

yeyun111 commented Apr 27, 2016

I got the same issue, and finally found to be the use from opencv in preprocessing, particularly cv::merge, cv::subtract and cv::split which are common operations for caffe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants