-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't Train with Multiple GPUs (Invalid Device Ordinal) #3376
Comments
This is strange that it can't even set device: Lines 32 to 37 in 378d49e
|
Problem resolved by nvcc --version
nvidia-smi
@AlexeyAB I now have a new question. Why is Darknet using GPU 0, even when I explicitly specify only |
How many GPU-usage and GPU-memory are consumed on GPU-0? It should create Lines 42 to 44 in 378d49e
cuDNN-library can use GPU-0 internally, for reasons unknown to me. |
I see, so there's nothing that Darknet can do about avoiding using GPU-0 then I guess, that's unfortunate. Thanks anyways! |
Solved my problem, great!!!! |
Using Tesla V100s and cuda-9.2. I can compile fine with the following flags.
Also,
CUDA_VISIBLE_DEVICES=6,7
.However, the following command crashes:
./darknet detector train ../od-yolo/training/top_20_1000.data ../od-yolo/training/top_20.cfg build/darknet/x64/darknet53.conv.74 -gpus 6,7
Both devices are live and if I use individual GPUs I can use them individually. What gives?
The text was updated successfully, but these errors were encountered: