GPU memory not release after interrupted the training script #15

LearnerInGithub · 2017-06-26T16:01:43Z

Hi, @bharatsingh430 , I faced a problem that the GPU memory not released normally after I interrupted the training script, in details saying, I used 2 GPU, like [0,1], while I pressed the Ctrl+C to stop the training script, then I prompt the nvidia-smi to see the GPU usage, found that only GPU 1 was normally released the used memory and GPU 0 still keep the allocated memory, even wait for a long time, the problem still there, so want to ask which reasons may caused such problem? And how could I fixed it? PS: I tried kill the Python process, but it not work. Waiting for your help! Thank you very much!

bharatsingh430 · 2017-06-26T16:07:12Z

you can use ps aux | grep caffe, get the pids and use kill -9 pid. that should work.

LearnerInGithub · 2017-06-28T08:35:01Z

@bharatsingh430 It's not always work, at this time, the processes has been killed by following your instructions, however, the occupied GPU memory still there and GPU usage was 100%, so how to clear the occupied GPU memory? Thank you!

smuelpeng · 2017-08-24T10:12:49Z

I encounter the same question，the process become a zumbia process which parent process is "init".So I have to reboot my machine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU memory not release after interrupted the training script #15

GPU memory not release after interrupted the training script #15

LearnerInGithub commented Jun 26, 2017

bharatsingh430 commented Jun 26, 2017

LearnerInGithub commented Jun 28, 2017 •

edited

Loading

smuelpeng commented Aug 24, 2017

GPU memory not release after interrupted the training script #15

GPU memory not release after interrupted the training script #15

Comments

LearnerInGithub commented Jun 26, 2017

bharatsingh430 commented Jun 26, 2017

LearnerInGithub commented Jun 28, 2017 • edited Loading

smuelpeng commented Aug 24, 2017

LearnerInGithub commented Jun 28, 2017 •

edited

Loading