Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training is getting killed #886

Closed
vdudhmal opened this issue Aug 8, 2014 · 4 comments
Closed

training is getting killed #886

vdudhmal opened this issue Aug 8, 2014 · 4 comments

Comments

@vdudhmal
Copy link

vdudhmal commented Aug 8, 2014

Hi there,

I am training Caffe for 6 object classes following the general approach on the ImageNet Training Recipe Page. I have 100 images per class. Batch size is 100 and 6 test iterations. However, the training gets killed in the first iteration:
I0808 13:37:18.157456 2527 net.cpp:174] Network initialization done.
I0808 13:37:18.157467 2527 net.cpp:175] Memory required for Data 209716808
I0808 13:37:18.157564 2527 solver.cpp:49] Solver scaffolding done.
I0808 13:37:18.157584 2527 solver.cpp:61] Solving CaffeNet
I0808 13:37:18.157608 2527 solver.cpp:106] Iteration 0, Testing net
I0808 13:46:45.056787 2527 solver.cpp:142] Test score #0: 0.17
I0808 13:46:45.472862 2527 solver.cpp:142] Test score #1: 2.00962
Killed
Done.

Any help regarding training Caffe and debugging is greatly appreciated!

@shelhamer
Copy link
Member

I've never seen this. Perhaps there's some kind of processing limit on your account?

Please ask on the caffe-users mailing list. As of the latest release we have decided to reserve GitHub issues for development discussion. Thanks!

@wangpichao
Copy link

I also encounter similiar situations when I use the ImageNet training and test my own data. Training is all right, but when I did test, after about half of my test data(13382 test images), it is also killed. What's the matter? Have you fixed it?

@Luonic
Copy link

Luonic commented Sep 24, 2016

@wangpichao probably you ran out of ram or gpu memory. Try to reduce batch size

@lwzeng
Copy link

lwzeng commented Jul 14, 2018

If the user or sysadmin did not kill the program the kernel may have. The kernel would only kill a process under exceptional circumstances such as extreme resource starvation (think mem+swap exhaustion).you will find the answer you want at https://stackoverflow.com/questions/726690/what-killed-my-process-and-why

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants