syncedmem.cpp:57] Check failed: error == cudaSuccess (11 vs. 0) #1600

jwessnit · 2014-12-19T15:35:30Z

hi,

would anyone know what the problem might be here? cudaMalloc is failing on iteration 0 when I am trying to train my own network. make runtest worked with the exception of 12 unit tests (see below).

I have seen Cuda error 11 a couple of times in other submitted issues but none caused by syncedmem.cpp.

I1219 14:44:31.277175 4727 net.cpp:208] This network produces output accuracy
I1219 14:44:31.277303 4727 net.cpp:208] This network produces output loss
I1219 14:44:31.277470 4727 net.cpp:467] Collecting Learning Rate and Weight Decay.
I1219 14:44:31.277608 4727 net.cpp:219] Network initialization done.
I1219 14:44:31.277739 4727 net.cpp:220] Memory required for data: 5493236
I1219 14:44:31.278095 4727 solver.cpp:41] Solver scaffolding done.
I1219 14:44:31.278239 4727 solver.cpp:160] Solving testnet
I1219 14:44:31.278445 4727 solver.cpp:247] Iteration 0, Testing net (#0)
F1219 14:44:31.733762 4727 syncedmem.cpp:57] Check failed: error == cudaSuccess (11 vs. 0) invalid argument
*** Check failure stack trace: ***
@ 0xb2619060 (unknown)
@ 0xb2618f5c (unknown)
@ 0xb2618b78 (unknown)
@ 0xb261af98 (unknown)
Aborted

make runtest errors:

[----------] Global test environment tear-down
[==========] 838 tests from 169 test cases ran. (911468 ms total)
[ PASSED ] 826 tests.
[ FAILED ] 12 tests, listed below:
[ FAILED ] NetTest/0.TestParamPropagateDown, where TypeParam = caffe::FloatCPU
[ FAILED ] NetTest/1.TestParamPropagateDown, where TypeParam = caffe::DoubleCPU
[ FAILED ] NetTest/2.TestParamPropagateDown, where TypeParam = caffe::FloatGPU
[ FAILED ] NetTest/3.TestParamPropagateDown, where TypeParam = caffe::DoubleGPU
[ FAILED ] MathFunctionsTest/0.TestSgnbitCPU, where TypeParam = float
[ FAILED ] MathFunctionsTest/0.TestSignCPU, where TypeParam = float
[ FAILED ] MathFunctionsTest/1.TestSignCPU, where TypeParam = double
[ FAILED ] MathFunctionsTest/1.TestSgnbitCPU, where TypeParam = double
[ FAILED ] HingeLossLayerTest/0.TestGradientL1, where TypeParam = caffe::FloatCPU
[ FAILED ] HingeLossLayerTest/1.TestGradientL1, where TypeParam = caffe::DoubleCPU
[ FAILED ] HingeLossLayerTest/2.TestGradientL1, where TypeParam = caffe::FloatGPU
[ FAILED ] HingeLossLayerTest/3.TestGradientL1, where TypeParam = caffe::DoubleGPU

12 FAILED TESTS
YOU HAVE 2 DISABLED TESTS

dreadlord1984 · 2015-01-19T01:45:00Z

Have you solved this problem? Or what's the reason of this situation?
I meet the same error in fintuning the trained model just as describe here . http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html.

I used the caffe version 0.99. And run successfully in other training task, just failed finetuning the trained model with different classes output.
My error:

math_functions.cpp:90] Check failed: error == cudaSuccess (11 vs. 0) invalid argument
*** Check failure stack trace: ***
@ 0x7fca72b8edaa (unknown)
@ 0x7fca72b8ece4 (unknown)
@ 0x7fca72b8e6e6 (unknown)
@ 0x7fca72b91687 (unknown)
@ 0x4b6dec caffe::caffe_copy<>()
@ 0x49ff4f caffe::SGDSolver<>::ComputeUpdateValue()
@ 0x49b6d3 caffe::Solver<>::Solve()
@ 0x41046e caffe::Solver<>::Solve()
@ 0x40de4d train()
@ 0x40f508 main
@ 0x7fca6ff3bec5 (unknown)
@ 0x40d6e9 (unknown)
Aborted (core dumped)

mingtop · 2015-01-22T02:36:52Z

When use the GT9500 Gpu , I got the problem
But I fixed it by Change my Gpu GTX750.
Caffe support the compution >= 2.0.
you should change you GpU

jwessnit · 2015-01-22T15:14:30Z

I have not yet solved this problem. Unfortunately, I am using a Jetson TK1, so I can't change GPU.

shelhamer · 2015-02-20T07:05:11Z

The latest release rc2 included the Jetson compatibility changes so it should work fine. Check out this blog post for details: http://jetsonhacks.com/2015/01/17/nvidia-jetson-tk1-caffe-deep-learning-framework/.

Please ask about hardware on the caffe-users group.

leejiajun · 2016-07-29T04:39:51Z

@jwessnit Have you solved this problem?

shelhamer closed this as completed Feb 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

syncedmem.cpp:57] Check failed: error == cudaSuccess (11 vs. 0) #1600

syncedmem.cpp:57] Check failed: error == cudaSuccess (11 vs. 0) #1600

jwessnit commented Dec 19, 2014

dreadlord1984 commented Jan 19, 2015

mingtop commented Jan 22, 2015

jwessnit commented Jan 22, 2015

shelhamer commented Feb 20, 2015

leejiajun commented Jul 29, 2016

syncedmem.cpp:57] Check failed: error == cudaSuccess (11 vs. 0) #1600

syncedmem.cpp:57] Check failed: error == cudaSuccess (11 vs. 0) #1600

Comments

jwessnit commented Dec 19, 2014

dreadlord1984 commented Jan 19, 2015

mingtop commented Jan 22, 2015

jwessnit commented Jan 22, 2015

shelhamer commented Feb 20, 2015

leejiajun commented Jul 29, 2016