Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make runtest : error == cudaSuccess (77 vs. 0) an illegal memory access was encountered #5058

Closed
mikael10j opened this issue Dec 3, 2016 · 5 comments

Comments

@mikael10j
Copy link

Hello,

Issue summary

Runtest fails with the following error :
F1203 10:28:14.179427 29002 math_functions.cu:79] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
caffe has been build with latest master.

Steps to reproduce

cmake ..
make all --jobs=4 -d
make install
make runtest

Your system configuration

Operating system: Ubuntu 16.04
Compiler: c++
CUDA version (if applicable): 8.0
CUDNN version (if applicable): 5.1.5
BLAS: Atlas
Python or MATLAB version (for pycaffe and matcaffe respectively): 2.7.12
GPU : GTX 1070

Trace

*** Check failure stack trace: ***
@ 0x7f9aa8b835cd google::LogMessage::Fail()
@ 0x7f9aa8b85433 google::LogMessage::SendToLog()
@ 0x7f9aa8b8315b google::LogMessage::Flush()
@ 0x7f9aa8b85e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9aa9978d9a caffe::caffe_gpu_memcpy()
@ 0x7f9aa98e36d0 caffe::SyncedMemory::cpu_data()
@ 0x7f9aa9775503 caffe::Blob<>::cpu_diff()
@ 0xcb9eb9 caffe::GradientChecker<>::CheckGradientSingle()
@ 0xcbb1fc caffe::GradientChecker<>::CheckGradientExhaustive()
@ 0xcc4728 caffe::SPPLayerTest_TestGradient_Test<>::TestBody()
@ 0xde59e3 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0xdde91a testing::Test::Run()
@ 0xddea68 testing::TestInfo::Run()
@ 0xddeb75 testing::TestCase::Run()
@ 0xde070f testing::internal::UnitTestImpl::RunAllTests()
@ 0xde0a33 testing::UnitTest::Run()
@ 0x894f7d main
@ 0x7f9aa288f830 __libc_start_main
@ 0x8976e9 _start
@ (nil) (unknown)
Aborted (core dumped)
src/caffe/test/CMakeFiles/runtest.dir/build.make:57: recipe for target 'src/caffe/test/CMakeFiles/runtest' failed
make[3]: *** [src/caffe/test/CMakeFiles/runtest] Error 134
CMakeFiles/Makefile2:328: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/all' failed
make[2]: *** [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2
CMakeFiles/Makefile2:335: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/rule' failed
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
Makefile:240: recipe for target 'runtest' failed
make: *** [runtest] Error 2

Thanks for your help

@williford
Copy link
Contributor

Can you add your build configuration file?

@mikael10j
Copy link
Author

yes of course, I have followed this tutorial : https://github.com/BVLC/caffe/wiki/Ubuntu-16.04-or-15.10-Installation-Guide
Makefile.config.txt
Yesterday I've reinstalled every thing from the beginning to be sure nouveau was correctly disabled and I didn't miss anything. The first runtest passed but the second gave me CURAND_STATUS_ERROR (201) and the third the illegal memory access error. By the way, I've noticed that despite I've installed the 375.20 version of the nvidia driver, after installing cuda 8 the 367.57 version was installed and used.

@mikael10j
Copy link
Author

mikael10j commented Dec 10, 2016

The last days I did some experiments. I gave up with ubuntu 16.
I installed ubuntu 14.04 (with kernel 3.13) and cuda 8.0.44 (with driver 367.48).
Then I installed Caffe from the last master first and from nvidia then, with Digits. Both time :
runtest passed
cuda example nbody (with 256000 bodies) passed
training Alexnet fails after several minutes (~ 1 epoch) with error == cudaSuccess (77 vs. 0) an illegal memory access was encountered
then cuda nbody fails for n bodies > 5320 with same error

rebooting, even when unplugging the computer for a moment, doesn't change anything
reinstalling nvidia driver worked only once

Does anyone have an idea ? Could it be a hardware issue ?

Thanks for your help

@mikael10j
Copy link
Author

This is not a Caffe issue. The problem was my gpu card. I've changed it and so far this way no problem.

@zhonhel
Copy link

zhonhel commented Apr 22, 2018

having the same issue with an NVIDIA 1070

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants