Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ubuntu16.04 runtest failed #4235

Closed
auroua opened this issue May 30, 2016 · 9 comments
Closed

ubuntu16.04 runtest failed #4235

auroua opened this issue May 30, 2016 · 9 comments

Comments

@auroua
Copy link

auroua commented May 30, 2016

I have finished make all & make test, but have error when make runtest, the error trace:

[----------] 9 tests from AdaGradSolverTest/2, where TypeParam = caffe::GPUDevice<float>
[ RUN      ] AdaGradSolverTest/2.TestAdaGradLeastSquaresUpdateWithWeightDecay
*** Aborted at 1464576931 (unix time) try "date -d @1464576931" if you are using GNU date ***
PC: @     0x7f99aa977143 (unknown)
*** SIGSEGV (@0x706d742f) received by PID 12752 (TID 0x7f99b1f20740) from PID 1886221359; stack trace: ***
    @     0x7f99a8de43d0 (unknown)
    @     0x7f99aa977143 (unknown)
    @           0x480463 caffe::MakeTempDir()
    @           0x49444a caffe::GradientBasedSolverTest<>::RunLeastSquaresSolver()
    @           0x4a533d caffe::GradientBasedSolverTest<>::TestLeastSquaresUpdate()
    @           0x893d13 testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0x88a9f7 testing::Test::Run()
    @           0x88aa9e testing::TestInfo::Run()
    @           0x88aba5 testing::TestCase::Run()
    @           0x88dee8 testing::internal::UnitTestImpl::RunAllTests()
    @           0x88e177 testing::UnitTest::Run()
    @           0x46215f main
    @     0x7f99a8a2a830 (unknown)
    @           0x469649 _start
Makefile:523: recipe for target 'runtest' failed
make: *** [runtest] Segmentation fault (core dumped)

I have only one gpu nvs5400m and one cpu
how could I solve this problem

@seanbell
Copy link

It looks like the crash happened in MakeTempDir:

inline void MakeTempDir(string* temp_dirname) {

You could try running with debug mode (end of the Makefile), to get a better stack trace.

From https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:

When reporting a bug, it's most helpful to provide the following information, where applicable:

  • What steps reproduce the bug?
  • Can you reproduce the bug using the latest master, compiled with the DEBUG make option?
  • What hardware and operating system/distribution are you running?
  • If the bug is a crash, provide the backtrace (usually printed by Caffe; always obtainable with gdb).

@jaredstarkey
Copy link

I am also having some issues in 16.04 that are related to HDF5. I'm not sure if they're related. The comments I'm seeing are related to HDF5 threadsafe builds. Since I'm just using the package from the repo, I'm a little in the dark.

Caffe was built with defaults using [make all] [make test]. HOWEVER... I did have to modify the Makefile to get it to build....

LIBRARIES += glog gflags protobuf boost_system boost_filesystem m hdf5_serial_hl hdf5_serial

user@system:~/proj/caffe/build/test$ sudo apt-get install libhdf5-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libhdf5-dev is already the newest version (1.8.16+docs-4ubuntu1).
0 upgraded, 0 newly installed, 0 to remove and 21 not upgraded.

user@system:~/proj/caffe$ sudo apt-get install libhdf5-serial-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libhdf5-serial-dev is already the newest version (1.8.16+docs-4ubuntu1).
0 upgraded, 0 newly installed, 0 to remove and 21 not upgraded.

user@system:~/proj/caffe/build/test$ ./test_hdf5data_layer.testbin
Cuda number of devices: 1
Current device id: 0
Current device name: GeForce GTX 980
[==========] Running 4 tests from 4 test cases.
[----------] Global test environment set-up.
[----------] 1 test from HDF5DataLayerTest/0, where TypeParam = caffe::CPUDevice
[ RUN ] HDF5DataLayerTest/0.TestRead
F0607 09:15:14.416803 3585 hdf5_data_layer.cpp:88] Failed to open source file: src/caffe/test/test_data/sample_data_list.txt
*** Check failure stack trace: ***
@ 0x7fa9d0ae45cd google::LogMessage::Fail()
@ 0x7fa9d0ae6433 google::LogMessage::SendToLog()
@ 0x7fa9d0ae415b google::LogMessage::Flush()
@ 0x7fa9d0ae6e1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7fa9cf9c9c05 caffe::HDF5DataLayer<>::LayerSetUp()
@ 0x41080b caffe::HDF5DataLayerTest_TestRead_Test<>::TestBody()
@ 0x431413 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x42a52a testing::Test::Run()
@ 0x42a678 testing::TestInfo::Run()
@ 0x42a755 testing::TestCase::Run()
@ 0x42ba2f testing::internal::UnitTestImpl::RunAllTests()
@ 0x42bd53 testing::UnitTest::Run()
@ 0x408f6d main
@ 0x7fa9cef9a830 __libc_start_main
@ 0x409359 _start
@ (nil) (unknown)
Aborted (core dumped)

I'm also having some errors with GPU extensions, but am on nvidia-367 from the driver ppa to support a GTX 1080. I have a GTX 980 that I'm using to rule out potential issues with the card, CUDA api dev issues, and the upstream drivers.

Please let me know if I can provide any additional details to help with debugging or troubleshooting.

@jaredstarkey
Copy link

Also relevant:

user@system:~/proj/caffe/src/caffe/test/test_data$ ls -al
total 60
drwxrwxr-x 2 user user 4096 Jun 6 15:31 .
drwxrwxr-x 3 user user 4096 Jun 2 14:57 ..
-rw-rw-r-- 1 user user 2104 Jun 2 14:57 generate_sample_data.py
-rw-rw-r-- 1 user user 15446 Jun 2 14:57 sample_data_2_gzip.h5
-rw-rw-r-- 1 user user 11824 Jun 2 14:57 sample_data.h5
-rw-rw-r-- 1 user user 87 Jun 2 14:57 sample_data_list.txt
-rw-rw-r-- 1 user user 11776 Jun 2 14:57 solver_data.h5
-rw-rw-r-- 1 user user 40 Jun 2 14:57 solver_data_list.txt

@jayinai
Copy link

jayinai commented Jun 25, 2016

@jaredstarkey hey I am also running Ubuntu 16.04 with GTX 1080. Did you solve the problem? I am having the exact same problem as yours :(

@jaredstarkey
Copy link

jaredstarkey commented Jun 28, 2016 via email

@jayinai
Copy link

jayinai commented Jun 28, 2016

@jaredstarkey hey I solved the problem by uninstalling driver v.361

sudo apt-get remove --purge nvidia-361

Now the runtest can pass. I have to remake them after I reboot, though. There must be a way such that we don't have to do this but I barely shut down the server so I am not motivated to find the solution :p

@Solomon1588
Copy link

Solomon1588 commented Jul 22, 2016

I've met this problem and I find out the reason:
The libraries in Ubuntu 15.10,16.04 repository are compiled with GCC 5.2. If you compiled CUDA and Caffe with other lower veision(e.g. GCC 4.9),then the linked libraries are incompatible.It can lead to the phenomenon that the caffe is compiled successfully but throw a runtime error.
Yon can recompile CUDA and Caffe with GCC5.x or build protobuf and possibly also all other system packages that export c++ functions containing std::string, std::vector, etc.with GCC 4.9 (eg. glog, gflags, boost .etc).
You can step into my post and learn more information.
Hope to help you :D

@pythonanonuser
Copy link

I'm not using CUDA on ubuntu 16.04 (CPU_ONLY mode) but still having the same issue. Any idea?

@shelhamer
Copy link
Member

From https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:

Please do not post usage, installation, or modeling questions, or other requests for help to Issues.
Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants