Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convolution error #15

Closed
RutenburgIG opened this issue Oct 19, 2017 · 2 comments
Closed

Convolution error #15

RutenburgIG opened this issue Oct 19, 2017 · 2 comments

Comments

@RutenburgIG
Copy link

RutenburgIG commented Oct 19, 2017

Caffe compiled with:

cmake .. -DPROTOBUF_INCLUDE_DIR="/beegfs/120x/home/ilia/protobuf/include/" -DUSE_NCCL=True -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN="30 35 50 52 60 61 62 70" -DCUDA_ARCH_PTX="30 35 50 52 60 61 62 70" -DCUDA_NVCC_FLAGS=--Wno-deprecated-gpu-targets -Wno-dev

-- Boost version: 1.54.0
-- Found the following Boost libraries:
--   system
--   thread
--   filesystem
-- Found gflags  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libgflags.so)
-- Found glog    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libglog.so)
-- Found PROTOBUF Compiler: /beegfs/120x/home/ilia/protobuf/bin/protoc
-- Found lmdb    (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/liblmdb.so)
-- Found LevelDB (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libleveldb.so)
-- Found Snappy  (include: /usr/include, library: /usr/lib/libsnappy.so)
-- Found JPEGTurbo: /usr/include
-- CUDA detected: 9.0
-- Found CUDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so (found version "7.0")
-- Added CUDA NVCC flags for: sm_30 sm_35 sm_50 sm_52 sm_60 sm_61 sm_62 sm_70 compute_30 compute_35 compute_50 compute_52 compute_60 compute_61 compute_62 compute_70
-- Found OpenCV 2.x: /usr/share/OpenCV
-- Found Atlas: /usr/include
-- Found Atlas (include: /usr/include, library: /usr/lib/libatlas.so)
-- Found PythonInterp: /beegfs/120x/home/ilia/nvcaffe_comp/bin/python2.7 (found suitable version "2.7.6", minimum required is "2.7")
-- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython2.7.so (found suitable version "2.7.6", minimum required is "2.7")
-- Found NumPy: /beegfs/120x/home/ilia/nvcaffe_comp/local/lib/python2.7/site-packages/numpy/core/include (found suitable version "1.13.1", minimum required is "1.7.1")
-- NumPy ver. 1.13.1 found (include: /beegfs/120x/home/ilia/nvcaffe_comp/local/lib/python2.7/site-packages/numpy/core/include)
-- Boost version: 1.54.0
-- Found the following Boost libraries:
--   python
-- Could NOT find Doxygen (missing:  DOXYGEN_EXECUTABLE)
-- Found NCCL: /usr/include
-- Found NCCL (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libnccl.so)
-- Found NVML: /usr/include
-- Found NVML (include: /usr/include, library: /usr/lib/nvidia-384/libnvidia-ml.so)
-- Found Git: /usr/bin/git (found version "1.9.1")
--
-- ******************* Caffe Configuration Summary *******************
-- General:
--   Version           :   0.16.4
--   Git               :   v0.16.1-404-g860701c
--   System            :   Linux
--   C++ compiler      :   /usr/bin/c++
--   Release CXX flags :   -O3 -DNDEBUG -fPIC -Wall -std=c++11 -Wno-sign-compare -Wno-uninitialized
--   Debug CXX flags   :   -g -DDEBUG -fPIC -Wall -std=c++11 -Wno-sign-compare -Wno-uninitialized
--   Build type        :   Release
--
--   BUILD_SHARED_LIBS :   ON
--   BUILD_python      :   ON
--   BUILD_matlab      :   OFF
--   BUILD_docs        :   ON
--   CPU_ONLY          :   OFF
--   USE_LEVELDB       :   ON
--   USE_LMDB          :   ON
--   ALLOW_LMDB_NOLOCK :   OFF
--   TEST_FP16         :   OFF
--
-- Dependencies:
--   BLAS              :   Yes (Atlas)
--   Boost             :   Yes (ver. 1.54)
--   glog              :   Yes
--   gflags            :   Yes
--   protobuf          :   Yes (ver. 3.4.0)
--   lmdb              :   Yes (ver. 0.9.10)
--   LevelDB           :   Yes (ver. 1.15)
--   Snappy            :   Yes (ver. 1.1.0)
--   OpenCV            :   Yes (ver. 2.4.8)
--   JPEGTurbo         :   No
--   CUDA              :   Yes (ver. 9.0)
--
-- NVIDIA CUDA:
--   Target GPU(s)     :   Manual
--   GPU arch(s)       :   sm_30 sm_35 sm_50 sm_52 sm_60 sm_61 sm_62 sm_70 compute_30 compute_35 compute_50 compute_52 compute_60 compute_61 compute_62 compute_70
--   cuDNN             :   Yes (ver. 7.0)
--   NCCL              :   Yes (ver. 2.0.5)
--   NVML              :   /usr/lib/nvidia-384/libnvidia-ml.so
--
-- Python:
--   Interpreter       :   /beegfs/120x/home/ilia/nvcaffe_comp/bin/python2.7 (ver. 2.7.6)
--   Libraries         :   /usr/lib/x86_64-linux-gnu/libpython2.7.so (ver 2.7.6)
--   NumPy             :   /beegfs/120x/home/ilia/nvcaffe_comp/local/lib/python2.7/site-packages/numpy/core/include (ver 1.13.1)
--
-- Documentaion:
--   Doxygen           :   No
--   config_file       :
--
-- Install:
--   Install path      :   /beegfs/120x/home/ilia/caffe_builds/nvc/build/install
--
-- Configuring done
-- Generating done
-- Build files have been written to: /beegfs/120x/home/ilia/caffe_builds/nvc/build

When I tried to run training process with:
./build/tools/caffe train -solver='solver.prototxt'

I got following error:

I1019 15:17:12.441572 108568 solver.cpp:315] Iteration 0 (0.371277 s), loss = 1383.36
I1019 15:17:12.441620 108568 solver.cpp:332]     Train net output #0: loss_bbox = 8.39254e-06 (* 100 = 0.000839254 loss)
I1019 15:17:12.441634 108568 solver.cpp:332]     Train net output #1: loss_cls = 2.47564 (* 500 = 1237.82 loss)
I1019 15:17:12.441706 108568 solver.cpp:332]     Train net output #2: rpn_cls_loss = 0.693479 (* 100 = 69.3479 loss)
I1019 15:17:12.441738 108568 solver.cpp:332]     Train net output #3: rpn_loss_bbox = 0.93409 (* 100 = 93.409 loss)
I1019 15:17:12.441750 108568 sgd_solver.cpp:136] Iteration 0, lr = 5e-05, m = 0.5

*** Aborted at 1508415432 (unix time) try "date -d @1508415432" if you are using GNU date ***
PC: @     0x7f8922466b8d caffe::CuDNNConvolutionLayer<>::FindExConvAlgo()
*** SIGSEGV (@0x0) received by PID 108568 (TID 0x7f89242d4900) from PID 0; stack trace: ***
    @     0x7f8920263cb0 (unknown)
    @     0x7f8922466b8d caffe::CuDNNConvolutionLayer<>::FindExConvAlgo()
    @     0x7f892248b9f0 caffe::CuDNNConvolutionLayer<>::Reshape()
    @     0x7f89223a7d0b caffe::Layer<>::Forward()
    @     0x7f892261da3b caffe::Net::ForwardFromTo()
    @     0x7f892261db97 caffe::Net::Forward()
    @     0x7f8922620325 caffe::Net::ForwardBackward()
    @     0x7f8922630652 caffe::Solver::Step()
    @     0x7f8922631395 caffe::Solver::Solve()
    @           0x40d9e8 train()
    @           0x40ae18 main
    @     0x7f892024ef45 (unknown)
    @           0x40b6fb (unknown)
    @                0x0 (unknown)
@drnikolaev
Copy link
Owner

hi @RutenburgIG thanks,
may I have your prototxt files please?

@drnikolaev
Copy link
Owner

Fixed in upcoming release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants