Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Seg Fault on MNIST example - C++ #9018

Closed
Austin-Bolesta opened this issue Dec 10, 2017 · 2 comments
Closed

Seg Fault on MNIST example - C++ #9018

Austin-Bolesta opened this issue Dec 10, 2017 · 2 comments
Labels
C++ Related to C++ Example

Comments

@Austin-Bolesta
Copy link

Austin-Bolesta commented Dec 10, 2017

Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.

For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io

Description

I'm trying to run the Mnist example from the cpp package and although it compiles fine it runs ino this issue on the line

train_iter.Reset();

I followed the debugger to where the issue was occurring and it happens in c_api.cc on line 581

int MXDataIterBeforeFirst(DataIterHandle handle) { //debugger says handle = 0 here API_BEGIN(); static_cast<IIterator<DataBatch>* >(handle)->BeforeFirst(); //SEGFAULT HERE API_END(); }
also the value for handle is 0. I've tried tracking down the problem, but haven't been able to find why this is happening.

Environment info (Required)

Ubuntu 16.04
cuda 8.0 and cudnn 6 on GTX 1080

What to do:
1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
2. Run the script using `python diagnose.py` and paste its output here.

----------Python Info----------
('Version      :', '2.7.12')
('Compiler     :', 'GCC 5.4.0 20160609')
('Build        :', ('default', 'Nov 20 2017 18:23:56'))
('Arch         :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version      :', '9.0.1')
('Directory    :', '/home/oem/.local/lib/python2.7/site-packages/pip')
----------MXNet Info-----------
('Version      :', '0.11.0')
('Directory    :', '/usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet')
Traceback (most recent call last):
  File "/home/oem/programming/diagnose.py", line 86, in check_mxnet
    with open(commit_hash, 'r') as f:
IOError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/COMMIT_HASH'

----------System Info----------
('Platform     :', 'Linux-4.10.0-40-generic-x86_64-with-Ubuntu-16.04-xenial')
('system       :', 'Linux')
('node         :', 'Austin-System')
('release      :', '4.10.0-40-generic')
('version      :', '#44~16.04.1-Ubuntu SMP Thu Nov 9 15:37:44 UTC 2017')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             1
NUMA node(s):          1
Vendor ID:             AuthenticAMD
CPU family:            23
Model:                 1
Model name:            AMD Ryzen 7 1800X Eight-Core Processor
Stepping:              1
CPU MHz:               2200.000
CPU max MHz:           3600.0000
CPU min MHz:           2200.0000
BogoMIPS:              7186.06
Virtualization:        AMD-V
L1d cache:             32K
L1i cache:             64K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0-15
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov succor smca
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0131 sec, LOAD: 0.7510 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0154 sec, LOAD: 0.1057 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0251 sec, LOAD: 0.3693 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0158 sec, LOAD: 0.2202 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0509 sec, LOAD: 0.2075 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0933 sec, LOAD: 0.4626 sec.
[Finished in 3.0s]

Package used (Python/R/Scala/Julia):
cpp-package

For Scala user, please provide:
1. Java version: (`java -version`)
2. Maven version: (`mvn -version`)
3. Scala runtime if applicable: (`scala -version`)

For R user, please provide R `sessionInfo()`:

## Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio): g++ 5.4

MXNet commit hash:
(Paste the output of `git rev-parse HEAD` here.)

Build config:

#-------------------------------------------------------------------------------
#  Template configuration for compiling mxnet
#
#  If you want to change the configuration, please use the following
#  steps. Assume you are on the root directory of mxnet. First copy the this
#  file so that any local changes will be ignored by git
#
#  $ cp make/config.mk .
#
#  Next modify the according entries, and then compile by
#
#  $ make
#
#  or build in parallel with 8 threads
#
#  $ make -j8
#-------------------------------------------------------------------------------

#---------------------
# choice of compiler
#--------------------

export CC = gcc
export CXX = g++
export NVCC = nvcc

# whether compile with options for MXNet developer
DEV = 0

# whether compile with debug
DEBUG = 1

# whether compiler with profiler
USE_PROFILER =

# the additional link flags you want to add
ADD_LDFLAGS =

# the additional compile flags you want to add
ADD_CFLAGS =

#---------------------------------------------
# matrix computation libraries for CPU/GPU
#---------------------------------------------

# whether use CUDA during compile
USE_CUDA = 1

# add the path to CUDA library to link and compile flag
# if you have already add them to environment variable, leave it as NONE
# USE_CUDA_PATH = /usr/local/cuda
USE_CUDA_PATH = /usr/local/cuda-8.0

# whether use CuDNN R3 library
USE_CUDNN = 1

# whether use cuda runtime compiling for writing kernels in native language (i.e. Python)
USE_NVRTC = 0

# whether use opencv during compilation
# you can disable it, however, you will not able to use
# imbin iterator
USE_OPENCV = 1

# use openmp for parallelization
USE_OPENMP = 1

# MKL ML Library for Intel CPU/Xeon Phi
# Please refer to MKL_README.md for details

# MKL ML Library folder, need to be root for /usr/local
# Change to User Home directory for standard user
# For USE_BLAS!=mkl only
MKLML_ROOT=/usr/local

# whether use MKL2017 library
USE_MKL2017 = 0

# whether use MKL2017 experimental feature for high performance
# Prerequisite USE_MKL2017=1
USE_MKL2017_EXPERIMENTAL = 0

# whether use NNPACK library
USE_NNPACK = 0

# choose the version of blas you want to use
# can be: mkl, blas, atlas, openblas
# in default use atlas for linux while apple for osx
UNAME_S := $(shell uname -s)
ifeq ($(UNAME_S), Darwin)
USE_BLAS = apple
else
USE_BLAS = atlas
endif

# whether use lapack during compilation
# only effective when compiled with blas versions openblas/apple/atlas/mkl
USE_LAPACK = 1

# path to lapack library in case of a non-standard installation
USE_LAPACK_PATH =

# add path to intel library, you may need it for MKL, if you did not add the path
# to environment variable
USE_INTEL_PATH = NONE

# If use MKL only for BLAS, choose static link automatically to allow python wrapper
ifeq ($(USE_MKL2017), 0)
ifeq ($(USE_BLAS), mkl)
USE_STATIC_MKL = 1
endif
else
USE_STATIC_MKL = NONE
endif

#----------------------------
# Settings for power and arm arch
#----------------------------
ARCH := $(shell uname -a)
ifneq (,$(filter $(ARCH), armv6l armv7l powerpc64le ppc64le aarch64))
	USE_SSE=0
else
	USE_SSE=1
endif

#----------------------------
# distributed computing
#----------------------------

# whether or not to enable multi-machine supporting
USE_DIST_KVSTORE = 0

# whether or not allow to read and write HDFS directly. If yes, then hadoop is
# required
USE_HDFS = 0

# path to libjvm.so. required if USE_HDFS=1
LIBJVM=$(JAVA_HOME)/jre/lib/amd64/server

# whether or not allow to read and write AWS S3 directly. If yes, then
# libcurl4-openssl-dev is required, it can be installed on Ubuntu by
# sudo apt-get install -y libcurl4-openssl-dev
USE_S3 = 0

#----------------------------
# additional operators
#----------------------------

# path to folders containing projects specific operators that you don't want to put in src/operators
EXTRA_OPERATORS =

#----------------------------
# other features
#----------------------------

# Create C++ interface package
USE_CPP_PACKAGE = 1

#----------------------------
# plugins
#----------------------------

# whether to use caffe integration. This requires installing caffe.
# You also need to add CAFFE_PATH/build/lib to your LD_LIBRARY_PATH
# CAFFE_PATH = $(HOME)/caffe
# MXNET_PLUGINS += plugin/caffe/caffe.mk

# whether to use torch integration. This requires installing torch.
# You also need to add TORCH_PATH/install/lib to your LD_LIBRARY_PATH
# TORCH_PATH = $(HOME)/torch
# MXNET_PLUGINS += plugin/torch/torch.mk

# WARPCTC_PATH = $(HOME)/warp-ctc
# MXNET_PLUGINS += plugin/warpctc/warpctc.mk

# whether to use sframe integration. This requires build sframe
# git@github.com:dato-code/SFrame.git
# SFRAME_PATH = $(HOME)/SFrame
# MXNET_PLUGINS += plugin/sframe/plugin.mk


## Error Message:
SEVSEGV

1 MXDataIterBeforeFirst               c_api.cc 581 0x7fffcf3f8757 
2 mxnet::cpp::MXDataIter::BeforeFirst io.hpp   46  0x42ab6a       
3 mxnet::cpp::DataIter::Reset         io.h     61  0x42a3d3       
4 main                                main.cpp 93  0x41b58b       


## Minimum reproducible example
(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)

## Steps to reproduce
(Paste the commands you ran that produced the error.)

1.
2.

## What have you tried to solve it?

1.
2.
@anirudh2290
Copy link
Member

@marcoabreu

@marcoabreu marcoabreu added C++ Related to C++ Example and removed CPP package labels Jul 17, 2018
@ThomasDelteil
Copy link
Contributor

ThomasDelteil commented Jul 30, 2018

@Austin-Bolesta the CPP examples were in bad shape and have all been fixed and are now continuously tested.

~/incubator-mxnet$ make -j $(nproc) USE_OPENCV=1 USE_BLAS=openblas USE_CPP_PACKAGE=1
~/incubator-mxnet$ cd cpp-package/example
~/incubator-mxnet/cpp-package/example$ make all
~/incubator-mxnet/cpp-package/example$ cp ../../lib/libmxnet.so .

~/incubator-mxnet/cpp-package/example$ ./mlp_cpu
[21:45:19] src/io/iter_mnist.cc:110: MNISTIter: load 60000 images, shuffle=1, shape=(100,784)
[21:45:19] src/io/iter_mnist.cc:110: MNISTIter: load 10000 images, shuffle=1, shape=(100,784)
[21:45:20] mlp_cpu.cpp:134: Epoch: 0 121704 samples/sec Accuracy: 0.1135
[21:45:21] mlp_cpu.cpp:134: Epoch: 1 125000 samples/sec Accuracy: 0.536
[21:45:21] mlp_cpu.cpp:134: Epoch: 2 133929 samples/sec Accuracy: 0.8278
[21:45:22] mlp_cpu.cpp:134: Epoch: 3 134228 samples/sec Accuracy: 0.8726
[21:45:22] mlp_cpu.cpp:134: Epoch: 4 131579 samples/sec Accuracy: 0.9041
[21:45:23] mlp_cpu.cpp:134: Epoch: 5 131291 samples/sec Accuracy: 0.9163
[21:45:23] mlp_cpu.cpp:134: Epoch: 6 131291 samples/sec Accuracy: 0.9226
[21:45:24] mlp_cpu.cpp:134: Epoch: 7 131004 samples/sec Accuracy: 0.9291
[21:45:24] mlp_cpu.cpp:134: Epoch: 8 131004 samples/sec Accuracy: 0.9334
[21:45:25] mlp_cpu.cpp:134: Epoch: 9 131579 samples/sec Accuracy: 0.937

@marcoabreu @indhub @sandeep-krishnamurthy can you please close?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C++ Related to C++ Example
Projects
None yet
Development

No branches or pull requests

5 participants