Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example training on CIFAR10 fails to save .caffemodel file due to hdf5 warning #6916

Open
davidlee321 opened this issue Feb 27, 2020 · 2 comments

Comments

@davidlee321
Copy link

davidlee321 commented Feb 27, 2020

Issue

I received a warning while running caffe's example on trainig CIFAR10 (https://caffe.berkeleyvision.org/gathered/examples/cifar10.html). A hdf5 library warning occurs and the final caffemodel cifar10_quick_iter_5000.caffemodel.h5 is not saved.

Advice would be appreciated.

$ cd /path/to/caffe
$ ./examples/cifar10/train_quick.sh
...
...  training is OK ...
...
I0227 18:02:51.012318 20654 solver.cpp:258]     Train net output #0: loss = 0.418514 (* 1 = 0.418514 loss)
I0227 18:02:51.012327 20654 sgd_solver.cpp:112] Iteration 4800, lr = 0.0001
I0227 18:03:47.208993 20654 solver.cpp:239] Iteration 4900 (1.77949 iter/s, 56.196s/100 iters), loss = 0.474742
I0227 18:03:47.209089 20654 solver.cpp:258]     Train net output #0: loss = 0.474742 (* 1 = 0.474742 loss)
I0227 18:03:47.209096 20654 sgd_solver.cpp:112] Iteration 4900, lr = 0.0001
I0227 18:04:40.889462 20656 data_layer.cpp:73] Restarting data prefetching from start.
I0227 18:04:43.167384 20654 solver.cpp:474] Snapshotting to HDF5 file examples/cifar10/cifar10_quick_iter_5000.caffemodel.h5
Warning! ***HDF5 library version mismatched error***
The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
You can, at your own risk, disable this warning by setting the environment
variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'.
Setting it to 2 or higher will suppress the warning messages totally.
Headers are 1.10.2, library is 1.8.16
	    SUMMARY OF THE HDF5 CONFIGURATION
	    =================================

General Information:
-------------------
		   HDF5 Version: 1.8.16
		  Configured on: Tue Aug 28 18:26:31 UTC 2018
		  Configured by: buildd@lgw01-amd64-024
		 Configure mode: production
		    Host system: x86_64-pc-linux-gnu
	      Uname information: Linux lgw01-amd64-024 4.4.0-128-generic #154-Ubuntu SMP Fri May 25 14:15:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
		       Byte sex: little-endian
		      Libraries: static, shared
	     Installation point: /usr
		    Flavor name: serial

Compiling Options:
------------------
               Compilation Mode: production
                     C Compiler: /usr/bin/cc
                         CFLAGS: -g -O2 -fstack-protector-strong -Wformat -Werror=format-security
                      H5_CFLAGS: -std=c99 -pedantic -Wall -Wextra -Wundef -Wshadow -Wpointer-arith -Wbad-function-cast -Wcast-qual -Wcast-align -Wwrite-strings -Wconversion -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wnested-externs -Winline -Wfloat-equal -Wmissing-format-attribute -Wmissing-noreturn -Wpacked -Wdisabled-optimization -Wformat=2 -Wunreachable-code -Wendif-labels -Wdeclaration-after-statement -Wold-style-definition -Winvalid-pch -Wvariadic-macros -Winit-self -Wmissing-include-dirs -Wswitch-default -Wswitch-enum -Wunused-macros -Wunsafe-loop-optimizations -Wc++-compat -Wstrict-overflow -Wlogical-op -Wlarger-than=2048 -Wvla -Wsync-nand -Wframe-larger-than=16384 -Wpacked-bitfield-compat -Wstrict-overflow=5 -Wjump-misses-init -Wunsuffixed-float-constants -Wdouble-promotion -Wsuggest-attribute=const -Wtrampolines -Wstack-usage=8192 -Wvector-operation-performance -Wsuggest-attribute=pure -Wsuggest-attribute=noreturn -Wsuggest-attribute=format -Wdate-time -Wopenmp-simd -Warray-bounds=2 -Wc99-c11-compat -O3 -fstdarg-opt
                      AM_CFLAGS: 
                       CPPFLAGS: -Wdate-time -D_FORTIFY_SOURCE=2
                    H5_CPPFLAGS: -D_GNU_SOURCE -D_POSIX_C_SOURCE=200112L   -DNDEBUG -UH5_DEBUG_API
                    AM_CPPFLAGS: -D_LARGEFILE64_SOURCE -D_LARGEFILE_SOURCE 
               Shared C Library: yes
               Static C Library: yes
  Statically Linked Executables: no
                        LDFLAGS: -Wl,-Bsymbolic-functions -Wl,-z,relro
                     H5_LDFLAGS: -Wl,--version-script,$(top_srcdir)/debian/map_serial.ver
                     AM_LDFLAGS: 
 	 	Extra libraries: -lpthread -lsz -lz -ldl -lm 
 		       Archiver: ar
 		 	 Ranlib: x86_64-linux-gnu-ranlib
 	      Debugged Packages: 
		    API Tracing: no

Languages:
----------
                        Fortran: yes
               Fortran Compiler: /usr/bin/gfortran
          Fortran 2003 Compiler: yes
                  Fortran Flags: -g -O2 -fstack-protector-strong
               H5 Fortran Flags:  
               AM Fortran Flags: 
         Shared Fortran Library: yes
         Static Fortran Library: yes

                            C++: yes
                   C++ Compiler: /usr/bin/c++
                      C++ Flags: -g -O2 -fstack-protector-strong -Wformat -Werror=format-security
                   H5 C++ Flags:  
                   AM C++ Flags: 
             Shared C++ Library: yes
             Static C++ Library: yes

Features:
---------
                  Parallel HDF5: no
             High Level library: yes
                   Threadsafety: yes
            Default API Mapping: v18
 With Deprecated Public Symbols: yes
         I/O filters (external): deflate(zlib),szip(encoder)
                            MPE: no
                     Direct VFD: no
                        dmalloc: no
Clear file buffers before write: yes
           Using memory checker: no
         Function Stack Tracing: no
      Strict File Format Checks: no
   Optimization Instrumentation: no
Bye...
*** Aborted at 1582797883 (unix time) try "date -d @1582797883" if you are using GNU date ***
PC: @     0x7f41fffee428 gsignal
*** SIGABRT (@0x3e9000050ae) received by PID 20654 (TID 0x7f42019dfac0) from PID 20654; stack trace: ***
    @     0x7f41fffee4b0 (unknown)
    @     0x7f41fffee428 gsignal
    @     0x7f41ffff002a abort
    @     0x7f41ff2a1290 H5check_version
    @          0x110e850 (unknown)
Aborted (core dumped)
$ dpkg -l | grep hdf5

ii  hdf5-helpers                                  1.8.16+docs-4ubuntu1.1                                   amd64        Hierarchical Data Format 5 (HDF5) - Helper tools
ii  libhdf5-10:amd64                              1.8.16+docs-4ubuntu1.1                                   amd64        Hierarchical Data Format 5 (HDF5) - runtime files - serial version
ii  libhdf5-cpp-11:amd64                          1.8.16+docs-4ubuntu1.1                                   amd64        Hierarchical Data Format 5 (HDF5) - C++ libraries
ii  libhdf5-dev                                   1.8.16+docs-4ubuntu1.1                                   amd64        Hierarchical Data Format 5 (HDF5) - development files - serial version
ii  libhdf5-serial-dev                            1.8.16+docs-4ubuntu1.1                                   all          transitional dummy package

System configuration

  • Operating system: ubuntu 16.04
  • Compiler: not relevant here I think
  • CUDA version (if applicable): not using
  • CUDNN version (if applicable): not using
  • BLAS: not using
  • Python version (if using pycaffe): 2.7
  • MATLAB version (if using matcaffe): n/a

Issue checklist

  • [ X ] read the guidelines and removed the first paragraph
  • [ X ] written a short summary and detailed steps to reproduce
  • [ X ] explained how solutions to related problems failed (tick if found none)
  • [ X ] filled system configuration
  • [ X ] attached relevant logs/config files (tick if not applicable)
@liuhang20011
Copy link

meet the same problem, on macos, now I am looking for solution, emmm~~~~~

@liuhang20011
Copy link

liuhang20011 commented Jun 28, 2020

it seems like that the problem is solved on my mac:

I0628 17:37:28.845957 2692641664 solver.cpp:474] Snapshotting to HDF5 file examples/cifar10/cifar10_quick_iter_500.caffemodel.h5
Warning! HDF5 library version mismatched error
The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
'HDF5_DISABLE_VERSION_CHECK' environment variable is set to 1, application will
continue at your own risk.
Headers are 1.10.1, library is 1.10.4
SUMMARY OF THE HDF5 CONFIGURATION
=================================

General Information:

               HDF5 Version: 1.10.4
              Configured on: Wed Dec 19 12:34:42 CST 2018
              Configured by: root@bm-osx1010-01.corp.continuum.io
                Host system: x86_64-apple-darwin13.4.0
          Uname information: Darwin bm-osx1010-01.corp.continuum.io 14.5.0 Darwin Kernel Version 14.5.0: Sun Jun  4 21:40:08 PDT 2017; root:xnu-2782.70.3~1/RELEASE_X86_64 x86_64
                   Byte sex: little-endian
         Installation point: /Users/liuhang/opt/anaconda3

Compiling Options:

                 Build Mode: production
          Debugging Symbols: no
                    Asserts: no
                  Profiling: no
         Optimization Level: high

Linking Options:

                  Libraries: static, shared

Statically Linked Executables:
LDFLAGS: -Wl,-pie -Wl,-headerpad_max_install_names -Wl,-dead_strip_dylibs -Wl,-rpath,/Users/liuhang/opt/anaconda3/lib -L/Users/liuhang/opt/anaconda3/lib
H5_LDFLAGS: -Wl,-commons,use_dylibs
AM_LDFLAGS: -L/Users/liuhang/opt/anaconda3/lib
Extra libraries: -lpthread -lz -ldl -lm
Archiver: /opt/concourse/worker/volumes/live/536c0667-3227-4b1e-5aff-a4dbd3f89d2d/volume/hdf5_1545244225635/_build_env/bin/x86_64-apple-darwin13.4.0-ar
AR_FLAGS: cr
Ranlib: /opt/concourse/worker/volumes/live/536c0667-3227-4b1e-5aff-a4dbd3f89d2d/volume/hdf5_1545244225635/_build_env/bin/x86_64-apple-darwin13.4.0-ranlib

Languages:

                          C: yes
                 C Compiler: /opt/concourse/worker/volumes/live/536c0667-3227-4b1e-5aff-a4dbd3f89d2d/volume/hdf5_1545244225635/_build_env/bin/x86_64-apple-darwin13.4.0-clang
                   CPPFLAGS: -D_FORTIFY_SOURCE=2 -mmacosx-version-min=10.9
                H5_CPPFLAGS:   -DNDEBUG -UH5_DEBUG_API
                AM_CPPFLAGS:  -I/Users/liuhang/opt/anaconda3/include
                    C Flags: -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -I/Users/liuhang/opt/anaconda3/include -fdebug-prefix-map=${SRC_DIR}=/usr/local/src/conda/${PKG_NAME}-${PKG_VERSION} -fdebug-prefix-map=${PREFIX}=/usr/local/src/conda-prefix
                 H5 C Flags:
                 AM C Flags:
           Shared C Library: yes
           Static C Library: yes


                    Fortran: yes
           Fortran Compiler: /opt/concourse/worker/volumes/live/536c0667-3227-4b1e-5aff-a4dbd3f89d2d/volume/hdf5_1545244225635/_build_env/bin/x86_64-apple-darwin13.4.0-gfortran ( GNU Fortran (GCC) 4.8.5)
              Fortran Flags:
           H5 Fortran Flags:  -pedantic -Wall -Wextra -Wunderflow -Wimplicit-interface -Wsurprising -Wno-c-binding-type  -s -O2
           AM Fortran Flags:
     Shared Fortran Library: no
     Static Fortran Library: yes

                        C++: yes
               C++ Compiler: /opt/concourse/worker/volumes/live/536c0667-3227-4b1e-5aff-a4dbd3f89d2d/volume/hdf5_1545244225635/_build_env/bin/x86_64-apple-darwin13.4.0-clang++
                  C++ Flags: -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -stdlib=libc++ -fvisibility-inlines-hidden -std=c++14 -fmessage-length=0 -I/Users/liuhang/opt/anaconda3/include -fdebug-prefix-map=${SRC_DIR}=/usr/local/src/conda/${PKG_NAME}-${PKG_VERSION} -fdebug-prefix-map=${PREFIX}=/usr/local/src/conda-prefix
               H5 C++ Flags:
               AM C++ Flags:
         Shared C++ Library: yes
         Static C++ Library: yes

                       Java: no

Features:

               Parallel HDF5: no

Parallel Filtered Dataset Writes: no
Large Parallel I/O: no
High-level library: yes
Threadsafety: yes
Default API mapping: v110
With deprecated public symbols: yes
I/O filters (external): deflate(zlib)
MPE: no
Direct VFD: no
dmalloc: no
Packages w/ extra debug output: none
API tracing: no
Using memory checker: yes
Memory allocation sanity checks: no
Metadata trace file: no
Function stack tracing: no
Strict file format checks: no
Optimization instrumentation: no
I0628 17:37:28.849007 2692641664 sgd_solver.cpp:296] Snapshotting solver state to HDF5 file examples/cifar10/cifar10_quick_iter_500.solverstate.h5
I0628 17:37:28.949931 2692641664 solver.cpp:327] Iteration 500, loss = 1.23881
I0628 17:37:28.949965 2692641664 solver.cpp:347] Iteration 500, Testing net (#0)
I0628 17:37:38.768250 73977856 data_layer.cpp:73] Restarting data prefetching from start.
I0628 17:37:39.181810 2692641664 solver.cpp:414] Test net output #0: accuracy = 0.5684
I0628 17:37:39.181847 2692641664 solver.cpp:414] Test net output #1: loss = 1.22117 (* 1 = 1.22117 loss)
I0628 17:37:39.181857 2692641664 solver.cpp:332] Optimization Done.
I0628 17:37:39.181864 2692641664 caffe.cpp:250] Optimization Done.

there are multiple version of hdf5 on my mac, from homebrew, macport and anaconda. so I remove the hdf5 of macport, and update the version of hdf5 on homebrew(the old version on my mac is 1.8), as you see, the version of header is 1.10.1, it is from homebrew, and the libs is 1.10.4 and it is from anaconda. although the sub version is not same, but anyway, it still run at last.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants