Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert_imageset.exe crash when processing 17K+ images #5348

Closed
GaryKT opened this issue Mar 2, 2017 · 27 comments
Closed

convert_imageset.exe crash when processing 17K+ images #5348

GaryKT opened this issue Mar 2, 2017 · 27 comments

Comments

@GaryKT
Copy link

GaryKT commented Mar 2, 2017

Issue summary

Convert imageset crashes on workstation when trying to train more than 17K images...
There seems to be no error handling. The system has 64GB RAM and 8GB GPU.
Is this related to memory management? Is there way to have imrpove memory management so larger image datasets can be processed with convert_imageset?

convert_imageset.exe --resize_height=200 --resize_width=200 --shuffle D:\PROGRAMMING\caffe\datasets\train2.txt jim_lmbd_full2
I0302 16:35:15.431205 39444 convert_imageset.cpp:86] Shuffling data
I0302 16:35:15.934109 39444 common.cpp:36] System entropy source not available, using fallback algorithm to generate seed instead.
I0302 16:35:15.934109 39444 common.cpp:36] System entropy source not available, using fallback algorithm to generate seed instead.
I0302 16:35:15.938120 39444 convert_imageset.cpp:89] A total of 82081 images.
I0302 16:35:15.940126 39444 db_lmdb.cpp:40] Opened lmdb jim_lmbd_full2
I0302 16:35:45.373826 39444 convert_imageset.cpp:147] Processed 1000 files.
I0302 16:36:13.004973 39444 convert_imageset.cpp:147] Processed 2000 files.
I0302 16:36:40.344454 39444 convert_imageset.cpp:147] Processed 3000 files.
I0302 16:37:08.040336 39444 convert_imageset.cpp:147] Processed 4000 files.
I0302 16:37:34.534848 39444 convert_imageset.cpp:147] Processed 5000 files.
I0302 16:38:03.082845 39444 convert_imageset.cpp:147] Processed 6000 files.
I0302 16:38:30.661777 39444 convert_imageset.cpp:147] Processed 7000 files.
I0302 16:38:58.234722 39444 convert_imageset.cpp:147] Processed 8000 files.
I0302 16:39:28.175652 39444 convert_imageset.cpp:147] Processed 9000 files.
I0302 16:39:55.643424 39444 convert_imageset.cpp:147] Processed 10000 files.
I0302 16:40:22.398650 39444 convert_imageset.cpp:147] Processed 11000 files.
I0302 16:40:51.848107 39444 convert_imageset.cpp:147] Processed 12000 files.
I0302 16:41:21.773030 39444 convert_imageset.cpp:147] Processed 13000 files.
I0302 16:41:50.492295 39444 convert_imageset.cpp:147] Processed 14000 files.
I0302 16:42:15.996433 39444 convert_imageset.cpp:147] Processed 15000 files.
I0302 16:42:42.570592 39444 convert_imageset.cpp:147] Processed 16000 files.
I0302 16:43:09.036027 39444 convert_imageset.cpp:147] Processed 17000 files.
F0302 16:43:32.926897 39444 db_lmdb.hpp:15] Check failed: mdb_status == 0 (87 vs. 0) The parameter is incorrect.**
*** Check failure stack trace: *

Steps to reproduce

convert_imageset.exe --resize_height=200 --resize_width=200 --shuffle D:\PROGRAMMING\caffe\datasets\train2.txt jim_lmbd_full2
with more than 17000 or so images.

Your system configuration

Operating system: Windows 10
Compiler: MS Visual Studio 2015
CUDA version (if applicable): 8
CUDNN version (if applicable): 5.1
BLAS: ?
Python or MATLAB version (for pycaffe and matcaffe respectively): Python 3.5.2

@caseyanya
Copy link

I also have the same issue, my convert_imageset.exe crashes at lmdb=2GB
I'm also
Operating system: Windows 10
Compiler: MS Visual Studio 2015
CUDA version (if applicable): 8
CUDNN version (if applicable): 5.1

@mblokker
Copy link

mblokker commented Mar 3, 2017

Same here, compiled on Windows 10, MS Visual Studio 2015, CPU only

@willyd
Copy link
Contributor

willyd commented Mar 3, 2017

Yes this is a LMDB issue. I fixed it on the latest caffe-builder master but haven't made a new release yet. You can build the dependencies from source using caffe-builder master. You can build all libraries (this is the default) or just lmdb and replace the lib, cmake file and header in the downloaded libraries.

To build only LMDB use -DCB_BUILD_ALL=OFF and -DBUILD_LMDB=ON.

Alternatively use LevelDB.

@caseyanya
Copy link

@willyd
Thanks for your fast reply.
I'm already using caffe-builder for my dependencies.
I first build caffe-builder using build_v140_x64.cmd, then I build caffe-windows with modified
CMakeLists.txt & build_win.cmd
cmake -G"!CMAKE_GENERATOR!" ^
-C %caffe-builder-root\build_v140_x64\libraries\caffe-build-config.cmake ^

Is there anything that I'm missing?
Sorry for bothering you so much.

@GaryKT
Copy link
Author

GaryKT commented Mar 3, 2017

@willyd
Thanks for your reply.
Could you briefly explain where to get your fix? This was a new build that I pulled a few days ago with Visual Studio 15.
Or do we need to get files from a seperate repository?

If yes, could you paste the URL? Thanks so much and sorry for not fully understanding your instructions.

@caseyanya
Copy link

@GaryKT
I think what he mentioned is this.
https://github.com/willyd/caffe-builder
But I still getting the same error, maybe I'm not using it in the right way.

@caseyanya
Copy link

@willyd
Thanks a lot. After a few checks, I successfully build caffe with caffe-builder.
I didn't use the "caffe-builder-config.cmake" correctly.

@caseyanya
Copy link

@GaryKT
I'm still a little confused about what to do is correct, here are my steps:

cd path_to_caffe-builder-root
git clone https://github.com/willyd/caffe-builder.git caffe-builder
build_v140_x64.cmd
cd path_to_caffe_root
git clone https://github.com/BVLC/caffe.git
cd caffe
git checkout windows
Edit files build_win.cmd & CMakeLists.txt
scripts\build_win.cmd
Then you can test convert_imageset.exe

Edit information:
build_win.cmd
if NOT DEFINED MSVC_VERSION set MSVC_VERSION=14
if NOT DEFINED WITH_NINJA set WITH_NINJA=1
if NOT DEFINED CPU_ONLY set CPU_ONLY=0
if NOT DEFINED CMAKE_CONFIG set CMAKE_CONFIG=Release
if NOT DEFINED USE_NCCL set USE_NCCL=0
if NOT DEFINED CMAKE_BUILD_SHARED_LIBS set CMAKE_BUILD_SHARED_LIBS=1
if NOT DEFINED PYTHON_VERSION set PYTHON_VERSION=2
if NOT DEFINED BUILD_PYTHON set BUILD_PYTHON=1
if NOT DEFINED BUILD_PYTHON_LAYER set BUILD_PYTHON_LAYER=1
if NOT DEFINED BUILD_MATLAB set BUILD_MATLAB=1
if NOT DEFINED PYTHON_EXE set PYTHON_EXE=python
if NOT DEFINED RUN_TESTS set RUN_TESTS=0
if NOT DEFINED RUN_LINT set RUN_LINT=1
if NOT DEFINED RUN_INSTALL set RUN_INSTALL=1

set CONDA_ROOT=your_conda_root

cmake -G"!CMAKE_GENERATOR!" ^
-DCUDNN_ROOT=your_cudnn_root ^
-C "%caffe-builder-root%\build_v140_x64\libraries\caffe-builder-config.cmake" ^

While running build_win.cmd, make sure it loads the caffe-builder-config.cmake file, this is where I didn't notice at the beginning.

Good luck !!!

@willyd
Copy link
Contributor

willyd commented Mar 3, 2017

I am referring to this commit: willyd/caffe-builder@1129bc6

Or more specifically to this part of the patch: https://github.com/willyd/caffe-builder/blob/master/packages/lmdb/lmdb_45a88275d2a410e683bae4ef44881e0f55fa3c4d.patch#L305-L308

If already building your dependencies from source just delete the build/packages/lmdb folder in caffe-builder and build again with ninja, it should download the appropriate sources and apply the patch.

The correct (undocumented) usage of caffe-builder dependencies if to set USE_PREBUILT_DEPENDENCIES=OFF and use the config.cmake file as cache init as suggested by @caseyanya. I would gladly accept a PR that documents this!

@GaryKT
Copy link
Author

GaryKT commented Mar 3, 2017

@caseyanya
Ok, we need Ninja for this right? No visual studio?

build_v140_x64.cmd
CMake Error: CMake was unable to find a build program corresponding to "Ninja". CMAKE_MAKE_PROGRAM is not set. You probably need to select a different build tool.
CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
-- Configuring incomplete, errors occurred!
The parameter is incorrect
CMake Error: Generator: execution of make failed. Make command was: ""

@caseyanya
Copy link

@GaryKT
Ninja for caffe-builder is for sure.
For caffe-windows I don't have an answer right now. I successfully built with Ninja half an hour ago.

@GaryKT
Copy link
Author

GaryKT commented Mar 3, 2017

@caseyanya
Thx. Is this the Ninja you are talking about: https://github.com/ninja-build/ninja/releases
Just download the binary? Any special instructions?
Thanks in advance.

@caseyanya
Copy link

@GaryKT
I'm not familiar with Ninja.
The one I use is v1.7.2 ninja-win.zip.
Just place the ninja.exe wherever you like and add it to PATH(Environment Variables).

@willyd
Copy link
Contributor

willyd commented Mar 3, 2017

@GaryKT Correct. I only support ninja since VS is prohibitively slow compared to ninja.

@caseyanya The caffe build supports VS and Ninja generators. WITH_NINJA in build_win.cmd controls if Ninja is used or VS. The VS generator has a limitation: it can't build a shared library with CUDA support.

@caseyanya
Copy link

@willyd
Thanks for all your answers and contributions.

@GaryKT
Copy link
Author

GaryKT commented Mar 3, 2017

@willyd
Thanks for your help.

Changed the settings in build_win.cmd to default to Ninja.
But it generates several errors (see below). Any idea what causes this?

D:\PROGRAMMING\caffe>scripts\build_win.cmd
The system cannot find the drive specified.
INFO: ============================================================
INFO: Summary:
INFO: ============================================================
INFO: MSVC_VERSION = 14
INFO: WITH_NINJA = 1
INFO: CMAKE_GENERATOR = "Ninja"
INFO: CPU_ONLY = 0
INFO: CMAKE_CONFIG = Release
INFO: USE_NCCL = 1
INFO: CMAKE_BUILD_SHARED_LIBS = 0
INFO: PYTHON_VERSION = 2
INFO: BUILD_PYTHON = 1
INFO: BUILD_PYTHON_LAYER = 1
INFO: BUILD_MATLAB = 1
INFO: PYTHON_EXE = "python"
INFO: RUN_TESTS = 0
INFO: RUN_LINT = 0
INFO: RUN_INSTALL = 0
INFO: ============================================================
The input line is too long.
The syntax of the command is incorrect.

@GaryKT
Copy link
Author

GaryKT commented Mar 3, 2017

With Ninja in the same folder:

D:\PROGRAMMING\caffe_builder\caffe-builder>build_v140_x64.cmd
The input line is too long.
The syntax of the command is incorrect.

@willyd
Copy link
Contributor

willyd commented Mar 3, 2017

Looks like the generated command line in the Ninja build files is too long. Try:

  1. Move the source and build folder to d:\short_folder_name, or
  2. Download this prebuilt package https://ci.appveyor.com/api/buildjobs/154drsjfjl7ukjid/artifacts/build%2Flibraries.zip and replace the lmdb libraries, includes and cmake files in your current build.

@GaryKT
Copy link
Author

GaryKT commented Mar 3, 2017

  1. Shorter Path test:
    D:\caffe-builder>build_v140_x64.cmd
    The input line is too long.
    The syntax of the command is incorrect.

Strange.
2) Ok, trying pre-built libraries. Which path should these files go in exactly?
D:\PROGRAMMING\caffe_builder\caffe-builder\packages\lmdb ?

@willyd
Copy link
Contributor

willyd commented Mar 3, 2017

  1. Start a new fresh command line prompt. Your PATH variable must be too long. Add a @SETLOCAL, @endlocal to build_v140_x64.cmd and submit a PR, if that solves it ;).

  2. Let caffe download the prebuilt dependencies. Stop the build and replace the lmdb stuff in build\libraries.

@GaryKT
Copy link
Author

GaryKT commented Mar 3, 2017

Thanks. New terminal worked. Could it be that the scripts keep appending to the path each time they are run? (duplicating the paths that already were set)

It's compiling now with Ninja I believe. Will let you know if it works.

@willyd
Copy link
Contributor

willyd commented Mar 3, 2017

Thanks. New terminal worked. Could it be that the scripts keep appending to the path each time they are run? (duplicating the paths that already were set)

Exactly, that's why I suggested this

Add a @setlocal, @endlocal to build_v140_x64.cmd and submit a PR, if that solves it ;).

@GaryKT
Copy link
Author

GaryKT commented Mar 3, 2017

Ok, build_v140_x64.cmd ran through a lot now but there is some error at the end.
There is no convert_imageset.exe generated yet or should it?
Or do I need to copy certain DLLs/libs from this now to the other caffe branch?

Is below error of concern? Thanks in advance.

-- Downloading... done
-- extracting...
src='D:/PROGRAMMING/caffe_builder/caffe-builder/build_v140_x64/download/boost_1_61_0.7z'
dst='D:/PROGRAMMING/caffe_builder/caffe-builder/build_v140_x64/packages/boost/boost_download-prefix/src/boost_download'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
ninja: build stopped: subcommand failed.

willyd added a commit to willyd/caffe that referenced this issue Mar 4, 2017
@willyd
Copy link
Contributor

willyd commented Mar 4, 2017

Can you try this PR #5355 and let me know if the problem is gone?

willyd added a commit to willyd/caffe that referenced this issue Mar 4, 2017
@GaryKT
Copy link
Author

GaryKT commented Mar 5, 2017

Thanks. Just to clarify: the updated file is now for the main caffe windows; not caffe builder that I was trying before, correct?
Would it not work with visual studio or should I try this with ninja again?

@GaryKT
Copy link
Author

GaryKT commented Mar 5, 2017

Trying with caffee, ninja and the fix to windowsdependencies. Is this the right command? It seems to ignore the parameters from before.

D:\PROGRAMMING\caffee_lmbd_fix\caffe\build\tools>convert_imageset.exe --resize_height=200 --resize_width=200 --shuffle D:\PROGRAMMING\caffe\dataset\train2.txt jim_test_lmbd
convert_imageset.exe: Convert a set of images to the leveldb/lmdb
format used as input for Caffe.
Usage:
convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME
The ImageNet dataset for the training demo is at
http://www.image-net.org/download-images

No modules matched: use -help

@GaryKT
Copy link
Author

GaryKT commented Mar 6, 2017

Correct parameters: D:\PROGRAMMING\caffee_lmbd_fix\caffe\build\tools>convert_imageset.exe --resize_height=200 --resize_width=200 --shuffle D:\PROGRAMMING\caffe\dataset\ train2.txt jim_test_lmbd

The fix worked. It runs past 17K images now. (Got the command parameters / path wrong at first).
I0306 15:46:12.473387 43992 convert_imageset.cpp:147] Processed 82000 files.

Great, thanks @willyd

@GaryKT GaryKT closed this as completed Mar 6, 2017
willyd added a commit that referenced this issue Mar 12, 2017
Updated prebuilt dependencies. Fixes #5348.
stingshen pushed a commit to stingshen/caffe-faster-rcnn that referenced this issue Jun 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants