New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caffe cannot handle HDF5 files larger as large as 20GB? #2953
Comments
You need to compile caffe in debug mode, run with gdb a send the stacktrace. |
So I compiled caffe in debug-mode. This is the output:
Unfortunately I don't know how to use gdb to help me in that case |
Nevermind. It is enougth |
There is an intrisinc limit on the blob shape size that is CHECK_LE(shape[i], INT_MAX / count_) |
So the blob has 2 GB limit minus 1 byte. You are over this limit. |
Ok. But what should I do then? I guess the number of training samples should not matter. I'm sure there are people with more training data than 2GB. I could cut my training data into chunks of < 2GB, train on the first chunk, save the caffenetmodel file, then load the next chunk and finetune the caffenetmodel on that chunk and so on... Or is there a more elegant way? Thanks for your help so far |
This is not an bug. You need to close this ticket and continue the discussion on caffe-users mailing list |
@mgarbade I believe you can have multiple HDF5 files, each with fewer than 2GB of data, but where the combination of all of them is above 2GB. You specify all the files in a list. The data layer will then cycle through the list of files. You can also get it to shuffle the list of files itself. See: https://github.com/BVLC/caffe/blob/master/src/caffe/layers/hdf5_data_layer.cpp#L138 |
I just ran into this issue as well. My batch size is 100, so my blob shape should be
The whole dataset, however, would be
Is the |
I just verified that a batch size of 10,923 fails (109233256256 = 2147549184) and a batch size of 10,922 doesn't (109223256256 = 2147352576). That is true whether the HDF5 dataset
|
@lukeyeager for 1 I always talked of blob limit of 2gb minus 1 byte so 2147483647 |
For 2. see #1470 |
(1) Yeah but if it's 4bytes per number (for (2,3) Aha, so the HDF5Data layer doesn't prefetch? That's vexing. I still don't see a need for the |
Hi, I can't see the need for using integer for the count variable in blobs as @lukeyeager said. Is there a particular reason for this, instead of using uint? I am having issues for big 3D data. |
Closing as duplicate of #1470. |
I have a training database stored in the hdf5 format. However caffe immediately breaks down when it tries to train on it. Error-Message:
When I split my training database into a smaller chunk (~13GB) everything works fine (all other parameters remained unchanged).
So I guess caffe has a problem with large HDF5 files?
The text was updated successfully, but these errors were encountered: