CIFAR-10 tutorial train_quick issue - solver state filename #2907

dymczykm · 2015-08-11T14:02:48Z

Hi,

I'm new to Caffe so I might be doing something very wrong, but I think I precisely followed the tutorial available here:
http://caffe.berkeleyvision.org/gathered/examples/cifar10.html
I have a clean master caffe installation on OS X 10.10.4.

The problem is that when running train_quick.sh at some point the training crashes as it expects to see the solver state (after 4k iters) file with .h5 extension (seems related to the recent commit: c9b333e):
https://github.com/BVLC/caffe/blob/master/examples/cifar10/train_quick.sh#L11

Backtrace:

I0811 15:05:17.383101 2041144064 solver.cpp:241] Restoring previous solver status from examples/cifar10/cifar10_quick_iter_4000.solverstate.h5
HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 0:
  #000: H5F.c line 604 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 990 in H5F_open(): unable to open file: time = Tue Aug 11 15:05:17 2015
, name = 'examples/cifar10/cifar10_quick_iter_4000.solverstate.h5', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 992 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #003: H5FDsec2.c line 343 in H5FD_sec2_open(): unable to open file: name = 'examples/cifar10/cifar10_quick_iter_4000.solverstate.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
    major: File accessibilty
    minor: Unable to open file
F0811 15:05:17.399014 2041144064 solver.cpp:702] Check failed: file_hid >= 0 (-1 vs. 0) Couldn't open solver state file examples/cifar10/cifar10_quick_iter_4000.solverstate.h5
*** Check failure stack trace: ***
    @        0x1054e0664  google::LogMessage::SendToLog()
    @        0x1054e0c97  google::LogMessage::Flush()
    @        0x1054e46af  google::LogMessageFatal::~LogMessageFatal()
    @        0x1054e1489  google::LogMessageFatal::~LogMessageFatal()
    @        0x10523b624  caffe::SGDSolver<>::RestoreSolverStateFromHDF5()
    @        0x1052333c1  caffe::Solver<>::Restore()
    @        0x105233166  caffe::Solver<>::Solve()
    @        0x10516ceba  train()
    @        0x10516f22f  main
    @     0x7fff8c0f25c9  start
./examples/cifar10/train_quick.sh: line 11: 41703 Abort trap: 6           $TOOLS/caffe train --solver=examples/cifar10/cifar10_quick_solver_lr1.prototxt --snapshot=examples/cifar10/cifar10_quick_iter_4000.solverstate.h5

But apparently, the training procedure snapshots the state to a file without .h5 extension. Quickly hacking the train_quick.sh file to expect just examples/cifar10/cifar10_quick_iter_4000.solverstate resolved the issue.

Correct me if I'm wrong, but it looks like there's some incompatibility between the training procedure and snapshot loading,

Cheers,
Marcin

The text was updated successfully, but these errors were encountered:

xiaohaoChen · 2015-08-17T04:47:54Z

I'm so new to caffe and I meet the same problem with you. I don't know whether you are correct. If you find the solution. Please do tell me. Thx!

FishermanZzhang · 2015-08-17T09:06:53Z

I also meet the problem. I edit the script train_quick.sh .
The final line --snapshot=examples/cifar10/cifar10_quick_iter_4000.solverstate.h5
to
--snapshot=examples/cifar10/cifar10_quick_iter_4000.solverstate.
Then the error is solved. Because, the old edition is that.However , I want to know why generate the error, why the author add the postfix of .h5.
@xiaohaoChen

serhan-gul · 2015-09-02T09:22:30Z

I experienced the same problem and had to fix train_quick.sh . Can someone explain what the extension .h5 means and why it was in the script originally?

Elpidam · 2015-09-02T09:26:25Z

h5 represents HDF5 which is a data model, library, and file format for storing and managing data. More info you can find here https://www.hdfgroup.org/

zhudelong · 2015-12-11T13:46:49Z

Hi, I got the same problem but it doesn't work when I remove .h5. Do you have any ideas. Thx

Resuming from examples/cifar10/cifar10_quick_iter_4000.solverstate.h5
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 140382562596736:
#000: ../../../src/H5F.c line 1586 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
#1: ../../../src/H5F.c line 1275 in H5F_open(): unable to open file: time = Fri Dec 11 21:43:59 2015
, name = 'examples/cifar10/cifar10_quick_iter_4000.solverstate.h5', tent_flags = 0
major: File accessibilty
minor: Unable to open file
#2: ../../../src/H5FD.c line 987 in H5FD_open(): open failed
major: Virtual File Layer
minor: Unable to initialize object
#3: ../../../src/H5FDsec2.c line 343 in H5FD_sec2_open(): unable to open file: name = 'examples/cifar10/cifar10_quick_iter_4000.solverstate.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
major: File accessibilty
minor: Unable to open file
F1211 21:43:59.571705 7933 sgd_solver.cpp:323] Check failed: file_hid >= 0 (-1 vs. 0) Couldn't open solver state file examples/cifar10/cifar10_quick_iter_4000.solverstate.h5

ruggeria · 2016-04-08T12:22:54Z

I'm getting the same problem as everyone else. I've noticed that if I even try to build the simple demo code from the HDF5 website, I get the same error. I think the problem is on there end.

shelhamer · 2017-04-14T02:15:58Z

Switched back to proto serialization in 8bc82c6.

shelhamer closed this as completed Apr 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CIFAR-10 tutorial train_quick issue - solver state filename #2907

CIFAR-10 tutorial train_quick issue - solver state filename #2907

dymczykm commented Aug 11, 2015

xiaohaoChen commented Aug 17, 2015

FishermanZzhang commented Aug 17, 2015

serhan-gul commented Sep 2, 2015

Elpidam commented Sep 2, 2015

zhudelong commented Dec 11, 2015

ruggeria commented Apr 8, 2016

shelhamer commented Apr 14, 2017

CIFAR-10 tutorial train_quick issue - solver state filename #2907

CIFAR-10 tutorial train_quick issue - solver state filename #2907

Comments

dymczykm commented Aug 11, 2015

xiaohaoChen commented Aug 17, 2015

FishermanZzhang commented Aug 17, 2015

serhan-gul commented Sep 2, 2015

Elpidam commented Sep 2, 2015

zhudelong commented Dec 11, 2015

ruggeria commented Apr 8, 2016

shelhamer commented Apr 14, 2017