Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIFAR-10 tutorial train_quick issue - solver state filename #2907

Closed
dymczykm opened this issue Aug 11, 2015 · 7 comments
Closed

CIFAR-10 tutorial train_quick issue - solver state filename #2907

dymczykm opened this issue Aug 11, 2015 · 7 comments

Comments

@dymczykm
Copy link

Hi,

I'm new to Caffe so I might be doing something very wrong, but I think I precisely followed the tutorial available here:
http://caffe.berkeleyvision.org/gathered/examples/cifar10.html
I have a clean master caffe installation on OS X 10.10.4.

The problem is that when running train_quick.sh at some point the training crashes as it expects to see the solver state (after 4k iters) file with .h5 extension (seems related to the recent commit: c9b333e):
https://github.com/BVLC/caffe/blob/master/examples/cifar10/train_quick.sh#L11

Backtrace:

I0811 15:05:17.383101 2041144064 solver.cpp:241] Restoring previous solver status from examples/cifar10/cifar10_quick_iter_4000.solverstate.h5
HDF5-DIAG: Error detected in HDF5 (1.8.14) thread 0:
  #000: H5F.c line 604 in H5Fopen(): unable to open file
    major: File accessibilty
    minor: Unable to open file
  #001: H5Fint.c line 990 in H5F_open(): unable to open file: time = Tue Aug 11 15:05:17 2015
, name = 'examples/cifar10/cifar10_quick_iter_4000.solverstate.h5', tent_flags = 0
    major: File accessibilty
    minor: Unable to open file
  #002: H5FD.c line 992 in H5FD_open(): open failed
    major: Virtual File Layer
    minor: Unable to initialize object
  #003: H5FDsec2.c line 343 in H5FD_sec2_open(): unable to open file: name = 'examples/cifar10/cifar10_quick_iter_4000.solverstate.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
    major: File accessibilty
    minor: Unable to open file
F0811 15:05:17.399014 2041144064 solver.cpp:702] Check failed: file_hid >= 0 (-1 vs. 0) Couldn't open solver state file examples/cifar10/cifar10_quick_iter_4000.solverstate.h5
*** Check failure stack trace: ***
    @        0x1054e0664  google::LogMessage::SendToLog()
    @        0x1054e0c97  google::LogMessage::Flush()
    @        0x1054e46af  google::LogMessageFatal::~LogMessageFatal()
    @        0x1054e1489  google::LogMessageFatal::~LogMessageFatal()
    @        0x10523b624  caffe::SGDSolver<>::RestoreSolverStateFromHDF5()
    @        0x1052333c1  caffe::Solver<>::Restore()
    @        0x105233166  caffe::Solver<>::Solve()
    @        0x10516ceba  train()
    @        0x10516f22f  main
    @     0x7fff8c0f25c9  start
./examples/cifar10/train_quick.sh: line 11: 41703 Abort trap: 6           $TOOLS/caffe train --solver=examples/cifar10/cifar10_quick_solver_lr1.prototxt --snapshot=examples/cifar10/cifar10_quick_iter_4000.solverstate.h5

But apparently, the training procedure snapshots the state to a file without .h5 extension. Quickly hacking the train_quick.sh file to expect just examples/cifar10/cifar10_quick_iter_4000.solverstate resolved the issue.

Correct me if I'm wrong, but it looks like there's some incompatibility between the training procedure and snapshot loading,

Cheers,
Marcin

@xiaohaoChen
Copy link

I'm so new to caffe and I meet the same problem with you. I don't know whether you are correct. If you find the solution. Please do tell me. Thx!

@FishermanZzhang
Copy link

I also meet the problem. I edit the script train_quick.sh .
The final line --snapshot=examples/cifar10/cifar10_quick_iter_4000.solverstate.h5
to
--snapshot=examples/cifar10/cifar10_quick_iter_4000.solverstate.
Then the error is solved. Because, the old edition is that.However , I want to know why generate the error, why the author add the postfix of .h5.
@xiaohaoChen

@serhan-gul
Copy link

I experienced the same problem and had to fix train_quick.sh . Can someone explain what the extension .h5 means and why it was in the script originally?

@Elpidam
Copy link

Elpidam commented Sep 2, 2015

h5 represents HDF5 which is a data model, library, and file format for storing and managing data. More info you can find here https://www.hdfgroup.org/

@zhudelong
Copy link

Hi, I got the same problem but it doesn't work when I remove .h5. Do you have any ideas. Thx

Resuming from examples/cifar10/cifar10_quick_iter_4000.solverstate.h5
HDF5-DIAG: Error detected in HDF5 (1.8.11) thread 140382562596736:
#000: ../../../src/H5F.c line 1586 in H5Fopen(): unable to open file
major: File accessibilty
minor: Unable to open file
#1: ../../../src/H5F.c line 1275 in H5F_open(): unable to open file: time = Fri Dec 11 21:43:59 2015
, name = 'examples/cifar10/cifar10_quick_iter_4000.solverstate.h5', tent_flags = 0
major: File accessibilty
minor: Unable to open file
#2: ../../../src/H5FD.c line 987 in H5FD_open(): open failed
major: Virtual File Layer
minor: Unable to initialize object
#3: ../../../src/H5FDsec2.c line 343 in H5FD_sec2_open(): unable to open file: name = 'examples/cifar10/cifar10_quick_iter_4000.solverstate.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0
major: File accessibilty
minor: Unable to open file
F1211 21:43:59.571705 7933 sgd_solver.cpp:323] Check failed: file_hid >= 0 (-1 vs. 0) Couldn't open solver state file examples/cifar10/cifar10_quick_iter_4000.solverstate.h5

@ruggeria
Copy link

ruggeria commented Apr 8, 2016

I'm getting the same problem as everyone else. I've noticed that if I even try to build the simple demo code from the HDF5 website, I get the same error. I think the problem is on there end.

@shelhamer
Copy link
Member

Switched back to proto serialization in 8bc82c6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants