Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Solver to allow interactive stepping #1228

Merged
merged 2 commits into from Jan 7, 2015

Conversation

longjon
Copy link
Contributor

@longjon longjon commented Oct 7, 2014

In this PR, the main loop of Solver::Solve is factored out into Solver::Step, which is exposed to Python. This allows interactive stepping of the solver.

Warning: I've only actually tested a parallel version of this on another branch.

@rodrigob
Copy link
Contributor

rodrigob commented Oct 9, 2014

@longjon I am currently playing on top of this branch.
How do you handle things that happen at the end of Solver:Solve on python side ?
E.g. things like handling snapshot_after_train().

Right now I am simply planning to expose these to python, so as to have a python re-implementation of Solver::Solve, and then add there callbacks to python plotting magic (see #481).

@longjon
Copy link
Contributor Author

longjon commented Oct 9, 2014

@rodrigob Cool, let me know (or PR this branch) if you find any errors.

In my use, I don't worry about things happening at the end of Solve, because there is no end; I am always running things until I've decided they've converged, diverged, or otherwise failed.

But yes, I essentially use a Python implementation of Solve, which is just a loop around solver.step, with any extras that I want thrown in, and preceded by a restore (not exposed yet) or a copy_from (#1195) if desired. Exposing Snapshot as well seems right.

@jeffdonahue
Copy link
Contributor

LGTM

@muupan
Copy link

muupan commented Nov 12, 2014

I'm using this PR for implementing online reinforcement learning. I think it's very useful but needs a small fix to work safely:

Solver's member variable initialized_ should be initialized with false in the constructor, otherwise calling Step() will cause undefined behavior because it checks the value of the uninitialized variable initialized_ at https://github.com/longjon/caffe/blob/solver-step/src/caffe/solver.cpp#L191

This is how I fixed it: muupan@d098036

@longjon
Copy link
Contributor Author

longjon commented Dec 31, 2014

@muupan you're right, of course. I've updated this with a rebase of my most recent working version.

@longjon longjon force-pushed the solver-step branch 2 times, most recently from 5f43178 to 033bafe Compare December 31, 2014 23:43
@longjon longjon mentioned this pull request Dec 31, 2014
longjon added a commit to longjon/caffe that referenced this pull request Jan 1, 2015
Refactor Solver to allow interactive stepping
longjon added a commit to longjon/caffe that referenced this pull request Jan 2, 2015
Refactor Solver to allow interactive stepping

Conflicts:
	src/caffe/solver.cpp
@longjon
Copy link
Contributor Author

longjon commented Jan 2, 2015

Updated: the snapshot logic has been changed to properly snapshot after every _k_th iteration, instead of snapshotting before the iteration following every _k_th. (In other words, snapshotting has been moved to the end of the solver loop.)

This simplifies the logic (except the "snapshot after train" logic), and ensures that Step(k) will end up with a snapshot on disk, instead of waiting for the next Step call, which can be important if something goes wrong between those two calls.

}

template <typename Dtype>
void Solver<Dtype>::PreSolve() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO since this method is common setup specific to the Solver class itself, it should just be given a new name like Initialize() (and then called separately before PreSolve() @ line 236), so that anyone subclassing Solver doesn't need to remember to call it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that forcing subclassers to call PreSolve is awkward. Is there any reason we don't just call PreSolve from Init? That would simplify the logic and get rid of initialized_, as there would be no such thing as an uninitialized solver, and then it makes sense to do the base class initialization right in Init. I think I'll try it and update the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we can't do that, because virtual functions can't be called from constructors. Here's a slightly more radical idea, which I've implemented now: get rid of Solver::PreSolve and just use the constructors, since constructors now provide the same functionality. SGDSolver::PreSolve, which was the only actual PreSolve, remains as it was and gets called from the SGDSolver constructor.

@longjon longjon force-pushed the solver-step branch 2 times, most recently from 147555a to 9478a48 Compare January 3, 2015 00:19
// Remember the initial iter_ value; will be non-zero if we loaded from a
// resume_file above.
void Solver<Dtype>::Step(int iters) {
vector<Blob<Dtype>*> bottom_vec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed since you changed the call to ForwardPrefilled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changed call is in Solve, this is in Step, where the call is to ForwardBackward. An alternative is to remove bottom_vec but call ForwardPrefilled and Backward separately; that's what I've now implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I'll revert that, because it makes #1663 awkward. We should probably just clean up the Net interface to avoid these dummy vectors at some later point.

@longjon
Copy link
Contributor Author

longjon commented Jan 6, 2015

@jeffdonahue I think I'm satisfied with this now; merge when ready.

jeffdonahue added a commit that referenced this pull request Jan 7, 2015
Refactor Solver to allow interactive stepping
@jeffdonahue jeffdonahue merged commit c6a88bf into BVLC:dev Jan 7, 2015
@jeffdonahue
Copy link
Contributor

Cool, thanks Jon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants