-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New lr policies, MultiStep and StepEarly #190
Conversation
Nice policies Sergio. Thanks for the examples. Could you also include tests? Learning rate policies and termination criteria #76 are both scheduled parts of the solver, and the conversation kind of stalled about the best way to add these. The options were observer/notify classes, coding right into solver, or make learning rate and termination factories like layerfactory. I think refactoring to a LearningRateFactory could be nice and orderly, and then the solver would call the LearningRate for any updates. What do you think? Re: naming, StepPlateau or StepFlat might be more descriptive than StepEarly. Or as you suggested elsewhere, EarlyStep has a nice relationship to early stopping. |
@shelhamer, I agree with you since this PR increases the number of learning rates to @Yangqing's refactoring threshold. I will use AdaptiveLearningRateFactory and AdaptiveLearningRate when I got the time to solve #30. AdaptiveLearningRate can not be mixed with LearningRate because of the different APIs. template<typename Dtype>
class LearningRate {
public:
Dtype schedule(const int iteration);
}
template<typename Dtype>
class AdaptiveLearningRate {
public:
// returns parameter wise learning rate
shared_ptr<Blob<Dtype> > schedule(const int iteration, const shared_ptr<Blob<Dtype> > gradient);
} |
Having a multistep decrease is definitely useful. The only thing I'd like to add is that having an unlimited number of steps, makes parametrizing Caffe more difficult. (The reason I bring this up is that I running hyperparameter optimization on caffe). So maybe instead of having to set each step, the stepsize could, just like the learning rate, follow a parametric function, e.g. decay exponentially or linearly. Let me know what you think. |
@tdomhan I will fix first the current new_lr_decay policies and will add more later. |
@sguada it'd be great to include these policies, and multistep would simplify the cifar-10 example. |
4278286
to
c01f07a
Compare
77de8d7
to
523a39c
Compare
hi Caffe! just a heads up to say I did try the merge on my own repo, and did run the test just as Travis did, and I am not getting any error. FYI Travis reports this after all tests are successful: |
523a39c
to
be36e40
Compare
Conflicts: include/caffe/solver.hpp src/caffe/proto/caffe.proto src/caffe/solver.cpp
be36e40
to
b025da7
Compare
When will this commit be available approximately? |
b025da7
to
6e20aa3
Compare
New lr policies, MultiStep and StepEarly
@Mezn it is available, let me know if you have any problems. |
@sguada my understanding is that stepearly is not part of the commit. Also, the *.prototxt for mnist are in examples/lenet instead of examples/mnist. my 2cents :) and thanks for this. |
@sguada thanks for the explanations. I'm into stochastic optimization, I'd be interested in looking at the old stepearly code. FYI, I am experimenting with a 'stagnation' policy relying on the median losses and or tests in order to speed up the overall training time. |
New lr policies, MultiStep and StepEarly
Let's remove |
This `examples/lenet/lenet_stepearly_solver.prototxt` is introduced in BVLC#190 by mistake, since stepearly is never actually merged.
Fix Python installation with CMake install target
This `examples/lenet/lenet_stepearly_solver.prototxt` is introduced in BVLC#190 by mistake, since stepearly is never actually merged.
lenet_multistep_solver.prototxt
Allows to define multiple steps in the
solver.prototxt
by settinglr_policy: multistep
and by definingstepvalue
when the learning rate should be decreased. This allows to have not evenly distributed steps. One should define the sequence ofstepvalue
in increasing order.lenet_stepearly_solver.prototxt
Allows to decrease the
lr_rate
dynamically based in the behaviour of Test accuracy.The learning will be decreased when for a number Tests defined by
stepearly
the maximum accuracy has not increased.