Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
AdaDelta Solver (v3) #2782
Conversation
matthiasplappert
and 3 others
commented on an outdated diff
Jul 18, 2015
| @@ -434,7 +434,8 @@ Dtype SGDSolver<Dtype>::GetLearningRate() { | ||
| (Dtype(1.) + exp(-this->param_.gamma() * (Dtype(this->iter_) - | ||
| Dtype(this->param_.stepsize()))))); | ||
| } else { | ||
| - LOG(FATAL) << "Unknown learning rate policy: " << lr_policy; | ||
| + rate = Dtype(0.); |
matthiasplappert
Contributor
|
matthiasplappert
changed the title from
AdaDelta v3 to AdaDelta Solver (attempt number 3)
Jul 18, 2015
matthiasplappert
changed the title from
AdaDelta Solver (attempt number 3) to AdaDelta Solver (v3)
Jul 18, 2015
|
Travis failed b/c of lint error (the commented-out LOG is causing the error, which will go away before merging this anyway, see comment above). |
shelhamer
added the
focus
label
Aug 4, 2015
shelhamer
referenced
this pull request
Aug 4, 2015
Closed
Adaptive Solvers: AdaDelta, RMSprop, and ADAM #2860
ronghanghu
added the
RH
label
Aug 5, 2015
|
@matthiasplappert thanks for making the update, but take another look at #2518 and see how the regularization and logging code was pulled out into |
ronghanghu
and 1 other
commented on an outdated diff
Aug 6, 2015
| @@ -129,6 +129,27 @@ class AdaGradSolver : public SGDSolver<Dtype> { | ||
| }; | ||
| template <typename Dtype> | ||
| +class AdaDeltaSolver : public SGDSolver<Dtype> { | ||
| + public: | ||
| + explicit AdaDeltaSolver(const SolverParameter& param) | ||
| + : SGDSolver<Dtype>(param) { PreSolve(); constructor_sanity_check(); } | ||
| + explicit AdaDeltaSolver(const string& param_file) | ||
| + : SGDSolver<Dtype>(param_file) { PreSolve(); constructor_sanity_check(); } |
ronghanghu
Member
|
ronghanghu
commented on an outdated diff
Aug 6, 2015
| +class AdaDeltaSolver : public SGDSolver<Dtype> { | ||
| + public: | ||
| + explicit AdaDeltaSolver(const SolverParameter& param) | ||
| + : SGDSolver<Dtype>(param) { PreSolve(); constructor_sanity_check(); } | ||
| + explicit AdaDeltaSolver(const string& param_file) | ||
| + : SGDSolver<Dtype>(param_file) { PreSolve(); constructor_sanity_check(); } | ||
| + | ||
| + protected: | ||
| + virtual void PreSolve(); | ||
| + virtual void ComputeUpdateValue(int param_id, Dtype rate); | ||
| + void constructor_sanity_check() { | ||
| + CHECK_EQ(0, this->param_.base_lr()) | ||
| + << "Learning rate cannot be used with AdaDelta."; | ||
| + CHECK_EQ("", this->param_.lr_policy()) | ||
| + << "Learning rate policy cannot be applied to AdaDelta."; | ||
| + } |
|
|
ronghanghu
commented on an outdated diff
Aug 6, 2015
| @@ -775,9 +776,192 @@ void AdaGradSolver<Dtype>::ComputeUpdateValue(int param_id, Dtype rate) { | ||
| } | ||
| } | ||
| +template <typename Dtype> | ||
| +void AdaDeltaSolver<Dtype>::PreSolve() { | ||
| + // Add the extra history entries for AdaDelta after those from | ||
| + // SGDSolver::PreSolve | ||
| + const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params(); |
ronghanghu
Member
|
ronghanghu
and 1 other
commented on an outdated diff
Aug 6, 2015
| + caffe_axpy(net_params[param_id]->count(), | ||
| + local_decay, | ||
| + net_params[param_id]->cpu_data(), | ||
| + net_params[param_id]->mutable_cpu_diff()); | ||
| + } else if (regularization_type == "L1") { | ||
| + caffe_cpu_sign(net_params[param_id]->count(), | ||
| + net_params[param_id]->cpu_data(), | ||
| + this->temp_[param_id]->mutable_cpu_data()); | ||
| + caffe_axpy(net_params[param_id]->count(), | ||
| + local_decay, | ||
| + this->temp_[param_id]->cpu_data(), | ||
| + net_params[param_id]->mutable_cpu_diff()); | ||
| + } else { | ||
| + LOG(FATAL) << "Unknown regularization type: " << regularization_type; | ||
| + } | ||
| + } |
ronghanghu
Member
|
ronghanghu
and 1 other
commented on an outdated diff
Aug 6, 2015
ronghanghu
commented on the diff
Aug 6, 2015
| + for (int i = 0; i <= kNumIters; ++i) { | ||
| + this->TestLeastSquaresUpdate(kLearningRate, kWeightDecay, kMomentum, i); | ||
| + } | ||
| +} | ||
| + | ||
| +TYPED_TEST(AdaDeltaSolverTest, TestAdaDeltaLeastSquaresUpdateWithEverything) { | ||
| + typedef typename TypeParam::Dtype Dtype; | ||
| + const Dtype kLearningRate = 0.0; | ||
| + const Dtype kWeightDecay = 0.1; | ||
| + const Dtype kMomentum = 0.95; | ||
| + const int kNumIters = 4; | ||
| + for (int i = 0; i <= kNumIters; ++i) { | ||
| + this->TestLeastSquaresUpdate(kLearningRate, kWeightDecay, kMomentum, i); | ||
| + } | ||
| +} | ||
| + |
ronghanghu
Member
|
ronghanghu
and 1 other
commented on an outdated diff
Aug 6, 2015
| + const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params(); | ||
| + for (int i = 0; i < net_params.size(); ++i) { | ||
| + const vector<int>& shape = net_params[i]->shape(); | ||
| + this->history_.push_back( | ||
| + shared_ptr<Blob<Dtype> >(new Blob<Dtype>(shape))); | ||
| + } | ||
| +} | ||
| + | ||
| +template <typename Dtype> | ||
| +void AdaDeltaSolver<Dtype>::ComputeUpdateValue(int param_id, Dtype rate) { | ||
| + const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params(); | ||
| + const vector<float>& net_params_weight_decay = | ||
| + this->net_->params_weight_decay(); | ||
| + Dtype delta = this->param_.delta(); | ||
| + Dtype momentum = this->param_.momentum(); | ||
| + Dtype weight_decay = this->param_.weight_decay(); |
|
|
ronghanghu
commented on the diff
Aug 7, 2015
| + | ||
| + // compute the update and copy to net_diff | ||
| + caffe_gpu_mul(net_params[param_id]->count(), | ||
| + net_params[param_id]->gpu_diff(), | ||
| + this->update_[param_id]->gpu_data(), | ||
| + net_params[param_id]->mutable_gpu_diff()); | ||
| + | ||
| + // compute square of update | ||
| + caffe_gpu_powx(net_params[param_id]->count(), | ||
| + net_params[param_id]->gpu_diff(), Dtype(2), | ||
| + this->update_[param_id]->mutable_gpu_data()); | ||
| + | ||
| + // update history of updates | ||
| + caffe_gpu_axpby(net_params[param_id]->count(), Dtype(1) - momentum, | ||
| + this->update_[param_id]->gpu_data(), momentum, | ||
| + this->history_[update_history_offset + param_id]->mutable_gpu_data()); |
ronghanghu
Member
|
ronghanghu
commented on the diff
Aug 7, 2015
| + | ||
| + // compute the update | ||
| + caffe_mul(net_params[param_id]->count(), | ||
| + net_params[param_id]->cpu_diff(), | ||
| + this->update_[param_id]->cpu_data(), | ||
| + net_params[param_id]->mutable_cpu_diff()); | ||
| + | ||
| + // compute square of update | ||
| + caffe_powx(net_params[param_id]->count(), | ||
| + net_params[param_id]->cpu_diff(), Dtype(2), | ||
| + this->update_[param_id]->mutable_cpu_data()); | ||
| + | ||
| + // update history of updates | ||
| + caffe_cpu_axpby(net_params[param_id]->count(), Dtype(1) - momentum, | ||
| + this->update_[param_id]->cpu_data(), momentum, | ||
| + this->history_[update_history_offset + param_id]->mutable_cpu_data()); |
ronghanghu
Member
|
ronghanghu
and 1 other
commented on an outdated diff
Aug 7, 2015
| + caffe_gpu_axpy(net_params[param_id]->count(), | ||
| + local_decay, | ||
| + net_params[param_id]->gpu_data(), | ||
| + net_params[param_id]->mutable_gpu_diff()); | ||
| + } else if (regularization_type == "L1") { | ||
| + caffe_gpu_sign(net_params[param_id]->count(), | ||
| + net_params[param_id]->gpu_data(), | ||
| + this->temp_[param_id]->mutable_gpu_data()); | ||
| + caffe_gpu_axpy(net_params[param_id]->count(), | ||
| + local_decay, | ||
| + this->temp_[param_id]->gpu_data(), | ||
| + net_params[param_id]->mutable_gpu_diff()); | ||
| + } else { | ||
| + LOG(FATAL) << "Unknown regularization type: " << regularization_type; | ||
| + } | ||
| + } |
|
|
|
@matthiasplappert Thanks for your great PR to introduce AdaDelta solver into Caffe! The remaining work include:
Please modify and update according to the reviews. |
|
@ronghanghu I'll try to find some time over the weekend to get all of this done. We should also thank @kevinbache and especially @mohomran (who wrote the original code), since I just carried on with their work. |
|
I'll resolve the conflict later today and (hopefully) address the reaming issues as well.
|
|
Update on this: This branch is now up-to-date with master and all feedback has been addressed. The tests pass locally and I expect them to also pass on the CI. Please review my changes and let me know if everything else is required on my end, e.g. cleaning up the commit history (not sure how you usually handle this). I've also pointed out the relevant commits in each feedback discussion to hopefully help with reviewing the changes. Finally, I have one suggestion to make: having all solvers in one relatively big file ( |
|
@matthiasplappert Thanks a lot for the update. I will review the changes today.
Yes, this is quite a problem. I expect to send a solver refactor PR to split solver.cpp and extract common code for these adaptive gradient solvers, after merging AdaDelta and Adam (#2856). |
ronghanghu
commented on an outdated diff
Aug 9, 2015
| @@ -159,6 +158,22 @@ class RMSPropSolver : public SGDSolver<Dtype> { | ||
| }; | ||
| template <typename Dtype> | ||
| +class AdaDeltaSolver : public SGDSolver<Dtype> { | ||
| + public: | ||
| + explicit AdaDeltaSolver(const SolverParameter& param) | ||
| + : SGDSolver<Dtype>(param) { PreSolve(); } | ||
| + explicit AdaDeltaSolver(const string& param_file) | ||
| + : SGDSolver<Dtype>(param_file) { PreSolve(); } | ||
| + | ||
| + protected: | ||
| + void PreSolve(); |
ronghanghu
Member
|
ronghanghu
commented on an outdated diff
Aug 9, 2015
| @@ -159,6 +158,22 @@ class RMSPropSolver : public SGDSolver<Dtype> { | ||
| }; | ||
| template <typename Dtype> | ||
| +class AdaDeltaSolver : public SGDSolver<Dtype> { | ||
| + public: | ||
| + explicit AdaDeltaSolver(const SolverParameter& param) | ||
| + : SGDSolver<Dtype>(param) { PreSolve(); } | ||
| + explicit AdaDeltaSolver(const string& param_file) | ||
| + : SGDSolver<Dtype>(param_file) { PreSolve(); } | ||
| + | ||
| + protected: | ||
| + void PreSolve(); | ||
| + virtual void Regularize(int param_id); |
ronghanghu
Member
|
ronghanghu
commented on an outdated diff
Aug 9, 2015
| @@ -860,6 +860,214 @@ void AdaGradSolver<Dtype>::ComputeUpdateValue(int param_id, Dtype rate) { | ||
| } | ||
| template <typename Dtype> | ||
| +void AdaDeltaSolver<Dtype>::PreSolve() { | ||
| + // Add the extra history entries for AdaDelta after those from | ||
| + // SGDSolver::PreSolve | ||
| + const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params(); | ||
| + for (int i = 0; i < net_params.size(); ++i) { | ||
| + const vector<int>& shape = net_params[i]->shape(); | ||
| + this->history_.push_back( | ||
| + shared_ptr<Blob<Dtype> >(new Blob<Dtype>(shape))); | ||
| + } | ||
| +} | ||
| + | ||
| +template <typename Dtype> | ||
| +void AdaDeltaSolver<Dtype>::Regularize(int param_id) { | ||
| + const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params(); |
ronghanghu
Member
|
ronghanghu
commented on the diff
Aug 9, 2015
| + for (int i = 0; i <= kNumIters; ++i) { | ||
| + this->TestLeastSquaresUpdate(kLearningRate, kWeightDecay, kMomentum, i); | ||
| + } | ||
| +} | ||
| + | ||
| +TYPED_TEST(AdaDeltaSolverTest, TestLeastSquaresUpdateWithEverythingAccum) { | ||
| + typedef typename TypeParam::Dtype Dtype; | ||
| + const Dtype kLearningRate = 1.0; | ||
| + const Dtype kWeightDecay = 0.1; | ||
| + const Dtype kMomentum = 0.95; | ||
| + const int kNumIters = 4; | ||
| + const int kIterSize = 2; | ||
| + this->CheckAccumulation(kLearningRate, kWeightDecay, kMomentum, kNumIters, | ||
| + kIterSize); | ||
| +} | ||
| + |
ronghanghu
Member
|
ronghanghu
commented on an outdated diff
Aug 9, 2015
| @@ -860,6 +860,214 @@ void AdaGradSolver<Dtype>::ComputeUpdateValue(int param_id, Dtype rate) { | ||
| } | ||
| template <typename Dtype> | ||
| +void AdaDeltaSolver<Dtype>::PreSolve() { | ||
| + // Add the extra history entries for AdaDelta after those from | ||
| + // SGDSolver::PreSolve | ||
| + const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params(); |
ronghanghu
Member
|
ronghanghu
commented on an outdated diff
Aug 9, 2015
| + LOG(FATAL) << "Unknown regularization type: " << regularization_type; | ||
| + } | ||
| + } | ||
| +#else | ||
| + NO_GPU; | ||
| +#endif | ||
| + break; | ||
| + } | ||
| + default: | ||
| + LOG(FATAL) << "Unknown caffe mode: " << Caffe::mode(); | ||
| + } | ||
| +} | ||
| + | ||
| +template <typename Dtype> | ||
| +void AdaDeltaSolver<Dtype>::ComputeUpdateValue(int param_id, Dtype rate) { | ||
| + const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params(); |
ronghanghu
Member
|
|
@matthiasplappert I just made a few comments above. Let's get the following work done and I think this PR will be ready:
|
|
@matthiasplappert a note about history: instead of squashing to a single commit, please squash the commits by each author into a single commit. This will leave three commits by @mohomran @kevinbache and yourself. In future work please make use of rebase instead of merge, as our policy is to only have merge commits for PRs. Thanks.
Absolutely, and this was noted in #2860 but deserves another issue so I've transplanted it to #2890. |
mohomran
and others
added some commits
Sep 20, 2014
|
@ronghanghu Thanks for the thorough review! I'm still very new to caffe, so your feedback is very much appreciated. I've addressed the remaining feedback and cleaned up the commit history (also: no more merges). All tests pass locally (not sure if Travis will pick this up since the branch was force-pushed to override the history). Let me know if anything else needs to be done before we can land this in master. |
|
@matthiasplappert Thanks for the update! I'll take a final review, and I expect to merge it tomorrow. @jeffdonahue could you also take a look? |
ronghanghu
added the
ready for review
label
Aug 10, 2015
This was referenced Aug 10, 2015
ronghanghu
added a commit
that referenced
this pull request
Aug 11, 2015
|
|
ronghanghu |
ebc3e3b
|
ronghanghu
merged commit ebc3e3b
into
BVLC:master
Aug 11, 2015
1 check passed
|
Finished final review. Thanks for the @mohomran, @kevinbache and @matthiasplappert for this excellent AdaDelta solver. |
ctrevino
added a commit
to Robotertechnik/caffe
that referenced
this pull request
Aug 11, 2015
|
|
ctrevino |
ab3842a
|
ctrevino
added a commit
to Robotertechnik/caffe
that referenced
this pull request
Aug 11, 2015
|
|
ctrevino |
8c83b88
|
matthiasplappert commentedJul 18, 2015
Picked up @kevinbache's branch (#2204), merged it with master, resolved merge conflicts and fixed a couple of issues due to API changes. All tests pass.
However, I need input on one change, please see comment directly in the diff.