AdaDelta Solver (v3) #2782

Merged
merged 3 commits into from Aug 11, 2015

Conversation

Projects
None yet
7 participants
Contributor

matthiasplappert commented Jul 18, 2015

Picked up @kevinbache's branch (#2204), merged it with master, resolved merge conflicts and fixed a couple of issues due to API changes. All tests pass.

However, I need input on one change, please see comment directly in the diff.

@matthiasplappert matthiasplappert and 3 others commented on an outdated diff Jul 18, 2015

src/caffe/solver.cpp
@@ -434,7 +434,8 @@ Dtype SGDSolver<Dtype>::GetLearningRate() {
(Dtype(1.) + exp(-this->param_.gamma() * (Dtype(this->iter_) -
Dtype(this->param_.stepsize())))));
} else {
- LOG(FATAL) << "Unknown learning rate policy: " << lr_policy;
+ rate = Dtype(0.);
@matthiasplappert

matthiasplappert Jul 18, 2015

Contributor

I'm unsure what the best way to solve this is. The problem here is that AdaDelta solver does not support a learning rate. However, since AdaDelta inherits from SGD, and SGD calls ApplyUpdates which, in turn, calls this method, we trigger the default case and therefore the fatal log (which is currently commented out). Returning a rate of 0.0 works fine, but is probably likely to cause errors in other areas of the code base where a valid learning rate is expected. Any input on this is greatly appreciated!

@seanbell

seanbell Jul 20, 2015

Contributor

One possible idea: keep the learning rate schedule, and treat it as a multiplier on the AdaDelta update step size. The only ugly part of this solution is that it would require the user to specify base_lr: 1 lr_policy: 'fixed' in order to get the default behavior.

@matthiasplappert

matthiasplappert Jul 25, 2015

Contributor

That would be a possible solution. Before going any further with this, is adding AdaDelta even of interest for caffe? I don't want to invest time into this if it's not likely to land in master eventually.

@PatWie

PatWie Aug 4, 2015

Contributor

I would strongly argue for AdaDelta shipped within the Caffe-Framework. I was surprised that it isn't already in the master-branch.

@ronghanghu

ronghanghu Aug 6, 2015

Member

I am also strongly in favor of having AdaDelta in Caffe. I'll go over and review this PR today.

@ronghanghu

ronghanghu Aug 6, 2015

Member

For the learning rate issue, I suggest using base_lr: 1 and lr_policy: 'fixed'

I suppose learning rate specification is still sometimes needed, even if you use AdaDelta. Take fine-tuning as an example, you may still want to have a smaller learning rate on pre-trained layers than on random-initialized layers even if you use AdaDelta.

For clarity, Let's change line 7 of Algorithm 1 in AdaDelta paper from:

x(t+1) = x(t) + delta_x(t)

to

x(t+1) = x(t) + local_rate * delta_x(t)

where local_rate = base_lr * lr_mult is the local learning rate for each parameter blob.

matthiasplappert changed the title from AdaDelta v3 to AdaDelta Solver (attempt number 3) Jul 18, 2015

matthiasplappert changed the title from AdaDelta Solver (attempt number 3) to AdaDelta Solver (v3) Jul 18, 2015

Contributor

matthiasplappert commented Jul 18, 2015

Travis failed b/c of lint error (the commented-out LOG is causing the error, which will go away before merging this anyway, see comment above).

shelhamer referenced this pull request Aug 4, 2015

Closed

AdaDelta v2 #2204

shelhamer added the focus label Aug 4, 2015

shelhamer referenced this pull request Aug 4, 2015

Closed

Adaptive Solvers: AdaDelta, RMSprop, and ADAM #2860

3 of 3 tasks complete

ronghanghu added the RH label Aug 5, 2015

Owner

shelhamer commented Aug 6, 2015

@matthiasplappert thanks for making the update, but take another look at #2518 and see how the regularization and logging code was pulled out into SGDSolver.

@ronghanghu ronghanghu and 1 other commented on an outdated diff Aug 6, 2015

include/caffe/solver.hpp
@@ -129,6 +129,27 @@ class AdaGradSolver : public SGDSolver<Dtype> {
};
template <typename Dtype>
+class AdaDeltaSolver : public SGDSolver<Dtype> {
+ public:
+ explicit AdaDeltaSolver(const SolverParameter& param)
+ : SGDSolver<Dtype>(param) { PreSolve(); constructor_sanity_check(); }
+ explicit AdaDeltaSolver(const string& param_file)
+ : SGDSolver<Dtype>(param_file) { PreSolve(); constructor_sanity_check(); }
@ronghanghu

ronghanghu Aug 6, 2015

Member

I suppose you have something wrong here. Now you are calling PreSolve() in constructor of both AdaDeltaSolver and SGDSolver, and since you turned in into a virtual method, you are now calling AdaDeltaSolver::PreSolve() twice when constructing a AdaDeltaSolver instance. Is that the desired behavior?
Sorry I was wrong here. Before the derived class constructor is called, the dynamic type of the object under construction is a base class instance and not a derived class instance. For this reason, you are still calling AdaDeltaSolver::PreSolve() in AdaDeltaSolver::AdaDeltaSolver after calling SGDSolver::Presolve() in SGDSolver::SGDSolver. However, I still don't see a reason making Presolve a virtual function, and in general it is not good to call a virtual function inside a constructor in C++.

Also see the comment below in AdaDeltaSolver<Dtype>::PreSolve().

@matthiasplappert

matthiasplappert Aug 9, 2015

Contributor

virtual issue addressed in aedff90

@ronghanghu ronghanghu commented on an outdated diff Aug 6, 2015

include/caffe/solver.hpp
+class AdaDeltaSolver : public SGDSolver<Dtype> {
+ public:
+ explicit AdaDeltaSolver(const SolverParameter& param)
+ : SGDSolver<Dtype>(param) { PreSolve(); constructor_sanity_check(); }
+ explicit AdaDeltaSolver(const string& param_file)
+ : SGDSolver<Dtype>(param_file) { PreSolve(); constructor_sanity_check(); }
+
+ protected:
+ virtual void PreSolve();
+ virtual void ComputeUpdateValue(int param_id, Dtype rate);
+ void constructor_sanity_check() {
+ CHECK_EQ(0, this->param_.base_lr())
+ << "Learning rate cannot be used with AdaDelta.";
+ CHECK_EQ("", this->param_.lr_policy())
+ << "Learning rate policy cannot be applied to AdaDelta.";
+ }
@ronghanghu

ronghanghu Aug 6, 2015

Member

Let's still keep the base_lr and lr_policy policy as discussed.

@ronghanghu ronghanghu commented on an outdated diff Aug 6, 2015

src/caffe/solver.cpp
@@ -775,9 +776,192 @@ void AdaGradSolver<Dtype>::ComputeUpdateValue(int param_id, Dtype rate) {
}
}
+template <typename Dtype>
+void AdaDeltaSolver<Dtype>::PreSolve() {
+ // Add the extra history entries for AdaDelta after those from
+ // SGDSolver::PreSolve
+ const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();
@ronghanghu

ronghanghu Aug 6, 2015

Member

Are you trying to call AdaDeltaSolver<Dtype>::PreSolve() after calling SGDSolver<Dtype>::Presolve()? Then I don't see a reason making PreSolve() a virtual function.

@ronghanghu ronghanghu and 1 other commented on an outdated diff Aug 6, 2015

src/caffe/solver.cpp
+ caffe_axpy(net_params[param_id]->count(),
+ local_decay,
+ net_params[param_id]->cpu_data(),
+ net_params[param_id]->mutable_cpu_diff());
+ } else if (regularization_type == "L1") {
+ caffe_cpu_sign(net_params[param_id]->count(),
+ net_params[param_id]->cpu_data(),
+ this->temp_[param_id]->mutable_cpu_data());
+ caffe_axpy(net_params[param_id]->count(),
+ local_decay,
+ this->temp_[param_id]->cpu_data(),
+ net_params[param_id]->mutable_cpu_diff());
+ } else {
+ LOG(FATAL) << "Unknown regularization type: " << regularization_type;
+ }
+ }
@ronghanghu

ronghanghu Aug 6, 2015

Member

Remove the regularization code here. Regularization should be handled in SGDSolver<Dtype>::Regularize. Take AdaGradSolver as an example.

@ronghanghu ronghanghu and 1 other commented on an outdated diff Aug 6, 2015

src/caffe/test/test_gradient_based_solver.cpp
@@ -61,8 +62,6 @@ class GradientBasedSolverTest : public MultiDeviceTest<TypeParam> {
ostringstream proto;
proto <<
"max_iter: " << num_iters << " "
- "base_lr: " << learning_rate << " "
- "lr_policy: 'fixed' "
"iter_size: " << iter_size << " "
@ronghanghu

ronghanghu Aug 6, 2015

Member

Let's keep the base_lr and lr_policy as discussed above.

@ronghanghu ronghanghu commented on the diff Aug 6, 2015

src/caffe/test/test_gradient_based_solver.cpp
+ for (int i = 0; i <= kNumIters; ++i) {
+ this->TestLeastSquaresUpdate(kLearningRate, kWeightDecay, kMomentum, i);
+ }
+}
+
+TYPED_TEST(AdaDeltaSolverTest, TestAdaDeltaLeastSquaresUpdateWithEverything) {
+ typedef typename TypeParam::Dtype Dtype;
+ const Dtype kLearningRate = 0.0;
+ const Dtype kWeightDecay = 0.1;
+ const Dtype kMomentum = 0.95;
+ const int kNumIters = 4;
+ for (int i = 0; i <= kNumIters; ++i) {
+ this->TestLeastSquaresUpdate(kLearningRate, kWeightDecay, kMomentum, i);
+ }
+}
+
@ronghanghu

ronghanghu Aug 6, 2015

Member

Please add one more test case:

  • TestLeastSquaresUpdateWithEverythingAccum (where you may set kNumIters = 4 and kIterSize = 2 and use CheckAccumulation)

You may take a look at those AdaGradSolverTest for details.

@matthiasplappert

matthiasplappert Aug 9, 2015

Contributor

Addressed in e4eb50b

@ronghanghu ronghanghu and 1 other commented on an outdated diff Aug 6, 2015

src/caffe/solver.cpp
+ const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();
+ for (int i = 0; i < net_params.size(); ++i) {
+ const vector<int>& shape = net_params[i]->shape();
+ this->history_.push_back(
+ shared_ptr<Blob<Dtype> >(new Blob<Dtype>(shape)));
+ }
+}
+
+template <typename Dtype>
+void AdaDeltaSolver<Dtype>::ComputeUpdateValue(int param_id, Dtype rate) {
+ const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();
+ const vector<float>& net_params_weight_decay =
+ this->net_->params_weight_decay();
+ Dtype delta = this->param_.delta();
+ Dtype momentum = this->param_.momentum();
+ Dtype weight_decay = this->param_.weight_decay();
@ronghanghu

ronghanghu Aug 6, 2015

Member

let's add Dtype local_rate = rate * net_params_lr[param_id]; here

@ronghanghu ronghanghu commented on the diff Aug 7, 2015

src/caffe/solver.cpp
+
+ // compute the update and copy to net_diff
+ caffe_gpu_mul(net_params[param_id]->count(),
+ net_params[param_id]->gpu_diff(),
+ this->update_[param_id]->gpu_data(),
+ net_params[param_id]->mutable_gpu_diff());
+
+ // compute square of update
+ caffe_gpu_powx(net_params[param_id]->count(),
+ net_params[param_id]->gpu_diff(), Dtype(2),
+ this->update_[param_id]->mutable_gpu_data());
+
+ // update history of updates
+ caffe_gpu_axpby(net_params[param_id]->count(), Dtype(1) - momentum,
+ this->update_[param_id]->gpu_data(), momentum,
+ this->history_[update_history_offset + param_id]->mutable_gpu_data());
@ronghanghu

ronghanghu Aug 7, 2015

Member

let's add local_rate multiplication after this line, where you have computed square of update (don't scale update by local_rate before computing square of update).

caffe_gpu_scale(net_params[param_id]->count(), local_rate,
    net_params[param_id]->gpu_diff(),
    net_params[param_id]->mutable_gpu_diff());

@ronghanghu ronghanghu commented on the diff Aug 7, 2015

src/caffe/solver.cpp
+
+ // compute the update
+ caffe_mul(net_params[param_id]->count(),
+ net_params[param_id]->cpu_diff(),
+ this->update_[param_id]->cpu_data(),
+ net_params[param_id]->mutable_cpu_diff());
+
+ // compute square of update
+ caffe_powx(net_params[param_id]->count(),
+ net_params[param_id]->cpu_diff(), Dtype(2),
+ this->update_[param_id]->mutable_cpu_data());
+
+ // update history of updates
+ caffe_cpu_axpby(net_params[param_id]->count(), Dtype(1) - momentum,
+ this->update_[param_id]->cpu_data(), momentum,
+ this->history_[update_history_offset + param_id]->mutable_cpu_data());
@ronghanghu

ronghanghu Aug 7, 2015

Member

let's add local_rate multiplication after this line, where you have computed square of update (don't scale update by local_rate before computing square of update).

caffe_cpu_scale(net_params[param_id]->count(), local_rate,
    net_params[param_id]->cpu_diff(),
    net_params[param_id]->mutable_cpu_diff());

@ronghanghu ronghanghu and 1 other commented on an outdated diff Aug 7, 2015

src/caffe/solver.cpp
+ caffe_gpu_axpy(net_params[param_id]->count(),
+ local_decay,
+ net_params[param_id]->gpu_data(),
+ net_params[param_id]->mutable_gpu_diff());
+ } else if (regularization_type == "L1") {
+ caffe_gpu_sign(net_params[param_id]->count(),
+ net_params[param_id]->gpu_data(),
+ this->temp_[param_id]->mutable_gpu_data());
+ caffe_gpu_axpy(net_params[param_id]->count(),
+ local_decay,
+ this->temp_[param_id]->gpu_data(),
+ net_params[param_id]->mutable_gpu_diff());
+ } else {
+ LOG(FATAL) << "Unknown regularization type: " << regularization_type;
+ }
+ }
@ronghanghu

ronghanghu Aug 7, 2015

Member

Remove the regularization code here.

Member

ronghanghu commented Aug 7, 2015

@matthiasplappert Thanks for your great PR to introduce AdaDelta solver into Caffe!

The remaining work include:

  • Add learning rate.
  • Remove regularization.
  • Add more test cases.
  • Change back Presolve() to be non-virtual.

Please modify and update according to the reviews.

Contributor

matthiasplappert commented Aug 7, 2015

@ronghanghu I'll try to find some time over the weekend to get all of this done. We should also thank @kevinbache and especially @mohomran (who wrote the original code), since I just carried on with their work.

Member

ronghanghu commented Aug 9, 2015

#2836 and #2866 introduced new conflicts to be resolved.

Contributor

matthiasplappert commented Aug 9, 2015

I'll resolve the conflict later today and (hopefully) address the reaming issues as well.

  • Add learning rate.
  • Remove regularization.
  • Add more test cases.
  • Change back PreSolve() to be non-virtual.
Contributor

matthiasplappert commented Aug 9, 2015

Update on this: This branch is now up-to-date with master and all feedback has been addressed. The tests pass locally and I expect them to also pass on the CI.

Please review my changes and let me know if everything else is required on my end, e.g. cleaning up the commit history (not sure how you usually handle this). I've also pointed out the relevant commits in each feedback discussion to hopefully help with reviewing the changes.

Finally, I have one suggestion to make: having all solvers in one relatively big file (solver.cpp) proved to be a really big pain while resolving the merge conflicts. The problem there was that RMSProb and AdaDelta were completely mixed up since they share a lot of similar code. I would propose to eventually split out the individual solvers into separate files to avoid this in the future. Should I open an issue for that?

Member

ronghanghu commented Aug 9, 2015

@matthiasplappert Thanks a lot for the update. I will review the changes today.

Finally, I have one suggestion to make: having all solvers in one relatively big file (solver.cpp) proved to be a really big pain while resolving the merge conflicts. The problem there was that RMSProb and AdaDelta were completely mixed up since they share a lot of similar code. I would propose to eventually split out the individual solvers into separate files to avoid this in the future. Should I open an issue for that?

Yes, this is quite a problem. I expect to send a solver refactor PR to split solver.cpp and extract common code for these adaptive gradient solvers, after merging AdaDelta and Adam (#2856).

@ronghanghu ronghanghu commented on an outdated diff Aug 9, 2015

include/caffe/solver.hpp
@@ -159,6 +158,22 @@ class RMSPropSolver : public SGDSolver<Dtype> {
};
template <typename Dtype>
+class AdaDeltaSolver : public SGDSolver<Dtype> {
+ public:
+ explicit AdaDeltaSolver(const SolverParameter& param)
+ : SGDSolver<Dtype>(param) { PreSolve(); }
+ explicit AdaDeltaSolver(const string& param_file)
+ : SGDSolver<Dtype>(param_file) { PreSolve(); }
+
+ protected:
+ void PreSolve();
@ronghanghu

ronghanghu Aug 9, 2015

Member

I think it is better to rename AdaDeltaSolver::PreSolve() into AdaDeltaSolver::AdaDeltaPreSolve(). Since you are going to call AdaDeltaSolver's presolve function after SGDSolver's presolve function, it is better to avoid a name conflict with SGDSolver::PreSolve(), no matter whether it is a virtual function.

@ronghanghu ronghanghu commented on an outdated diff Aug 9, 2015

include/caffe/solver.hpp
@@ -159,6 +158,22 @@ class RMSPropSolver : public SGDSolver<Dtype> {
};
template <typename Dtype>
+class AdaDeltaSolver : public SGDSolver<Dtype> {
+ public:
+ explicit AdaDeltaSolver(const SolverParameter& param)
+ : SGDSolver<Dtype>(param) { PreSolve(); }
+ explicit AdaDeltaSolver(const string& param_file)
+ : SGDSolver<Dtype>(param_file) { PreSolve(); }
+
+ protected:
+ void PreSolve();
+ virtual void Regularize(int param_id);
@ronghanghu

ronghanghu Aug 9, 2015

Member

I didn't find a significant difference between your AdaDeltaSolver::Regularize and the original SGDSolver::Regularize (the former is almost duplicated from the latter).

We can just use SGDSolver::Regularize and avoid introducing a new regularization function in AdaDeltaSolver.

@ronghanghu ronghanghu commented on an outdated diff Aug 9, 2015

src/caffe/solver.cpp
@@ -860,6 +860,214 @@ void AdaGradSolver<Dtype>::ComputeUpdateValue(int param_id, Dtype rate) {
}
template <typename Dtype>
+void AdaDeltaSolver<Dtype>::PreSolve() {
+ // Add the extra history entries for AdaDelta after those from
+ // SGDSolver::PreSolve
+ const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();
+ for (int i = 0; i < net_params.size(); ++i) {
+ const vector<int>& shape = net_params[i]->shape();
+ this->history_.push_back(
+ shared_ptr<Blob<Dtype> >(new Blob<Dtype>(shape)));
+ }
+}
+
+template <typename Dtype>
+void AdaDeltaSolver<Dtype>::Regularize(int param_id) {
+ const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();
@ronghanghu

ronghanghu Aug 9, 2015

Member

Remove the entire AdaDeltaSolver::Regularize function.

The only difference between your AdaDeltaSolver::Regularize and the original SGDSolver::Regularize seem to be that you use const vector<shared_ptr<Blob<Dtype> > >& net_params rather than const vector<Blob<Dtype>*>& net_params. The rest are all the same.

Note that after #2866, one should use const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); to be consistent.

So, I believe we don't need a AdaDeltaSolver::Regularize here. Let's just use SGDSolver::Regularize instead.

@ronghanghu ronghanghu commented on the diff Aug 9, 2015

src/caffe/test/test_gradient_based_solver.cpp
+ for (int i = 0; i <= kNumIters; ++i) {
+ this->TestLeastSquaresUpdate(kLearningRate, kWeightDecay, kMomentum, i);
+ }
+}
+
+TYPED_TEST(AdaDeltaSolverTest, TestLeastSquaresUpdateWithEverythingAccum) {
+ typedef typename TypeParam::Dtype Dtype;
+ const Dtype kLearningRate = 1.0;
+ const Dtype kWeightDecay = 0.1;
+ const Dtype kMomentum = 0.95;
+ const int kNumIters = 4;
+ const int kIterSize = 2;
+ this->CheckAccumulation(kLearningRate, kWeightDecay, kMomentum, kNumIters,
+ kIterSize);
+}
+
@ronghanghu

ronghanghu Aug 9, 2015

Member

#2836 and #2866 modifies solver as well as introduces new share param and snapshot tests. To be consistent, let's also add 4 new test cases:

TestLeastSquaresUpdateWithEverythingShare
TestLeastSquaresUpdateWithEverythingAccumShare
TestSnapshot
TestSnapshotShare

For TestSnapshot, you can take a look at TestSnapshot in SGDSolverTest as an example. For the other 3 shared cases, you just need to add this->share_ = true; to the corresponding test cases.

@ronghanghu ronghanghu commented on an outdated diff Aug 9, 2015

src/caffe/solver.cpp
@@ -860,6 +860,214 @@ void AdaGradSolver<Dtype>::ComputeUpdateValue(int param_id, Dtype rate) {
}
template <typename Dtype>
+void AdaDeltaSolver<Dtype>::PreSolve() {
+ // Add the extra history entries for AdaDelta after those from
+ // SGDSolver::PreSolve
+ const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();
@ronghanghu

ronghanghu Aug 9, 2015

Member

To be consistent with #2866, use const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); instead.

@ronghanghu ronghanghu commented on an outdated diff Aug 9, 2015

src/caffe/solver.cpp
+ LOG(FATAL) << "Unknown regularization type: " << regularization_type;
+ }
+ }
+#else
+ NO_GPU;
+#endif
+ break;
+ }
+ default:
+ LOG(FATAL) << "Unknown caffe mode: " << Caffe::mode();
+ }
+}
+
+template <typename Dtype>
+void AdaDeltaSolver<Dtype>::ComputeUpdateValue(int param_id, Dtype rate) {
+ const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();
@ronghanghu

ronghanghu Aug 9, 2015

Member

To be consistent with #2866, use const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); instead.

Member

ronghanghu commented Aug 9, 2015

@matthiasplappert I just made a few comments above. Let's get the following work done and I think this PR will be ready:

  • Rename AdaDelta::PreSolve into AdaDelta::AdaDeltaPreSolve.
  • Remove the AdaDelta::Regularize() function entirely.
  • Replace const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params(); with const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); to be consistent with #2866
  • Add 4 more test cases to be consistent with #2836 and #2866.
  • After that, squash commits by each author into a single commit, and take a further rebase against bvlc/master.
Owner

shelhamer commented Aug 9, 2015

@matthiasplappert a note about history: instead of squashing to a single commit, please squash the commits by each author into a single commit. This will leave three commits by @mohomran @kevinbache and yourself. In future work please make use of rebase instead of merge, as our policy is to only have merge commits for PRs. Thanks.

having all solvers in one relatively big file (solver.cpp) proved to be a really big pain while resolving the merge conflicts. [...] I would propose to eventually split out the individual solvers into separate files to avoid this in the future.

Absolutely, and this was noted in #2860 but deserves another issue so I've transplanted it to #2890.

Contributor

matthiasplappert commented Aug 10, 2015

@ronghanghu Thanks for the thorough review! I'm still very new to caffe, so your feedback is very much appreciated.

I've addressed the remaining feedback and cleaned up the commit history (also: no more merges). All tests pass locally (not sure if Travis will pick this up since the branch was force-pushed to override the history). Let me know if anything else needs to be done before we can land this in master.

Member

ronghanghu commented Aug 10, 2015

@matthiasplappert Thanks for the update! I'll take a final review, and I expect to merge it tomorrow. @jeffdonahue could you also take a look?

@ronghanghu ronghanghu added a commit that referenced this pull request Aug 11, 2015

@ronghanghu ronghanghu Merge pull request #2782 from matthiasplappert/adadelta
AdaDelta Solver (v3)
ebc3e3b

@ronghanghu ronghanghu merged commit ebc3e3b into BVLC:master Aug 11, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Member

ronghanghu commented Aug 11, 2015

Finished final review. Thanks for the @mohomran, @kevinbache and @matthiasplappert for this excellent AdaDelta solver.

@ctrevino ctrevino added a commit to Robotertechnik/caffe that referenced this pull request Aug 11, 2015

@ctrevino ctrevino Merge remote-tracking branch 'upstream/master'
Merge pull request #2782 from matthiasplappert/adadelta
ab3842a

@ctrevino ctrevino added a commit to Robotertechnik/caffe that referenced this pull request Aug 11, 2015

@ctrevino ctrevino Merge pull request #2782 from matthiasplappert/adadelta 8c83b88
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment