AdaDelta Solver (v3) #2782

Merged
merged 3 commits into from Aug 11, 2015

Conversation

Projects
None yet
7 participants
@matthiasplappert
Contributor

matthiasplappert commented Jul 18, 2015

Picked up @kevinbache's branch (#2204), merged it with master, resolved merge conflicts and fixed a couple of issues due to API changes. All tests pass.

However, I need input on one change, please see comment directly in the diff.

src/caffe/solver.cpp
@@ -434,7 +434,8 @@ Dtype SGDSolver<Dtype>::GetLearningRate() {
(Dtype(1.) + exp(-this->param_.gamma() * (Dtype(this->iter_) -
Dtype(this->param_.stepsize())))));
} else {
- LOG(FATAL) << "Unknown learning rate policy: " << lr_policy;
+ rate = Dtype(0.);

This comment has been minimized.

@matthiasplappert

matthiasplappert Jul 18, 2015

Contributor

I'm unsure what the best way to solve this is. The problem here is that AdaDelta solver does not support a learning rate. However, since AdaDelta inherits from SGD, and SGD calls ApplyUpdates which, in turn, calls this method, we trigger the default case and therefore the fatal log (which is currently commented out). Returning a rate of 0.0 works fine, but is probably likely to cause errors in other areas of the code base where a valid learning rate is expected. Any input on this is greatly appreciated!

@matthiasplappert

matthiasplappert Jul 18, 2015

Contributor

I'm unsure what the best way to solve this is. The problem here is that AdaDelta solver does not support a learning rate. However, since AdaDelta inherits from SGD, and SGD calls ApplyUpdates which, in turn, calls this method, we trigger the default case and therefore the fatal log (which is currently commented out). Returning a rate of 0.0 works fine, but is probably likely to cause errors in other areas of the code base where a valid learning rate is expected. Any input on this is greatly appreciated!

This comment has been minimized.

@seanbell

seanbell Jul 20, 2015

Contributor

One possible idea: keep the learning rate schedule, and treat it as a multiplier on the AdaDelta update step size. The only ugly part of this solution is that it would require the user to specify base_lr: 1 lr_policy: 'fixed' in order to get the default behavior.

@seanbell

seanbell Jul 20, 2015

Contributor

One possible idea: keep the learning rate schedule, and treat it as a multiplier on the AdaDelta update step size. The only ugly part of this solution is that it would require the user to specify base_lr: 1 lr_policy: 'fixed' in order to get the default behavior.

This comment has been minimized.

@matthiasplappert

matthiasplappert Jul 25, 2015

Contributor

That would be a possible solution. Before going any further with this, is adding AdaDelta even of interest for caffe? I don't want to invest time into this if it's not likely to land in master eventually.

@matthiasplappert

matthiasplappert Jul 25, 2015

Contributor

That would be a possible solution. Before going any further with this, is adding AdaDelta even of interest for caffe? I don't want to invest time into this if it's not likely to land in master eventually.

This comment has been minimized.

@PatWie

PatWie Aug 4, 2015

Contributor

I would strongly argue for AdaDelta shipped within the Caffe-Framework. I was surprised that it isn't already in the master-branch.

@PatWie

PatWie Aug 4, 2015

Contributor

I would strongly argue for AdaDelta shipped within the Caffe-Framework. I was surprised that it isn't already in the master-branch.

This comment has been minimized.

@ronghanghu

ronghanghu Aug 6, 2015

Member

I am also strongly in favor of having AdaDelta in Caffe. I'll go over and review this PR today.

@ronghanghu

ronghanghu Aug 6, 2015

Member

I am also strongly in favor of having AdaDelta in Caffe. I'll go over and review this PR today.

This comment has been minimized.

@ronghanghu

ronghanghu Aug 6, 2015

Member

For the learning rate issue, I suggest using base_lr: 1 and lr_policy: 'fixed'

I suppose learning rate specification is still sometimes needed, even if you use AdaDelta. Take fine-tuning as an example, you may still want to have a smaller learning rate on pre-trained layers than on random-initialized layers even if you use AdaDelta.

For clarity, Let's change line 7 of Algorithm 1 in AdaDelta paper from:

x(t+1) = x(t) + delta_x(t)

to

x(t+1) = x(t) + local_rate * delta_x(t)

where local_rate = base_lr * lr_mult is the local learning rate for each parameter blob.

@ronghanghu

ronghanghu Aug 6, 2015

Member

For the learning rate issue, I suggest using base_lr: 1 and lr_policy: 'fixed'

I suppose learning rate specification is still sometimes needed, even if you use AdaDelta. Take fine-tuning as an example, you may still want to have a smaller learning rate on pre-trained layers than on random-initialized layers even if you use AdaDelta.

For clarity, Let's change line 7 of Algorithm 1 in AdaDelta paper from:

x(t+1) = x(t) + delta_x(t)

to

x(t+1) = x(t) + local_rate * delta_x(t)

where local_rate = base_lr * lr_mult is the local learning rate for each parameter blob.

This comment has been minimized.

@matthiasplappert matthiasplappert changed the title from AdaDelta v3 to AdaDelta Solver (attempt number 3) Jul 18, 2015

@matthiasplappert matthiasplappert changed the title from AdaDelta Solver (attempt number 3) to AdaDelta Solver (v3) Jul 18, 2015

@matthiasplappert

This comment has been minimized.

Show comment
Hide comment
@matthiasplappert

matthiasplappert Jul 18, 2015

Contributor

Travis failed b/c of lint error (the commented-out LOG is causing the error, which will go away before merging this anyway, see comment above).

Contributor

matthiasplappert commented Jul 18, 2015

Travis failed b/c of lint error (the commented-out LOG is causing the error, which will go away before merging this anyway, see comment above).

@shelhamer shelhamer referenced this pull request Aug 4, 2015

Closed

AdaDelta v2 #2204

@shelhamer shelhamer added the focus label Aug 4, 2015

@shelhamer shelhamer referenced this pull request Aug 4, 2015

Closed

Adaptive Solvers: AdaDelta, RMSprop, and ADAM #2860

3 of 3 tasks complete

@ronghanghu ronghanghu added the RH label Aug 5, 2015

@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Aug 6, 2015

Member

@matthiasplappert thanks for making the update, but take another look at #2518 and see how the regularization and logging code was pulled out into SGDSolver.

Member

shelhamer commented Aug 6, 2015

@matthiasplappert thanks for making the update, but take another look at #2518 and see how the regularization and logging code was pulled out into SGDSolver.

include/caffe/solver.hpp
+ explicit AdaDeltaSolver(const SolverParameter& param)
+ : SGDSolver<Dtype>(param) { PreSolve(); constructor_sanity_check(); }
+ explicit AdaDeltaSolver(const string& param_file)
+ : SGDSolver<Dtype>(param_file) { PreSolve(); constructor_sanity_check(); }

This comment has been minimized.

@ronghanghu

ronghanghu Aug 6, 2015

Member

I suppose you have something wrong here. Now you are calling PreSolve() in constructor of both AdaDeltaSolver and SGDSolver, and since you turned in into a virtual method, you are now calling AdaDeltaSolver::PreSolve() twice when constructing a AdaDeltaSolver instance. Is that the desired behavior?
Sorry I was wrong here. Before the derived class constructor is called, the dynamic type of the object under construction is a base class instance and not a derived class instance. For this reason, you are still calling AdaDeltaSolver::PreSolve() in AdaDeltaSolver::AdaDeltaSolver after calling SGDSolver::Presolve() in SGDSolver::SGDSolver. However, I still don't see a reason making Presolve a virtual function, and in general it is not good to call a virtual function inside a constructor in C++.

Also see the comment below in AdaDeltaSolver<Dtype>::PreSolve().

@ronghanghu

ronghanghu Aug 6, 2015

Member

I suppose you have something wrong here. Now you are calling PreSolve() in constructor of both AdaDeltaSolver and SGDSolver, and since you turned in into a virtual method, you are now calling AdaDeltaSolver::PreSolve() twice when constructing a AdaDeltaSolver instance. Is that the desired behavior?
Sorry I was wrong here. Before the derived class constructor is called, the dynamic type of the object under construction is a base class instance and not a derived class instance. For this reason, you are still calling AdaDeltaSolver::PreSolve() in AdaDeltaSolver::AdaDeltaSolver after calling SGDSolver::Presolve() in SGDSolver::SGDSolver. However, I still don't see a reason making Presolve a virtual function, and in general it is not good to call a virtual function inside a constructor in C++.

Also see the comment below in AdaDeltaSolver<Dtype>::PreSolve().

This comment has been minimized.

@matthiasplappert

matthiasplappert Aug 9, 2015

Contributor

virtual issue addressed in aedff90

@matthiasplappert

matthiasplappert Aug 9, 2015

Contributor

virtual issue addressed in aedff90

include/caffe/solver.hpp
+ << "Learning rate cannot be used with AdaDelta.";
+ CHECK_EQ("", this->param_.lr_policy())
+ << "Learning rate policy cannot be applied to AdaDelta.";
+ }

This comment has been minimized.

@ronghanghu

ronghanghu Aug 6, 2015

Member

Let's still keep the base_lr and lr_policy policy as discussed.

@ronghanghu

ronghanghu Aug 6, 2015

Member

Let's still keep the base_lr and lr_policy policy as discussed.

src/caffe/solver.cpp
+void AdaDeltaSolver<Dtype>::PreSolve() {
+ // Add the extra history entries for AdaDelta after those from
+ // SGDSolver::PreSolve
+ const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();

This comment has been minimized.

@ronghanghu

ronghanghu Aug 6, 2015

Member

Are you trying to call AdaDeltaSolver<Dtype>::PreSolve() after calling SGDSolver<Dtype>::Presolve()? Then I don't see a reason making PreSolve() a virtual function.

@ronghanghu

ronghanghu Aug 6, 2015

Member

Are you trying to call AdaDeltaSolver<Dtype>::PreSolve() after calling SGDSolver<Dtype>::Presolve()? Then I don't see a reason making PreSolve() a virtual function.

src/caffe/solver.cpp
+ } else {
+ LOG(FATAL) << "Unknown regularization type: " << regularization_type;
+ }
+ }

This comment has been minimized.

@ronghanghu

ronghanghu Aug 6, 2015

Member

Remove the regularization code here. Regularization should be handled in SGDSolver<Dtype>::Regularize. Take AdaGradSolver as an example.

@ronghanghu

ronghanghu Aug 6, 2015

Member

Remove the regularization code here. Regularization should be handled in SGDSolver<Dtype>::Regularize. Take AdaGradSolver as an example.

This comment has been minimized.

@@ -61,8 +62,6 @@ class GradientBasedSolverTest : public MultiDeviceTest<TypeParam> {
ostringstream proto;
proto <<
"max_iter: " << num_iters << " "
- "base_lr: " << learning_rate << " "
- "lr_policy: 'fixed' "
"iter_size: " << iter_size << " "

This comment has been minimized.

@ronghanghu

ronghanghu Aug 6, 2015

Member

Let's keep the base_lr and lr_policy as discussed above.

@ronghanghu

ronghanghu Aug 6, 2015

Member

Let's keep the base_lr and lr_policy as discussed above.

This comment has been minimized.

+ this->TestLeastSquaresUpdate(kLearningRate, kWeightDecay, kMomentum, i);
+ }
+}
+

This comment has been minimized.

@ronghanghu

ronghanghu Aug 6, 2015

Member

Please add one more test case:

  • TestLeastSquaresUpdateWithEverythingAccum (where you may set kNumIters = 4 and kIterSize = 2 and use CheckAccumulation)

You may take a look at those AdaGradSolverTest for details.

@ronghanghu

ronghanghu Aug 6, 2015

Member

Please add one more test case:

  • TestLeastSquaresUpdateWithEverythingAccum (where you may set kNumIters = 4 and kIterSize = 2 and use CheckAccumulation)

You may take a look at those AdaGradSolverTest for details.

This comment has been minimized.

@matthiasplappert

matthiasplappert Aug 9, 2015

Contributor

Addressed in e4eb50b

@matthiasplappert

matthiasplappert Aug 9, 2015

Contributor

Addressed in e4eb50b

src/caffe/solver.cpp
+ this->net_->params_weight_decay();
+ Dtype delta = this->param_.delta();
+ Dtype momentum = this->param_.momentum();
+ Dtype weight_decay = this->param_.weight_decay();

This comment has been minimized.

@ronghanghu

ronghanghu Aug 6, 2015

Member

let's add Dtype local_rate = rate * net_params_lr[param_id]; here

@ronghanghu

ronghanghu Aug 6, 2015

Member

let's add Dtype local_rate = rate * net_params_lr[param_id]; here

This comment has been minimized.

+ // update history of updates
+ caffe_gpu_axpby(net_params[param_id]->count(), Dtype(1) - momentum,
+ this->update_[param_id]->gpu_data(), momentum,
+ this->history_[update_history_offset + param_id]->mutable_gpu_data());

This comment has been minimized.

@ronghanghu

ronghanghu Aug 7, 2015

Member

let's add local_rate multiplication after this line, where you have computed square of update (don't scale update by local_rate before computing square of update).

caffe_gpu_scale(net_params[param_id]->count(), local_rate,
    net_params[param_id]->gpu_diff(),
    net_params[param_id]->mutable_gpu_diff());
@ronghanghu

ronghanghu Aug 7, 2015

Member

let's add local_rate multiplication after this line, where you have computed square of update (don't scale update by local_rate before computing square of update).

caffe_gpu_scale(net_params[param_id]->count(), local_rate,
    net_params[param_id]->gpu_diff(),
    net_params[param_id]->mutable_gpu_diff());

This comment has been minimized.

+ // update history of updates
+ caffe_cpu_axpby(net_params[param_id]->count(), Dtype(1) - momentum,
+ this->update_[param_id]->cpu_data(), momentum,
+ this->history_[update_history_offset + param_id]->mutable_cpu_data());

This comment has been minimized.

@ronghanghu

ronghanghu Aug 7, 2015

Member

let's add local_rate multiplication after this line, where you have computed square of update (don't scale update by local_rate before computing square of update).

caffe_cpu_scale(net_params[param_id]->count(), local_rate,
    net_params[param_id]->cpu_diff(),
    net_params[param_id]->mutable_cpu_diff());
@ronghanghu

ronghanghu Aug 7, 2015

Member

let's add local_rate multiplication after this line, where you have computed square of update (don't scale update by local_rate before computing square of update).

caffe_cpu_scale(net_params[param_id]->count(), local_rate,
    net_params[param_id]->cpu_diff(),
    net_params[param_id]->mutable_cpu_diff());

This comment has been minimized.

src/caffe/solver.cpp
+ } else {
+ LOG(FATAL) << "Unknown regularization type: " << regularization_type;
+ }
+ }

This comment has been minimized.

@ronghanghu

ronghanghu Aug 7, 2015

Member

Remove the regularization code here.

@ronghanghu

ronghanghu Aug 7, 2015

Member

Remove the regularization code here.

This comment has been minimized.

@ronghanghu

This comment has been minimized.

Show comment
Hide comment
@ronghanghu

ronghanghu Aug 7, 2015

Member

@matthiasplappert Thanks for your great PR to introduce AdaDelta solver into Caffe!

The remaining work include:

  • Add learning rate.
  • Remove regularization.
  • Add more test cases.
  • Change back Presolve() to be non-virtual.

Please modify and update according to the reviews.

Member

ronghanghu commented Aug 7, 2015

@matthiasplappert Thanks for your great PR to introduce AdaDelta solver into Caffe!

The remaining work include:

  • Add learning rate.
  • Remove regularization.
  • Add more test cases.
  • Change back Presolve() to be non-virtual.

Please modify and update according to the reviews.

@matthiasplappert

This comment has been minimized.

Show comment
Hide comment
@matthiasplappert

matthiasplappert Aug 7, 2015

Contributor

@ronghanghu I'll try to find some time over the weekend to get all of this done. We should also thank @kevinbache and especially @mohomran (who wrote the original code), since I just carried on with their work.

Contributor

matthiasplappert commented Aug 7, 2015

@ronghanghu I'll try to find some time over the weekend to get all of this done. We should also thank @kevinbache and especially @mohomran (who wrote the original code), since I just carried on with their work.

@ronghanghu

This comment has been minimized.

Show comment
Hide comment
@ronghanghu

ronghanghu Aug 9, 2015

Member

#2836 and #2866 introduced new conflicts to be resolved.

Member

ronghanghu commented Aug 9, 2015

#2836 and #2866 introduced new conflicts to be resolved.

@matthiasplappert

This comment has been minimized.

Show comment
Hide comment
@matthiasplappert

matthiasplappert Aug 9, 2015

Contributor

I'll resolve the conflict later today and (hopefully) address the reaming issues as well.

  • Add learning rate.
  • Remove regularization.
  • Add more test cases.
  • Change back PreSolve() to be non-virtual.
Contributor

matthiasplappert commented Aug 9, 2015

I'll resolve the conflict later today and (hopefully) address the reaming issues as well.

  • Add learning rate.
  • Remove regularization.
  • Add more test cases.
  • Change back PreSolve() to be non-virtual.
@matthiasplappert

This comment has been minimized.

Show comment
Hide comment
@matthiasplappert

matthiasplappert Aug 9, 2015

Contributor

Update on this: This branch is now up-to-date with master and all feedback has been addressed. The tests pass locally and I expect them to also pass on the CI.

Please review my changes and let me know if everything else is required on my end, e.g. cleaning up the commit history (not sure how you usually handle this). I've also pointed out the relevant commits in each feedback discussion to hopefully help with reviewing the changes.

Finally, I have one suggestion to make: having all solvers in one relatively big file (solver.cpp) proved to be a really big pain while resolving the merge conflicts. The problem there was that RMSProb and AdaDelta were completely mixed up since they share a lot of similar code. I would propose to eventually split out the individual solvers into separate files to avoid this in the future. Should I open an issue for that?

Contributor

matthiasplappert commented Aug 9, 2015

Update on this: This branch is now up-to-date with master and all feedback has been addressed. The tests pass locally and I expect them to also pass on the CI.

Please review my changes and let me know if everything else is required on my end, e.g. cleaning up the commit history (not sure how you usually handle this). I've also pointed out the relevant commits in each feedback discussion to hopefully help with reviewing the changes.

Finally, I have one suggestion to make: having all solvers in one relatively big file (solver.cpp) proved to be a really big pain while resolving the merge conflicts. The problem there was that RMSProb and AdaDelta were completely mixed up since they share a lot of similar code. I would propose to eventually split out the individual solvers into separate files to avoid this in the future. Should I open an issue for that?

@ronghanghu

This comment has been minimized.

Show comment
Hide comment
@ronghanghu

ronghanghu Aug 9, 2015

Member

@matthiasplappert Thanks a lot for the update. I will review the changes today.

Finally, I have one suggestion to make: having all solvers in one relatively big file (solver.cpp) proved to be a really big pain while resolving the merge conflicts. The problem there was that RMSProb and AdaDelta were completely mixed up since they share a lot of similar code. I would propose to eventually split out the individual solvers into separate files to avoid this in the future. Should I open an issue for that?

Yes, this is quite a problem. I expect to send a solver refactor PR to split solver.cpp and extract common code for these adaptive gradient solvers, after merging AdaDelta and Adam (#2856).

Member

ronghanghu commented Aug 9, 2015

@matthiasplappert Thanks a lot for the update. I will review the changes today.

Finally, I have one suggestion to make: having all solvers in one relatively big file (solver.cpp) proved to be a really big pain while resolving the merge conflicts. The problem there was that RMSProb and AdaDelta were completely mixed up since they share a lot of similar code. I would propose to eventually split out the individual solvers into separate files to avoid this in the future. Should I open an issue for that?

Yes, this is quite a problem. I expect to send a solver refactor PR to split solver.cpp and extract common code for these adaptive gradient solvers, after merging AdaDelta and Adam (#2856).

include/caffe/solver.hpp
+ : SGDSolver<Dtype>(param_file) { PreSolve(); }
+
+ protected:
+ void PreSolve();

This comment has been minimized.

@ronghanghu

ronghanghu Aug 9, 2015

Member

I think it is better to rename AdaDeltaSolver::PreSolve() into AdaDeltaSolver::AdaDeltaPreSolve(). Since you are going to call AdaDeltaSolver's presolve function after SGDSolver's presolve function, it is better to avoid a name conflict with SGDSolver::PreSolve(), no matter whether it is a virtual function.

@ronghanghu

ronghanghu Aug 9, 2015

Member

I think it is better to rename AdaDeltaSolver::PreSolve() into AdaDeltaSolver::AdaDeltaPreSolve(). Since you are going to call AdaDeltaSolver's presolve function after SGDSolver's presolve function, it is better to avoid a name conflict with SGDSolver::PreSolve(), no matter whether it is a virtual function.

include/caffe/solver.hpp
+
+ protected:
+ void PreSolve();
+ virtual void Regularize(int param_id);

This comment has been minimized.

@ronghanghu

ronghanghu Aug 9, 2015

Member

I didn't find a significant difference between your AdaDeltaSolver::Regularize and the original SGDSolver::Regularize (the former is almost duplicated from the latter).

We can just use SGDSolver::Regularize and avoid introducing a new regularization function in AdaDeltaSolver.

@ronghanghu

ronghanghu Aug 9, 2015

Member

I didn't find a significant difference between your AdaDeltaSolver::Regularize and the original SGDSolver::Regularize (the former is almost duplicated from the latter).

We can just use SGDSolver::Regularize and avoid introducing a new regularization function in AdaDeltaSolver.

src/caffe/solver.cpp
+
+template <typename Dtype>
+void AdaDeltaSolver<Dtype>::Regularize(int param_id) {
+ const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();

This comment has been minimized.

@ronghanghu

ronghanghu Aug 9, 2015

Member

Remove the entire AdaDeltaSolver::Regularize function.

The only difference between your AdaDeltaSolver::Regularize and the original SGDSolver::Regularize seem to be that you use const vector<shared_ptr<Blob<Dtype> > >& net_params rather than const vector<Blob<Dtype>*>& net_params. The rest are all the same.

Note that after #2866, one should use const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); to be consistent.

So, I believe we don't need a AdaDeltaSolver::Regularize here. Let's just use SGDSolver::Regularize instead.

@ronghanghu

ronghanghu Aug 9, 2015

Member

Remove the entire AdaDeltaSolver::Regularize function.

The only difference between your AdaDeltaSolver::Regularize and the original SGDSolver::Regularize seem to be that you use const vector<shared_ptr<Blob<Dtype> > >& net_params rather than const vector<Blob<Dtype>*>& net_params. The rest are all the same.

Note that after #2866, one should use const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); to be consistent.

So, I believe we don't need a AdaDeltaSolver::Regularize here. Let's just use SGDSolver::Regularize instead.

+ this->CheckAccumulation(kLearningRate, kWeightDecay, kMomentum, kNumIters,
+ kIterSize);
+}
+

This comment has been minimized.

@ronghanghu

ronghanghu Aug 9, 2015

Member

#2836 and #2866 modifies solver as well as introduces new share param and snapshot tests. To be consistent, let's also add 4 new test cases:

TestLeastSquaresUpdateWithEverythingShare
TestLeastSquaresUpdateWithEverythingAccumShare
TestSnapshot
TestSnapshotShare

For TestSnapshot, you can take a look at TestSnapshot in SGDSolverTest as an example. For the other 3 shared cases, you just need to add this->share_ = true; to the corresponding test cases.

@ronghanghu

ronghanghu Aug 9, 2015

Member

#2836 and #2866 modifies solver as well as introduces new share param and snapshot tests. To be consistent, let's also add 4 new test cases:

TestLeastSquaresUpdateWithEverythingShare
TestLeastSquaresUpdateWithEverythingAccumShare
TestSnapshot
TestSnapshotShare

For TestSnapshot, you can take a look at TestSnapshot in SGDSolverTest as an example. For the other 3 shared cases, you just need to add this->share_ = true; to the corresponding test cases.

src/caffe/solver.cpp
+void AdaDeltaSolver<Dtype>::PreSolve() {
+ // Add the extra history entries for AdaDelta after those from
+ // SGDSolver::PreSolve
+ const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();

This comment has been minimized.

@ronghanghu

ronghanghu Aug 9, 2015

Member

To be consistent with #2866, use const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); instead.

@ronghanghu

ronghanghu Aug 9, 2015

Member

To be consistent with #2866, use const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); instead.

src/caffe/solver.cpp
+
+template <typename Dtype>
+void AdaDeltaSolver<Dtype>::ComputeUpdateValue(int param_id, Dtype rate) {
+ const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params();

This comment has been minimized.

@ronghanghu

ronghanghu Aug 9, 2015

Member

To be consistent with #2866, use const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); instead.

@ronghanghu

ronghanghu Aug 9, 2015

Member

To be consistent with #2866, use const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); instead.

@ronghanghu

This comment has been minimized.

Show comment
Hide comment
@ronghanghu

ronghanghu Aug 9, 2015

Member

@matthiasplappert I just made a few comments above. Let's get the following work done and I think this PR will be ready:

  • Rename AdaDelta::PreSolve into AdaDelta::AdaDeltaPreSolve.
  • Remove the AdaDelta::Regularize() function entirely.
  • Replace const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params(); with const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); to be consistent with #2866
  • Add 4 more test cases to be consistent with #2836 and #2866.
  • After that, squash commits by each author into a single commit, and take a further rebase against bvlc/master.
Member

ronghanghu commented Aug 9, 2015

@matthiasplappert I just made a few comments above. Let's get the following work done and I think this PR will be ready:

  • Rename AdaDelta::PreSolve into AdaDelta::AdaDeltaPreSolve.
  • Remove the AdaDelta::Regularize() function entirely.
  • Replace const vector<shared_ptr<Blob<Dtype> > >& net_params = this->net_->params(); with const vector<Blob<Dtype>*>& net_params = this->net_->learnable_params(); to be consistent with #2866
  • Add 4 more test cases to be consistent with #2836 and #2866.
  • After that, squash commits by each author into a single commit, and take a further rebase against bvlc/master.
@shelhamer

This comment has been minimized.

Show comment
Hide comment
@shelhamer

shelhamer Aug 9, 2015

Member

@matthiasplappert a note about history: instead of squashing to a single commit, please squash the commits by each author into a single commit. This will leave three commits by @mohomran @kevinbache and yourself. In future work please make use of rebase instead of merge, as our policy is to only have merge commits for PRs. Thanks.

having all solvers in one relatively big file (solver.cpp) proved to be a really big pain while resolving the merge conflicts. [...] I would propose to eventually split out the individual solvers into separate files to avoid this in the future.

Absolutely, and this was noted in #2860 but deserves another issue so I've transplanted it to #2890.

Member

shelhamer commented Aug 9, 2015

@matthiasplappert a note about history: instead of squashing to a single commit, please squash the commits by each author into a single commit. This will leave three commits by @mohomran @kevinbache and yourself. In future work please make use of rebase instead of merge, as our policy is to only have merge commits for PRs. Thanks.

having all solvers in one relatively big file (solver.cpp) proved to be a really big pain while resolving the merge conflicts. [...] I would propose to eventually split out the individual solvers into separate files to avoid this in the future.

Absolutely, and this was noted in #2860 but deserves another issue so I've transplanted it to #2890.

@matthiasplappert

This comment has been minimized.

Show comment
Hide comment
@matthiasplappert

matthiasplappert Aug 10, 2015

Contributor

@ronghanghu Thanks for the thorough review! I'm still very new to caffe, so your feedback is very much appreciated.

I've addressed the remaining feedback and cleaned up the commit history (also: no more merges). All tests pass locally (not sure if Travis will pick this up since the branch was force-pushed to override the history). Let me know if anything else needs to be done before we can land this in master.

Contributor

matthiasplappert commented Aug 10, 2015

@ronghanghu Thanks for the thorough review! I'm still very new to caffe, so your feedback is very much appreciated.

I've addressed the remaining feedback and cleaned up the commit history (also: no more merges). All tests pass locally (not sure if Travis will pick this up since the branch was force-pushed to override the history). Let me know if anything else needs to be done before we can land this in master.

@ronghanghu

This comment has been minimized.

Show comment
Hide comment
@ronghanghu

ronghanghu Aug 10, 2015

Member

@matthiasplappert Thanks for the update! I'll take a final review, and I expect to merge it tomorrow. @jeffdonahue could you also take a look?

Member

ronghanghu commented Aug 10, 2015

@matthiasplappert Thanks for the update! I'll take a final review, and I expect to merge it tomorrow. @jeffdonahue could you also take a look?

@lukeyeager lukeyeager referenced this pull request in NVIDIA/DIGITS Aug 10, 2015

Merged

Windows Compatibility #199

@ronghanghu ronghanghu referenced this pull request Aug 10, 2015

Closed

Adam solver #2856

ronghanghu added a commit that referenced this pull request Aug 11, 2015

@ronghanghu ronghanghu merged commit ebc3e3b into BVLC:master Aug 11, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ronghanghu

This comment has been minimized.

Show comment
Hide comment
@ronghanghu

ronghanghu Aug 11, 2015

Member

Finished final review. Thanks for the @mohomran, @kevinbache and @matthiasplappert for this excellent AdaDelta solver.

Member

ronghanghu commented Aug 11, 2015

Finished final review. Thanks for the @mohomran, @kevinbache and @matthiasplappert for this excellent AdaDelta solver.

ctrevino added a commit to Robotertechnik/caffe that referenced this pull request Aug 11, 2015

Merge remote-tracking branch 'upstream/master'
Merge pull request #2782 from matthiasplappert/adadelta

ctrevino added a commit to Robotertechnik/caffe that referenced this pull request Aug 11, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment