Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify training interface by removing weight decay and scaling #695

Merged
merged 5 commits into from Jul 14, 2017

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Jul 14, 2017

There is some confusion regarding the interface of the "Trainer" class:
#641
#684

I agree that it's difficult to understand. This commit removes the rate decay and gradient scaling functionality that implicitly changes the learning rate in non-transparent ways. Here is an example of the before/after behavior:

Rate Decay Before

// At beginning of training
Trainer trainer(initial_learning_rate, rate_decay)
// After every epoch
trainer.update_epoch()

Rate Decay After

// At beginning of training
Trainer trainer(initial_learning_rate)
// After every epoch
trainer.learning_rate /= (1 - rate_decay)

Gradient Scaling Before:

cg.backward(loss)
trainer.update(scaling_factor)

Gradient Scaling After:

cg.backward(loss * scaling_factor)
trainer.update()
Copy link
Collaborator

@pmichel31415 pmichel31415 left a comment

LGTM!

@neubig neubig merged commit 348b502 into master Jul 14, 2017
2 checks passed
@liesun1994
Copy link

@liesun1994 liesun1994 commented Aug 17, 2017

I am wondering if lamtram need to be modified with the issues pulled.

@liesun1994
Copy link

@liesun1994 liesun1994 commented Aug 17, 2017

When I am running lamtram, it may occurs Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see #695 for details .

@neubig neubig deleted the simplify-training branch Sep 1, 2017
tetsuok added a commit to tetsuok/dynet that referenced this issue Sep 2, 2017
tetsuok added a commit to tetsuok/dynet that referenced this issue Sep 3, 2017
neubig added a commit that referenced this issue Sep 4, 2017
* Remove deprecated Trainer::update_epoch

It was deprecated in #695.

* Remove first variable from examples
@yuvalpinter
Copy link
Contributor

@yuvalpinter yuvalpinter commented Oct 28, 2017

I can't seem to be able to run the trainer.learning_rate /= (1 - rate_decay) code. It gives me the following error: TypeError: 'property' object is not callable.
This happens for both MomentumSGDTrainer and AdamTrainer.

@shuheik
Copy link
Contributor

@shuheik shuheik commented Jan 7, 2018

Shouldn't it be
trainer.learning_rate *= (1 - rate_decay) or
trainer.learning_rate /= (1 + rate_decay)
instead of
trainer.learning_rate /= (1 - rate_decay)?

I assume rate_decay to be some small positive value and dividing by (1- rate_decay) would make learning_rate grow larger.

yuvalpinter added a commit to yuvalpinter/dynet that referenced this issue Jan 10, 2018
kashif added a commit to kashif/dynet that referenced this issue Jan 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants