New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify training interface by removing weight decay and scaling #695

Merged
merged 5 commits into from Jul 14, 2017

Conversation

Projects
None yet
5 participants
@neubig
Copy link
Contributor

neubig commented Jul 14, 2017

There is some confusion regarding the interface of the "Trainer" class:
#641
#684

I agree that it's difficult to understand. This commit removes the rate decay and gradient scaling functionality that implicitly changes the learning rate in non-transparent ways. Here is an example of the before/after behavior:

Rate Decay Before

// At beginning of training
Trainer trainer(initial_learning_rate, rate_decay)
// After every epoch
trainer.update_epoch()

Rate Decay After

// At beginning of training
Trainer trainer(initial_learning_rate)
// After every epoch
trainer.learning_rate /= (1 - rate_decay)

Gradient Scaling Before:

cg.backward(loss)
trainer.update(scaling_factor)

Gradient Scaling After:

cg.backward(loss * scaling_factor)
trainer.update()

@neubig neubig requested review from yoavg , armatthews and pmichel31415 Jul 14, 2017

@pmichel31415
Copy link
Collaborator

pmichel31415 left a comment

LGTM!

@neubig neubig merged commit 348b502 into master Jul 14, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@neubig neubig referenced this pull request Aug 1, 2017

Closed

params in MomentumSGDTrainer #1

@liesun1994

This comment has been minimized.

Copy link

liesun1994 commented Aug 17, 2017

I am wondering if lamtram need to be modified with the issues pulled.

@liesun1994

This comment has been minimized.

Copy link

liesun1994 commented Aug 17, 2017

When I am running lamtram, it may occurs Trainer::update_epoch has been deprecated and doesn't do anything. Please remove it from your code, and control the learning rate of the trainer directly, for example by: 'trainer.learning_rate /= (1 - rate_decay)', see #695 for details .

@neubig neubig deleted the simplify-training branch Sep 1, 2017

tetsuok added a commit to tetsuok/dynet that referenced this pull request Sep 2, 2017

tetsuok added a commit to tetsuok/dynet that referenced this pull request Sep 3, 2017

neubig added a commit that referenced this pull request Sep 4, 2017

Remove deprecated Trainer::update_epoch (#852)
* Remove deprecated Trainer::update_epoch

It was deprecated in #695.

* Remove first variable from examples
@yuvalpinter

This comment has been minimized.

Copy link
Contributor

yuvalpinter commented Oct 28, 2017

I can't seem to be able to run the trainer.learning_rate /= (1 - rate_decay) code. It gives me the following error: TypeError: 'property' object is not callable.
This happens for both MomentumSGDTrainer and AdamTrainer.

@shuheik

This comment has been minimized.

Copy link
Contributor

shuheik commented Jan 7, 2018

Shouldn't it be
trainer.learning_rate *= (1 - rate_decay) or
trainer.learning_rate /= (1 + rate_decay)
instead of
trainer.learning_rate /= (1 - rate_decay)?

I assume rate_decay to be some small positive value and dividing by (1- rate_decay) would make learning_rate grow larger.

yuvalpinter added a commit to yuvalpinter/dynet that referenced this pull request Jan 10, 2018

neubig added a commit that referenced this pull request Jan 17, 2018

kashif added a commit to kashif/dynet that referenced this pull request Jan 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment