Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Model averaging #514

Closed
vince62s opened this issue Jan 14, 2018 · 2 comments
Closed

[feature request] Model averaging #514

vince62s opened this issue Jan 14, 2018 · 2 comments

Comments

@vince62s
Copy link
Member

In order to do model averaging, we need to keep several checkpoints.
2 approaches:
Time-based: this is the TF approach which keeps 10 minutes interval ckpt files, up to N (20 by default) files
Step-based: same as in Lua-onmt with save_every option which saves a chekcpoint every X iteration.

Step is more appropriate since system's speed could vary a lot.
But it's also good to have a "keep last N" flag to minimize disk usage.

then the external tool can average weights and spit out the averaged model.
should guve at least 1 bleu improvement.

@vince62s
Copy link
Member Author

Thanks @pltrdy
The last thing we need is 2 extra options at training
-keep_ckpt N
-save_ckpt_every M

N means we keep the last N checkpoint (eg: 5 10 20) good for averaging.
M is a number of iterations. We save a checkpoint each M iteration.

@srush @helson73 tell me if you're ok with this.

@helson73
Copy link
Contributor

That’s great!

@vince62s vince62s closed this as completed Aug 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants