New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenNMT brnn model parity #1031

Closed
eltorre opened this Issue Nov 2, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@eltorre

eltorre commented Nov 2, 2018

Hi all,

I am training some baseline brnn systems for English->Chinese and English->Spanish using OpenNMT and OpenNMT-py. The training set has 1m sentences randomly extracted from the corpus available in http://opus.nlpl.eu/, and training and development sets have 10k sentences extracted the same way. All parameters have the default value, but for the encoder type.

I assumed that both OpenNMT and OpenNMT-py would produce similar results, but I was surprised to see that the results were completely different. OpenNMT trained for 13 epochs (roughly 110k steps), what took around 20h to train, but OpenNMT-py 100k steps only took 5h; OpenNMT obtains an average of 5 BLEU points more than OpenNMT-py on the test set . Duplicating the number of steps for OpenNMT-py generated a system with the same BLEU up to the 3rd digit.

I also added POS tags to the English side as word features; while this increased the performance of the OpenNMT system by around 2 BLEU points, OpenNMT-py saw a increase of 0.02 BLEU points.

Then, I started digging a bit more and found that there are several differences between OpenNMT and OpenNMT-py.

In particular, the biggest difference seems to be the learning rate decay function: while (by default) OpenNMT only decreases the learning rate after 9 epochs (roughly 70% of the training time), and only reduces the learning rate to 0.7 when the score does not improve, OpenNMT-py has a much more aggressive decay function, halving the learning rate after 50k steps, then after every 10k steps, regardless of the score (what kinda explains why the system with 200k steps had basically the same performance). Also, it seems the maximum number of feature values per batch is different: while in OpenNMT is 20, in OpenNMT-py is N^0.7 (N^0.7=8 in this experiment), what can partially explain the lack of improvement when adding features.

Has anyone experimented with OpenNMT-py parameters in order to obtain a similar performance to OpenNMT?

@vince62s

This comment has been minimized.

Contributor

vince62s commented Nov 2, 2018

Thanks for the detailed report.
You can adjust the decay scheme to make it closer to the Lua version using these:
https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/opts.py#L401-L414

To be honest, we have not tuned the RNN parameters because most recent work has been done on the transformer.

For the features thing, I am surprised by the +2 Bleu point was it for Chinese ?

I'll check what we currently do in -py for this.

@vince62s

This comment has been minimized.

Contributor

vince62s commented Nov 2, 2018

Also, can you copy paste this on the Forum, this is a better place to keep track of this.
thanks

@vince62s vince62s closed this Nov 2, 2018

@vince62s

This comment has been minimized.

Contributor

vince62s commented Nov 2, 2018

more info here: #147
and you can change the feat_vec_size here:
https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/opts.py#L40-L47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment