Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenNMT benchmarking versus other pytorch libraries. #139

Closed
PetrochukM opened this issue Jul 21, 2017 · 11 comments
Closed

OpenNMT benchmarking versus other pytorch libraries. #139

PetrochukM opened this issue Jul 21, 2017 · 11 comments

Comments

@PetrochukM
Copy link
Contributor

There is certainly a lot of great work going into OpenNMT. But with every new feature added from some paper, do we have any sense of those features in tandem helping OpenNMT?

This gist compares OpenNMT-py v.s. pytorch-seq2seq on newstest2013.

pytorch-seq2seq has a simple LSTM + Attention model and it achieves:

Finished epoch 50, Dev Perplexity: 1695.8370

While OpenNMT is much more mature library does worse:

Epoch 50,    40/   47; acc:   7.64; ppl: 1500.51;12384 src tok/s; 13280 tgt tok/s;    246 s elapsed
Train perplexity: 1546
Train accuracy: 6.97936
Validation perplexity: 24962.4
Validation accuracy: 4.8146
Decaying learning rate to 1.77636e-15

What are the goals of this library?

@yoonkim
Copy link

yoonkim commented Jul 21, 2017

With such a small dataset the comparison is not really meaningful.

I would say you would need a dataset of at least 100K sentences to get any indication of performance (this is for translation--for other tasks you may be able to get away with less).

@srush
Copy link
Contributor

srush commented Jul 22, 2017

Thank you for the comparison though. We will set up as benchmark to make sure it does at least this well.

We have been using this data for comparison. It is about 200k sentences.

https://github.com/OpenNMT/IntegrationTesting/tree/master/data

Would you be willing to run the pytorch-seq2seq model for comparison sake and post the ppl?

@srush srush changed the title [Dicussion] OpenNMT Benchmarking. Does OpenNMT do better than simple LSTM + Attention? OpenNMT benchmarking versus other pytorch libraries. Jul 22, 2017
@PetrochukM
Copy link
Contributor Author

@kylegao91

@kylegao91
Copy link

@Deepblue129 I agree that the newstest2013 is too small for benchmarking the effectiveness. I used it mainly for evaluating the efficiency.
@srush I will run experiments on larger dataset once the speed issue is resolved.

@srush
Copy link
Contributor

srush commented Aug 29, 2017

Any update on this? Would love to post these benchmarks now that our code is more stable.

@kylegao91
Copy link

I haven't run our experiment on the 200k sentence data mentioned above, but we do have improved our speed. I will have a look at your integration test data and get by to you soon.

@dalegebit
Copy link

dalegebit commented Sep 12, 2017

I have similar problems repeating results with PyOpenNMT. I train my English-to-German nmt model with Europarl v7, and almost completely copy the parameter setting and corpus mentioned in http://opennmt.net/Models (but I didn't use preprocess.lua and aggressive tokenizer, which are also mentioned). Even with many tries, I cannot reduce the perplexity to below 15 in validation set. I wonder why this happens? Have you tested PyOpenNMT on Europarl v7 and achieved 7.19 PPL in newstest2013.deen? Would mind you releasing the benchmarks that we can refer to?

@srush
Copy link
Contributor

srush commented Sep 12, 2017 via email

marcotcr pushed a commit to marcotcr/OpenNMT-py that referenced this issue Sep 20, 2017
* yay

* ok

* removed more of adam's code :(

* small fix
@dalegebit
Copy link

Sorry, I missed your message. Sure. Here are my logs, adam version and sgd version. Adam version has larger ppl on validation set but better accuracy, while sgd version has smaller ppl but worse accuracy. But none of them can achieve <15 ppl on validation set. In the logs, I only show the result of 13 epoches, however, even if I run many more epoches, there is no sign that they could reach that goal. I only used default parameters and default learning rate scheduling, except for changing the initial learning rate and adding -input 1 and -extra_shuffle.
https://github.com/dalegebit/OpenNMT-py/blob/multi-gpu/log/benchmarks_opitm-adam_lr-0.001.txt
https://github.com/dalegebit/OpenNMT-py/blob/multi-gpu/log/benchmarks_optim-sgd_lr-0.01.txt

@kylegao91
Copy link

Here is the log of latest (0.1.4) pytorch-seq2seq running on your integration test data with its default setting:
https://gist.github.com/kylegao91/91534e220cf745dd976ad24bb4bac4c0

Look like pytorch-seq2seq still has a lot of room to improve both speed and accuracy. I post the log here for discussions, but, for PR reasons, please do not include it if you are going to publish a benchmark.

@vince62s
Copy link
Member

vince62s commented Aug 2, 2018

closing this, quite old.
We reproduce SOA results with the transformer. (comparable to the original paper)

@vince62s vince62s closed this as completed Aug 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants