OpenNMT benchmarking versus other pytorch libraries. #139

PetrochukM · 2017-07-21T15:22:19Z

There is certainly a lot of great work going into OpenNMT. But with every new feature added from some paper, do we have any sense of those features in tandem helping OpenNMT?

This gist compares OpenNMT-py v.s. pytorch-seq2seq on newstest2013.

pytorch-seq2seq has a simple LSTM + Attention model and it achieves:

Finished epoch 50, Dev Perplexity: 1695.8370

While OpenNMT is much more mature library does worse:

Epoch 50,    40/   47; acc:   7.64; ppl: 1500.51;12384 src tok/s; 13280 tgt tok/s;    246 s elapsed
Train perplexity: 1546
Train accuracy: 6.97936
Validation perplexity: 24962.4
Validation accuracy: 4.8146
Decaying learning rate to 1.77636e-15

What are the goals of this library?

The text was updated successfully, but these errors were encountered:

yoonkim · 2017-07-21T15:26:44Z

With such a small dataset the comparison is not really meaningful.

I would say you would need a dataset of at least 100K sentences to get any indication of performance (this is for translation--for other tasks you may be able to get away with less).

srush · 2017-07-22T17:25:13Z

Thank you for the comparison though. We will set up as benchmark to make sure it does at least this well.

We have been using this data for comparison. It is about 200k sentences.

https://github.com/OpenNMT/IntegrationTesting/tree/master/data

Would you be willing to run the pytorch-seq2seq model for comparison sake and post the ppl?

PetrochukM · 2017-07-22T21:43:51Z

@kylegao91

kylegao91 · 2017-07-23T01:44:22Z

@Deepblue129 I agree that the newstest2013 is too small for benchmarking the effectiveness. I used it mainly for evaluating the efficiency.
@srush I will run experiments on larger dataset once the speed issue is resolved.

srush · 2017-08-29T15:17:27Z

Any update on this? Would love to post these benchmarks now that our code is more stable.

kylegao91 · 2017-08-30T02:22:35Z

I haven't run our experiment on the 200k sentence data mentioned above, but we do have improved our speed. I will have a look at your integration test data and get by to you soon.

dalegebit · 2017-09-12T07:11:28Z

I have similar problems repeating results with PyOpenNMT. I train my English-to-German nmt model with Europarl v7, and almost completely copy the parameter setting and corpus mentioned in http://opennmt.net/Models (but I didn't use preprocess.lua and aggressive tokenizer, which are also mentioned). Even with many tries, I cannot reduce the perplexity to below 15 in validation set. I wonder why this happens? Have you tested PyOpenNMT on Europarl v7 and achieved 7.19 PPL in newstest2013.deen? Would mind you releasing the benchmarks that we can refer to?

srush · 2017-09-12T11:46:07Z

We haven't tried this yet, but we have replicated other similar benchmarks, so I am surprised it is so far off. Can you send over your logs and command?

…

On Sep 12, 2017 3:11 AM, "dalegebit" ***@***.***> wrote: I have similar problems repeating results with PyOpenNMT. I train my English-to-German nmt model with Europarl v7, and almost completely copy the parameter setting and corpus mentioned in I <https://urldefense.proofpoint.com/v2/url?u=http-3A__opennmt.net_Models_&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=wnHFZ7D4m-9MRwk-CWlvCGbWEiQX_AvUO2LuMy4Vj7c&m=wXI3UiGyBT7pEYR-AxeIVxDZ57F69Qmu21gDO6PoH60&s=9UpKJsTCmwiqXYPz6gyygw3rV_y0-L5AvC8aTbyMuAQ&e=> (but I didn't use preprocess.lua and aggressive tokenizer, which are also mentioned). Even with many tries, I cannot reduce the perplexity below 15 in validation set. I wonder why this happens? Have you tested PyOpenNMT on Europarl v7 and achieved 7.19 PPL in newstest2013.deen? Would mind you releasing the benchmarks that we can refer to? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_OpenNMT_OpenNMT-2Dpy_issues_139-23issuecomment-2D328760938&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=wnHFZ7D4m-9MRwk-CWlvCGbWEiQX_AvUO2LuMy4Vj7c&m=wXI3UiGyBT7pEYR-AxeIVxDZ57F69Qmu21gDO6PoH60&s=o00aVsSb28xtU4H_jjXelt54n140yOSXDaOSFgYTLTM&e=>, or mute the thread <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AACMKibw2TJmpK6tEV-2DV1j7BWChAQXv2ks5shi6hgaJpZM4Ofi-5Fi&d=DwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=wnHFZ7D4m-9MRwk-CWlvCGbWEiQX_AvUO2LuMy4Vj7c&m=wXI3UiGyBT7pEYR-AxeIVxDZ57F69Qmu21gDO6PoH60&s=wItEVNOSCndyID4HfN0Jl2pg6gCl4n19c_YYqM9LvUw&e=> .

* yay * ok * removed more of adam's code :( * small fix

dalegebit · 2017-09-29T04:13:37Z

Sorry, I missed your message. Sure. Here are my logs, adam version and sgd version. Adam version has larger ppl on validation set but better accuracy, while sgd version has smaller ppl but worse accuracy. But none of them can achieve <15 ppl on validation set. In the logs, I only show the result of 13 epoches, however, even if I run many more epoches, there is no sign that they could reach that goal. I only used default parameters and default learning rate scheduling, except for changing the initial learning rate and adding -input 1 and -extra_shuffle.
https://github.com/dalegebit/OpenNMT-py/blob/multi-gpu/log/benchmarks_opitm-adam_lr-0.001.txt
https://github.com/dalegebit/OpenNMT-py/blob/multi-gpu/log/benchmarks_optim-sgd_lr-0.01.txt

kylegao91 · 2017-09-29T07:47:59Z

Here is the log of latest (0.1.4) pytorch-seq2seq running on your integration test data with its default setting:
https://gist.github.com/kylegao91/91534e220cf745dd976ad24bb4bac4c0

Look like pytorch-seq2seq still has a lot of room to improve both speed and accuracy. I post the log here for discussions, but, for PR reasons, please do not include it if you are going to publish a benchmark.

vince62s · 2018-08-02T18:26:26Z

closing this, quite old.
We reproduce SOA results with the transformer. (comparable to the original paper)

PetrochukM mentioned this issue Jul 21, 2017

pytorch-seq2seq slower than OpenNMT-py IBM/pytorch-seq2seq#27

Closed

srush changed the title ~~[Dicussion] OpenNMT Benchmarking. Does OpenNMT do better than simple LSTM + Attention?~~ OpenNMT benchmarking versus other pytorch libraries. Jul 22, 2017

srush added the type:testing label Jul 22, 2017

srush added the contributions welcome label Aug 29, 2017

marcotcr pushed a commit to marcotcr/OpenNMT-py that referenced this issue Sep 20, 2017

Generic interactive script, removed DrQA specific script (OpenNMT#139)

8043efe

* yay * ok * removed more of adam's code :( * small fix

vince62s closed this as completed Aug 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenNMT benchmarking versus other pytorch libraries. #139

OpenNMT benchmarking versus other pytorch libraries. #139

PetrochukM commented Jul 21, 2017

yoonkim commented Jul 21, 2017

srush commented Jul 22, 2017

PetrochukM commented Jul 22, 2017

kylegao91 commented Jul 23, 2017

srush commented Aug 29, 2017

kylegao91 commented Aug 30, 2017

dalegebit commented Sep 12, 2017 •

edited

srush commented Sep 12, 2017 via email

dalegebit commented Sep 29, 2017

kylegao91 commented Sep 29, 2017

vince62s commented Aug 2, 2018

OpenNMT benchmarking versus other pytorch libraries. #139

OpenNMT benchmarking versus other pytorch libraries. #139

Comments

PetrochukM commented Jul 21, 2017

yoonkim commented Jul 21, 2017

srush commented Jul 22, 2017

PetrochukM commented Jul 22, 2017

kylegao91 commented Jul 23, 2017

srush commented Aug 29, 2017

kylegao91 commented Aug 30, 2017

dalegebit commented Sep 12, 2017 • edited

srush commented Sep 12, 2017 via email

dalegebit commented Sep 29, 2017

kylegao91 commented Sep 29, 2017

vince62s commented Aug 2, 2018

dalegebit commented Sep 12, 2017 •

edited