Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Educational purpose] Why OpenNMT-py is fast? #552

Closed
howardyclo opened this issue Feb 4, 2018 · 6 comments
Closed

[Educational purpose] Why OpenNMT-py is fast? #552

howardyclo opened this issue Feb 4, 2018 · 6 comments

Comments

@howardyclo
Copy link

howardyclo commented Feb 4, 2018

Hello, recently I implemented seq2seq for practicing and educational purpose.
Here is my code.

I also compared the performance to OpenNMT-py, and found that this library is more
GPU-memory efficient and the training iteration is a lot fast. When running the following model:

  • word_vec_size=300
  • hidden_size=512
  • rnn_type=LSTM
  • batch_size=32
    and trained on my grammatical error correction corpus (2443191 sentence pairs), OpenNMT-py only takes ~1 hour to complete an epoch (~76000 iterations), while my code takes ~6 hour to complete an epoch.

I am wondering what important optimizations should I further do comparing to the OpenNMT-py codebase? Since when I tried OpenNMT-py, I didn't specify shard_size and couldn't know why OpenNMT-py is fast? What key script should I be aware to?

Appreciated.

@howardyclo howardyclo changed the title [Educational purpose] Why OpenNMT-py is fast and memory efficient? [Educational purpose] Why OpenNMT-py is fast? Feb 4, 2018
@playma
Copy link
Contributor

playma commented Feb 6, 2018

Compare with different framework in github.
OpenNMT-py is fast, GPU-memory efficient and performance good.
I have not studied a lot about OpenNMT-py source code.

Maybe only the engineers who build it can explain the trick and reason.

Call @srush

@srush
Copy link
Contributor

srush commented Feb 6, 2018

I was called :D

So our main aim is simplicity not speed. That being said there are a couple optimizations that matter:

  • Use CuDNN when possible (always on encoder, on decoder when input_feed 0)
  • Always avoid indexing / loops and use torch primitives.
  • When possible, batch softmax operations across time. ( this is the second complicated part of the code)
  • Batch inference and beam search for translation (this is the most complicated part of the code)

Awesome to hear you are working on GEC, it's a neat problem.

Cheers!

@howardyclo
Copy link
Author

howardyclo commented Feb 6, 2018

@srush Thanks for the reply! It's helpful!
I think I should study opennmt-py codebase in order to write more optimized code, because the training speed and memory usage really matters a lot. Recently I've come up with an idea about a different way to train GEC task and require to craft a new model. This new model is basically a seq2seq but with a dynamic memory spirit in it. But before that, I really need to have an efficient codebase first :-(

@playma
Copy link
Contributor

playma commented Feb 6, 2018

@howardyclo can I ask what is GEC?

@srush
Copy link
Contributor

srush commented Feb 6, 2018

Grammatical Error Correction.

Feel free to just use our code. It is pretty modular, and it can be more fun to develop with others.

We will also likely add some GEC specific features as well. One of our students works on that.

@howardyclo
Copy link
Author

howardyclo commented Feb 10, 2018

@srush
After I dig into the onmt codebase, I found the key point to speed up the training is "bucketing". When I trained on my own GEC corpus (one epoch = 2 million sentence pair), it costed me 7 hours to complete an epoch. When I replaced my own dataloader to onmt's data iterator, I found that the training time is reduced to 1.5 hour! That's quite mind-blowing! :-) Besides, the performance is improved too.

@srush srush closed this as completed Feb 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants