pytorch-seq2seq slower than OpenNMT-py #27

kylegao91 · 2017-07-18T14:04:03Z

Benchmarked the two implementations using WMT's newstest2013 from German to English. See training logs in the gist. Despite accuracy differences, pytorch-seq2seq is 10 times slower than OpenNMT.py.

PetrochukM · 2017-07-19T16:38:13Z

@kylegao91 Changed pytorch-seq2seq in a private implementation. We were able to match the speeds of OpenNMT .

Things that made a big difference:

Removing fixed length batching and introducing variable sized batching had a 3 - 4x speed up. Pooling together similar sized examples reduced padding. We implemented this with torchtext.
Faster loss function similar to OpenNMT memory efficient loss. Instead of looping row by row evaluating the loss batch times. We transformed the target and output from 2D and 3D to 1D and 2D. Evaluated the loss once for the entire batch.

        # (seq len, batch size, dictionary size) -> (batch size * seq len, dictionary size)
        outputs = outputs.view(-1, outputs.size(2))
        # (seq len, batch size) -> (batch size * seq len)
        targets = targets.view(-1)
        self.criterion(outputs, targets)

Removed the DecoderRNN loop for updating length. We were able to use tensor operations and not include a python loop.

            eos_batches = symbols.view(-1).data.eq(self.eos_idx).nonzero()
            if eos_batches.dim() > 0:
                # (n, 1) => (n)
                eos_batches = eos_batches.view(-1)
                lengths[eos_batches] = len(sequence_symbols)

kylegao91 · 2017-07-19T16:43:57Z

@Deepblue129 Thanks a lot! I will try the ideas.

kylegao91 · 2017-07-20T15:29:01Z

@Deepblue129 Regarding the 3rd point, your code ignored the condition di < lengths[b_idx] so that the lengths in lengths might be longer than they should be. Look at #32 for modified version.

cclauss · 2017-07-20T20:25:55Z

If speed is essential, why not step up to Python 3.6 or to pypy? Both are faster that Python 2.7.

kylegao91 · 2017-07-21T03:12:35Z

@cclauss It's like, doing that would speed up an O(N^2) sorting algorithm, but it's still O(N^2) instead of O(Nlog(N))...
We definitely need to support python3 though, would appreciate any contribution to that end.

PetrochukM · 2017-07-21T15:32:33Z

@kylegao91
OpenNMT/OpenNMT-py#139

kylegao91 · 2017-09-13T00:03:39Z

Now with #32, #55, and #73, this issue is done.

kylegao91 added enhancement high priority labels Jul 18, 2017

kylegao91 self-assigned this Jul 18, 2017

kylegao91 mentioned this issue Jul 18, 2017

Benchmark with WMT machine translation #6

Open

kylegao91 added this to the Sprint 1 milestone Jul 18, 2017

kylegao91 mentioned this issue Jul 20, 2017

Speed up training by refactoring decoding unrolling #32

Merged

kylegao91 modified the milestones: Sprint 2, Sprint 1 Jul 31, 2017

kylegao91 modified the milestones: Sprint 3, Sprint 2 Aug 31, 2017

kylegao91 closed this as completed Sep 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pytorch-seq2seq slower than OpenNMT-py #27

pytorch-seq2seq slower than OpenNMT-py #27

kylegao91 commented Jul 18, 2017

PetrochukM commented Jul 19, 2017 •

edited

kylegao91 commented Jul 19, 2017

kylegao91 commented Jul 20, 2017

cclauss commented Jul 20, 2017

kylegao91 commented Jul 21, 2017

PetrochukM commented Jul 21, 2017 •

edited

kylegao91 commented Sep 13, 2017

pytorch-seq2seq slower than OpenNMT-py #27

pytorch-seq2seq slower than OpenNMT-py #27

Comments

kylegao91 commented Jul 18, 2017

PetrochukM commented Jul 19, 2017 • edited

kylegao91 commented Jul 19, 2017

kylegao91 commented Jul 20, 2017

cclauss commented Jul 20, 2017

kylegao91 commented Jul 21, 2017

PetrochukM commented Jul 21, 2017 • edited

kylegao91 commented Sep 13, 2017

PetrochukM commented Jul 19, 2017 •

edited

PetrochukM commented Jul 21, 2017 •

edited