Skip to content
Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)
Python Shell Cuda C++
Branch: master
Clone or download
Latest commit 925985a Oct 22, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
DrQA update README and code for v2 Sep 7, 2018
classification fix bugs; better naming Oct 23, 2019
docs README: add instructions of pytorch installation Dec 20, 2018
imgs
language_model minor changes to train_enwik8.py Oct 23, 2019
misc change version; add CLA Oct 16, 2019
speech Update SRU_12L.ndl Sep 18, 2017
sru verion 2.1.9 Oct 23, 2019
.gitignore remove gitignore Oct 9, 2017
CONTRIBUTING.md change version; add CLA Oct 16, 2019
LICENSE
README.md minor changes Oct 23, 2019
requirements.txt
setup.py

README.md

About

SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.


Average processing time of LSTM, conv2d and SRU, tested on GTX 1070

For example, the figure above presents the processing time of a single mini-batch of 32 samples. SRU achieves 10 to 16 times speed-up compared to LSTM, and operates as fast as (or faster than) word-level convolution using conv2d.

The paper has multiple versions, please check the latest one.

Reference:

Simple Recurrent Units for Highly Parallelizable Recurrence

@inproceedings{lei2018sru,
  title={Simple Recurrent Units for Highly Parallelizable Recurrence},
  author={Tao Lei and Yu Zhang and Sida I. Wang and Hui Dai and Yoav Artzi},
  booktitle={Empirical Methods in Natural Language Processing (EMNLP)},
  year={2018}
}

Requirements

Install requirements via pip install -r requirements.txt.


Installation

From source:

SRU can be installed as a regular package via python setup.py install or pip install ..

From PyPi:

pip install sru

Directly use the source without installation:

Make sure this repo and CUDA library can be found by the system, e.g.

export PYTHONPATH=path_to_repo/sru
export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Examples

The usage of SRU is similar to nn.LSTM. SRU likely requires more stacking layers than LSTM. We recommend starting by 2 layers and use more if necessary (see our report for more experimental details).

import torch
from torch.autograd import Variable
from sru import SRU, SRUCell

# input has length 20, batch size 32 and dimension 128
x = Variable(torch.FloatTensor(20, 32, 128).cuda())

input_size, hidden_size = 128, 128

rnn = SRU(input_size, hidden_size,
    num_layers = 2,          # number of stacking RNN layers
    dropout = 0.0,           # dropout applied between RNN layers
    bidirectional = False,   # bidirectional RNN
    layer_norm = False,      # apply layer normalization on the output of each layer
    highway_bias = 0,        # initial bias of highway gate (<= 0)
    rescale = True,          # whether to use scaling correction
)
rnn.cuda()

output_states, c_states = rnn(x)      # forward pass

# output_states is (length, batch size, number of directions * hidden size)
# c_states is (layers, batch size, number of directions * hidden size)

Contributing

Please read and follow the guidelines.

Other Implementations

@musyoku had a very nice SRU implementaion in chainer.

@adrianbg implemented the first CPU version.


You can’t perform that action at this time.