lrn

Source code for "A Lightweight Recurrent Network for Sequence Modeling"

Model Architecture

In our new paper, we propose lightweight recurrent network, which combines the strengths of ATR and SRU.

ATR helps reduces model parameters and avoids additional free parameters for gate calculation, through the twin-gate mechanism
SRU follows the QRNN and moves all recurrent computations outside the recurrence.

Based on the above units, we propose LRN:

$\begin{align} \mathbf{q}_t, \mathbf{k}_t, \mathbf{v}_t = \mathbf{x}_t\mathbf{W}_q, \mathbf{x}_t\mathbf{W}_k, \mathbf{x}_t\mathbf{W}_v \\ \mathbf{i}_t = \sigma(\mathbf{k}_t + \mathbf{h}_{t-1}) \\ \mathbf{f}_t = \sigma(\mathbf{q}_t - \mathbf{h}_{t-1}) \\ \mathbf{h}_t = g(\mathbf{i}_t \odot \mathbf{v}_t + \mathbf{f}_t \odot \mathbf{h}_{t-1}) \end{align*}$

where g(·) is an activation function, tanh or identity. W_q, W_k and W_v are model parameters. The matrix computation (as well as potential layer noramlization) can be shfited outside the recurrence. Therefore, the whole model is fast in running.

When applying twin-gate mechanism, the output value in h_t might suffer explosion issue, which could grow into infinity. This is the reason we added the activation function. Another alternative solution would be using layer normalization, which forces activation values to be stable.

Structure Analysis

One way to understand the model is to unfold the LRN structure along input tokens:

$\mathbf{h}_t & = \sum_{k=1}^t \mathbf{i}_k \odot \left(\prod_{l=1}^{t-k}\mathbf{f}_{k+l}\right) \odot \mathbf{v}_k,$

The above structure which is also observed by Zhang et al., Lee et al., and etc, endows the RNN model with multiple interpretations. We provide two as follows:

Relation with Self Attention Networks

Informally, LRN assembles forget gates from step t to step k+1 in order to query the key (input gate). The result weight is assigned to the corresponding value representation and contributes to the final hidden representation.

Does the learned weights make sense? We do a classification tasks on AmaPolar task with a unidirectional linear-LRN. The final hidden state is feed into the classifier. One example below shows the learned weights. The term great gains a large weight, which decays slowly and contributes the final positive decision.

Long-term and Short-term Memory

Another view of the unfolded structure is that different gates form different memory mechanism. The input gate acts as a short-term memory and indicates how many information can be activated in this token. The forget gates form a forget chain that controls how to erase meaningless past information.

Experiments

We did experiment on six different tasks:

Citation

Please cite the following paper:

Biao Zhang; Rico Sennrich (2019). A Lightweight Recurrent Network for Sequence Modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy.

@inproceedings{zhang-sennrich:2019:ACL,
  address = "Florence, Italy",
  author = "Zhang, Biao and Sennrich, Rico",
  booktitle = "{Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics}",
  publisher = "Association for Computational Linguistics",
  title = "{A Lightweight Recurrent Network for Sequence Modeling}",
  year = "2019"
}

Contact

For any further comments or questions about LRN, please email Biao Zhang.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
doc		doc
figures		figures
lm		lm
mt		mt
ner		ner
nli		nli
rc		rc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc

doc

figures

figures

lm

lm

mt

mt

ner

ner

nli

nli

rc

rc

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

lrn

Model Architecture

Structure Analysis

Experiments

Citation

Contact

About

Releases

Packages

Languages

License

bzhangGo/lrn

Folders and files

Latest commit

History

Repository files navigation

lrn

Model Architecture

Structure Analysis

Experiments

Citation

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages