Skip to content

Latest commit

 

History

History
44 lines (30 loc) · 2.76 KB

README.md

File metadata and controls

44 lines (30 loc) · 2.76 KB

RWKV-howto

possibly useful materials and tutorial for learning RWKV.

RWKV: Parallelizable RNN with Transformer-level LLM Performance.

Relevant Papers

  • 🌟(2023-05) RWKV: Reinventing RNNs for the Transformer Era arxiv

  • (2023-03) Resurrecting Recurrent Neural Networks for Long Sequences arxiv

  • (2023-02) SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks arxiv

  • (2022-08) Simplified State Space Layers for Sequence Modeling ICLR2023

  • 🌟(2021-05) An Attention Free Transformer arxiv

  • (2021-10) Efficiently Modeling Long Sequences with Structured State Spaces ICLR2022

  • (2020-08) Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention ICML2020

  • (2018) Parallelizing Linear Recurrent Neural Nets Over Sequence Length ICLR2018

  • (2017-09) Simple Recurrent Units for Highly Parallelizable Recurrence EMNLP2017

  • (2017-10) MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural Networks Neurips2017

  • (2017-06) Attention Is All You Need Neurips2017

  • (2016-11) Quasi-Recurrent Neural Networks ICLR2017

Resources

  • Introducing RWKV - An RNN with the advantages of a transformer Hugging Face

  • 有了Transformer框架后是不是RNN完全可以废弃了?知乎

  • RNN最简单有效的形式是什么?知乎

  • 🌟RWKV的RNN CNN二象性 知乎

  • RNN的隐藏层需要非线性吗?知乎

  • Google新作试图“复活”RNN:RNN能否再次辉煌? 苏剑林

  • 🌟How the RWKV language model works Johan Sokrates Wind

  • 🌟The RWKV language model: An RNN with the advantages of a transformer Johan Sokrates Wind

  • The Unreasonable Effectiveness of Recurrent Neural Networks Andrej Karpathy blog

Code