Skip to content

nlpodyssey/rwkv

RWKV

RWKV (Receptance Weighted Key Value) is a RNN with Transformer-level performance without the quadratic attention mechanism: only the hidden state at the current position is needed to calculate the state at the next position.

RWKV is designed to perform inference efficiently, even on CPUs, so it is well-suited to run LLM (Large Language Model) on normal consumer hardware at decent speed.

This implementation is written in Go and utilizes the Spago machine learning framework.

How it works

Currently, there are no research papers that describe this neural architecture. The majority of the information can be found in the original codebase of RWKV's author, PENG Bo (BlinkDL on GitHub).

Roughly speaking,

  • it uses a method similar to an "exponential moving average" to gather contextual information by alternating time-mix and channel-mix layers. The layers decay at different rates, which helps the network remember important information for longer periods of time as it processes the input sequence.
  • the time-mix is inspired by Apple's AFT. The channel-mix is inspired by GeGLU.
  • it uses careful parameters initialization to get fast convergence (orthogonal matrices with proper scaling and special time curves).

Installation

Requirements:

Clone this repo or get the library:

go get -u github.com/nlpodyssey/rwkv

The library is optimized to run in x86-64 CPUs. If you want to run it on a different architecture, you can use the GOARCH=amd64 environment variable.

Roadmap

  • Parameters initialization (essential)
  • Unit tests
  • Documentation
  • Gob serialization for large models
  • Model optimization

Credits

References

@software{peng_bo_2021_5196578,
  author       = {PENG Bo},
  title        = {BlinkDL/RWKV-LM: 0.01},
  month        = aug,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {0.01},
  doi          = {10.5281/zenodo.5196577},
  url          = {https://doi.org/10.5281/zenodo.5196577}
}

About

RWKV (Receptance Weighted Key Value) is a RNN with Transformer-level performance

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages