RWKV

RWKV (Receptance Weighted Key Value) is a RNN with Transformer-level performance without the quadratic attention mechanism: only the hidden state at the current position is needed to calculate the state at the next position.

RWKV is designed to perform inference efficiently, even on CPUs, so it is well-suited to run LLM (Large Language Model) on normal consumer hardware at decent speed.

This implementation is written in Go and utilizes the Spago machine learning framework.

How it works

Currently, there are no research papers that describe this neural architecture. The majority of the information can be found in the original codebase of RWKV's author, PENG Bo (BlinkDL on GitHub).

Roughly speaking,

it uses a method similar to an "exponential moving average" to gather contextual information by alternating time-mix and channel-mix layers. The layers decay at different rates, which helps the network remember important information for longer periods of time as it processes the input sequence.
the time-mix is inspired by Apple's AFT. The channel-mix is inspired by GeGLU.
it uses careful parameters initialization to get fast convergence (orthogonal matrices with proper scaling and special time curves).

Installation

Requirements:

Go 1.19

Clone this repo or get the library:

go get -u github.com/nlpodyssey/rwkv

The library is optimized to run in x86-64 CPUs. If you want to run it on a different architecture, you can use the GOARCH=amd64 environment variable.

Roadmap

Credits

RWKV is a research project by PENG Bo and this implementation is a Go port of the original codebase.

References

@software{peng_bo_2021_5196578,
  author       = {PENG Bo},
  title        = {BlinkDL/RWKV-LM: 0.01},
  month        = aug,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {0.01},
  doi          = {10.5281/zenodo.5196577},
  url          = {https://doi.org/10.5281/zenodo.5196577}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
channelmix.go		channelmix.go
go.mod		go.mod
go.sum		go.sum
layer.go		layer.go
rwkv.go		rwkv.go
state.go		state.go
timemix.go		timemix.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

LICENSE

LICENSE

README.md

README.md

channelmix.go

channelmix.go

go.mod

go.mod

go.sum

go.sum

layer.go

layer.go

rwkv.go

rwkv.go

state.go

state.go

timemix.go

timemix.go

Repository files navigation

RWKV

How it works

Installation

Roadmap

Credits

References

About

Releases

Packages

Languages

License

nlpodyssey/rwkv

Folders and files

Latest commit

History

Repository files navigation

RWKV

How it works

Installation

Roadmap

Credits

References

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages