Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about RWKV formula #64

Open
luowyang opened this issue Apr 1, 2023 · 2 comments
Open

Question about RWKV formula #64

luowyang opened this issue Apr 1, 2023 · 2 comments

Comments

@luowyang
Copy link

luowyang commented Apr 1, 2023

In the first formula in README, RWKV is rewritten into recurrent form by letting $W_n=(n-1)w$. Is there a particular reason for using $n-1$ instead of $n$? The latter is more natural, and in From GPT to RWKV (the formulas) the recurrent formula of RWKV also implies the latter. So I believe you probably have tried it but for some reason it is suboptimal.

@BlinkDL
Copy link
Owner

BlinkDL commented Apr 1, 2023

The $n-1$ format is more expressive. I only tried the current formula because I believe it's better.

Note I am treating W_0 differently.

@luowyang
Copy link
Author

luowyang commented Apr 1, 2023

That sounds reasonable because the $n-1$ form makes it possible for $\exp(K_i)V_i$ to appear in the expression of $O_{i+1}$. However, it would be better if someone could provide some empirical evidence. Thus I think it's better to leave this issue open for some time :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants