You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the first formula in README, RWKV is rewritten into recurrent form by letting $W_n=(n-1)w$. Is there a particular reason for using $n-1$ instead of $n$? The latter is more natural, and in From GPT to RWKV (the formulas) the recurrent formula of RWKV also implies the latter. So I believe you probably have tried it but for some reason it is suboptimal.
The text was updated successfully, but these errors were encountered:
That sounds reasonable because the $n-1$ form makes it possible for $\exp(K_i)V_i$ to appear in the expression of $O_{i+1}$. However, it would be better if someone could provide some empirical evidence. Thus I think it's better to leave this issue open for some time :-)
In the first formula in README, RWKV is rewritten into recurrent form by letting$W_n=(n-1)w$ . Is there a particular reason for using $n-1$ instead of $n$ ? The latter is more natural, and in From GPT to RWKV (the formulas) the recurrent formula of RWKV also implies the latter. So I believe you probably have tried it but for some reason it is suboptimal.
The text was updated successfully, but these errors were encountered: