We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
你好作者!我对此工作很感兴趣,因为我现在在用基于transformer的模型做分类任务,transformer或者RNN在分类任务里通常采用最后一个模块的每个通道的最后一个元素作为输出,并通过全连接层映射到几个类别。 请问你觉得RWKV原理类似吗?依旧提取最后一个元素作为输出是否稳妥呢?希望您能给出一些建议,我将很感激!
The text was updated successfully, but these errors were encountered:
你好,可以试试传统方法,但还有一个办法,RWKV 的 hidden state 很小(请看 https://github.com/BlinkDL/RWKV-v2-RNN-Pile/blob/main/src/model.py 的 .xx .aa .bb ),可以试试直接加个线性层输出。试试用 .xx 和 .aa / .bb 作为线性层的输入。
Sorry, something went wrong.
好的好的!非常感谢!我这就试试
No branches or pull requests
你好作者!我对此工作很感兴趣,因为我现在在用基于transformer的模型做分类任务,transformer或者RNN在分类任务里通常采用最后一个模块的每个通道的最后一个元素作为输出,并通过全连接层映射到几个类别。
请问你觉得RWKV原理类似吗?依旧提取最后一个元素作为输出是否稳妥呢?希望您能给出一些建议,我将很感激!
The text was updated successfully, but these errors were encountered: