关于MHA

你好，谢谢你们的代码。比较你们的论文和代码时，遇到以下不理解的地方：
![image](https://github.com/daxin007/Client/assets/14818170/46ac3a76-ca9b-43c8-9458-83f6e0bff63d)
论文中提到数据输入到MHA时的格式 是 B x C x L; 在代码中，Attetionlayer进行分头。实际上就是在序列长度Length这个尺度上进行的？
![image](https://github.com/daxin007/Client/assets/14818170/2e0788ef-641e-47f9-bc1e-de3c1255e821)
那么在inner_attention,也就是FullAttention中，L 变成了 H x E; 你们使用的scale，也就是1. / sqrt(E)，实际是与论文中的1./sqrt(C),不符？
![image](https://github.com/daxin007/Client/assets/14818170/ffc3bd98-4c8d-4f8f-91a9-03cd6d2eeff9)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于MHA #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

关于MHA #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions