-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARM] Multiheadattention #4463
[ARM] Multiheadattention #4463
Conversation
EdVince
commented
Jan 10, 2023
•
edited
Loading
edited
- 原本的fp32实现修改为与x86的同步,都是调用gemm
- 新增fp16的实现,只有softmax用fp32,其它都用fp16
- bf16不打算支持
- test中增加了白名单,对于mha的test来说,放宽2倍就能过test,这里给了5倍
Codecov Report
@@ Coverage Diff @@
## master #4463 +/- ##
==========================================
- Coverage 94.72% 94.63% -0.09%
==========================================
Files 726 726
Lines 194540 191451 -3089
==========================================
- Hits 184275 181184 -3091
- Misses 10265 10267 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
请教一下,softmax是否只能用fp32实现?fp16是否精度不够?如果输入值是int8/int16,也要先转化为浮点数? |
Softmax本就是一个对精度很明感的计算,如果multiheadattention里softmax也用fp16的话,精度会掉的很厉害,特别是输入的维度比较大的时候。pr3940是白座实现的int8的multiheadattention,可以看到它在softmax处也是用了float来算。在我看过的一些芯片上的softmax的硬件设计中,对于softmax他们都是选择输入int8输出float这样子的。不过也有全用int8来做的softmax,这个我就不太了解了。Softmax的计算确实是一个很明显的瓶颈,但无奈它又是一个精度敏感算子,尽量能用fp32还是用fp32的好。 |
感谢解释!我也是在AI推理芯片上做算法的,不知道有没有人尝试过干脆把softmax换成别的归一化函数,对定点数友好一点的。 |
一步到位,换一个算力更大的芯片得了。 |
haha, 能用钱解决的都不是问题 |
arm82/arm86 ci 没过呀 |
我看android的过了,这linux的是需要什么特别的设置的吗? |
android ci 只有编译,没有跑test,linux 会跑 test |
Thanks for your contribution ! |