We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
背景:我在探索paraformer在端侧上部署方法,我希望通过RK框架调用NPU进行推理。RK框架只支持fp16精度的模型进行推理。 FP16的表示范围[-65504 ~ 66504],FP32表示范围[-3.4×10^{38},3.4×10^{38}],因此FP32模型直接转RK模型,在推理过程中会出现溢出(NAN)。
我采用了FUNASR教程:https://github.com/alibaba-damo-academy/FunASR/blob/v0.8.8/funasr/export/README.md ,进行INT8量化,然而该方案是动态量化,在真正计算时仍会逆量化为fp32。 问题:
请问是否有真正的fp16模型或者finetune训练方案?
The text was updated successfully, but these errors were encountered:
Maybe you could ref to this code: 33f2d46
Sorry, something went wrong.
请问一下,怎么具体使用这两句代码?scale的大小是多少?
Maybe you could ref to this code: 33f2d46 请问一下,怎么具体使用这两句代码?scale的大小是多少?
请问,你做到端侧的方案是什么?
Maybe you could ref to this code: 33f2d46 请问一下,怎么具体使用这两句代码?scale的大小是多少? 请问,你做到端侧的方案是什么?
2pass方案
No branches or pull requests
背景:我在探索paraformer在端侧上部署方法,我希望通过RK框架调用NPU进行推理。RK框架只支持fp16精度的模型进行推理。
FP16的表示范围[-65504 ~ 66504],FP32表示范围[-3.4×10^{38},3.4×10^{38}],因此FP32模型直接转RK模型,在推理过程中会出现溢出(NAN)。
我采用了FUNASR教程:https://github.com/alibaba-damo-academy/FunASR/blob/v0.8.8/funasr/export/README.md ,进行INT8量化,然而该方案是动态量化,在真正计算时仍会逆量化为fp32。
问题:
请问是否有真正的fp16模型或者finetune训练方案?
The text was updated successfully, but these errors were encountered: