New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference support mixed-precision model [3] #44057
Conversation
… convert_model
… convert_model
… convert_model
… convert_model
你的PR提交成功,感谢你对开源项目的贡献! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- TensorRT的开启配置项,模型精度是否是fp16能否自动判断?
- 因为tensorRT和op converter相关改动比较多,请让王哲一起review一下。
这个可以加逻辑,但 fp16 和 fp32 精度还是有微小的区别的,感觉还是交给用户控制比较好 |
PR types
Others
PR changes
Others
Describe
base pr: #43814
该PR效果:Paddle-TRT 兼容 原生混合精度(fp16)。
模拟大模型测试(fp32参数8.6G)
测试环境:T4, batch_size=1, ir_optim=false(原生关闭所有优化,trt需要开启优化), cuda11.4, cudnn8.2, trt8.4, warmup=100, repeats=1000
结论:
- 开启trt, fp32权重加载显存曲线图