Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference support mixed-precision model [3] #44057

Merged
merged 21 commits into from Jul 8, 2022

Conversation

jiweibo
Copy link
Contributor

@jiweibo jiweibo commented Jul 4, 2022

PR types

Others

PR changes

Others

Describe

base pr: #43814

该PR效果:Paddle-TRT 兼容 原生混合精度(fp16)。

模拟大模型测试(fp32参数8.6G)

import numpy as np
import paddle
import paddle.nn as nn
from paddle.jit import to_static
from paddle.static import InputSpec


class LinearNet(nn.Layer):
    def __init__(self):
        super(LinearNet, self).__init__()
        self.in_linear = nn.Linear(150528, 4096)
        self.out_linear = nn.Linear(4096, 1000)
        self.linears = paddle.nn.LayerList([nn.Linear(4096, 4096) for i in range(100)])


    def forward(self, x):
        x = paddle.flatten(x, 1)
        x = self.in_linear(x)
        for i,l in enumerate(self.linears):
            x = self.linears[i](x)

        x = self.out_linear(x)
        return x

model = LinearNet()
net = to_static(model, input_spec=[InputSpec(shape=[None, 3, 224, 224], name='x')])
paddle.jit.save(net, './linear/inference')

测试环境:T4, batch_size=1, ir_optim=false(原生关闭所有优化,trt需要开启优化), cuda11.4, cudnn8.2, trt8.4, warmup=100, repeats=1000

内存 显存 运行耗时 加载耗时(predictor初始化时间)
原生fp32 8201.7M 9398M 38.65ms 18.23s
原生fp16 4960.5M 5010M 20.09ms 9.88s
trt-fp16(fp32权重加载) 19.72G 5518M 18.28ms 405.9s
trt-fp16(fp16权重加载) 13.47G 5518M 18.4ms 396.3s

结论:

  • 原生推理,混合精度相比 fp32,在显/内存、初始化耗时方面均有一倍提升;算子运行耗时在Shape符合TensorCore要求的情况下有一倍提升;
  • Paddle-TRT推理,低精度权重加载相比 fp32,在内存占用方面有提升,其余无影响。
  • 开启trt,fp16权重加载显存曲线图

17972c117f27055b34553d3c41f07620

- 开启trt, fp32权重加载显存曲线图

110f0741ae73f71d890450b0739baef1

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jul 4, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@jiweibo jiweibo closed this Jul 7, 2022
@jiweibo jiweibo reopened this Jul 7, 2022
Copy link
Contributor

@zhoutianzi666 zhoutianzi666 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiweibo jiweibo requested a review from shangzhizhou July 7, 2022 12:23
Copy link
Member

@shangzhizhou shangzhizhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. TensorRT的开启配置项,模型精度是否是fp16能否自动判断?
  2. 因为tensorRT和op converter相关改动比较多,请让王哲一起review一下。

@jiweibo
Copy link
Contributor Author

jiweibo commented Jul 8, 2022

  • TensorRT的开启配置项,模型精度是否是fp16能否自动判断?

这个可以加逻辑,但 fp16 和 fp32 精度还是有微小的区别的,感觉还是交给用户控制比较好

@jiweibo jiweibo merged commit 7f95872 into PaddlePaddle:develop Jul 8, 2022
@jiweibo jiweibo deleted the convert_model branch July 8, 2022 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants