Inference support mixed-precision model [3] #44057

jiweibo · 2022-07-04T08:29:56Z

PR types

Others

PR changes

Others

Describe

base pr: #43814

该PR效果：Paddle-TRT 兼容原生混合精度(fp16)。

模拟大模型测试（fp32参数8.6G）

import numpy as np
import paddle
import paddle.nn as nn
from paddle.jit import to_static
from paddle.static import InputSpec


class LinearNet(nn.Layer):
    def __init__(self):
        super(LinearNet, self).__init__()
        self.in_linear = nn.Linear(150528, 4096)
        self.out_linear = nn.Linear(4096, 1000)
        self.linears = paddle.nn.LayerList([nn.Linear(4096, 4096) for i in range(100)])


    def forward(self, x):
        x = paddle.flatten(x, 1)
        x = self.in_linear(x)
        for i,l in enumerate(self.linears):
            x = self.linears[i](x)

        x = self.out_linear(x)
        return x

model = LinearNet()
net = to_static(model, input_spec=[InputSpec(shape=[None, 3, 224, 224], name='x')])
paddle.jit.save(net, './linear/inference')

测试环境：T4, batch_size=1, ir_optim=false（原生关闭所有优化，trt需要开启优化）, cuda11.4, cudnn8.2, trt8.4, warmup=100, repeats=1000

	内存	显存	运行耗时	加载耗时（predictor初始化时间）
原生fp32	8201.7M	9398M	38.65ms	18.23s
原生fp16	4960.5M	5010M	20.09ms	9.88s
trt-fp16（fp32权重加载）	19.72G	5518M	18.28ms	405.9s
trt-fp16（fp16权重加载）	13.47G	5518M	18.4ms	396.3s

结论：

原生推理，混合精度相比 fp32，在显/内存、初始化耗时方面均有一倍提升；算子运行耗时在Shape符合TensorCore要求的情况下有一倍提升；
Paddle-TRT推理，低精度权重加载相比 fp32，在内存占用方面有提升，其余无影响。
开启trt，fp16权重加载显存曲线图

- 开启trt, fp32权重加载显存曲线图

… convert_model

paddle-bot-old · 2022-07-04T08:30:07Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… convert_model

zhoutianzi666

LGTM

paddle/fluid/inference/analysis/ir_passes/tensorrt_subgraph_pass.cc

shangzhizhou

TensorRT的开启配置项，模型精度是否是fp16能否自动判断？
因为tensorRT和op converter相关改动比较多，请让王哲一起review一下。

jiweibo · 2022-07-08T12:33:45Z

TensorRT的开启配置项，模型精度是否是fp16能否自动判断？

这个可以加逻辑，但 fp16 和 fp32 精度还是有微小的区别的，感觉还是交给用户控制比较好

jiweibo added 12 commits June 24, 2022 05:12

inference add convert to mixed model ability.

653653c

update

8503d60

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

0362820

… convert_model

update

4e0cfc9

fix ci problem

060de4b

add test_properties

1cb777c

update

ca0fb03

update

b6ba486

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

a10f469

… convert_model

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

af4f828

… convert_model

mixed precsion support trt

8b10588

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

a25fe17

… convert_model

jiweibo added 9 commits July 5, 2022 05:20

update

0ebf618

update

c918e67

update

227a902

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ddc0bc6

… convert_model

update

40cf424

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

6326314

… convert_model

update

8830b02

update

4def6c1

update

4966be8

jiweibo closed this Jul 7, 2022

jiweibo reopened this Jul 7, 2022

zhoutianzi666 approved these changes Jul 7, 2022

View reviewed changes

jiweibo requested a review from shangzhizhou July 7, 2022 12:23

shangzhizhou reviewed Jul 8, 2022

View reviewed changes

paddle/fluid/inference/analysis/ir_passes/tensorrt_subgraph_pass.cc Show resolved Hide resolved

shangzhizhou reviewed Jul 8, 2022

View reviewed changes

Wangzheee approved these changes Jul 8, 2022

View reviewed changes

jiweibo merged commit 7f95872 into PaddlePaddle:develop Jul 8, 2022

jiweibo deleted the convert_model branch July 8, 2022 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference support mixed-precision model [3] #44057

Inference support mixed-precision model [3] #44057

jiweibo commented Jul 4, 2022 •

edited

paddle-bot-old bot commented Jul 4, 2022

zhoutianzi666 left a comment

shangzhizhou left a comment

jiweibo commented Jul 8, 2022

Inference support mixed-precision model [3] #44057

Inference support mixed-precision model [3] #44057

Conversation

jiweibo commented Jul 4, 2022 • edited

PR types

PR changes

Describe

paddle-bot-old bot commented Jul 4, 2022

zhoutianzi666 left a comment

Choose a reason for hiding this comment

shangzhizhou left a comment

Choose a reason for hiding this comment

jiweibo commented Jul 8, 2022

jiweibo commented Jul 4, 2022 •

edited