inference engine related design #10198

Superjomn · 2018-04-25T10:15:03Z

panyx0718 · 2018-04-25T12:03:59Z

Have we verified the performance of using tensorrt as a sub-graph?

Superjomn · 2018-04-25T12:08:14Z

We will get a benchmark next week. @panyx0718

luotao1 · 2018-04-26T01:59:53Z

doc/fluid/design/inference_engine.md

+The inference phase need to support some special hardware for acceleration, 
+such as GPU, FPGA, and ARM.
+Special softwares power some of these hardwares and the inner states are hidden, for example, the TensorRT is released by NVidia to improve the inference performance on GPUs, it takes a computation graph as input, 
+optimize and execute it, but the users can't directly modify its internal logics. 


Special softwares power some of these hardwares and the inner states are hidden. For example, TensorRT is released by NVIDIA to improve the inference performance on GPUs. It takes a computation graph as input, optimizes and executes it, while users can't directly modify its internal logic.

luotao1 · 2018-04-26T02:05:40Z

doc/fluid/design/inference_engine.md

+
+## Use Engines to Execute Sub-blocks
+
+Compared to Paddle Fluid, the engines covers limited number of operators and can only power several kinds of models. In other words, the engines can only support a part of Fluid.


Motivation of sub-blocks method

line 13 + some information from tensorflow/models#4028, in order to tell people why we use sub-blocks method, not directly use TensorRT.

Use Engines to Execute Sub-blocks

lind 14
...

luotao1 · 2018-04-26T02:07:31Z

doc/fluid/design/inference_engine.md

+
+</p>
+
+It is easy to parallelize the computation by scheduling several engines on different devices, for example, the CPU and GPU engines can be dispatched in the meantime


add . after mentime.

luotao1 · 2018-04-26T03:39:56Z

doc/fluid/design/inference_engine.md

+We use a `with-statement` to mark the sub-block as follows.
+
+```python
+with infer.power_by_engine('tensorrt'):


What's type of infer, ProgramDesc? Followings are current trainspiler inferface, whose parameter is a ProgramDesc.

t = fluid.InferenceTranspiler() t.transpile(inference_transpiler_program, place)

In my mind, the interface for automatic detection mode is:

t = fluid.InferenceTranspiler() t.transpile(inference_transpiler_program, place, engine = 'tensorrt' ) def transpile(inference_transpiler_program, place, engine): if engine == "tensorrt": power_by_tensorrt_engine(inference_transpiler_program); else: ..

infer is a module.

import paddle.inference as infer

luotao1 · 2018-04-26T03:41:20Z

doc/fluid/design/inference_engine.md

+```python
+with infer.power_by_engine('tensorrt'):
+    o = some_op()
+    o = some_op()


What's meaning of o = some_op()?

No practical meaning, just shows that there are several operators there.

luotao1 · 2018-04-26T05:27:14Z

doc/fluid/design/inference_engine.md

+
+###  EngineOp                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
+
+`EngineOp` is just a normal Fluid operator, which has an attribute called `subblock` to get the Fluid description about a sub-block.


subblock->sub_block

luotao1 · 2018-04-26T05:27:23Z

doc/fluid/design/inference_engine.md

+*/
+enum class DeviceType {
+    CPU = 0,
+    GPU


The enum syntax just needs to set the first element, and following elements will increase automatically.

luotao1 · 2018-04-26T05:28:50Z

doc/fluid/design/inference_engine.md

+
+The `EngineOutputConvertOp` is similar.
+
+### Optimizer for sub-block


Optimizer->Transpiler

An optimizer is not a Transpiler. It corresponds to the optimization in a compiler.

luotao1 · 2018-04-26T05:29:14Z

doc/fluid/design/inference_engine.md

+### Optimizer for sub-block
+
+```c++
+// The InferenceOptimizers input a program desc and output a block desc.


input a program desc, but output maybe a series of sub-block desc

Input a program desc, output a program desc with several newly inserted EngineOp with their attribute set with the sub-blocks.

luotao1 · 2018-04-26T05:32:04Z

doc/fluid/design/inference_engine.md

+// Different implementations will rewrite the original program desc by different logics.
+// There might be many different optimizers, such as
+// - CleanUselessOptimizer
+// - PruneOpOptimizer


what are CleanUselessOptimizer and PruneOpOptimizer ?
We already have prune method of inference. see paddle\fluid\framework\prune.cc

Yes, I this a factory pattern of Operators is a better interface, maybe we'd better refactor those codes.

luotao1 · 2018-04-26T05:57:49Z

Are all of above implemented and run in C++ end?
Following are current C++ inference logic.

inference_program = paddle::inference::Load(&executor, scope, dirname);
executor.Run(*inference_program, scope, ...)

Thus, how to use inference engine?

inference_program = paddle::inference::Load(&executor, scope, dirname);
inference_engine_program = paddle::inference::transplier(inference_program, engine="tensorrt");
executor.Run(*inference_engine_program, scope, ...)

Superjomn · 2018-04-26T10:17:42Z

The inference might have its own executor implementation, so there might be some more consideration about the SDK.

The Anakin and MDL team will join together to design the inference SDK, and there might be some futher designs about these issues. @luotao1

wangkuiyi

不好意思，从这个design pr和另一个code pr里，我都没有能领会这个设计的意图。视频会议一下吧。

wangkuiyi · 2018-04-26T17:10:24Z

doc/fluid/design/inference_engine.md

@@ -0,0 +1,254 @@
+# Utilize Engines to Accelerate Inference


这里的engines指的是什么呢？

看上去是指 TensorRT？我看到后面提出一个base class，也在另一个code的PR里看到了这个base class。这是为了将来derive除了 TensorRT 之外的其他的“engine”对应的class吗？

TensorRT, Anajin, 或者其他类似自带完整优化的库

wangkuiyi · 2018-04-26T17:11:26Z

doc/fluid/design/inference_engine.md

@@ -0,0 +1,254 @@
+# Utilize Engines to Accelerate Inference
+
+The inference phase need to support some special hardware for acceleration, 


The inference phase need to support some special hardware

=>

We want to utilize DL chips to accelerate the inference of Fluid models.

luotao1 · 2019-02-01T06:21:28Z

感谢您给PaddlePaddle贡献文档。由于文档已迁移至FluidDoc repo，因此关闭您的PR，欢迎您向FluidDoc Repo贡献文档。
Thanks for contributing to PaddlePaddle! Since documents have been moved to FluidDoc repo, we close this PR. Welcome to contribute to FluidDoc repo.

Superjomn added 4 commits April 25, 2018 18:13

init

da08e12

update

a6f374e

move to design/

94d445d

delete empty lines in the tail

77a95ce

Superjomn requested review from panyx0718, luotao1 and Xreki April 25, 2018 10:28

Xreki added the 预测原名Inference，包含Capi预测问题等 label Apr 25, 2018

Superjomn added 3 commits April 25, 2018 19:22

add engine interface

2cf5fe3

update format

3f7f58f

update format

b27a16d

update

61459f2

luotao1 reviewed Apr 26, 2018

View reviewed changes

wangkuiyi requested changes Apr 26, 2018

View reviewed changes

Xreki added this to Integrate TensorRT in Inference Framework May 21, 2018

luotao1 closed this Feb 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference engine related design #10198

inference engine related design #10198

Superjomn commented Apr 25, 2018 •

edited

panyx0718 commented Apr 25, 2018

Superjomn commented Apr 25, 2018 •

edited

luotao1 Apr 26, 2018

luotao1 Apr 26, 2018

luotao1 Apr 26, 2018

luotao1 Apr 26, 2018

Superjomn Apr 26, 2018 •

edited

luotao1 Apr 26, 2018

Superjomn Apr 26, 2018

luotao1 Apr 26, 2018

luotao1 Apr 26, 2018

Superjomn Apr 26, 2018

luotao1 Apr 26, 2018

Superjomn Apr 26, 2018

luotao1 Apr 26, 2018

Superjomn Apr 26, 2018

luotao1 Apr 26, 2018 •

edited

Superjomn Apr 26, 2018

luotao1 commented Apr 26, 2018

Superjomn commented Apr 26, 2018

wangkuiyi left a comment •

edited

wangkuiyi Apr 26, 2018

wangkuiyi Apr 26, 2018

Superjomn Apr 27, 2018

wangkuiyi Apr 26, 2018

luotao1 commented Feb 1, 2019


		## Use Engines to Execute Sub-blocks

		Compared to Paddle Fluid, the engines covers limited number of operators and can only power several kinds of models. In other words, the engines can only support a part of Fluid.


		</p>

		It is easy to parallelize the computation by scheduling several engines on different devices, for example, the CPU and GPU engines can be dispatched in the meantime


		### EngineOp

		`EngineOp` is just a normal Fluid operator, which has an attribute called `subblock` to get the Fluid description about a sub-block.


		The `EngineOutputConvertOp` is similar.

		### Optimizer for sub-block

		@@ -0,0 +1,254 @@
		# Utilize Engines to Accelerate Inference

		The inference phase need to support some special hardware for acceleration,

inference engine related design #10198

inference engine related design #10198

Conversation

Superjomn commented Apr 25, 2018 • edited

panyx0718 commented Apr 25, 2018

Superjomn commented Apr 25, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Motivation of sub-blocks method

Use Engines to Execute Sub-blocks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Superjomn Apr 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 Apr 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 commented Apr 26, 2018

Superjomn commented Apr 26, 2018

wangkuiyi left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luotao1 commented Feb 1, 2019

Superjomn commented Apr 25, 2018 •

edited

Superjomn commented Apr 25, 2018 •

edited

Superjomn Apr 26, 2018 •

edited

luotao1 Apr 26, 2018 •

edited

wangkuiyi left a comment •

edited