[FAQ] 自定义 PASS + 自定义 OP 用于 Inference #564

ronny1996 · 2023-05-17T08:33:26Z

自定义PASS
PaddlePaddle/Paddle#35602
PaddlePaddle/Paddle#36095
自定义OP
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/custom_op/index_cn.html#zidingyisuanzi

my_add_n.cc // 放到插件中一起编译成一个 so 文件

// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include <iostream>
#include <vector>

#include "paddle/extension.h"

std::vector<paddle::Tensor> MyAddNOp(const paddle::Tensor& x,
                                     const paddle::Tensor& y,
                                     const paddle::Tensor& z) {
  return {paddle::add(x, y)};
}

std::vector<std::vector<int64_t>> MyAddNOpInferShape(
    const std::vector<int64_t>& x_shape,
    const std::vector<int64_t>& y_shape,
    const std::vector<int64_t>& z_shape) {
  return {x_shape};
}

PD_BUILD_OP(my_add_n)
    .Inputs({"X", "Y", "Z"})
    .Outputs({"Out"})
    .SetKernelFn(PD_KERNEL(MyAddNOp))
    .SetInferShapeFn(PD_INFER_SHAPE(
        MyAddNOpInferShape));  // neccessary if the op has muti_inputs

run.py

import paddle
import numpy as np

paddle.utils.cpp_extension.extension_utils.load_op_meta_info_and_register_op('/opt/py37env/lib/python3.7/site-packages/paddle_custom_device/libpaddle-custom-npu.so')

@paddle.incubate.passes.ir.RegisterPass
def generate_add_n():
    def pattern(x, y, z):
        return paddle.add(paddle.add(x, y), z)

    def replace(x, y, z):
        return paddle.incubate.passes.ir.PassDesc.OP.my_add_n(X=x, Y=y, Z=z)

    return pattern, replace

@paddle.jit.to_static(input_spec=[paddle.static.InputSpec([None, 32], 'float32', 'x'),  paddle.static.InputSpec([None, 32], 'float32', 'y'),  paddle.static.InputSpec([None, 32], 'float32', 'z')])
def func(x, y, z):
    return x + y + z

model_file = './saved_models/func'
paddle.jit.save(func, model_file)

# inference
config = paddle.inference.Config()
config.set_prog_file(model_file + '.pdmodel')
config.enable_memory_optim()
pass_builder = config.pass_builder()
pass_builder.append_pass('generate_add_n')
print(pass_builder.all_passes())
predictor = paddle.inference.create_predictor(config)

input_names = predictor.get_input_names()
for i, name in enumerate(input_names):
    input_tensor = predictor.get_input_handle(name)
    input_tensor.copy_from_cpu(np.random.randn(2, 32).astype('float32'))

predictor.run()
results = []
output_names = predictor.get_output_names()
for i, name in enumerate(output_names):
    output_tensor = predictor.get_output_handle(name)
    output_data = output_tensor.copy_to_cpu()
    results.append(output_data)
print(results)

GLOG_v=10 python run.py

I0517 18:23:54.884903 94348 operator.cc:750] Place(cpu) Op(my_add_n), inputs:{X[x:float[2, 32]({})(Place(cpu))], Y[y:float[2, 32]({})(Place(cpu))], Z[z:float[2, 32]({})(Place(cpu))]}, outputs:{Out[tmp_1:[0]({})()]}.
I0517 18:23:54.884943 94348 context_pool.cc:62] DeviceContextPool Get: Place(cpu)
I0517 18:23:54.884971 94348 operator.cc:2130] op type:my_add_n, expected_kernel_key:{data_type[RAW(runtime decided type)]; data_layout[Undefined(AnyLayout)]; place[Place(cpu)]; library_type[PLAIN]}
I0517 18:23:54.884994 94348 context_pool.cc:62] DeviceContextPool Get: Place(cpu)
I0517 18:23:54.885025 94348 custom_operator.cc:424] Custom Operator: InferShape - get input ddim.
I0517 18:23:54.885046 94348 custom_operator.cc:505] Custom Operator: InferShape - calc output ddim.
I0517 18:23:54.885062 94348 custom_operator.cc:530] Custom Operator: InferShape - set output ddim: inplace_map.size() = 0, output_shapes.size() = 1
I0517 18:23:54.885083 94348 custom_operator.cc:1160] Custom Operator: run custom kernel func in lambda.
I0517 18:23:54.885099 94348 custom_operator.cc:64] Custom Operator: Start run KernelFunc.
I0517 18:23:54.885111 94348 custom_operator.cc:68] Custom Operator: input name - X
I0517 18:23:54.885135 94348 custom_operator.cc:68] Custom Operator: input name - Y
I0517 18:23:54.885149 94348 custom_operator.cc:68] Custom Operator: input name - Z
I0517 18:23:54.885154 94348 custom_operator.cc:185] Custom Operator: push outputs into CustomOpKernelContext.
I0517 18:23:54.885172 94348 custom_operator.cc:268] Custom Operator: Run ComputeFunc.
I0517 18:23:54.885187 94348 op_meta_info.cc:202] Custom opertor ConstructInplaceIndex no need to recompute.
I0517 18:23:54.885202 94348 op_meta_info.cc:245] Custom opertor update plain outputs map successfully.
I0517 18:23:54.885227 94348 api.cc:24106] add API kernel key: [CPU, NCHW, float32]
I0517 18:23:54.885249 94348 custom_device_op_list.cc:46] Custom Device Black List: 
I0517 18:23:54.885263 94348 api.cc:24113] add kernel: {"input":["CPU, NCHW, float32","CPU, NCHW, float32"],"output":["CPU, NCHW, float32"],"attribute":[]}
I0517 18:23:54.885291 94348 context_pool.cc:62] DeviceContextPool Get: Place(cpu)
I0517 18:23:54.885329 94348 dense_tensor.cc:139] Allocate data with bytes: 256
I0517 18:23:54.885344 94348 stats.h:84] Update peak_value, after update, peak_value = 1024 , current value = 1024
I0517 18:23:54.885383 94348 operator.cc:797] Place(cpu) Op(my_add_n), inputs:{X[x:float[2, 32]({})(Place(cpu))], Y[y:float[2, 32]({})(Place(cpu))], Z[z:float[2, 32]({})(Place(cpu))]}, outputs:{Out[tmp_1:float[2, 32]({})(Place(cpu))]}.
I0517 18:23:54.885411 94348 helper.h:464] after run : [cpu current allocated memory: 0.000976562MB], [cpu current reserved memory: 0MB], [cpu peak allocated memory: 0.000976562MB], [cpu peak reserved memory: 0MB]
I0517 18:23:54.885437 94348 reset_tensor_array.cc:45] Collect 0 arrays
[array([[ 0.58247435,  0.826475  ,  0.6871278 ,  0.4126696 , -0.2559116 ,
         0.65742874,  2.1384077 ,  0.24653143, -0.29847062, -2.2460418 ,
        -1.1594441 , -1.5321505 ,  3.0779753 ,  1.3047652 ,  5.319272  ,
        -3.2988782 ,  2.2765095 ,  0.8565507 , -3.34338   , -1.906771  ,
        -1.3918409 , -0.9324397 , -0.14787453, -0.4925239 , -0.24697244,
        -0.29773337, -2.2361014 , -2.4385114 ,  1.9175045 , -1.7525816 ,
        -2.0501115 ,  2.8168874 ],
       [-0.42592376, -1.5766194 ,  3.0644276 , -1.9179165 ,  2.8835368 ,
         0.28963447,  0.4251368 ,  1.146347  , -0.45447612, -0.9540442 ,
         1.8834621 ,  0.5726208 , -1.1495211 ,  2.1192973 , -0.1619632 ,
         1.1780676 , -3.423511  ,  0.31345803,  2.212157  ,  2.284046  ,
        -1.8597114 , -0.988636  ,  2.5586586 ,  0.6752815 , -0.8432386 ,
        -1.5520113 , -0.93274736,  0.7499885 , -2.2453508 ,  1.2411486 ,
         0.89078593,  0.02444351]], dtype=float32)]
I0517 18:23:54.887071 94348 imperative.cc:2204] Tracer(0x3b7d92b0) set expected place Place(npu:0)
I0517 18:23:54.887138 94348 mmap_allocator.cc:321] PID: 94348, MemoryMapFdSet: set size - 0
I0517 18:23:54.889010 94348 mmap_allocator.cc:321] PID: 94348, MemoryMapFdSet: set size - 0
I0517 18:23:55.128073 94348 mmap_allocator.cc:321] PID: 94348, MemoryMapFdSet: set size - 0

NPU DEMO: #578

The text was updated successfully, but these errors were encountered:

qili93 · 2023-05-30T08:48:45Z

Close as #578 merged.

qili93 closed this as completed May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FAQ] 自定义 PASS + 自定义 OP 用于 Inference #564

[FAQ] 自定义 PASS + 自定义 OP 用于 Inference #564

ronny1996 commented May 17, 2023 •

edited

qili93 commented May 30, 2023

[FAQ] 自定义 PASS + 自定义 OP 用于 Inference #564

[FAQ] 自定义 PASS + 自定义 OP 用于 Inference #564

Comments

ronny1996 commented May 17, 2023 • edited

qili93 commented May 30, 2023

ronny1996 commented May 17, 2023 •

edited