Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FAQ] 自定义 PASS + 自定义 OP 用于 Inference #564

Closed
ronny1996 opened this issue May 17, 2023 · 1 comment
Closed

[FAQ] 自定义 PASS + 自定义 OP 用于 Inference #564

ronny1996 opened this issue May 17, 2023 · 1 comment

Comments

@ronny1996
Copy link
Collaborator

ronny1996 commented May 17, 2023

自定义PASS
PaddlePaddle/Paddle#35602
PaddlePaddle/Paddle#36095
自定义OP
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/custom_op/index_cn.html#zidingyisuanzi

my_add_n.cc // 放到插件中一起编译成一个 so 文件

// Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include <iostream>
#include <vector>

#include "paddle/extension.h"

std::vector<paddle::Tensor> MyAddNOp(const paddle::Tensor& x,
                                     const paddle::Tensor& y,
                                     const paddle::Tensor& z) {
  return {paddle::add(x, y)};
}

std::vector<std::vector<int64_t>> MyAddNOpInferShape(
    const std::vector<int64_t>& x_shape,
    const std::vector<int64_t>& y_shape,
    const std::vector<int64_t>& z_shape) {
  return {x_shape};
}

PD_BUILD_OP(my_add_n)
    .Inputs({"X", "Y", "Z"})
    .Outputs({"Out"})
    .SetKernelFn(PD_KERNEL(MyAddNOp))
    .SetInferShapeFn(PD_INFER_SHAPE(
        MyAddNOpInferShape));  // neccessary if the op has muti_inputs

run.py

import paddle
import numpy as np

paddle.utils.cpp_extension.extension_utils.load_op_meta_info_and_register_op('/opt/py37env/lib/python3.7/site-packages/paddle_custom_device/libpaddle-custom-npu.so')

@paddle.incubate.passes.ir.RegisterPass
def generate_add_n():
    def pattern(x, y, z):
        return paddle.add(paddle.add(x, y), z)

    def replace(x, y, z):
        return paddle.incubate.passes.ir.PassDesc.OP.my_add_n(X=x, Y=y, Z=z)

    return pattern, replace

@paddle.jit.to_static(input_spec=[paddle.static.InputSpec([None, 32], 'float32', 'x'),  paddle.static.InputSpec([None, 32], 'float32', 'y'),  paddle.static.InputSpec([None, 32], 'float32', 'z')])
def func(x, y, z):
    return x + y + z

model_file = './saved_models/func'
paddle.jit.save(func, model_file)

# inference
config = paddle.inference.Config()
config.set_prog_file(model_file + '.pdmodel')
config.enable_memory_optim()
pass_builder = config.pass_builder()
pass_builder.append_pass('generate_add_n')
print(pass_builder.all_passes())
predictor = paddle.inference.create_predictor(config)

input_names = predictor.get_input_names()
for i, name in enumerate(input_names):
    input_tensor = predictor.get_input_handle(name)
    input_tensor.copy_from_cpu(np.random.randn(2, 32).astype('float32'))

predictor.run()
results = []
output_names = predictor.get_output_names()
for i, name in enumerate(output_names):
    output_tensor = predictor.get_output_handle(name)
    output_data = output_tensor.copy_to_cpu()
    results.append(output_data)
print(results)

GLOG_v=10 python run.py

I0517 18:23:54.884903 94348 operator.cc:750] Place(cpu) Op(my_add_n), inputs:{X[x:float[2, 32]({})(Place(cpu))], Y[y:float[2, 32]({})(Place(cpu))], Z[z:float[2, 32]({})(Place(cpu))]}, outputs:{Out[tmp_1:[0]({})()]}.
I0517 18:23:54.884943 94348 context_pool.cc:62] DeviceContextPool Get: Place(cpu)
I0517 18:23:54.884971 94348 operator.cc:2130] op type:my_add_n, expected_kernel_key:{data_type[RAW(runtime decided type)]; data_layout[Undefined(AnyLayout)]; place[Place(cpu)]; library_type[PLAIN]}
I0517 18:23:54.884994 94348 context_pool.cc:62] DeviceContextPool Get: Place(cpu)
I0517 18:23:54.885025 94348 custom_operator.cc:424] Custom Operator: InferShape - get input ddim.
I0517 18:23:54.885046 94348 custom_operator.cc:505] Custom Operator: InferShape - calc output ddim.
I0517 18:23:54.885062 94348 custom_operator.cc:530] Custom Operator: InferShape - set output ddim: inplace_map.size() = 0, output_shapes.size() = 1
I0517 18:23:54.885083 94348 custom_operator.cc:1160] Custom Operator: run custom kernel func in lambda.
I0517 18:23:54.885099 94348 custom_operator.cc:64] Custom Operator: Start run KernelFunc.
I0517 18:23:54.885111 94348 custom_operator.cc:68] Custom Operator: input name - X
I0517 18:23:54.885135 94348 custom_operator.cc:68] Custom Operator: input name - Y
I0517 18:23:54.885149 94348 custom_operator.cc:68] Custom Operator: input name - Z
I0517 18:23:54.885154 94348 custom_operator.cc:185] Custom Operator: push outputs into CustomOpKernelContext.
I0517 18:23:54.885172 94348 custom_operator.cc:268] Custom Operator: Run ComputeFunc.
I0517 18:23:54.885187 94348 op_meta_info.cc:202] Custom opertor ConstructInplaceIndex no need to recompute.
I0517 18:23:54.885202 94348 op_meta_info.cc:245] Custom opertor update plain outputs map successfully.
I0517 18:23:54.885227 94348 api.cc:24106] add API kernel key: [CPU, NCHW, float32]
I0517 18:23:54.885249 94348 custom_device_op_list.cc:46] Custom Device Black List: 
I0517 18:23:54.885263 94348 api.cc:24113] add kernel: {"input":["CPU, NCHW, float32","CPU, NCHW, float32"],"output":["CPU, NCHW, float32"],"attribute":[]}
I0517 18:23:54.885291 94348 context_pool.cc:62] DeviceContextPool Get: Place(cpu)
I0517 18:23:54.885329 94348 dense_tensor.cc:139] Allocate data with bytes: 256
I0517 18:23:54.885344 94348 stats.h:84] Update peak_value, after update, peak_value = 1024 , current value = 1024
I0517 18:23:54.885383 94348 operator.cc:797] Place(cpu) Op(my_add_n), inputs:{X[x:float[2, 32]({})(Place(cpu))], Y[y:float[2, 32]({})(Place(cpu))], Z[z:float[2, 32]({})(Place(cpu))]}, outputs:{Out[tmp_1:float[2, 32]({})(Place(cpu))]}.
I0517 18:23:54.885411 94348 helper.h:464] after run : [cpu current allocated memory: 0.000976562MB], [cpu current reserved memory: 0MB], [cpu peak allocated memory: 0.000976562MB], [cpu peak reserved memory: 0MB]
I0517 18:23:54.885437 94348 reset_tensor_array.cc:45] Collect 0 arrays
[array([[ 0.58247435,  0.826475  ,  0.6871278 ,  0.4126696 , -0.2559116 ,
         0.65742874,  2.1384077 ,  0.24653143, -0.29847062, -2.2460418 ,
        -1.1594441 , -1.5321505 ,  3.0779753 ,  1.3047652 ,  5.319272  ,
        -3.2988782 ,  2.2765095 ,  0.8565507 , -3.34338   , -1.906771  ,
        -1.3918409 , -0.9324397 , -0.14787453, -0.4925239 , -0.24697244,
        -0.29773337, -2.2361014 , -2.4385114 ,  1.9175045 , -1.7525816 ,
        -2.0501115 ,  2.8168874 ],
       [-0.42592376, -1.5766194 ,  3.0644276 , -1.9179165 ,  2.8835368 ,
         0.28963447,  0.4251368 ,  1.146347  , -0.45447612, -0.9540442 ,
         1.8834621 ,  0.5726208 , -1.1495211 ,  2.1192973 , -0.1619632 ,
         1.1780676 , -3.423511  ,  0.31345803,  2.212157  ,  2.284046  ,
        -1.8597114 , -0.988636  ,  2.5586586 ,  0.6752815 , -0.8432386 ,
        -1.5520113 , -0.93274736,  0.7499885 , -2.2453508 ,  1.2411486 ,
         0.89078593,  0.02444351]], dtype=float32)]
I0517 18:23:54.887071 94348 imperative.cc:2204] Tracer(0x3b7d92b0) set expected place Place(npu:0)
I0517 18:23:54.887138 94348 mmap_allocator.cc:321] PID: 94348, MemoryMapFdSet: set size - 0
I0517 18:23:54.889010 94348 mmap_allocator.cc:321] PID: 94348, MemoryMapFdSet: set size - 0
I0517 18:23:55.128073 94348 mmap_allocator.cc:321] PID: 94348, MemoryMapFdSet: set size - 0

NPU DEMO: #578

@qili93
Copy link
Collaborator

qili93 commented May 30, 2023

Close as #578 merged.

@qili93 qili93 closed this as completed May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants