fix KthvalueInferMeta #62801

xingmingyyj · 2024-03-17T14:29:44Z

PR types

Others

PR changes

Others

Description

问题描述

下述动转静代码执行时

#!/bin/env python
# -*- coding: utf-8 -*-
# encoding=utf-8 vi:ts=4:sw=4:expandtab:ft=python
"""

"""

import numpy as np
import paddle

paddle.seed(33)
np.random.seed(33)

def compare(result, expect, delta=1e-10, rtol=1e-10):
    """
    比较函数
    :param result: 输入值
    :param expect: 输出值
    :param delta: 误差值
    :param rtol: 相对误差
    :return:
    """
    if isinstance(expect, paddle.Tensor) or isinstance(expect, np.ndarray):
        if isinstance(result, paddle.Tensor):
            result = result.numpy()
        if isinstance(expect, paddle.Tensor):
            expect = expect.numpy()
        res = np.allclose(result, expect, atol=delta, rtol=rtol, equal_nan=True)
        # 出错打印错误数据
        # if res is False:
        #     diff = abs(result - expect)
        #     logging.error("expect is: {}".format(expect))
        #     logging.error("result is: {}".format(result))
        #     logging.error("Output has diff! max diff: {}".format(np.amax(diff)))
        # if result.dtype != expect.dtype:
        #     logging.error(
        #         "Different output data types! res type is: {}, and expect type is: {}".format(
        #             result.dtype, expect.dtype
        #         )
        #     )
        assert res
        assert result.shape == expect.shape
        assert result.dtype == expect.dtype
    elif isinstance(expect, list) or isinstance(expect, tuple):
        for i, element in enumerate(expect):
            if isinstance(result, (np.generic, np.ndarray)) or isinstance(result, paddle.Tensor):
                if i > 0:
                    break
                compare(result, expect[i], delta, rtol)

            else:
                compare(result[i], expect[i], delta, rtol)
    elif isinstance(expect, (bool, int, float)):
        assert expect == result
    else:
        raise Exception("expect is unknown data struction in compare_tool!!!")

def randtool(dtype, low, high, shape):
    """
    np random tools
    """
    if dtype == "int":
        return np.random.randint(low, high, shape)

    elif dtype == "float":
        return low + (high - low) * np.random.random(shape)

def naive_func(a, in_params, func):
    """用于动转静的方法"""
    layer = eval(func)(**a, **in_params)
    return layer

func = "paddle.kthvalue"

in_tensor = {
    "x": paddle.to_tensor(randtool("float", -1, 1, shape=[5, 3, 4, 4]), dtype="float32"),
}

in_params = {
    "k": 3,
    "axis": 0,
}

paddle.seed(33)
obj = naive_func
dy_out = obj(in_tensor, in_params, func)

paddle.seed(33)
jit_obj = paddle.jit.to_static(obj)
st_out = jit_obj(in_tensor, in_params, func)
print("dy_out is: ", dy_out)
print("st_out is: ", st_out)

paddle.jit.save(jit_obj, path="kthvalue")
print("jit.save is successfully !!!")

paddle.seed(33)
jit = paddle.jit.load("kthvalue")
print("jit.load is successfully !!!")

paddle.seed(33)
inputs_key = sorted(in_tensor.keys())
inputs_value = []
for k in inputs_key:
    inputs_value.append(in_tensor[k])
# print('inputs_value is: ', inputs_value)
res = jit(*inputs_value)
print('jit.load res: ', res)

compare(dy_out, res, delta=1e-5, rtol=1e-6)

会遇到indices输出全为0的问题

jit.load res:  [Tensor(shape=[3, 4, 4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[[-0.07591926, -0.26354831, -0.17811839, -0.47940060],
         [ 0.60465443, -0.14411892, -0.11287902,  0.21788037],
         [ 0.56884265, -0.02682375,  0.06054568, -0.08810953],
         [ 0.46526086, -0.21340129, -0.29790798, -0.20623052]],

        [[-0.36299756, -0.06327001,  0.66013783,  0.06157992],
         [ 0.56318831,  0.28023970,  0.13557813,  0.00581930],
         [ 0.37604561, -0.56533492, -0.14192767,  0.02036260],
         [ 0.19907297,  0.44599527, -0.13094325,  0.02038879]],

        [[ 0.05510214, -0.33704653, -0.09510893, -0.41952190],
         [ 0.37856624, -0.42806000, -0.37419304, -0.17315947],
         [-0.27487597,  0.12684217,  0.42080349, -0.70518380],
         [-0.05522427,  0.13258798,  0.21601824,  0.56292760]]]), Tensor(shape=[3, 4, 4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[[0.00000000, 0.        , 0.00000000, 0.        ],
         [0.        , 0.        , 0.        , 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ]],

        [[0.00000000, 0.        , 0.        , 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ]],

        [[0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ]]])]

静态图如下：

{
    (%0) = "data(phi_kernel)" () {dtype:(pd_op.DataType)bool,is_persistable:[false],kernel_key:<backend:GPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"data",name:"_jst.0.a.0",op_name:"pd_op.data",place:(pd_op.Place)Place(gpu:0),shape:(pd_op.IntArray)[],stop_gradient:[false]} : () -> gpu_tensor<5x3x4x4xf32>
    (%1, %2) = "kthvalue(phi_kernel)" (%0) {axis:(Int32)0,is_persistable:[false,false],k:(Int32)3,keepdim:false,kernel_key:<backend:GPU|layout:NCHW|dtype:float32>,kernel_name:"kthvalue",op_name:"pd_op.kthvalue",stop_gradient:[false,false]} : (gpu_tensor<5x3x4x4xf32>) -> gpu_tensor<3x4x4xf32>, gpu_tensor<3x4x4xf32>
    (%3) = "full(phi_kernel)" () {dtype:(pd_op.DataType)float32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)1} : () -> cpu_tensor<1xf32>
    (%4) = "scale_(phi_kernel)" (%1, %3) {bias:(Float)0,bias_after_scale:true,is_inplace:true,is_persistable:[false],kernel_key:<backend:GPU|layout:NCHW|dtype:float32>,kernel_name:"scale",op_name:"pd_op.scale_",stop_gradient:[false]} : (gpu_tensor<3x4x4xf32>, cpu_tensor<1xf32>) -> gpu_tensor<3x4x4xf32>
    (%5) = "full(phi_kernel)" () {dtype:(pd_op.DataType)float32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)1} : () -> cpu_tensor<1xf32>
    (%6) = "scale_(phi_kernel)" (%2, %5) {bias:(Float)0,bias_after_scale:true,is_inplace:true,is_persistable:[false],kernel_key:<backend:GPU|layout:NCHW|dtype:float32>,kernel_name:"scale",op_name:"pd_op.scale_",stop_gradient:[false]} : (gpu_tensor<3x4x4xf32>, cpu_tensor<1xf32>) -> gpu_tensor<3x4x4xf32>
    () = "builtin.shadow_output" (%4) {output_name:"translated_layer/scale_0.tmp_0"} : (gpu_tensor<3x4x4xf32>) -> 
    () = "builtin.shadow_output" (%6) {output_name:"translated_layer/scale_1.tmp_0"} : (gpu_tensor<3x4x4xf32>) -> 
}

将kthvalue算子中的Value(%2)和scale_算子中的Value(%2)分别打印输出

kthvalue

3 2 0 0 1 4 2 2 2 0 4 4 1 1 3 1 2 1 3 4 4 4 3 4 4 3 1 2 0 0 1 1 0 0 1 4 1 1 4 3 0 3 0 3 0 1 2 3

scale_

4.2039e-45 0 2.8026e-45 0 0 0 0 0 1.4013e-45 0 5.60519e-45 0 2.8026e-45 0 2.8026e-45 0 2.8026e-45 0 0 0 5.60519e-45 0 5.60519e-45 0 1.4013e-45 0 1.4013e-45 0 4.2039e-45 0 1.4013e-45 0 2.8026e-45 0 1.4013e-45 0 4.2039e-45 0 5.60519e-45 0 5.60519e-45 0 5.60519e-45 0 4.2039e-45 0 5.60519e-45 0

猜测是数据类型错误导致的。
在paddle/phi/kernels/gpu/kthvalue_kernel.cu中indices的数据类型为

  int64_t* indices_data = dev_ctx.template Alloc<int64_t>(indices);

定位到InferMeta的问题。

  indices->set_dims(dims);
  indices->share_lod(x);
  indices->set_dtype(x.dtype());
}

paddle-bot · 2024-03-17T14:29:49Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot · 2024-03-17T14:29:51Z

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

kangguangli · 2024-03-18T03:16:55Z

这里确实存在infermeta的问题，不过还有几个问题：

为什么同一个变量指 %2，在kthvalue时打印是正确的int类型，在scale中打印就是float类型，这里你是怎么加的打印？是不是可能跟打印方式有关？
问题是从旧IR切换到新IR发生的，为什么在旧IR没问题？
注意到在save前我们也做了动静图的对比？save前有没有问题？为什么load后会有问题？

xingmingyyj · 2024-03-20T02:44:46Z

这里确实存在infermeta的问题，不过还有几个问题：

为什么同一个变量指 %2，在kthvalue时打印是正确的int类型，在scale中打印就是float类型，这里你是怎么加的打印？是不是可能跟打印方式有关？

问题是从旧IR切换到新IR发生的，为什么在旧IR没问题？

注意到在save前我们也做了动静图的对比？save前有没有问题？为什么load后会有问题？

这里加打印的方式是这样的,在kthvalue的kernel里面加入了下面的逻辑：

    auto output_size = indices->numel();
    auto u_size = sizeof(int64_t);
    int64_t* out_data = new int64_t[output_size];
    cudaMemcpy(out_data, indices->data(), u_size * output_size, cudaMemcpyDeviceToHost);
    std::cout<<"kthvalue"<<std::endl;
    for(int i = 0;i < output_size; i++){
      std::cout<<out_data[i]<< " ";
    }
    std::cout<<std::endl;

因为可以看到在kernel里面他申请的类型就是int64_t

int64_t* indices_data = dev_ctx.template Alloc<int64_t>(indices);

在‘scale’的kernel里面加了下面的逻辑：

template <typename T, typename Context>
void ScaleKernel(const Context& dev_ctx,
                 const DenseTensor& x,
                 const Scalar& scale,
                 float bias,
                 bool bias_after_scale,
                 DenseTensor* out) {


    auto output_size = x.numel();
    auto u_size = sizeof(T);
    T* out_data = new T[output_size];

    cudaMemcpy(out_data, x.data(), u_size * output_size, cudaMemcpyDeviceToHost);
    std::cout<<"x"<<std::endl;
    for(int i = 0;i < output_size; i++){
      std::cout<<out_data[i]<< " ";
    }
    std::cout<<std::endl;

打印出的结果就是上面Pr Description里面的结果，其实也就是说明这里的T是float32。
2. 旧IR下没有问题的原因可能和#58379 描述错误类似。旧IR下根据内存中实际保存数据的类型选择kernel。旧IR下的部分日志：

I0320 01:49:43.968546 215409 interpreter_util.cc:647] Build OpFuncNode from : scale
I0320 01:49:43.968559 215409 interpreter_util.cc:724] scale : [execution_stream, stream_priority, scheduling_priority] = [DefaultStream, 0, 0]
I0320 01:49:43.968565 215409 interpreter_util.cc:732] Start run Place(gpu:0) Op(scale), inputs:{ScaleTensor[], X[kthvalue_0.tmp_1:int64_t[3, 4, 4]({})(Place(gpu:0))]}, outputs:{Out[translated_layer/scale_1.tmp_0:[]({})()]}.
I0320 01:49:43.968577 215409 interpreter_util.cc:749] OP is not null
I0320 01:49:43.968580 215409 interpreter_util.cc:752] get op_with_kernel
I0320 01:49:43.968582 215409 interpreter_util.cc:757] get RuntimeContext
I0320 01:49:43.968585 215409 context_pool.cc:62] DeviceContextPool Get: Place(gpu:0)
I0320 01:49:43.968595 215409 interpreter_util.cc:786] expected_kernel_key : {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0320 01:49:43.968607 215409 operator.cc:2293] Kernel Signature - name: scale; inputs: X; attributes: scale, bias, bias_after_scale; outputs: Out
I0320 01:49:43.968618 215409 operator.cc:2279] op type:scale, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0320 01:49:43.968626 215409 operator.cc:2304] Static graph mode ChoosePhiKernel - kernel name: scale | kernel key: (GPU, Undefined(AnyLayout), int64) | kernel: {"input":["GPU, NCHW, int64"],"output":["GPU, NCHW, int64"],"attribute":["Scalar","float","bool"]}

可以发现他的选择是正确的。
新IR下的部分日志：

I0320 02:33:27.534058 218651 pd_op_to_kernel_pass.cc:2702] op name pd_op.scale
I0320 02:33:27.534071 218651 pd_op_to_kernel_pass.cc:971] Begin to infer kernel key from op_info_parser(defined by yaml info)
I0320 02:33:27.534077 218651 pd_op_to_kernel_pass.cc:975] Infer kernel data_type: [float32] from yaml info
I0320 02:33:27.534080 218651 pd_op_to_kernel_pass.cc:979] Infer kernel backend: [Undefined] from yaml info
I0320 02:33:27.534118 218651 pd_op_to_kernel_pass.cc:1005] Begin to infer kernel key from op operands

从scale的OpRunTimeInfo中可发现他是直接根据x的dtype选择的。
3. save前的两次执行都为动态图。

zoooo0820

LGTM

fix kthvalueinfermeta

e254e75

paddle-bot bot added the contributor External developers label Mar 17, 2024

kangguangli approved these changes Mar 20, 2024

View reviewed changes

zoooo0820 approved these changes Mar 20, 2024

View reviewed changes

kangguangli merged commit cb649c0 into PaddlePaddle:develop Mar 20, 2024
30 checks passed

xingmingyyj deleted the fix_kthvalue branch March 20, 2024 09:10

This was referenced Mar 23, 2024

[WeeklyReports] 2024.03.09~2024.03.22 周报汇总 PFCCLab/Camp#161

Closed

Fix paddle.mode and paddle.bincount API #62995

Closed

xingmingyyj mentioned this pull request Apr 29, 2024

Fix paddle.mode and paddle.bincount API #63970

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix KthvalueInferMeta #62801

fix KthvalueInferMeta #62801

xingmingyyj commented Mar 17, 2024 •

edited

Loading

paddle-bot bot commented Mar 17, 2024

paddle-bot bot commented Mar 17, 2024

kangguangli commented Mar 18, 2024 •

edited

Loading

xingmingyyj commented Mar 20, 2024 •

edited

Loading

zoooo0820 left a comment

fix KthvalueInferMeta #62801

fix KthvalueInferMeta #62801

Conversation

xingmingyyj commented Mar 17, 2024 • edited Loading

PR types

PR changes

Description

问题描述

paddle-bot bot commented Mar 17, 2024

paddle-bot bot commented Mar 17, 2024

kangguangli commented Mar 18, 2024 • edited Loading

xingmingyyj commented Mar 20, 2024 • edited Loading

zoooo0820 left a comment

Choose a reason for hiding this comment

xingmingyyj commented Mar 17, 2024 •

edited

Loading

kangguangli commented Mar 18, 2024 •

edited

Loading

xingmingyyj commented Mar 20, 2024 •

edited

Loading