Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix KthvalueInferMeta #62801

Merged
merged 1 commit into from
Mar 20, 2024
Merged

Conversation

xingmingyyj
Copy link
Contributor

@xingmingyyj xingmingyyj commented Mar 17, 2024

PR types

Others

PR changes

Others

Description

问题描述

下述动转静代码执行时

#!/bin/env python
# -*- coding: utf-8 -*-
# encoding=utf-8 vi:ts=4:sw=4:expandtab:ft=python
"""

"""

import numpy as np
import paddle

paddle.seed(33)
np.random.seed(33)

def compare(result, expect, delta=1e-10, rtol=1e-10):
    """
    比较函数
    :param result: 输入值
    :param expect: 输出值
    :param delta: 误差值
    :param rtol: 相对误差
    :return:
    """
    if isinstance(expect, paddle.Tensor) or isinstance(expect, np.ndarray):
        if isinstance(result, paddle.Tensor):
            result = result.numpy()
        if isinstance(expect, paddle.Tensor):
            expect = expect.numpy()
        res = np.allclose(result, expect, atol=delta, rtol=rtol, equal_nan=True)
        # 出错打印错误数据
        # if res is False:
        #     diff = abs(result - expect)
        #     logging.error("expect is: {}".format(expect))
        #     logging.error("result is: {}".format(result))
        #     logging.error("Output has diff! max diff: {}".format(np.amax(diff)))
        # if result.dtype != expect.dtype:
        #     logging.error(
        #         "Different output data types! res type is: {}, and expect type is: {}".format(
        #             result.dtype, expect.dtype
        #         )
        #     )
        assert res
        assert result.shape == expect.shape
        assert result.dtype == expect.dtype
    elif isinstance(expect, list) or isinstance(expect, tuple):
        for i, element in enumerate(expect):
            if isinstance(result, (np.generic, np.ndarray)) or isinstance(result, paddle.Tensor):
                if i > 0:
                    break
                compare(result, expect[i], delta, rtol)

            else:
                compare(result[i], expect[i], delta, rtol)
    elif isinstance(expect, (bool, int, float)):
        assert expect == result
    else:
        raise Exception("expect is unknown data struction in compare_tool!!!")

def randtool(dtype, low, high, shape):
    """
    np random tools
    """
    if dtype == "int":
        return np.random.randint(low, high, shape)

    elif dtype == "float":
        return low + (high - low) * np.random.random(shape)

def naive_func(a, in_params, func):
    """用于动转静的方法"""
    layer = eval(func)(**a, **in_params)
    return layer

func = "paddle.kthvalue"

in_tensor = {
    "x": paddle.to_tensor(randtool("float", -1, 1, shape=[5, 3, 4, 4]), dtype="float32"),
}

in_params = {
    "k": 3,
    "axis": 0,
}

paddle.seed(33)
obj = naive_func
dy_out = obj(in_tensor, in_params, func)

paddle.seed(33)
jit_obj = paddle.jit.to_static(obj)
st_out = jit_obj(in_tensor, in_params, func)
print("dy_out is: ", dy_out)
print("st_out is: ", st_out)

paddle.jit.save(jit_obj, path="kthvalue")
print("jit.save is successfully !!!")

paddle.seed(33)
jit = paddle.jit.load("kthvalue")
print("jit.load is successfully !!!")

paddle.seed(33)
inputs_key = sorted(in_tensor.keys())
inputs_value = []
for k in inputs_key:
    inputs_value.append(in_tensor[k])
# print('inputs_value is: ', inputs_value)
res = jit(*inputs_value)
print('jit.load res: ', res)

compare(dy_out, res, delta=1e-5, rtol=1e-6)

会遇到indices输出全为0的问题

jit.load res:  [Tensor(shape=[3, 4, 4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[[-0.07591926, -0.26354831, -0.17811839, -0.47940060],
         [ 0.60465443, -0.14411892, -0.11287902,  0.21788037],
         [ 0.56884265, -0.02682375,  0.06054568, -0.08810953],
         [ 0.46526086, -0.21340129, -0.29790798, -0.20623052]],

        [[-0.36299756, -0.06327001,  0.66013783,  0.06157992],
         [ 0.56318831,  0.28023970,  0.13557813,  0.00581930],
         [ 0.37604561, -0.56533492, -0.14192767,  0.02036260],
         [ 0.19907297,  0.44599527, -0.13094325,  0.02038879]],

        [[ 0.05510214, -0.33704653, -0.09510893, -0.41952190],
         [ 0.37856624, -0.42806000, -0.37419304, -0.17315947],
         [-0.27487597,  0.12684217,  0.42080349, -0.70518380],
         [-0.05522427,  0.13258798,  0.21601824,  0.56292760]]]), Tensor(shape=[3, 4, 4], dtype=float32, place=Place(gpu:0), stop_gradient=True,
       [[[0.00000000, 0.        , 0.00000000, 0.        ],
         [0.        , 0.        , 0.        , 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ]],

        [[0.00000000, 0.        , 0.        , 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ]],

        [[0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ],
         [0.00000000, 0.        , 0.00000000, 0.        ]]])]

静态图如下:

{
    (%0) = "data(phi_kernel)" () {dtype:(pd_op.DataType)bool,is_persistable:[false],kernel_key:<backend:GPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"data",name:"_jst.0.a.0",op_name:"pd_op.data",place:(pd_op.Place)Place(gpu:0),shape:(pd_op.IntArray)[],stop_gradient:[false]} : () -> gpu_tensor<5x3x4x4xf32>
    (%1, %2) = "kthvalue(phi_kernel)" (%0) {axis:(Int32)0,is_persistable:[false,false],k:(Int32)3,keepdim:false,kernel_key:<backend:GPU|layout:NCHW|dtype:float32>,kernel_name:"kthvalue",op_name:"pd_op.kthvalue",stop_gradient:[false,false]} : (gpu_tensor<5x3x4x4xf32>) -> gpu_tensor<3x4x4xf32>, gpu_tensor<3x4x4xf32>
    (%3) = "full(phi_kernel)" () {dtype:(pd_op.DataType)float32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)1} : () -> cpu_tensor<1xf32>
    (%4) = "scale_(phi_kernel)" (%1, %3) {bias:(Float)0,bias_after_scale:true,is_inplace:true,is_persistable:[false],kernel_key:<backend:GPU|layout:NCHW|dtype:float32>,kernel_name:"scale",op_name:"pd_op.scale_",stop_gradient:[false]} : (gpu_tensor<3x4x4xf32>, cpu_tensor<1xf32>) -> gpu_tensor<3x4x4xf32>
    (%5) = "full(phi_kernel)" () {dtype:(pd_op.DataType)float32,kernel_key:<backend:CPU|layout:Undefined(AnyLayout)|dtype:float32>,kernel_name:"full",op_name:"pd_op.full",place:(pd_op.Place)Place(cpu),shape:(pd_op.IntArray)[1],stop_gradient:[true],value:(Float)1} : () -> cpu_tensor<1xf32>
    (%6) = "scale_(phi_kernel)" (%2, %5) {bias:(Float)0,bias_after_scale:true,is_inplace:true,is_persistable:[false],kernel_key:<backend:GPU|layout:NCHW|dtype:float32>,kernel_name:"scale",op_name:"pd_op.scale_",stop_gradient:[false]} : (gpu_tensor<3x4x4xf32>, cpu_tensor<1xf32>) -> gpu_tensor<3x4x4xf32>
    () = "builtin.shadow_output" (%4) {output_name:"translated_layer/scale_0.tmp_0"} : (gpu_tensor<3x4x4xf32>) -> 
    () = "builtin.shadow_output" (%6) {output_name:"translated_layer/scale_1.tmp_0"} : (gpu_tensor<3x4x4xf32>) -> 
}

kthvalue算子中的Value(%2)和scale_算子中的Value(%2)分别打印输出

  • kthvalue
3 2 0 0 1 4 2 2 2 0 4 4 1 1 3 1 2 1 3 4 4 4 3 4 4 3 1 2 0 0 1 1 0 0 1 4 1 1 4 3 0 3 0 3 0 1 2 3
  • scale_
4.2039e-45 0 2.8026e-45 0 0 0 0 0 1.4013e-45 0 5.60519e-45 0 2.8026e-45 0 2.8026e-45 0 2.8026e-45 0 0 0 5.60519e-45 0 5.60519e-45 0 1.4013e-45 0 1.4013e-45 0 4.2039e-45 0 1.4013e-45 0 2.8026e-45 0 1.4013e-45 0 4.2039e-45 0 5.60519e-45 0 5.60519e-45 0 5.60519e-45 0 4.2039e-45 0 5.60519e-45 0

猜测是数据类型错误导致的。
paddle/phi/kernels/gpu/kthvalue_kernel.cuindices的数据类型为

  int64_t* indices_data = dev_ctx.template Alloc<int64_t>(indices);

定位到InferMeta的问题。

  indices->set_dims(dims);
  indices->share_lod(x);
  indices->set_dtype(x.dtype());
}

Copy link

paddle-bot bot commented Mar 17, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Mar 17, 2024
Copy link

paddle-bot bot commented Mar 17, 2024

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

@kangguangli
Copy link
Contributor

kangguangli commented Mar 18, 2024

这里确实存在infermeta的问题, 不过还有几个问题:

  1. 为什么同一个变量 指 %2,在kthvalue时打印是正确的int类型,在scale中打印就是float类型,这里你是怎么加的打印?是不是可能跟打印方式有关?
  2. 问题是从旧IR切换到新IR发生的,为什么在旧IR没问题?
  3. 注意到在save前我们也做了动静图的对比?save前有没有问题?为什么load后会有问题?

@xingmingyyj
Copy link
Contributor Author

xingmingyyj commented Mar 20, 2024

这里确实存在infermeta的问题, 不过还有几个问题:

  1. 为什么同一个变量 指 %2,在kthvalue时打印是正确的int类型,在scale中打印就是float类型,这里你是怎么加的打印?是不是可能跟打印方式有关?
  2. 问题是从旧IR切换到新IR发生的,为什么在旧IR没问题?
  3. 注意到在save前我们也做了动静图的对比?save前有没有问题?为什么load后会有问题?
  1. 这里加打印的方式是这样的,在kthvalue的kernel里面加入了下面的逻辑:
    auto output_size = indices->numel();
    auto u_size = sizeof(int64_t);
    int64_t* out_data = new int64_t[output_size];
    cudaMemcpy(out_data, indices->data(), u_size * output_size, cudaMemcpyDeviceToHost);
    std::cout<<"kthvalue"<<std::endl;
    for(int i = 0;i < output_size; i++){
      std::cout<<out_data[i]<< " ";
    }
    std::cout<<std::endl;

因为可以看到在kernel里面他申请的类型就是int64_t

int64_t* indices_data = dev_ctx.template Alloc<int64_t>(indices);

在‘scale’的kernel里面加了下面的逻辑:

template <typename T, typename Context>
void ScaleKernel(const Context& dev_ctx,
                 const DenseTensor& x,
                 const Scalar& scale,
                 float bias,
                 bool bias_after_scale,
                 DenseTensor* out) {


    auto output_size = x.numel();
    auto u_size = sizeof(T);
    T* out_data = new T[output_size];

    cudaMemcpy(out_data, x.data(), u_size * output_size, cudaMemcpyDeviceToHost);
    std::cout<<"x"<<std::endl;
    for(int i = 0;i < output_size; i++){
      std::cout<<out_data[i]<< " ";
    }
    std::cout<<std::endl;

打印出的结果就是上面Pr Description里面的结果,其实也就是说明这里的Tfloat32
2. 旧IR下没有问题的原因可能和#58379 描述错误类似。旧IR下根据内存中实际保存数据的类型选择kernel。旧IR下的部分日志:

I0320 01:49:43.968546 215409 interpreter_util.cc:647] Build OpFuncNode from : scale
I0320 01:49:43.968559 215409 interpreter_util.cc:724] scale : [execution_stream, stream_priority, scheduling_priority] = [DefaultStream, 0, 0]
I0320 01:49:43.968565 215409 interpreter_util.cc:732] Start run Place(gpu:0) Op(scale), inputs:{ScaleTensor[], X[kthvalue_0.tmp_1:int64_t[3, 4, 4]({})(Place(gpu:0))]}, outputs:{Out[translated_layer/scale_1.tmp_0:[]({})()]}.
I0320 01:49:43.968577 215409 interpreter_util.cc:749] OP is not null
I0320 01:49:43.968580 215409 interpreter_util.cc:752] get op_with_kernel
I0320 01:49:43.968582 215409 interpreter_util.cc:757] get RuntimeContext
I0320 01:49:43.968585 215409 context_pool.cc:62] DeviceContextPool Get: Place(gpu:0)
I0320 01:49:43.968595 215409 interpreter_util.cc:786] expected_kernel_key : {data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0320 01:49:43.968607 215409 operator.cc:2293] Kernel Signature - name: scale; inputs: X; attributes: scale, bias, bias_after_scale; outputs: Out
I0320 01:49:43.968618 215409 operator.cc:2279] op type:scale, expected_kernel_key:{data_type[int64_t]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[PLAIN]}
I0320 01:49:43.968626 215409 operator.cc:2304] Static graph mode ChoosePhiKernel - kernel name: scale | kernel key: (GPU, Undefined(AnyLayout), int64) | kernel: {"input":["GPU, NCHW, int64"],"output":["GPU, NCHW, int64"],"attribute":["Scalar","float","bool"]}

可以发现他的选择是正确的。
新IR下的部分日志:

I0320 02:33:27.534058 218651 pd_op_to_kernel_pass.cc:2702] op name pd_op.scale
I0320 02:33:27.534071 218651 pd_op_to_kernel_pass.cc:971] Begin to infer kernel key from op_info_parser(defined by yaml info)
I0320 02:33:27.534077 218651 pd_op_to_kernel_pass.cc:975] Infer kernel data_type: [float32] from yaml info
I0320 02:33:27.534080 218651 pd_op_to_kernel_pass.cc:979] Infer kernel backend: [Undefined] from yaml info
I0320 02:33:27.534118 218651 pd_op_to_kernel_pass.cc:1005] Begin to infer kernel key from op operands

scale的OpRunTimeInfo中可发现他是直接根据x的dtype选择的。
3. save前的两次执行都为动态图。

Copy link
Contributor

@zoooo0820 zoooo0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kangguangli kangguangli merged commit cb649c0 into PaddlePaddle:develop Mar 20, 2024
30 checks passed
@xingmingyyj xingmingyyj deleted the fix_kthvalue branch March 20, 2024 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants