Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PIR API adaptor No.143、144】 Migrate margin_cross_entropy、masked_multihead_attention #58762

Merged
merged 4 commits into from
Dec 26, 2023

Conversation

Tomoko-hjf
Copy link
Contributor

PR types

Others

PR changes

Others

Description

PIR API 推全升级
paddle.nn.functional.margin_cross_entropy 迁移升级至 pir,并更新单测,单测覆盖率:6/6
paddle.incubate.nn.functional.masked_multihead_attention 迁移升级至 pir,并更新单测,单测覆盖率:2/2

@paddle-bot paddle-bot bot added the contributor External developers label Nov 7, 2023
@luotao1 luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Nov 8, 2023
Copy link

paddle-ci-bot bot commented Nov 15, 2023

Sorry to inform you that b0fc204's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@MarioLulab
Copy link
Contributor

sorry ,因为这个 pr 没有关联到 issue #58067 里,所以之前一直忘记 review 这个 pr 了 😭 今天我 review 一下,请问健飞大佬最近还有时间推进这个 pr 的合入吗~

Copy link
Contributor

@MarioLulab MarioLulab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
辛苦 merge 最新的分支解决一下冲突~

@Tomoko-hjf
Copy link
Contributor Author

LGTM 辛苦 merge 最新的分支解决一下冲突~

已更新~

Copy link
Contributor

@MarioLulab MarioLulab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work~

test/legacy_test/test_masked_multihead_attention_op.py Outdated Show resolved Hide resolved
test/legacy_test/test_margin_cross_entropy_op.py Outdated Show resolved Hide resolved
test/legacy_test/test_margin_cross_entropy_op.py Outdated Show resolved Hide resolved
test/legacy_test/test_margin_cross_entropy_op.py Outdated Show resolved Hide resolved
test/legacy_test/test_margin_cross_entropy_op.py Outdated Show resolved Hide resolved
@MarioLulab
Copy link
Contributor

MarioLulab commented Dec 21, 2023

@0x45f @YuanRisheng 出现了问题,帮忙看一下?

本地运行单测出现错误:

具体出错单测为 :test/legacy_test/test_masked_multihead_attention_op.py 下的TestLayerNormStaticInt8Op

--- Running PIR pass [inplace_pass]
I1221 07:52:11.319885 676665 pass.cc:38] --- detected [0] subgraphs!
I1221 07:52:13.244624 676665 pir_interpreter.cc:1264] New Executor is Running ...
W1221 07:52:13.245051 676665 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W1221 07:52:13.279644 676665 gpu_resources.cc:164] device: 0, cuDNN Version: 8.6.


--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::framework::StandaloneExecutor::Run(std::vector<std::string, std::allocator<std::string > > const&, bool)
1   paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator<std::string > > const&, std::vector<phi::DenseTensor, std::allocator<phi::DenseTensor> > const&, bool, bool)
2   paddle::framework::PirInterpreter::Run(std::vector<std::string, std::allocator<std::string > > const&, std::vector<phi::DenseTensor, std::allocator<phi::DenseTensor> > const&, bool, bool)
3   paddle::framework::PirInterpreter::BuildInstruction()
4   paddle::framework::PhiKernelInstruction::PhiKernelInstruction(unsigned long, phi::Place const&, pir::Operation*, paddle::framework::ValueExecutionInfo const*)
5   void paddle::framework::BuildPhiContext<phi::InferMetaContext, phi::MetaTensor, phi::MetaTensor, paddle::small_vector<phi::MetaTensor, 15u>, paddle::small_vector<phi::MetaTensor, 15u>, false>(pir::Operation*, paddle::framework::ValueExecutionInfo const&, paddle::dialect::OpYamlInfoParser const&, phi::InferMetaContext*)
6   phi::DenseTensor const& paddle::framework::Variable::Get<phi::DenseTensor>() const

----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1703145133 (unix time) try "date -d @1703145133" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 676665 (TID 0x7f332e345740) from PID 0 ***]

复现环境

CUDA Version: 12.0
python==3.8

复现代码

# bugReproduce.py
import numpy as np

import paddle
from paddle.framework import core
from paddle.incubate.nn.functional import masked_multihead_attention
from paddle.pir_utils import test_with_pir_api

np.random.seed(0)
bsz = 2
cache_bsz = 2
num_head = 32
dim_head = 128
beam_size = 1
max_seq_len = 33
sequence_length = 32

x = np.random.uniform(
    -0.05, 0.05, [bsz, 3, num_head, dim_head]
)
bias = np.random.uniform(
    -0.05, 0.05, [3, num_head, dim_head]
)
src_mask = np.zeros([bsz, 1, 1, sequence_length + 1])

cum_offsets = None
sequence_lengths = None
rotary_tensor = None
beam_cache_offset = None

cache_kv_out = np.random.uniform(
    -0.05,
    0.05,
    [
        2,
        cache_bsz,
        num_head,
        sequence_length,
        dim_head,
    ],
)
numpy_ones = np.zeros(
    [2, cache_bsz, num_head, 1, dim_head]
)
cache_kv_mmha_out = np.concatenate(
    (cache_kv_out, numpy_ones), axis=3
)

qkv_out_scale = None
out_shift = None
out_smooth = None

seq_len = 1
rotary_emb_dims = 0
use_neox_rotary_style = False

out_scale = -1
quant_round_type = 1
quant_max_bound = 127
quant_min_bound = -127
place = paddle.CUDAPlace(0)

def check_main(
    x,
    bias,
    src_mask,
    cache_kv_out,
    cache_kv_mmha_out,
    qkv_out_scale,
    out_scale,
    dtype,
):
    paddle.enable_static()
    with paddle.static.program_guard(paddle.static.Program()):
        x_static = paddle.static.data(
            name="x_static",
            shape=[bsz, 3 * num_head * dim_head],
            dtype=dtype,
        )
        bias_static = paddle.static.data(
            name="bias_static",
            shape=[3, num_head, dim_head],
            dtype=dtype,
        )
        src_mask_static = paddle.static.data(
            name="src_mask_static",
            shape=[bsz, 1, 1, sequence_length + 1],
            dtype=dtype,
        )
        cache_kv_mmha_out_static = paddle.static.data(
            name="cache_kv_mmha_out_static",
            shape=[
                2,
                cache_bsz,
                num_head,
                sequence_length + 1,
                dim_head,
            ],
            dtype=dtype,
        )

        outs = masked_multihead_attention(
            x_static,
            cache_kv_mmha_out_static,
            bias_static,
            src_mask_static,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            32,
            0,
            False,
            "fp16",
            -1,
            1,
            127.0,
            -127.0,
        )
        exe = paddle.static.Executor(place)
        exe.run(
            feed={
                "x_static": x.reshape(bsz, -1).astype(dtype),
                "cache_kv_mmha_out_static": cache_kv_mmha_out.astype(dtype),
                "bias_static": bias.astype(dtype),
                "src_mask_static": src_mask.astype(dtype),
            },
        )


with paddle.pir_utils.IrGuard():
    check_main(
        x,
        bias,
        src_mask,
        cache_kv_out,
        cache_kv_mmha_out,
        qkv_out_scale,
        out_scale,
        'float16',
    )

排查过程

运行

GLOG_v=8 python3 bugReproduce.py

会出现错误:

  File "bugReproduce.py", line 123, in check_main
    exe.run(
  File "/home/aistudio/Paddle/build-gpu/python/paddle/base/executor.py", line 1763, in run
    res = self._run_pir_impl(
  File "/home/aistudio/Paddle/build-gpu/python/paddle/base/executor.py", line 2113, in _run_pir_impl
    ret = new_exe.run(list(feed.keys()), return_numpy)
  File "/home/aistudio/Paddle/build-gpu/python/paddle/base/executor.py", line 828, in run
    tensors = self._new_exe.run(
RuntimeError: (PreconditionNotMet) var() should exist in var_name_2_id_ (at /home/aistudio/Paddle/paddle/fluid/framework/new_executor/pir_interpreter.cc:819)

表明 pir 模式的执行器缺少运行时的 Variable
在旧ir静态图模式下运行不报错。猜测是由于 masked_multihead_attention 调用的是 masked_multihead_attention_ inplace 算子。算子会对 beam_cache_offset 输入进行 inplace 操作,但是在该单测下 beam_cache_offset 输入为 None

@0x45f
Copy link
Contributor

0x45f commented Dec 22, 2023

上面提到的问题提PR进行了修复,待 #60269 合入后再推进本PR的合入~

@0x45f
Copy link
Contributor

0x45f commented Dec 25, 2023

上面提到的问题提PR进行了修复,待 #60269 合入后再推进本PR的合入~

#60269 已合入~

@MarioLulab
Copy link
Contributor

辛苦 Merge 一下最新的分支~ 🥳

Copy link
Contributor

@MarioLulab MarioLulab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@0x45f 0x45f merged commit b4ff023 into PaddlePaddle:develop Dec 26, 2023
29 checks passed
Wanglongzhi2001 pushed a commit to Wanglongzhi2001/Paddle that referenced this pull request Jan 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers HappyOpenSource 快乐开源活动issue与PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants