-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PIR API adaptor No.143、144】 Migrate margin_cross_entropy、masked_multihead_attention #58762
Conversation
Sorry to inform you that b0fc204's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
sorry ,因为这个 pr 没有关联到 issue #58067 里,所以之前一直忘记 review 这个 pr 了 😭 今天我 review 一下,请问健飞大佬最近还有时间推进这个 pr 的合入吗~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
辛苦 merge 最新的分支解决一下冲突~
已更新~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work~
@0x45f @YuanRisheng 出现了问题,帮忙看一下? 本地运行单测出现错误:具体出错单测为 :test/legacy_test/test_masked_multihead_attention_op.py 下的TestLayerNormStaticInt8Op --- Running PIR pass [inplace_pass]
I1221 07:52:11.319885 676665 pass.cc:38] --- detected [0] subgraphs!
I1221 07:52:13.244624 676665 pir_interpreter.cc:1264] New Executor is Running ...
W1221 07:52:13.245051 676665 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W1221 07:52:13.279644 676665 gpu_resources.cc:164] device: 0, cuDNN Version: 8.6.
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0 paddle::framework::StandaloneExecutor::Run(std::vector<std::string, std::allocator<std::string > > const&, bool)
1 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator<std::string > > const&, std::vector<phi::DenseTensor, std::allocator<phi::DenseTensor> > const&, bool, bool)
2 paddle::framework::PirInterpreter::Run(std::vector<std::string, std::allocator<std::string > > const&, std::vector<phi::DenseTensor, std::allocator<phi::DenseTensor> > const&, bool, bool)
3 paddle::framework::PirInterpreter::BuildInstruction()
4 paddle::framework::PhiKernelInstruction::PhiKernelInstruction(unsigned long, phi::Place const&, pir::Operation*, paddle::framework::ValueExecutionInfo const*)
5 void paddle::framework::BuildPhiContext<phi::InferMetaContext, phi::MetaTensor, phi::MetaTensor, paddle::small_vector<phi::MetaTensor, 15u>, paddle::small_vector<phi::MetaTensor, 15u>, false>(pir::Operation*, paddle::framework::ValueExecutionInfo const&, paddle::dialect::OpYamlInfoParser const&, phi::InferMetaContext*)
6 phi::DenseTensor const& paddle::framework::Variable::Get<phi::DenseTensor>() const
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
[TimeInfo: *** Aborted at 1703145133 (unix time) try "date -d @1703145133" if you are using GNU date ***]
[SignalInfo: *** SIGSEGV (@0x0) received by PID 676665 (TID 0x7f332e345740) from PID 0 ***] 复现环境CUDA Version: 12.0 复现代码# bugReproduce.py
import numpy as np
import paddle
from paddle.framework import core
from paddle.incubate.nn.functional import masked_multihead_attention
from paddle.pir_utils import test_with_pir_api
np.random.seed(0)
bsz = 2
cache_bsz = 2
num_head = 32
dim_head = 128
beam_size = 1
max_seq_len = 33
sequence_length = 32
x = np.random.uniform(
-0.05, 0.05, [bsz, 3, num_head, dim_head]
)
bias = np.random.uniform(
-0.05, 0.05, [3, num_head, dim_head]
)
src_mask = np.zeros([bsz, 1, 1, sequence_length + 1])
cum_offsets = None
sequence_lengths = None
rotary_tensor = None
beam_cache_offset = None
cache_kv_out = np.random.uniform(
-0.05,
0.05,
[
2,
cache_bsz,
num_head,
sequence_length,
dim_head,
],
)
numpy_ones = np.zeros(
[2, cache_bsz, num_head, 1, dim_head]
)
cache_kv_mmha_out = np.concatenate(
(cache_kv_out, numpy_ones), axis=3
)
qkv_out_scale = None
out_shift = None
out_smooth = None
seq_len = 1
rotary_emb_dims = 0
use_neox_rotary_style = False
out_scale = -1
quant_round_type = 1
quant_max_bound = 127
quant_min_bound = -127
place = paddle.CUDAPlace(0)
def check_main(
x,
bias,
src_mask,
cache_kv_out,
cache_kv_mmha_out,
qkv_out_scale,
out_scale,
dtype,
):
paddle.enable_static()
with paddle.static.program_guard(paddle.static.Program()):
x_static = paddle.static.data(
name="x_static",
shape=[bsz, 3 * num_head * dim_head],
dtype=dtype,
)
bias_static = paddle.static.data(
name="bias_static",
shape=[3, num_head, dim_head],
dtype=dtype,
)
src_mask_static = paddle.static.data(
name="src_mask_static",
shape=[bsz, 1, 1, sequence_length + 1],
dtype=dtype,
)
cache_kv_mmha_out_static = paddle.static.data(
name="cache_kv_mmha_out_static",
shape=[
2,
cache_bsz,
num_head,
sequence_length + 1,
dim_head,
],
dtype=dtype,
)
outs = masked_multihead_attention(
x_static,
cache_kv_mmha_out_static,
bias_static,
src_mask_static,
None,
None,
None,
None,
None,
None,
None,
32,
0,
False,
"fp16",
-1,
1,
127.0,
-127.0,
)
exe = paddle.static.Executor(place)
exe.run(
feed={
"x_static": x.reshape(bsz, -1).astype(dtype),
"cache_kv_mmha_out_static": cache_kv_mmha_out.astype(dtype),
"bias_static": bias.astype(dtype),
"src_mask_static": src_mask.astype(dtype),
},
)
with paddle.pir_utils.IrGuard():
check_main(
x,
bias,
src_mask,
cache_kv_out,
cache_kv_mmha_out,
qkv_out_scale,
out_scale,
'float16',
) 排查过程运行 GLOG_v=8 python3 bugReproduce.py 会出现错误: File "bugReproduce.py", line 123, in check_main
exe.run(
File "/home/aistudio/Paddle/build-gpu/python/paddle/base/executor.py", line 1763, in run
res = self._run_pir_impl(
File "/home/aistudio/Paddle/build-gpu/python/paddle/base/executor.py", line 2113, in _run_pir_impl
ret = new_exe.run(list(feed.keys()), return_numpy)
File "/home/aistudio/Paddle/build-gpu/python/paddle/base/executor.py", line 828, in run
tensors = self._new_exe.run(
RuntimeError: (PreconditionNotMet) var() should exist in var_name_2_id_ (at /home/aistudio/Paddle/paddle/fluid/framework/new_executor/pir_interpreter.cc:819) 表明 pir 模式的执行器缺少运行时的 Variable |
上面提到的问题提PR进行了修复,待 #60269 合入后再推进本PR的合入~ |
辛苦 Merge 一下最新的分支~ 🥳 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Others
Description
PIR API 推全升级
将
paddle.nn.functional.margin_cross_entropy
迁移升级至 pir,并更新单测,单测覆盖率:6/6将
paddle.incubate.nn.functional.masked_multihead_attention
迁移升级至 pir,并更新单测,单测覆盖率:2/2