Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【PIR API adaptor No.4、9】Migrate affine_grid,argsort into pir #59661

Merged
merged 31 commits into from
Dec 22, 2023

Conversation

enkilee
Copy link
Contributor

@enkilee enkilee commented Dec 4, 2023

PR types

Others

PR changes

APIs

Description

paddle.nn.functional.affine_grid 迁移升级至 pir,并更新单测
paddle.argsort 迁移升级至 pir,并更新单测

对于affine_grid,在PIR下对齐了python端的gpudnn选择逻辑:
image

Copy link

paddle-bot bot commented Dec 4, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Dec 4, 2023
@luotao1 luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Dec 5, 2023
python/paddle/nn/functional/vision.py Outdated Show resolved Hide resolved
python/paddle/nn/functional/vision.py Outdated Show resolved Hide resolved
@enkilee
Copy link
Contributor Author

enkilee commented Dec 13, 2023

@MarioLulab 佬,argsort的静态单测报错,没调对。麻烦有空看下

2023-12-13 14:06:40 Traceback (most recent call last):
2023-12-13 14:06:40   File "/workspace/Paddle/build/python/paddle/pir_utils.py", line 113, in impl
2023-12-13 14:06:40     func(*args, **kwargs)
2023-12-13 14:06:40   File "/workspace/Paddle/build/test/legacy_test/test_argsort_op.py", line 404, in test_api_static1
2023-12-13 14:06:40     result = exe.run(
2023-12-13 14:06:40   File "/workspace/Paddle/build/python/paddle/base/executor.py", line 1772, in run
2023-12-13 14:06:40     res = self._run_impl(
2023-12-13 14:06:40   File "/workspace/Paddle/build/python/paddle/base/executor.py", line 1978, in _run_impl
2023-12-13 14:06:40     ret = new_exe.run(
2023-12-13 14:06:40   File "/workspace/Paddle/build/python/paddle/base/executor.py", line 827, in run
2023-12-13 14:06:40     tensors = self._new_exe.run(
2023-12-13 14:06:40 RuntimeError: 
2023-12-13 14:06:40 
2023-12-13 14:06:40 --------------------------------------
2023-12-13 14:06:40 C++ Traceback (most recent call last):
2023-12-13 14:06:40 --------------------------------------
2023-12-13 14:06:40 0   paddle::framework::StandaloneExecutor::Run(std::vector<std::string, std::allocator<std::string > > const&, bool)
2023-12-13 14:06:40 1   paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool)
2023-12-13 14:06:40 2   paddle::framework::ProgramInterpreter::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool)
2023-12-13 14:06:40 3   paddle::framework::ProgramInterpreter::Build(std::vector<std::string, std::allocator<std::string > > const&, std::vector<paddle::framework::OpFuncNode, std::allocator<paddle::framework::OpFuncNode> >*)
2023-12-13 14:06:40 4   paddle::framework::interpreter::BuildOpFuncList(phi::Place const&, paddle::framework::BlockDesc const&, std::set<std::string, std::less<std::string >, std::allocator<std::string > > const&, std::vector<paddle::framework::OpFuncNode, std::allocator<paddle::framework::OpFuncNode> >*, paddle::framework::VariableScope*, paddle::framework::interpreter::ExecutionConfig const&, std::vector<std::function<void (paddle::framework::OperatorBase*, paddle::framework::Scope*)>, std::allocator<std::function<void (paddle::framework::OperatorBase*, paddle::framework::Scope*)> > > const&, std::vector<std::function<void (paddle::framework::OperatorBase*, paddle::framework::Scope*)>, std::allocator<std::function<void (paddle::framework::OperatorBase*, paddle::framework::Scope*)> > > const&, bool, bool)
2023-12-13 14:06:40 5   paddle::framework::interpreter::BuildVariableMap(std::map<std::string, std::vector<std::string, std::allocator<std::string > >, std::less<std::string >, std::allocator<std::pair<std::string const, std::vector<std::string, std::allocator<std::string > > > > > const&, paddle::framework::VariableScope*, paddle::framework::Scope*, bool, bool)
2023-12-13 14:06:40 6   paddle::framework::VariableScope::VarId(std::string const&) const
2023-12-13 14:06:40 7   phi::enforce::EnforceNotMet::EnforceNotMet(common::ErrorSummary const&, char const*, int)
2023-12-13 14:06:40 8   phi::enforce::GetCurrentTraceBackString[abi:cxx11](bool)
2023-12-13 14:06:40 
2023-12-13 14:06:40 ----------------------
2023-12-13 14:06:40 Error Message Summary:
2023-12-13 14:06:40 ----------------------
2023-12-13 14:06:40 NotFoundError: argsort_0.tmp_1 not in VariableScope.
2023-12-13 14:06:40   [Hint: Expected HasVar(name) == true, but received HasVar(name):0 != true:1.] (at ../paddle/fluid/framework/new_executor/new_executor_defs.cc:148)

Copy link
Contributor

@MarioLulab MarioLulab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work ~

  1. argsort 在 test/legacy_test/test_affine_grid_op.py 文件下还有 TestArgsortOpCPU 及其派生类单测。涉及反向计算,可以在 pr 描述里写上暂不支持 pir,并更新单侧覆盖率
  2. argsort 在 test/legacy_test/test_zero_dim_tensor.py 文件下还有 TestSundryAPIStatic.test_argsort 单测,麻烦适配一下 pir 吧~
  3. affine_grid 在 test/legacy_test/test_affine_grid_function.py 文件下的 AffineGridTestCase 里还有相关的静态图单测,麻烦一起适配一下 pir 吧
  4. affine_grid 在 test/legacy_test/test_layers.py 文件下的 TestBook.test_affine_grid 也需要适配 Pir

test/legacy_test/test_argsort_op.py Outdated Show resolved Hide resolved
test/legacy_test/test_argsort_op.py Outdated Show resolved Hide resolved
test/legacy_test/test_argsort_op.py Outdated Show resolved Hide resolved
test/legacy_test/test_argsort_op.py Outdated Show resolved Hide resolved
test/legacy_test/test_argsort_op.py Outdated Show resolved Hide resolved
test/legacy_test/test_argsort_op.py Show resolved Hide resolved
Copy link
Contributor

@MarioLulab MarioLulab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work~
但还有一些小问题

test/legacy_test/test_affine_grid_function.py Outdated Show resolved Hide resolved
test/legacy_test/test_affine_grid_function.py Outdated Show resolved Hide resolved
test/legacy_test/test_affine_grid_function.py Show resolved Hide resolved
@MarioLulab
Copy link
Contributor

TestBook.test_affine_grid 单测中, pir 模式下 affine_grid 不支持 out_shape 是动态 shape。会报错:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc


--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::pybind::ThrowExceptionToPython(std::__exception_ptr::exception_ptr)

----------------------
Error Message Summary:
----------------------
FatalError: `Process abort signal` is detected by the operating system.
  [TimeInfo: *** Aborted at 1702621654 (unix time) try "date -d @1702621654" if you are using GNU date ***]
  [SignalInfo: *** SIGABRT (@0x7f3) received by PID 2035 (TID 0x7f1ebc14b740) from PID 2035 ***]

定位到:AffineGridOp::Build 函数里使用 common::product 计算 output_shape_size 时,因为 dims 第一维是 -1 导致的异常返回值:I1215 06:27:34.333279 2035 pd_op.cc:4148] output_shape_size 18446744073709551614。

问题仍在排查 🚀

@MarioLulab
Copy link
Contributor

MarioLulab commented Dec 18, 2023

辛苦解决一下冲突~
上述问题应该在 #60059 已经解决

@0x45f
Copy link
Contributor

0x45f commented Dec 19, 2023

提PR修复了affine_grid kernel选择gpudnn kernel有误的问题,待 #60153 合入后再尝试推进该PR的合入~

@0x45f
Copy link
Contributor

0x45f commented Dec 20, 2023

提PR修复了affine_grid kernel选择gpudnn kernel有误的问题,待 #60153 合入后再尝试推进该PR的合入~

辛苦按照 #60153 的修改在本pr中修改一下吧。#60153 无法通过覆盖率ci,无法合入~

@enkilee
Copy link
Contributor Author

enkilee commented Dec 21, 2023

收到

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI 显示挂在了 TestArgsortOpCPU.setUp 函数里,ci 报的错误是未切换到静态图模式。我本地没法复现 😢 试试在 TestArgsortOpCPU.setUp 函数最开头加上 paddle.enable_static() ,看看 ci 是否还会报错

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

收到

Copy link
Contributor

@MarioLulab MarioLulab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@0x45f 0x45f merged commit df4c1f1 into PaddlePaddle:develop Dec 22, 2023
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers HappyOpenSource 快乐开源活动issue与PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants