-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PIR API adaptor No.4、9】Migrate affine_grid,argsort into pir #59661
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
@MarioLulab 佬,argsort的静态单测报错,没调对。麻烦有空看下 2023-12-13 14:06:40 Traceback (most recent call last):
2023-12-13 14:06:40 File "/workspace/Paddle/build/python/paddle/pir_utils.py", line 113, in impl
2023-12-13 14:06:40 func(*args, **kwargs)
2023-12-13 14:06:40 File "/workspace/Paddle/build/test/legacy_test/test_argsort_op.py", line 404, in test_api_static1
2023-12-13 14:06:40 result = exe.run(
2023-12-13 14:06:40 File "/workspace/Paddle/build/python/paddle/base/executor.py", line 1772, in run
2023-12-13 14:06:40 res = self._run_impl(
2023-12-13 14:06:40 File "/workspace/Paddle/build/python/paddle/base/executor.py", line 1978, in _run_impl
2023-12-13 14:06:40 ret = new_exe.run(
2023-12-13 14:06:40 File "/workspace/Paddle/build/python/paddle/base/executor.py", line 827, in run
2023-12-13 14:06:40 tensors = self._new_exe.run(
2023-12-13 14:06:40 RuntimeError:
2023-12-13 14:06:40
2023-12-13 14:06:40 --------------------------------------
2023-12-13 14:06:40 C++ Traceback (most recent call last):
2023-12-13 14:06:40 --------------------------------------
2023-12-13 14:06:40 0 paddle::framework::StandaloneExecutor::Run(std::vector<std::string, std::allocator<std::string > > const&, bool)
2023-12-13 14:06:40 1 paddle::framework::InterpreterCore::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool)
2023-12-13 14:06:40 2 paddle::framework::ProgramInterpreter::Run(std::vector<std::string, std::allocator<std::string > > const&, bool, bool, bool)
2023-12-13 14:06:40 3 paddle::framework::ProgramInterpreter::Build(std::vector<std::string, std::allocator<std::string > > const&, std::vector<paddle::framework::OpFuncNode, std::allocator<paddle::framework::OpFuncNode> >*)
2023-12-13 14:06:40 4 paddle::framework::interpreter::BuildOpFuncList(phi::Place const&, paddle::framework::BlockDesc const&, std::set<std::string, std::less<std::string >, std::allocator<std::string > > const&, std::vector<paddle::framework::OpFuncNode, std::allocator<paddle::framework::OpFuncNode> >*, paddle::framework::VariableScope*, paddle::framework::interpreter::ExecutionConfig const&, std::vector<std::function<void (paddle::framework::OperatorBase*, paddle::framework::Scope*)>, std::allocator<std::function<void (paddle::framework::OperatorBase*, paddle::framework::Scope*)> > > const&, std::vector<std::function<void (paddle::framework::OperatorBase*, paddle::framework::Scope*)>, std::allocator<std::function<void (paddle::framework::OperatorBase*, paddle::framework::Scope*)> > > const&, bool, bool)
2023-12-13 14:06:40 5 paddle::framework::interpreter::BuildVariableMap(std::map<std::string, std::vector<std::string, std::allocator<std::string > >, std::less<std::string >, std::allocator<std::pair<std::string const, std::vector<std::string, std::allocator<std::string > > > > > const&, paddle::framework::VariableScope*, paddle::framework::Scope*, bool, bool)
2023-12-13 14:06:40 6 paddle::framework::VariableScope::VarId(std::string const&) const
2023-12-13 14:06:40 7 phi::enforce::EnforceNotMet::EnforceNotMet(common::ErrorSummary const&, char const*, int)
2023-12-13 14:06:40 8 phi::enforce::GetCurrentTraceBackString[abi:cxx11](bool)
2023-12-13 14:06:40
2023-12-13 14:06:40 ----------------------
2023-12-13 14:06:40 Error Message Summary:
2023-12-13 14:06:40 ----------------------
2023-12-13 14:06:40 NotFoundError: argsort_0.tmp_1 not in VariableScope.
2023-12-13 14:06:40 [Hint: Expected HasVar(name) == true, but received HasVar(name):0 != true:1.] (at ../paddle/fluid/framework/new_executor/new_executor_defs.cc:148) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work ~
- argsort 在 test/legacy_test/test_affine_grid_op.py 文件下还有 TestArgsortOpCPU 及其派生类单测。涉及反向计算,可以在 pr 描述里写上暂不支持 pir,并更新单侧覆盖率
- argsort 在 test/legacy_test/test_zero_dim_tensor.py 文件下还有 TestSundryAPIStatic.test_argsort 单测,麻烦适配一下 pir 吧~
- affine_grid 在 test/legacy_test/test_affine_grid_function.py 文件下的 AffineGridTestCase 里还有相关的静态图单测,麻烦一起适配一下 pir 吧
- affine_grid 在 test/legacy_test/test_layers.py 文件下的 TestBook.test_affine_grid 也需要适配 Pir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work~
但还有一些小问题
TestBook.test_affine_grid 单测中, pir 模式下 affine_grid 不支持 out_shape 是动态 shape。会报错: terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0 paddle::pybind::ThrowExceptionToPython(std::__exception_ptr::exception_ptr)
----------------------
Error Message Summary:
----------------------
FatalError: `Process abort signal` is detected by the operating system.
[TimeInfo: *** Aborted at 1702621654 (unix time) try "date -d @1702621654" if you are using GNU date ***]
[SignalInfo: *** SIGABRT (@0x7f3) received by PID 2035 (TID 0x7f1ebc14b740) from PID 2035 ***] 定位到:AffineGridOp::Build 函数里使用 common::product 计算 output_shape_size 时,因为 dims 第一维是 -1 导致的异常返回值:I1215 06:27:34.333279 2035 pd_op.cc:4148] output_shape_size 18446744073709551614。 问题仍在排查 🚀 |
辛苦解决一下冲突~ |
提PR修复了affine_grid kernel选择gpudnn kernel有误的问题,待 #60153 合入后再尝试推进该PR的合入~ |
收到 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI 显示挂在了 TestArgsortOpCPU.setUp 函数里,ci 报的错误是未切换到静态图模式。我本地没法复现 😢 试试在 TestArgsortOpCPU.setUp 函数最开头加上 paddle.enable_static() ,看看 ci 是否还会报错
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
APIs
Description
No.4 paddle.nn.functional.affine_grid
No.9 paddle.argsort
将
paddle.nn.functional.affine_grid
迁移升级至 pir,并更新单测将
paddle.argsort
迁移升级至 pir,并更新单测对于affine_grid,在PIR下对齐了python端的gpudnn选择逻辑: