【Hackathon 4th No.27】为 Paddle 新增 paddle.sparse.concat 稀疏 API #53872

lijingkai2023 · 2023-05-16T22:29:33Z

PR types

New features

PR changes

APIs

Description

完成第四期第24项目开发任务: https://github.com/PaddlePaddle/community/blob/master/hackthon_4th/%E3%80%90PaddlePaddle%20Hackathon%204%E3%80%91%20%E6%A0%B8%E5%BF%83%E6%A1%86%E6%9E%B6%E5%BC%80%E6%BA%90%E8%B4%A1%E7%8C%AE%20API%20%E5%BC%80%E5%8F%91%E4%BB%BB%E5%8A%A1%E5%90%88%E9%9B%86.md#task27

1、增加以稀疏矩阵列表为参数，自动生成动态图代码和注册逻辑（由于concat的第一个参数是稀疏矩阵列表，paddle框架当前不支持）
2、新增 paddle.sparse.concat 稀疏 API

RFC设计文档: PaddlePaddle/community#504
中文api文档:PaddlePaddle/docs#5886

[used AI Studio] 完成： c++算子以稀疏矩阵列表为参数，注册逻辑；GPU编译测试

paddle-bot · 2023-05-16T22:29:37Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot · 2023-05-16T22:29:40Z

❌ The PR is not created using PR's template. You can refer to this Demo.
Please use PR's template, it helps save our maintainers' time so that more developers get helped.

paddle/phi/api/lib/api_gen_utils.cc

paddle/phi/kernels/sparse/cpu/concat_grad_kernel.cc

paddle/phi/kernels/sparse/cpu/concat_kernel.cc

paddle/phi/kernels/sparse/gpu/concat_kernel.cu

lijingkai2023 · 2023-05-22T08:22:00Z

sparse_concat在静态图中调用时，报错：
File "C:\Users\desig\AppData\Local\Programs\Python\Python310\lib\site-packages\paddle\fluid\framework.py", line 2793, in init
for frame in traceback.extract_stack():

InvalidArgumentError: Operator sparse_concat's input x should contain only one variable.
  [Hint: Expected it->second.size() <= 1UL, but received it->second.size():2 > 1UL:1.] (at ..\paddle\fluid\framework\operator.cc:1129)
  [operator < sparse_concat > error]

根据日志输出判断，是在paddle\fluid\framework\operator.cc中，函数 GetExpectedPhiKernelArgs 语句 return (*arg_map_fn_)(arg_mapping_ctx); 中出错。
*arg_map_fn_ 是paddle\phi\ops\compat\generated_sparse_sig.cc 中函数 SparseConcatOpArgumentMapping，但是该函数未调用已经报错返回。

请问报错函数(文件paddle\fluid\framework\operator.cc中函数 InputVar),是在哪里被调用的？
这个问题有什么好的解决思路吗？

zyfncg · 2023-05-24T03:00:29Z

sparse_concat在静态图中调用时，报错： File "C:\Users\desig\AppData\Local\Programs\Python\Python310\lib\site-packages\paddle\fluid\framework.py", line 2793, in init for frame in traceback.extract_stack():
InvalidArgumentError: Operator sparse_concat's input x should contain only one variable.
  [Hint: Expected it->second.size() <= 1UL, but received it->second.size():2 > 1UL:1.] (at ..\paddle\fluid\framework\operator.cc:1129)
  [operator < sparse_concat > error]
根据日志输出判断，是在paddle\fluid\framework\operator.cc中，函数 GetExpectedPhiKernelArgs 语句 return (*arg_map_fn_)(arg_mapping_ctx); 中出错。 *arg_map_fn_ 是paddle\phi\ops\compat\generated_sparse_sig.cc 中函数 SparseConcatOpArgumentMapping，但是该函数未调用已经报错返回。

请问报错函数(文件paddle\fluid\framework\operator.cc中函数 InputVar),是在哪里被调用的？这个问题有什么好的解决思路吗？

这个问题需要加一些LOG信息来定位InputVar的具体调用位置，有可能是已经进入到了SparseConcatOpArgumentMapping中并调用了IsSparseCooTensorInput之类的函数。

lijingkai2023 · 2023-05-25T01:27:37Z

上面说的报错位置，就是通过增加LOG信息找到的。
在函数SparseConcatOpArgumentMapping开头，和IsSparseCooTensorInput开头都增加了LOG日志，但是未输出，所以判断未进入函数SparseConcatOpArgumentMapping，已经报错。
再往下找问题，没有思路了

paddle-ci-bot · 2023-05-29T03:16:52Z

Sorry to inform you that 63bbbc7's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

zhwesky2010 · 2023-06-01T14:48:51Z

@lijingkai2023 你好，由于CPU kernel已经跑通了，GPU kernel应该是kernel中的问题，具体有什么报错吗

luotao1 · 2023-06-02T00:37:56Z

@zhwesky2010 具体报错如下：

gpu报错 如下：
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   sparse::concat_ad_func(std::vector<paddle::Tensor, std::allocator<paddle::Tensor> > const&, paddle::experimental::ScalarBase<paddle::Tensor>)
1   paddle::experimental::sparse::concat(std::vector<paddle::Tensor, std::allocator<paddle::Tensor> > const&, paddle::experimental::ScalarBase<paddle::Tensor> const&)
2   phi::KernelImpl<void (*)(phi::GPUContext const&, std::vector<phi::SparseCooTensor const*, std::allocator<phi::SparseCooTensor const*> > const&, paddle::experimental::ScalarBase<phi::DenseTensor> const&, phi::SparseCooTensor*), &(void phi::sparse::ConcatCooKernel<float, phi::GPUContext>(phi::GPUContext const&, std::vector<phi::SparseCooTensor const*, std::allocator<phi::SparseCooTensor const*> > const&, paddle::experimental::ScalarBase<phi::DenseTensor> const&, phi::SparseCooTensor*))>::Compute(phi::KernelContext*)
3   void phi::sparse::ConcatCooKernel<float, phi::GPUContext>(phi::GPUContext const&, std::vector<phi::SparseCooTensor const*, std::allocator<phi::SparseCooTensor const*> > const&, paddle::experimental::ScalarBase<phi::DenseTensor> const&, phi::SparseCooTensor*)
4   phi::DDim::CopyFrom(phi::DDim const&)
 
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1684316520 (unix time) try "date -d @1684316520" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x48) received by PID 7618 (TID 0x7f8f715d8700) from PID 72 ***]
 
 
静态图调用报错 如下：
File "C:\Users\desig\AppData\Local\Programs\Python\Python310\lib\site-packages\paddle\fluid\framework.py", line 2793, in init
for frame in traceback.extract_stack():

InvalidArgumentError: Operator sparse_concat's input x should contain only one variable.
  [Hint: Expected it->second.size() <= 1UL, but received it->second.size():2 > 1UL:1.] (at ..\paddle\fluid\framework\operator.cc:1129)
  [operator < sparse_concat > error]
详细描述可以参考我在pr中提出的问题

lijingkai2023 · 2023-06-02T03:07:35Z

是的，可以进入，增加的log日志输出了。
以前也整了好几次都没有输出日志，估计是哪里弄错了吧
谢谢！

zhwesky2010 · 2023-06-02T03:22:25Z

@lijingkai2023 你好，静态图调用报错是由于目前生成时还有些机制问题。对于sparse算子任务来说当前也可以只做动态图的，因为算子都是动静复用的，静态图的单测目前也可以先不用写了。

动态图yaml生成这里我看你已经弄好了，因为CPU可以跑了，GPU 是纯kernel问题，和静态图无关，具体是ddim的CopyFrom这个函数，应该是触发了访问越界导致segment fault，所以还需要修一下ConcatCooKernel这个函数，属于任务的范围。

lijingkai2023 · 2023-06-02T03:24:26Z

好的
正在修改Gpu kernel

zhwesky2010 · 2023-06-02T03:25:25Z

python/paddle/fluid/tests/unittests/test_sparse_concat_op.py

+                self.check_result(i, [2, 3, 4, 2, 3, 4, 2, 3, 4], j + 1, 'coo')
+
+
+# class TestSparseConcatStatic(unittest.TestCase):


静态图单测目前可以先不用管了

zhwesky2010 · 2023-06-05T02:57:43Z

paddle/phi/kernels/sparse/gpu/concat_kernel.cu

+}
+
+template <typename T, typename Context>
+void ConcatCooKernel(const Context &dev_ctx,


具体是这个函数里有触发访问越界报错的问题

paddle-bot · 2023-06-05T03:53:16Z

很抱歉，经过我们的反复讨论，你的PR暂未达到合入标准，请阅读飞桨原生算子开发规范，你可以重新提交新的PR，我们先将此PR关闭，感谢你的贡献。
Sorry to inform you that through our discussion, your PR fails to meet the merging standard (Reference: Paddle Custom Operator Design Doc). You can also submit an new one. Thank you.

paddle-bot bot added contributor External developers status: proposed labels May 16, 2023

luotao1 assigned luotao1, zkh2016 and Ligoml May 17, 2023

luotao1 added the PaddlePaddle Hackathon label May 17, 2023

paddle-bot bot removed the status: proposed label May 17, 2023

zkh2016 reviewed May 17, 2023

View reviewed changes

lijingkai2023 mentioned this pull request May 18, 2023

【PaddlePaddle Hackathon 第四期】任务总览 #51281

Closed

lijingkai2023 force-pushed the develop branch from f91804c to 06da3e7 Compare May 19, 2023 00:21

luotao1 added the API label May 23, 2023

zhwesky2010 reviewed Jun 2, 2023

View reviewed changes

zhwesky2010 reviewed Jun 5, 2023

View reviewed changes

lijingkai2023 closed this Jun 5, 2023

lijingkai2023 force-pushed the develop branch 2 times, most recently from 1a99070 to 0b1086b Compare June 5, 2023 03:53

paddle-bot bot added the status: not progressed label Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon 4th No.27】为 Paddle 新增 paddle.sparse.concat 稀疏 API #53872

【Hackathon 4th No.27】为 Paddle 新增 paddle.sparse.concat 稀疏 API #53872

lijingkai2023 commented May 16, 2023

paddle-bot bot commented May 16, 2023

paddle-bot bot commented May 16, 2023

lijingkai2023 commented May 22, 2023

zyfncg commented May 24, 2023

lijingkai2023 commented May 25, 2023

paddle-ci-bot bot commented May 29, 2023

zhwesky2010 commented Jun 1, 2023 •

edited

luotao1 commented Jun 2, 2023

lijingkai2023 commented Jun 2, 2023

zhwesky2010 commented Jun 2, 2023 •

edited

lijingkai2023 commented Jun 2, 2023

zhwesky2010 Jun 2, 2023

lijingkai2023 Jun 5, 2023

zhwesky2010 Jun 5, 2023 •

edited

lijingkai2023 Jun 5, 2023

paddle-bot bot commented Jun 5, 2023

		self.check_result(i, [2, 3, 4, 2, 3, 4, 2, 3, 4], j + 1, 'coo')


		# class TestSparseConcatStatic(unittest.TestCase):

【Hackathon 4th No.27】为 Paddle 新增 paddle.sparse.concat 稀疏 API #53872

【Hackathon 4th No.27】为 Paddle 新增 paddle.sparse.concat 稀疏 API #53872

Conversation

lijingkai2023 commented May 16, 2023

PR types

PR changes

Description

paddle-bot bot commented May 16, 2023

paddle-bot bot commented May 16, 2023

lijingkai2023 commented May 22, 2023

zyfncg commented May 24, 2023

lijingkai2023 commented May 25, 2023

paddle-ci-bot bot commented May 29, 2023

zhwesky2010 commented Jun 1, 2023 • edited

luotao1 commented Jun 2, 2023

lijingkai2023 commented Jun 2, 2023

zhwesky2010 commented Jun 2, 2023 • edited

lijingkai2023 commented Jun 2, 2023

zhwesky2010 Jun 2, 2023

Choose a reason for hiding this comment

lijingkai2023 Jun 5, 2023

Choose a reason for hiding this comment

zhwesky2010 Jun 5, 2023 • edited

Choose a reason for hiding this comment

lijingkai2023 Jun 5, 2023

Choose a reason for hiding this comment

paddle-bot bot commented Jun 5, 2023

zhwesky2010 commented Jun 1, 2023 •

edited

zhwesky2010 commented Jun 2, 2023 •

edited

zhwesky2010 Jun 5, 2023 •

edited