-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PaddlePaddle Hackathon 2】3、为 Paddle 新增 corrcoef(皮尔逊积矩相关系数) API #40690
Conversation
hi~有一些细节需要注意一下 @liqitong-a |
PR格式检查通过,你的PR将接受Paddle专家以及开源社区的review,请及时关注PR动态。 |
好的好的 改好啦 |
python/paddle/tensor/linalg.py
Outdated
|
||
def corrcoef(x, rowvar=True, ddof=False, name=None): | ||
""" | ||
Return Pearson product-moment correlation coefficients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
文档不要照抄numpy吧,自己尝试写一下能不能更清楚一些
python/paddle/tensor/linalg.py
Outdated
|
||
x(Tensor): A N-D(N<=2) Tensor containing multiple variables and observations. By default, each row of x represents a variable. Also see rowvar below. | ||
rowvar(Bool, optional): If rowvar is True (default), then each row represents a variable, with observations in the columns. Default: True | ||
ddof(Bool, optional): Has no effect, do not use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果已经deprecated,那为什么还要保留?
python/paddle/tensor/linalg.py
Outdated
|
||
""" | ||
|
||
if ddof is not False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
python/paddle/tensor/linalg.py
Outdated
warnings.warn('ddof have no effect and are deprecated', | ||
DeprecationWarning) | ||
c = cov(x, rowvar) | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
代码中最好分类处理,不要用try,否则有其他错误也不容易发现
self.shape = [20, 10] | ||
|
||
def test_tensor_corr_default(self): | ||
typelist = ['float64'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果支持复数,请添加复数类型测试
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果支持复数,请添加复数类型测试
我想问一下,numpy里的cov支持复数,paddle里的cov不支持复数,corrcoef是再cov的基础上写的,paddle的corrcoef需要支持复数吗,如果需要的话,那cov是需要改成支持复数的吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我去看了一下,基础操作现在复数支持还不完善,那可以先不添加复数测试了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typelist是否需要补充fp32,
你的PR有最新反馈,请及时修改。 |
@@ -0,0 +1,116 @@ | |||
# Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2019->2022
self.assertRaises(ValueError, test_err) | ||
|
||
|
||
class Corr_Test4(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
每个test类添加一下注释,
增加不支持的数据类型测试案例
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
每个test类添加一下注释, 增加不支持的数据类型测试案例
我改好啦,麻烦看一下哦
tensor = paddle.to_tensor(np_arr, place=p) | ||
corr = paddle.linalg.corrcoef(tensor) | ||
np_corr = numpy_corr(np_arr, rowvar=True, dtype=dtype) | ||
self.assertTrue(np.allclose(np_corr, corr.numpy(), atol=1.e-2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议float64和float32分开写单测,两者精度差距比较大 还是分开设置阈值好些,float64不要改rtol,float32改成rtol=1e-4或-5看过不过得了CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我试了一下,本地测改成float32改成-5是可以的,提交时候测就过不了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我试了一下,本地测改成float32改成-5是可以的,提交时候测就过不了
按道理是不会的,可能是你本地只是某种环境,但是CI会测试 Linux/mac/windows下mkl/openblas/GPU库的多种组合,某种case下过不了。具体挂的是哪条测试呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我试了一下,本地测改成float32改成-5是可以的,提交时候测就过不了
按道理是不会的,可能是你本地只是某种环境,但是CI会测试 Linux/mac/windows下mkl/openblas/GPU库的多种组合,某种case下过不了。具体挂的是哪条测试呢
挂的是这个 self.assertTrue(np.allclose(np_corr, corr.numpy(), atol=1.e-2)) ,当使用float32的时候
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我试了一下,本地测改成float32改成-5是可以的,提交时候测就过不了
按道理是不会的,可能是你本地只是某种环境,但是CI会测试 Linux/mac/windows下mkl/openblas/GPU库的多种组合,某种case下过不了。具体挂的是哪条测试呢
挂的是这个 self.assertTrue(np.allclose(np_corr, corr.numpy(), atol=1.e-2)) ,当使用float32的时候
我是说挂的哪条流水线
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果1e-5不行的话,1e-4也行吧,麻烦重新提交下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我试了一下,本地测改成float32改成-5是可以的,提交时候测就过不了
按道理是不会的,可能是你本地只是某种环境,但是CI会测试 Linux/mac/windows下mkl/openblas/GPU库的多种组合,某种case下过不了。具体挂的是哪条测试呢
挂的是这个 self.assertTrue(np.allclose(np_corr, corr.numpy(), atol=1.e-2)) ,当使用float32的时候
我是说挂的哪条流水线
挂的是PR-CI-Static-Check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果1e-5不行的话,1e-4也行吧,麻烦重新提交下
我重新试一下
self.shape = [20, 10] | ||
|
||
def test_tensor_corr_default(self): | ||
typelist = ['float64', 'float32'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
还是过不了,float32精度是要差一些,就测float64精度吧,你找找最小的atol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
还是过不了,float32精度是要差一些,就测float64精度吧,你找找最小的atol
好的好的 我调整一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
你好,麻烦float64不要设置atol,float32设置1e-5,分开测再提交一下
float64我测试过的 是可以过的,不过float32设置1e-3一下都不行的。 |
你按这个标准来提交就可以,后面我们会处理 |
请问是把float32和float64放两个文件测吗 |
就是分两个 assertTrue |
@liqitong-a 代码风格检查没有过 |
这样子可以吗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for docs
从PR-CI-Coverage的历史记录看 https://xly.bce.baidu.com/paddlepaddle/paddle/newipipe/builds/6512?module=github/PaddlePaddle/Paddle&pipeline=PR-CI-Coverage&branch=pull/40690(develop) 单测存在超时情况,可以考虑缩小输入数据维度来避免 |
我改了一下,可以了 |
@@ -3181,3 +3182,72 @@ def lstsq(x, y, rcond=None, driver=None, name=None): | |||
singular_values = paddle.static.data(name='singular_values', shape=[0]) | |||
|
|||
return solution, residuals, rank, singular_values | |||
|
|||
|
|||
def corrcoef(x, rowvar=True, name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
接口跟numpy比较,缺少了y参数,缺少的原因是什么?后续会添加y参数吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
接口跟numpy比较,缺少了y参数,缺少的原因是什么?后续会添加y参数吗?
这个是由于计算corrcoef首先要计算cov,paddle的cov在编写的时候对比numpy也是没有y参数,后续需要看cov是否添加y参数。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG API
PR types
Others
PR changes
APIs
Describe
ISSUE链接:corrcoef
RFC的PR链接:PaddlePaddle/community#46
中文文档链接:PaddlePaddle/docs#4316