【Hackathon 4th No.13】为 Paddle 新增 Bernoulli API #52244

megemini · 2023-03-28T11:24:00Z

PR types

New features

PR changes

APIs

Describe

[used AI Studio]

飞桨黑客马拉松第四期 No.13：为 Paddle 新增 Bernoulli API
涉及以下5个文件:
- __init__.py: 将 Bernoulli 添加到 distribution 目录下
- bernoulli.py: Bernoulli API 的具体实现
- kl.py: 添加 Bernoulli 对应的 kl_divergence 方法
- test_distribution_bernoulli.py: 动态图单元测试，含kl.py
- test_distribution_bernoulli_static.py: 静态图单元测试，含kl.py

其他说明：

相应的设计文档已经合并【PR链接】#453
以上代码通过 pre-commit
以上代码已经在 AI Studio 测试通过
自测代码覆盖率 90%+

谢谢评审！

p.s. 2023-04-09 CI情况反馈

根据评审意见修改：

初始化条件改为 float|Tensor，并且优化初始化逻辑。
根据修改后的初始化条件，增加相应的异常单测。
EPS 使用 paddle.finfo 代替，并且修改单测中 BernoulliNumpy 的 entropy 方法。
支持0D的初始化条件，并增加相应单测，temperature 使用 paddle.full(shape=(), fill_value=...)。
增加 rsample 的反向单测。
修改 doc string 中的例子，增加0D的情况。
调整 doc string 中的空行。

请评审，谢谢！

p.s. 2023-04-06 CI情况反馈

重新提交之后，CI只剩下两个必须需要手工确认的了：

修改了单测中 BernoulliNumpy 的初始化方式，规避掉 scipy 版本的问题。
统一用 float64 初始化，然后在各个返回值中（如 mean、variance 等）再转换为原始数据类型（ astype ，如 float32、float64）。这样可以在不破坏原有测试用例的基础上规避掉 scipy 版本的问题。
优化了 test_distribution_bernoulli_static 的测试流程，较少了测试时间。
将原有的 parameterize_cls 中多个测试参数合并为一个，并且将 test_xxx 统一转换为私有方法 _test_xxx，再由 test_all 一并调用。这样可以在不减少测试用例的基础上，省掉每次测试的初始化过程，最终静态图的测试时间约为动态图的一半。

请评审，谢谢！

p.s. 2023-04-03 CI情况反馈

通过修改 unittest_py/requirements.txt 中 scipy 的版本，已经可以正常跑通 PR-CI-Coverage 中动态图单测，但是 PR-CI-Coverage 中静态图的单测提示超时 15s 了。

在 AI Studio 中静态图的单测大约在 12s 左右，CI 环境比较慢。

请指导一下，是否可以通过修改 CMakeLists.txt 的方式延长单测时间，修改哪一部分？60s 应该足够。

非常感谢！：）

p.s. 2023-03-29 CI情况反馈

今天查看CI的执行情况，发现单测通过不了，具体看了一下，应该是scipy版本的问题。

CI环境中的scipy版本是scipy==1.7.3,这个版本对于float32的处理存在问题：

  import numpy as np
  import scipy.stats
  p = np.array(0.3, dtype='float32')
  rv = scipy.stats.bernoulli(p)
  print(rv.mean())    
  # 0.0

而对于其他版本，如scipy==1.7.1，或最新的1.10.x版本，结果是正确的：

  import numpy as np
  import scipy.stats
  p = np.array(0.3, dtype='float32')
  rv = scipy.stats.bernoulli(p)
  print(rv.mean())    
  # 0.30000001192092896

单测中的np.testing.assert_allclose错误，进而导致CI无法通过。

以上问题，在scipy的issue中已有类似的反馈：
BUG: scipy.stats.beta and bernoulli fails with float32 inputs #15961
BUG: Overflow when using stats.beta.mean #16478

另外，1.7.3 版本已经不在 scipy 正式文档列表中了，是否可以更新一下CI环境重新测试。

还请知悉，非常感谢！

… develop

paddle-bot · 2023-03-28T11:24:05Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

cxxly · 2023-04-03T03:52:55Z

scipy基础版本升级可能比较麻烦，建议 1）你可以修改Paddle/python/unittest_py/requirements.txt 中的scipy版本尝试，如果CI出现大范围报错，则 2) 用numpy实现个bernoulli作为基准进行测试

megemini · 2023-04-03T03:57:19Z

scipy基础版本升级可能比较麻烦，建议 1）你可以修改Paddle/python/unittest_py/requirements.txt 中的scipy版本尝试，如果CI出现大范围报错，则 2) 用numpy实现个bernoulli作为基准进行测试

收到！我先试一下第一个方法，谢谢！：）

… develop

luotao1 · 2023-04-03T06:32:55Z

1）你可以修改Paddle/python/unittest_py/requirements.txt 中的scipy版本尝试

如果CI没有问题的话，建议单独提一个PR把scipy版本做升级

luotao1 · 2023-04-03T06:40:01Z

python/unittest_py/requirements.txt

@@ -8,7 +8,7 @@ hypothesis
 opencv-python<=4.2.0.32
 visualdl
 paddle2onnx>=0.9.6
-scipy>=1.6
+scipy>=1.6, !=1.7.2, !=1.7.3


2023-04-03 14:06:08 Collecting scipy!=1.7.2,!=1.7.3,>=1.6 2023-04-03 14:06:09 Using cached scipy-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)

可以直接写 1.10.1？

2023-04-03 14:06:08 Collecting scipy!=1.7.2,!=1.7.3,>=1.6 2023-04-03 14:06:09 Using cached scipy-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)

可以直接写 1.10.1？

可以，AI STUDIO 上就是1.10.1，可以过测。

只是这样的话会绑定python版本，以后scipy更新也要修改，所以就写成我这样了。

megemini · 2023-04-03T06:44:48Z

1）你可以修改Paddle/python/unittest_py/requirements.txt 中的scipy版本尝试

如果CI没有问题的话，建议单独提一个PR把scipy版本做升级

通过修改这个文件已经能够过测 PR-CI-Py3，但是 PR-CI-Coverage 这个环境比较慢，其中的静态图超时了，是不是可以修改一下 timeout？
单独提PR修改 unittest_py/requirements.txt 这个文件吗？是不是先提PR修改scipy版本，merge之后我再拉取新的版本并提这个API的PR？这样就不需要在这个API中修改scipy版本了？

谢谢！

luotao1 · 2023-04-03T06:50:45Z

只是这样的话会绑定python版本，以后scipy更新也要修改，所以就写成我这样了。

你可以写 >=1.10.1

单独提PR修改 unittest_py/requirements.txt 这个文件吗？是不是先提PR修改scipy版本，merge之后我再拉取新的版本并提这个API的PR？这样就不需要在这个API中修改scipy版本了？

意思就是先提一个单独的PR修改unittest_py/requirements.txt 中的scipy版本（因为CI环境的变化需要其他审核人看），merge之后再更新这个API的PR（你可以先同步保留这个PR中scipy版本的修改，便于 @cxxly 继续审核，不然你这个PR的CI都过不去）

但是 PR-CI-Coverage 这个环境比较慢，其中的静态图超时了，是不是可以修改一下 timeout

请 @cxxly 看下是否可以修改 timeout，还是单测写法上可以改进下。

megemini · 2023-04-03T07:04:15Z

只是这样的话会绑定python版本，以后scipy更新也要修改，所以就写成我这样了。

你可以写 >=1.10.1

单独提PR修改 unittest_py/requirements.txt 这个文件吗？是不是先提PR修改scipy版本，merge之后我再拉取新的版本并提这个API的PR？这样就不需要在这个API中修改scipy版本了？

意思就是先提一个单独的PR修改unittest_py/requirements.txt 中的scipy版本（因为CI环境的变化需要其他审核人看），merge之后再更新这个API的PR（你可以先同步保留这个PR中scipy版本的修改，便于 @cxxly 继续审核，不然你这个PR的CI都过不去）

但是 PR-CI-Coverage 这个环境比较慢，其中的静态图超时了，是不是可以修改一下 timeout

请 @cxxly 看下是否可以修改 timeout，还是单测写法上可以改进下。

收到，那我先去提个PR修改scipy版本。

megemini · 2023-04-03T10:41:02Z

已经提交PR，#52476

这里没有写 scipy>=1.10.1，因为 scipy 在 1.8.0 版本以后需要 python3.8，而CI有的环境中还是 python3.7版本，为了保持兼容性，还是写成了 scipy>=1.6, !=1.7.2, !=1.7.3
PR-CI-Mac-Python3 没有通过，还是之前一样的问题，日志中没有显示 scipy 的安装版本，是不是这个CI不受 unittest_py/requirements.txt 的影响？

luotao1 · 2023-04-04T03:39:06Z

PR-CI-Mac-Python3 没有通过，还是之前一样的问题，日志中没有显示 scipy 的安装版本，是不是这个CI不受 unittest_py/requirements.txt 的影响？

Mac 是在CI配置中写的 unittest_py/requirements.txt ，因此等 PR merge后重新rerun下即可。

… develop

megemini · 2023-04-06T03:22:24Z

目前CI还有两个必须需要手动确认的项目了～

请评审，谢谢！

cxxly · 2023-04-07T02:49:05Z

python/paddle/distribution/bernoulli.py

+    'float32': 1e-03,
+    'float64': 1e-05,
+}
+


paddle.finfo(paddle.float32).eps paddle.finfo(paddle.float64).eps能否满足要求，如果不能，此处eps取值需要通过注释说明原因

应该是这样写，可是有几个问题：

paddle里面我现在只找到 iinfo ，没有 finfo

如果有，且与 numpy 的 eps 一致的话，单测是过不了的，因为单测里面的 np.testing.assert_allclose 是按照 config 来设置的

RTOL = { 'float32': 1e-03, 'complex64': 1e-3, 'float64': 1e-5, 'complex128': 1e-5, }

https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/tests/unittests/distribution/config.py#L25
而 numpy 的值跟这个不一样

print(np.finfo('float32').eps, np.finfo('float64').eps) # 1.1920929e-07 2.220446049250313e-16

这咋处理？

1）paddle develop 包含 finfo
2) 你这里eps不是仅仅为了clip用嘛，没太明白allclose阈值有啥关系
3）paddle.finfo 和 numpy finfo 是一样的

1）我在develop里面已经找到 finfo 了，谢谢！
2）我又看了一下当时因为 eps 错误的用例，应该是 scipy 的 entropy 方法与 paddle 的实现有少许区别导致的。假设 probs=1.0:

# -------------------------- # scipy def entr(x): return -x*np.log(x) def _entropy_scipy(p): return entr(p) + entr(1-p) # -------------------------- # paddle def _probs_to_logits(probs, is_binary=False): return ((paddle.log(probs) - paddle.log1p(-probs)) if is_binary else paddle.log(probs)) def _entropy_paddle(logits, probs): return paddle.max(logits, 0)-logits*probs+paddle.log(1+paddle.exp(-paddle.abs(logits))) # -------------------------- # 测试 scipy 与 paddle 的区别 probs = (1.0-np.finfo('float32').eps).astype('float32') logits = _probs_to_logits(paddle.to_tensor(probs, dtype='float32'), True) entropy_np = _entropy_scipy(probs) entropy_paddle = _entropy_paddle(logits, probs) print(entropy_np) print(entropy_paddle.numpy()) # 2.0196896973703793e-06 # [2.026558e-06]

这时候如果再用allclose进行声明就会报错:

np.testing.assert_allclose( entropy_np, entropy_paddle, rtol=1e-03, atol=0, ) # AssertionError: # Not equal to tolerance rtol=0.001, atol=0 # Mismatched elements: 1 / 1 (100%) # Max absolute difference: 6.8682766e-09 # Max relative difference: 0.00338913 # x: array(2.01969e-06) # y: array([2.026558e-06], dtype=float32)

这样吧，我把 scipy 的 entropy 改为与 paddle 一致的算法，这样应该就可以了。

谢谢！：）

cxxly · 2023-04-07T02:51:54Z

python/paddle/distribution/bernoulli.py

+
+    Args:
+        probs (float|list|tuple|numpy.ndarray|Tensor): The ``probs`` input of Bernoulli distribution. The data type is float32 or float64. The range must be in [0, 1].
+        name (str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.


无需支持 list/tuple/numpy.ndarray，支持scalar/Tensor即可

改成 float|Tensor 吧，保持与 _to_tensor 一致，而且包含复数也没什么意义。

cxxly · 2023-04-07T02:52:29Z

python/paddle/distribution/bernoulli.py

+        # Clip probs from [0, 1] to (0, 1) with smallest representable number `eps`.
+        self.probs = _clip_probs(self.probs)
+        self.logits = self._probs_to_logits(self.probs, is_binary=True)
+


同上，支持scalar/Tensor即可，简化代码复杂度

cxxly · 2023-04-07T02:53:40Z

python/paddle/distribution/bernoulli.py

+            Tensor: Sampled data with shape `sample_shape` + `batch_shape` + `event_shape`.
+            The shape of the sampled data have ndim lager than 1.
+
+        Examples:


此处应该需要空行吧，请确认下

cxxly · 2023-04-07T02:54:42Z

python/paddle/distribution/bernoulli.py

+                'shape',
+                (np.ndarray, tensor.Variable, list, tuple),
+                name,
+            )


cxxly · 2023-04-07T02:55:04Z

python/paddle/distribution/bernoulli.py

+            Tensor: Sampled data with shape `sample_shape` + `batch_shape` + `event_shape`.
+            The shape of the sampled data have ndim lager than 1.
+
+        Examples:


cxxly · 2023-04-07T02:55:19Z

python/paddle/distribution/bernoulli.py

+                'temperature',
+                (float,),
+                name,
+            )


cxxly · 2023-04-07T02:57:27Z

python/paddle/distribution/bernoulli.py

+        shape = shape if isinstance(shape, tuple) else tuple(shape)
+        shape = self._extend_shape(shape)
+
+        temperature = paddle.full(shape=[1], fill_value=temperature)


飞桨已支持0D Tensor, paddle.full(shape=(), fill_value=...)

0D 好像现在支持的还不好，这里如果改成 0D，后面的 paddle.divide 会报错，比如：

paddle.divide(paddle.to_tensor([0.3]), paddle.full(shape=[1], fill_value=0.1)) # Tensor(shape=[1], dtype=float32, place=Place(gpu:0), stop_gradient=True, [3.]) paddle.divide(paddle.to_tensor([0.3]), paddle.full(shape=(), fill_value=0.1)) # --------------------------------------------------------------------------- # ValueError Traceback (most recent call last) # /tmp/ipykernel_1677/1739005628.py in <module> # ----> 1 paddle.divide(paddle.to_tensor([0.3]), paddle.full(shape=(), fill_value=0.1)) # /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/math.py in divide(x, y, name) # 716 act = None # 717 if in_dygraph_mode(): # --> 718 return _C_ops.divide( x, y) # 719 else: # 720 if _in_legacy_dygraph(): # ValueError: (InvalidArgument) Axis should be less than 1, but received axis is 1. # [Hint: Expected axis < max_dim, but received axis:1 >= max_dim:1.] (at /paddle/paddle/phi/kernels/funcs/common_shape.h:53)

因为to_tensor会产生1D，能否也用full替代，或者reshape到0D

刚又重新试了下develop版本，之前很多0D的运算问题都没有了，我把整体代码逻辑优化一下。谢谢！：）

cxxly · 2023-04-07T03:02:12Z

python/paddle/fluid/tests/unittests/distribution/test_distribution_bernoulli.py

+                        temperature,
+                    )
+                )
+


需要测试 rsample的反向

cxxly · 2023-04-07T03:04:04Z

只是这样的话会绑定python版本，以后scipy更新也要修改，所以就写成我这样了。

你可以写 >=1.10.1

单独提PR修改 unittest_py/requirements.txt 这个文件吗？是不是先提PR修改scipy版本，merge之后我再拉取新的版本并提这个API的PR？这样就不需要在这个API中修改scipy版本了？

意思就是先提一个单独的PR修改unittest_py/requirements.txt 中的scipy版本（因为CI环境的变化需要其他审核人看），merge之后再更新这个API的PR（你可以先同步保留这个PR中scipy版本的修改，便于 @cxxly 继续审核，不然你这个PR的CI都过不去）

但是 PR-CI-Coverage 这个环境比较慢，其中的静态图超时了，是不是可以修改一下 timeout

请 @cxxly 看下是否可以修改 timeout，还是单测写法上可以改进下。

可以的，需要定位并说明原因

megemini · 2023-04-07T04:17:04Z

只是这样的话会绑定python版本，以后scipy更新也要修改，所以就写成我这样了。

你可以写 >=1.10.1

单独提PR修改 unittest_py/requirements.txt 这个文件吗？是不是先提PR修改scipy版本，merge之后我再拉取新的版本并提这个API的PR？这样就不需要在这个API中修改scipy版本了？

意思就是先提一个单独的PR修改unittest_py/requirements.txt 中的scipy版本（因为CI环境的变化需要其他审核人看），merge之后再更新这个API的PR（你可以先同步保留这个PR中scipy版本的修改，便于 @cxxly 继续审核，不然你这个PR的CI都过不去）

但是 PR-CI-Coverage 这个环境比较慢，其中的静态图超时了，是不是可以修改一下 timeout

请 @cxxly 看下是否可以修改 timeout，还是单测写法上可以改进下。

可以的，需要定位并说明原因

不用改了，我这里修改静态图的测试流程，不会 timeout 了。谢谢！：）

… develop

luotao1 · 2023-04-10T08:05:34Z

@megemini 如果上一轮comment意见都修改完了，请评论告诉我们可以启动下一轮review

megemini · 2023-04-10T09:51:09Z

@megemini 如果上一轮comment意见都修改完了，请评论告诉我们可以启动下一轮review

可以了，请评审！

谢谢！

cxxly

LGTM

jeff41404

LGTM

luotao1 · 2023-04-11T08:51:50Z

@megemini 中文文档链接请附一下

sunzhongkai588

LGTM

megemini · 2023-04-11T12:21:43Z

@megemini 中文文档链接请附一下

PR链接：#5794

请评审，谢谢！：）

megemini added 3 commits March 28, 2023 10:46

【Hackathon 4th No.13】为 Paddle 新增 Bernoulli API

db397dd

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

bb4d4d4

… develop

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

d8fc566

… develop

paddle-bot bot added contributor External developers status: proposed labels Mar 28, 2023

megemini mentioned this pull request Mar 28, 2023

【PaddlePaddle Hackathon 第四期】任务总览 #51281

Closed

luotao1 assigned luotao1, cxxly and Ligoml Mar 29, 2023

luotao1 added the API label Mar 29, 2023

paddle-bot bot removed the status: proposed label Mar 29, 2023

megemini added 2 commits April 3, 2023 05:30

[Change]change unittest_py scipy version

92d96e3

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

459853f

… develop

luotao1 reviewed Apr 3, 2023

View reviewed changes

megemini mentioned this pull request Apr 3, 2023

[Change]修改python unittest_py中scipy的版本信息 #52476

Merged

megemini added 2 commits April 5, 2023 13:18

[Change]修改BernoulliNumpy的类型参数;优化静态图测试流程

9854c3a

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

fca01c8

… develop

cxxly reviewed Apr 7, 2023

View reviewed changes

python/paddle/distribution/bernoulli.py

'temperature',

(float,),

name,

)

Copy link

Contributor

cxxly Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

megemini reacted with thumbs up emoji

cxxly reviewed Apr 7, 2023

View reviewed changes

megemini added 3 commits April 8, 2023 03:10

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

7db16a1

… develop

[Change]优化类的初始化及逻辑;增加0D相关测试用例

1f72f79

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

9068827

… develop

cxxly approved these changes Apr 10, 2023

View reviewed changes

jeff41404 approved these changes Apr 10, 2023

View reviewed changes

sunzhongkai588 approved these changes Apr 11, 2023

View reviewed changes

megemini mentioned this pull request Apr 11, 2023

【Hackathon 4th No.13】为 Paddle 新增 Bernoulli API -- Bernoulli doc PaddlePaddle/docs#5794

Merged

luotao1 merged commit f05c870 into PaddlePaddle:develop Apr 12, 2023

luotao1 mentioned this pull request Apr 12, 2023

Add bernoulli.py to paddle.distribution #50687

Closed

luotao1 mentioned this pull request Apr 26, 2023

paddle.distribution中Bernoulli分布如何实现？ #50649

Closed

【Hackathon 4th No.13】为 Paddle 新增 Bernoulli API #52244

【Hackathon 4th No.13】为 Paddle 新增 Bernoulli API #52244

Conversation

megemini commented Mar 28, 2023 • edited

PR types

PR changes

Describe

p.s. 2023-04-09 CI情况反馈

p.s. 2023-04-06 CI情况反馈

p.s. 2023-04-03 CI情况反馈

p.s. 2023-03-29 CI情况反馈

paddle-bot bot commented Mar 28, 2023

cxxly commented Apr 3, 2023 • edited

megemini commented Apr 3, 2023

luotao1 commented Apr 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

megemini commented Apr 3, 2023

luotao1 commented Apr 3, 2023

megemini commented Apr 3, 2023

megemini commented Apr 3, 2023

luotao1 commented Apr 4, 2023

megemini commented Apr 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

megemini Apr 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cxxly commented Apr 7, 2023

megemini commented Apr 7, 2023

luotao1 commented Apr 10, 2023

megemini commented Apr 10, 2023

cxxly left a comment

Choose a reason for hiding this comment

jeff41404 left a comment

Choose a reason for hiding this comment

luotao1 commented Apr 11, 2023

sunzhongkai588 left a comment

Choose a reason for hiding this comment

megemini commented Apr 11, 2023

megemini commented Mar 28, 2023 •

edited

cxxly commented Apr 3, 2023 •

edited

megemini Apr 7, 2023 •

edited