Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RFC for batchnorm1d optim #135

Merged
merged 8 commits into from
Jul 13, 2022
Merged

Conversation

EsdeathYZH
Copy link
Contributor

No description provided.

@paddle-bot-old
Copy link

你的PR提交成功,感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备,具体请参考示例模版
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

@CLAassistant
Copy link

CLAassistant commented May 22, 2022

CLA assistant check
All committers have signed the CLA.

@paddle-bot-old
Copy link

PR格式检查通过,你的PR将接受Paddle专家以及开源社区的review,请及时关注PR动态。
The format inspection passed. Your PR will be reviewed by experts of Paddle and developers from the open-source community. Stay tuned.

Copy link
Contributor

@JamesLim-sy JamesLim-sy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

batch_norm_out = batch_norm(x)
```

简单的进行端到端实验测试,可以发现在数据规模较小时,torch和paddle都使用cudnn进行计算,而当数据规模大于一定阈值后,torch使用自己开发的算子进行计算使得计算时延小于paddle,进一步使用nsight compute查看profile结果:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请教下,这部分可以发现在数据规模较小时,torch和paddle都使用cudnn进行计算,此时Paddle和torch的性能是相同的吗,和后文中提到的oneflow相比呢?想确认下torch无论是调cudnn,还是启用自己手写的kernel,性能更好些

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里我的测试shape是[200000, 1, 3],在这个shape下paddle和torch都使用的是cudnn,性能也是相似的,后续我把shape改成[136000, 16]之后,torch在小数据规模下调用的自己的kernel,性能要比cudnn要好,oneflow的性能我还没测试过,之后测一下补充到这个文档里


## 命名与参数设计
参考:[飞桨API 设计及命名规范](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/api_contributing_guides/api_design_guidelines_standard_cn.html)
## 底层OP设计
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

考虑到bn在许多模型中被使用,直接基于原算子修改需要验证大量的模型精度,所以建议

  1. 对当前的bn先解决不支持大batch_size的问题,
  2. 在paddle/phi/sparse目录下新增batch_norm,支持bn1d的情况,性能尽量打平或优于竞品

@zkh2016 zkh2016 merged commit fe897a1 into PaddlePaddle:master Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants