Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refine benchmard bert ips stat #8361

Merged
merged 3 commits into from
May 8, 2024

Conversation

wanghuancoder
Copy link
Contributor

@wanghuancoder wanghuancoder commented May 6, 2024

PR types

Bug fixes

PR changes

Others

Description

Benchmark统计平均每个step的IPS存在错误:
由于Bert模型在loss.numpy()时才会阻塞CPU,等待GPU完成全部运算。原有时间统计逻辑会遗漏从train_run_cost = time.time() - batch_startlogger.info(print loss)之间的GPU用时。
因此,相应的step只统计了前向的batch_cost,没有统计反向和Optimizer。

这种情况,如果log_step==1,那么没个step统计都是错误的。如果log_step==10,则每10个step中第1个step是错误的,后9个是正确的。这导致我们的IPS统计偏高。

但Torch的统计存在的问题,考虑到IPS更多用于与竞品性能对比使用。因此对该问题不做修改。但添加Note标注。

Copy link

paddle-bot bot commented May 6, 2024

Thanks for your contribution!

Copy link

codecov bot commented May 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.36%. Comparing base (2273ee7) to head (e083057).
Report is 58 commits behind head on develop.

❗ Current head e083057 differs from pull request most recent head 6962b99. Consider uploading reports for the commit 6962b99 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8361      +/-   ##
===========================================
+ Coverage    55.15%   55.36%   +0.21%     
===========================================
  Files          601      614      +13     
  Lines        91764    96016    +4252     
===========================================
+ Hits         50611    53164    +2553     
- Misses       41153    42852    +1699     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wawltor wawltor merged commit 18e5cee into PaddlePaddle:develop May 8, 2024
6 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants