Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR types
Bug fixes
PR changes
Others
Description
Benchmark统计平均每个step的IPS存在错误:
由于Bert模型在loss.numpy()时才会阻塞CPU,等待GPU完成全部运算。原有时间统计逻辑会遗漏从
train_run_cost = time.time() - batch_start
到logger.info
(print loss)之间的GPU用时。因此,相应的step只统计了前向的batch_cost,没有统计反向和Optimizer。
这种情况,如果log_step==1,那么没个step统计都是错误的。如果log_step==10,则每10个step中第1个step是错误的,后9个是正确的。这导致我们的IPS统计偏高。
但Torch的统计存在的问题,考虑到IPS更多用于与竞品性能对比使用。因此对该问题不做修改。但添加Note标注。