-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] 添加 zb vpp 实现方案设计文档 #194
Conversation
|
||
在获取到显存占用情况后,我们可以根据显存占用情况来进行任务调度。在任务调度的过程中,我们可以根据显存占用情况来进行任务的插入,以此来降低 bubble 率。在插入一个任务的时候,我们需要做如下流程: | ||
|
||
1. 插入任务前判断插入后是否会超出显存限制:当前卡的显存占用 + 任务的最大显存占用 是否超出显存限制 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前卡的显存占用,可以直接使用paddle显存API实时获取,减少显存预估工具不准确对最终编排效果的影响。https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/device/cuda/memory_allocated_cn.html#memory-allocated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当前卡的显存占用,可以直接使用paddle显存API实时获取,减少显存预估工具不准确对最终编排效果的影响。https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/device/cuda/memory_allocated_cn.html#memory-allocated
这里的显存占用是根据 program 估计出来的,这个 api 应该无法在流水编排阶段用吧
具体来说我们可以在添加如下功能类,主要功能是根据 program 来估计显存占用情况。 | ||
|
||
```python | ||
class ZBVPPMemoryEstimator: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
预估显存的功能可以单独实现,实现后拿一个模型实际跑一下,对比预估值和实际值的误差大小。
@From00 重新写了一个预估显存的类设计,麻烦看下是否可行 ~ 如果可行下一步就先实现这个预估显存的类并在实际模型上进行评估 。
1. 每张卡都重复做一遍任务调度,根据 pp_stage 确认自己的任务 | ||
2. 0 号卡做一遍任务调度,然后把调度的结果广播给其他卡 | ||
|
||
第一种方案的缺点是每张卡都需要做一遍任务调度,这样会浪费计算资源。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
静态图是一轮编排 + 多轮执行的方式,且调度算法总体复杂度不高,每张卡都计算一遍调度任务对性能不会有太大影响。
|
||
|
||
```python | ||
def try_v_schedule(fill_f, fill_b): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个编排方案,建议可以单独写一篇博客介绍,画一些比较直观的图示。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个编排方案,建议可以单独写一篇博客介绍,画一些比较直观的图示。
当时看 zb-vpp 源码的时候写了个介绍源码的博客,后面我再补充一些图
https://space.keter.top/docs/high_performance/%E5%88%86%E5%B8%83%E5%BC%8F/zero-bubble-vpp
No description provided.