New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement flow.utils.checkpoint #9053
Conversation
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
python/oneflow/utils/checkpoint.py
Outdated
return None | ||
|
||
def inner_unpack(packed): | ||
raise RuntimeError("You are calling backwards on a tensor that is never exposed. Please open an issue.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是不是在暗示不能处理二阶导?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是的,是因为这里真正会被调用的只有 inner_pack 这个函数,而 API 要求同时提供 pack 和 unpack 两个 hook,所以写了一个没用的函数占位置。
目测这个 API 在二阶导下仍然可以正常工作,但它确实是不适合在二阶导下使用的 —— 二阶导要求两次导,在求第一次导的时候重计算就会被触发并占据显存,因此无法减少接下来求第二次导时的显存占用。此外二阶导要在一阶导的后向图上求导,这部分计算图没有暴露给用户,因此从原理上就没有办法用这套 checkpointing API 控制。DTR 这种形式的重计算方案才可以。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally. |
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9053/ |
Speed stats:
|
Speed stats:
|
View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9053/ |
#8995 的后续
从 PyTorch 搬运来它的 checkpointing 实现,用于 eager