Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph activation checkpointing #6192

Merged
merged 20 commits into from
Sep 10, 2021
Merged

graph activation checkpointing #6192

merged 20 commits into from
Sep 10, 2021

Conversation

tingkuanpei
Copy link
Contributor

@tingkuanpei tingkuanpei commented Sep 7, 2021

在nn.graph中

  • 增加 activetion checkpointing的单元测试
  • 对identity的自动插入

@strint strint changed the title Add test_graph_activation_checkpoint.py graph activation checkpointing Sep 9, 2021
strint and others added 5 commits September 9, 2021 19:29
* Primitive (#6183)

* Add Primitive

* #ifdef WITH_CUDA

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Disable implicit boxing when parallel num eq one (#6188)

* mv_boxing_folder_to_core

* minor fix

* disable_implicit_boxing_when_parallel_num_eq_one

* Update eager_consistent_op_interpreter.cpp

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Lazy support Scalar (#6181)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Fix LayerNorm check bug (#6196)

* fix(Layernorm): fix check bug

* fix judge whether cpu or not

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* add glu op (#6065)

* add glu op

* del glu_op export,align with torch

* mod glu_op

* mov op logic to C++

* Solve problems

* solve conflict

* delete gradient functor

* add ndim check

* add GLU test

* delete blank line

* format

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>

* Primitive based copy task node (#6195)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* KernelState (#6198)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* container_util: fix VectorAt, remove useless MutMapAt (#6172)

* fcontainer_util: fix VectorAt, remove useless MutMapAt

* fcontainer_util: format

* MapAt: add default value version

* format

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Refine StreamContext (#6191)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Cpu symetric s to s (#6153)

* mv_boxing_folder_to_core

* minor fix

* cpu_symetric_s_to_s

* add test case

* auto format by CI

* minor fix

* refine

* Update eager_nccl_kernels.cpp

* minor fix

* fix bug

* minor fix

* Update oneflow/user/kernels/eager_nccl_kernels.cpp

Co-authored-by: daquexian <daquexian566@gmail.com>

* Update eager_nccl_kernels.cpp

* Update eager_nccl_kernels.cpp

* minor fix

* Update eager_nccl_kernels.cpp

Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: daquexian <daquexian566@gmail.com>

* fix bug (#6197)

Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* fix consistent tensor zeros (#6202)

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* [Feat.] nn.Graph support grad acc with input/output tensor (#6155)

* nn.Graph support grad acc with input/output tensor

* dirty pass grad acc

* revert tensor.backward hack

* fix indent

* default S0 -> B

* Pack op/kernel support scalar input

* nn.Graph output pack support loss scalar

* add test script

* pass test

* Lazy build output eager tensors after job complete

* non scalar output test

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Dev eliminate gcc warnings (#6199)

* fix gcc warning

* refine

* fix comment

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* StreamContextAdapter (#6205)

* StreamContextAdapter

* refine

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Autotest generate input tensor (#6206)

* Add tensor yaml, support export tensor functional api.

* refine

* Remove packed functor signature

* remove unused file

* Refine

* refine

* add activation op import

* reinit oneflow init.py

* add oneflow abs and exp

* add oneflow abs and exp

* add acos

* add arccosh

* add more op

* add more ops

* add more op

* add more ops

* add log1p

* add more smaples

* add more ops

* add more ops

* add more ops

* add more ops

* Complete tensor functional apis.

* Fix pybind call

* add more ops

* add ops done

* Add target of_functional_tensor_obj

* Disable throw visibility warnings

* fix target link

* fix

* fix incorrect use of flow.Tensor.

* Fix error merge

* fix

* fix add unittest

* refine

* refine

* fix

* fix

* add tensor doc

* auto format by CI

* refine

* Fix

* Add doc for python function

* refine

* add tensor method docstring

* fix some bug

* fix docs bug

* Fix

* auto format by CI

* Tensor->tensor

* Tensor->tensor

* refine Tensor->tensor

* fix

* fix

* fix

* fix conflict

* fix bug

* fix ci bug

* fix

* delete diag op

* fix conflict

* Fix segment

* fix

* merge

* autotest framework generate input tensor

* autotest framework generate input tensor

* fix bug

* fix impl bug

* refine

* refine

* refine

* fix

* fix

* fix comments

* delete useless

* fix ci error

* fix ci error

Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Cleanup KernelUtil (#6212)

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>

* Rename flow to oneflow in user hint (#6190)

* style(*): rename flow to oneflow in user hint

* fix(*): fix doctest

* auto format by CI

* remove ddp speed test

Signed-off-by: daquexian <daquexian566@gmail.com>

Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: daquexian <daquexian566@gmail.com>

* merg and refactor

* refact code

* add io identity for activation checkpointing

Co-authored-by: Juncheng <liujuncheng1022@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: binbinHan <han_binbin@163.com>
Co-authored-by: cheng cheng <472491134@qq.com>
Co-authored-by: Yinggang Wang <wyg19970408@gmail.com>
Co-authored-by: QiangX-man <87475073+QiangX-man@users.noreply.github.com>
Co-authored-by: Zhenhua <huangzhenhua@zhejianglab.com>
Co-authored-by: Twice <i@twice.moe>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Co-authored-by: daquexian <daquexian566@gmail.com>
Co-authored-by: ZZK <42901638+MARD1NO@users.noreply.github.com>
Co-authored-by: Luyang <flowingsun007@163.com>
Co-authored-by: Xiaoyu Zhang <35585791+BBuf@users.noreply.github.com>
Co-authored-by: hjchen2 <chenhoujiangcug@gmail.com>
with self.scope_context():
result = self._origin.__class__.forward(self, *args)
result = self._post_forward_mapping_out_scope(result)
result = seq_to_func_return(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在block scope外对forward做input/output的mapping

break_with_identity,
"break_activation_checkpointing_with_identity",
*args,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

插入indentity

else:
if self._debug:
print(
f"{repr_str} is not a Tensor, {func_desc} transformation will be ignored."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapping在debug下的异常打印信息

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我们的 mapping tensor 只支持本身 arg 就是 tensor 以及 list 里的 tensor,不支持其他的嵌套了吧? 比如 dict,比如 list of list

return list_to_func_return(self._eager_outputs_buffer[0])
return seq_to_func_return(self._eager_outputs_buffer[0])

def _rebuild_outputs(self, out2name=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

输出的重建汇总成一个函数处理

op.name,
re.I,
)
is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

验证插入了重计算op

if re.search("identity_.*_grad", str(name), re.I) is not None:
find_ctrl = True
print(name)
test_case.assertTrue(find_ctrl)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

验证插入了identity op,并且重计算段的第一个op有grad作为控制边

@@ -39,12 +39,14 @@ class CustomGraphIOCheck(flow.nn.Graph):
def __init__(self):
super().__init__()
self.m = CustomModuleIOCheck()
self.m.config.activation_checkpointing = True

def build(self, t, lt, n):
rt, rlt, n, ri, rs = self.m(t, lt, n, 1, "2")
return t, lt, n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

验证block的maping支持 tensor / list(tensor),并且可以忽略其它类型

@oneflow-ci-bot oneflow-ci-bot removed their request for review September 9, 2021 17:28
See the License for the specific language governing permissions and
limitations under the License.
"""
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要 import os

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已加

@strint strint requested review from oneflow-ci-bot and removed request for oneflow-ci-bot September 10, 2021 02:47
@strint strint requested review from oneflow-ci-bot and removed request for oneflow-ci-bot September 10, 2021 03:00
self._is_executing_forward = False
return result

def _pre_forward_mapping_out_scope(self, *args):
# Deal with activation checkpointing identity.
if self.config.activation_checkpointing:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里改成 或 吧,如果 配置了 stage id 或者 checkpointing,就插入 identity。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,不过下个pr里面才有测试例子,我加在下个pr里

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

OneFlow resnet50 time: 128.5ms (= 6423.8ms / 50, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.7ms (= 7087.1ms / 50, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.7ms / 128.5ms)

OneFlow resnet50 time: 74.6ms (= 3727.8ms / 50, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 84.2ms (= 4212.2ms / 50, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.13 (= 84.2ms / 74.6ms)

OneFlow resnet50 time: 48.3ms (= 2415.8ms / 50, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.4ms (= 2969.6ms / 50, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.23 (= 59.4ms / 48.3ms)

OneFlow resnet50 time: 46.4ms (= 2319.4ms / 50, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 46.7ms (= 2336.7ms / 50, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.01 (= 46.7ms / 46.4ms)

OneFlow resnet50 time: 42.9ms (= 2146.1ms / 50, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 43.4ms (= 2170.5ms / 50, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.01 (= 43.4ms / 42.9ms)

OneFlow resnet50 time: 152.9ms (= 7644.5ms / 50, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 157.9ms (= 7897.0ms / 50, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.03 (= 157.9ms / 152.9ms)

OneFlow resnet50 time: 97.8ms (= 4889.4ms / 50, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 110.5ms (= 5525.5ms / 50, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 110.5ms / 97.8ms)

OneFlow resnet50 time: 76.2ms (= 3808.0ms / 50, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 82.2ms (= 4108.6ms / 50, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.08 (= 82.2ms / 76.2ms)

OneFlow resnet50 time: 79.7ms (= 3987.3ms / 50, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.6ms (= 3731.0ms / 50, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 0.94 (= 74.6ms / 79.7ms)

OneFlow resnet50 time: 75.5ms (= 3776.1ms / 50, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 58.0ms (= 2900.5ms / 50, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 0.77 (= 58.0ms / 75.5ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review September 10, 2021 04:43
@oneflow-ci-bot oneflow-ci-bot removed their request for review September 10, 2021 07:15
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot September 10, 2021 07:15
@strint strint mentioned this pull request Sep 10, 2021
3 tasks
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot September 10, 2021 08:57
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

OneFlow resnet50 time: 128.4ms (= 6418.6ms / 50, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 141.2ms (= 7061.9ms / 50, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.10 (= 141.2ms / 128.4ms)

OneFlow resnet50 time: 74.7ms (= 3734.7ms / 50, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 82.8ms (= 4141.0ms / 50, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.11 (= 82.8ms / 74.7ms)

OneFlow resnet50 time: 48.4ms (= 2420.4ms / 50, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 58.2ms (= 2908.2ms / 50, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.20 (= 58.2ms / 48.4ms)

OneFlow resnet50 time: 46.5ms (= 2322.7ms / 50, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 44.9ms (= 2247.4ms / 50, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 0.97 (= 44.9ms / 46.5ms)

OneFlow resnet50 time: 43.7ms (= 2183.3ms / 50, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 44.0ms (= 2198.4ms / 50, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.01 (= 44.0ms / 43.7ms)

OneFlow resnet50 time: 154.4ms (= 7718.0ms / 50, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 158.5ms (= 7926.3ms / 50, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.03 (= 158.5ms / 154.4ms)

OneFlow resnet50 time: 99.8ms (= 4991.0ms / 50, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 99.2ms (= 4961.2ms / 50, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 0.99 (= 99.2ms / 99.8ms)

OneFlow resnet50 time: 76.7ms (= 3833.7ms / 50, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.1ms (= 3854.6ms / 50, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.01 (= 77.1ms / 76.7ms)

OneFlow resnet50 time: 73.0ms (= 3647.6ms / 50, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 63.5ms (= 3174.7ms / 50, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 0.87 (= 63.5ms / 73.0ms)

OneFlow resnet50 time: 66.8ms (= 3341.9ms / 50, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 58.0ms (= 2901.3ms / 50, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 0.87 (= 58.0ms / 66.8ms)

@oneflow-ci-bot oneflow-ci-bot merged commit 1f354f1 into master Sep 10, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the tkpei/checkpoint branch September 10, 2021 10:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants