Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph/block io check #5803

Merged
merged 20 commits into from
Aug 10, 2021
Merged

graph/block io check #5803

merged 20 commits into from
Aug 10, 2021

Conversation

strint
Copy link
Contributor

@strint strint commented Aug 9, 2021

  • Graph 支持输入、输出为Tensor、None、TensorTuple、List[Tensor]中的一种,其它类型会报ERROR
  • Module Block 支持输入、输出为Tensor或者其它类型,如果为其它类型,graph debug时会报WARNING
  • nn.Graph分 build_forward_graph、compile_and_init_graph_runtime、run三阶段ERROR
  • check graph build中出现的.numpy(),报ERROR
  • 测试加上Alexnet eval

graph.debug()打开时,打印下面的信息,可以看到graph和module block支持的输入类型,具体参见测试文件

(GRAPH:CustomGraphIOCheck_0:CustomGraphIOCheck) start building forward graph.
(INPUT:_CustomGraphIOCheck_0-input_0:tensor(flow.Size([10, 10]), dtype=oneflow.float32))
(INPUT:_CustomGraphIOCheck_0-input_1_0:tensor(flow.Size([10, 10]), dtype=oneflow.float32))  # graph接口TensorTuple被展开
(INPUT:_CustomGraphIOCheck_0-input_1_1:tensor(flow.Size([10, 10]), dtype=oneflow.float32))
(INPUT:_CustomGraphIOCheck_0-input_2_0:tensor(flow.Size([10, 10]), dtype=oneflow.float32)) # graph接口list[Tensor] 被展开
(INPUT:_CustomGraphIOCheck_0-input_2_1:tensor(flow.Size([10, 10]), dtype=oneflow.float32))
[WARNING](INPUT:_CustomGraphIOCheck_0-input_3:<class 'NoneType'>) # graph接口支持None
(MODULE:m:CustomModuleIOCheck())
(INPUT:_m-input_0:tensor(flow.Size([10, 10]), is_lazy ='True', dtype=oneflow.float32))
[WARNING](INPUT:_m-input_1:<class 'oneflow._oneflow_internal.TensorTuple'>) # module block接口不限制类型,只在debug模式下warning一下
[WARNING](INPUT:_m-input_2:<class 'list'>)  # module block接口不限制类型,只在debug模式下warning一下
[WARNING](INPUT:_m-input_3:<class 'NoneType'>)
[WARNING](INPUT:_m-input_4:<class 'int'>)
[WARNING](INPUT:_m-input_5:<class 'str'>)
(OUTPUT:_m-output_0:tensor(flow.Size([10, 10]), is_lazy ='True', dtype=oneflow.float32))
[WARNING](OUTPUT:_m-output_1:<class 'oneflow._oneflow_internal.TensorTuple'>)
[WARNING](OUTPUT:_m-output_2:<class 'list'>)
[WARNING](OUTPUT:_m-output_3:<class 'NoneType'>)
[WARNING](OUTPUT:_m-output_4:<class 'int'>)
[WARNING](OUTPUT:_m-output_5:<class 'str'>)
(OUTPUT:_CustomGraphIOCheck_0-output_0:tensor(flow.Size([10, 10]), is_lazy ='True', dtype=oneflow.float32))
(OUTPUT:_CustomGraphIOCheck_0-output_1_0:tensor(flow.Size([10, 10]), is_lazy ='True', dtype=oneflow.float32))
(OUTPUT:_CustomGraphIOCheck_0-output_1_1:tensor(flow.Size([10, 10]), is_lazy ='True', dtype=oneflow.float32))
(OUTPUT:_CustomGraphIOCheck_0-output_2_0:tensor(flow.Size([10, 10]), is_lazy ='True', dtype=oneflow.float32))
(OUTPUT:_CustomGraphIOCheck_0-output_2_1:tensor(flow.Size([10, 10]), is_lazy ='True', dtype=oneflow.float32))
[WARNING](OUTPUT:_CustomGraphIOCheck_0-output_3:<class 'NoneType'>)
(GRAPH:CustomGraphIOCheck_0:CustomGraphIOCheck) end building forward graph.
(GRAPH:CustomGraphIOCheck_0:CustomGraphIOCheck) start compiling and init graph runtime.
(GRAPH:CustomGraphIOCheck_0:CustomGraphIOCheck) end compiling and init graph rumtime.

graph非法输入、输出类型,打印error + 另外打印错误栈(这里太长不展示)

[ERROR](INPUT:_CustomGraphIOCheck_0-input_3:<class 'int'>)
[ERROR](GRAPH:CustomGraphIOCheck_0:CustomGraphIOCheck) build forward graph got error:  <class 'NotImplementedError'> nn.Graph.build()'s input argument has not support types other then Tensor/TensorTuple/list(Tensor)/None yet. .
[ERROR](OUTPUT:_CustomGraphIOCheck_0-output_3:<class 'str'>)
[ERROR](GRAPH:CustomGraphIOCheck_0:CustomGraphIOCheck) build forward graph got error:  <class 'NotImplementedError'> nn.Graph.build()'s output argument has not support types other then Tensor/TensorTuple/list(Tensor)/None yet. .

tensor.numpy()报错,graph会打印一个error,另外打印错误栈(这里太长不展示)

[ERROR](GRAPH:CustomGraphIOCheck_0:CustomGraphIOCheck) build forward graph got error:  <class 'AssertionError'> tensor.numpy() is not allowed to called in nn.Graph.build(*args) or called by lazy tensor. .

python/oneflow/nn/graph.py Outdated Show resolved Hide resolved
@strint strint closed this Aug 9, 2021
@strint strint reopened this Aug 9, 2021
@strint strint requested a review from oneflow-ci-bot August 9, 2021 20:41
@strint strint added this to the v0.5.0 milestone Aug 9, 2021
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 139.5ms (= 6972.6ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 126.1ms (= 6305.4ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.11 (= 139.5ms / 126.1ms)

PyTorch resnet50 time: 81.0ms (= 4052.0ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 73.0ms (= 3650.4ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.11 (= 81.0ms / 73.0ms)

PyTorch resnet50 time: 58.2ms (= 2908.3ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 48.2ms (= 2407.7ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.21 (= 58.2ms / 48.2ms)

PyTorch resnet50 time: 43.5ms (= 2174.4ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 45.5ms (= 2274.3ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 0.96 (= 43.5ms / 45.5ms)

PyTorch resnet50 time: 42.7ms (= 2134.2ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 40.1ms (= 2005.9ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.06 (= 42.7ms / 40.1ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review August 10, 2021 00:08
@chengtbf
Copy link
Contributor

解决一下冲突~

@strint
Copy link
Contributor Author

strint commented Aug 10, 2021

解决一下冲突~

done

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 139.8ms (= 6989.3ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 127.7ms (= 6385.4ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.09 (= 139.8ms / 127.7ms)

PyTorch resnet50 time: 84.4ms (= 4218.8ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.4ms (= 3720.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.13 (= 84.4ms / 74.4ms)

PyTorch resnet50 time: 57.8ms (= 2888.6ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 49.8ms (= 2489.5ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.16 (= 57.8ms / 49.8ms)

PyTorch resnet50 time: 47.3ms (= 2363.3ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 42.3ms (= 2114.7ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.12 (= 47.3ms / 42.3ms)

PyTorch resnet50 time: 43.1ms (= 2152.7ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 42.7ms (= 2132.7ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.01 (= 43.1ms / 42.7ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review August 10, 2021 07:32
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 10, 2021 09:36
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 10, 2021 10:31
@strint strint requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 10, 2021 12:58
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 10, 2021 13:19
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 10, 2021 16:30
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 10, 2021 17:44
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 137.0ms (= 6847.5ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 126.0ms (= 6299.5ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.09 (= 137.0ms / 126.0ms)

PyTorch resnet50 time: 83.5ms (= 4174.3ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 72.9ms (= 3644.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.15 (= 83.5ms / 72.9ms)

PyTorch resnet50 time: 58.5ms (= 2926.8ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 48.4ms (= 2419.9ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.21 (= 58.5ms / 48.4ms)

PyTorch resnet50 time: 48.1ms (= 2405.3ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 41.8ms (= 2089.9ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.15 (= 48.1ms / 41.8ms)

PyTorch resnet50 time: 38.0ms (= 1899.0ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 41.3ms (= 2065.7ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 0.92 (= 38.0ms / 41.3ms)

@oneflow-ci-bot oneflow-ci-bot merged commit 63d5996 into master Aug 10, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the fix/nn_graph/io_allow_none_tensor branch August 10, 2021 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants