Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unimplemented return information #5952

Merged
merged 29 commits into from
Aug 23, 2021

Conversation

luqiang-guo
Copy link
Contributor

@luqiang-guo luqiang-guo commented Aug 18, 2021

优化之后的出错信息提示如下:

例如执行

import oneflow as flow

shape = (2, 3)
placement = flow.placement("cuda", {0: 0})
sbp = flow.sbp.split(0)
a = flow.Tensor(*shape)
b = a.to_consistent(placement=placement, sbp=sbp)

b.device()

修改前报错信息如下:

Traceback (most recent call last):
  File "test.py", line 9, in <module>
    b.device()
oneflow._oneflow_internal.exception.UnimplementedException: 
  File "/home/guoluqiang/oneflow_tmp/oneflow-1/oneflow/core/framework/tensor.h", line 439, in device

修改之后的错误信息如下:

Traceback (most recent call last):
  File "test.py", line 9, in <module>
    b.device()
oneflow._oneflow_internal.exception.RuntimeException: 
  File "/home/guoluqiang/oneflow_tmp/oneflow/oneflow/core/framework/tensor.h", line 492, in device
    RuntimeError : ConsistentTensor has no device property

@wyg1997 wyg1997 requested a review from poohRui August 22, 2021 15:24
@wyg1997 wyg1997 requested review from oneflow-ci-bot and removed request for poohRui August 22, 2021 15:24
@luqiang-guo luqiang-guo requested a review from poohRui August 22, 2021 15:24
@luqiang-guo luqiang-guo requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 22, 2021 15:28
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 139.6ms (= 6978.6ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.0ms (= 6398.0ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.09 (= 139.6ms / 128.0ms)

PyTorch resnet50 time: 84.3ms (= 4214.8ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.8ms (= 3740.0ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.13 (= 84.3ms / 74.8ms)

PyTorch resnet50 time: 58.3ms (= 2912.5ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 48.0ms (= 2397.5ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.21 (= 58.3ms / 48.0ms)

PyTorch resnet50 time: 47.7ms (= 2386.3ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 43.3ms (= 2166.3ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 47.7ms / 43.3ms)

PyTorch resnet50 time: 39.1ms (= 1956.7ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 41.5ms (= 2075.4ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 0.94 (= 39.1ms / 41.5ms)

@luqiang-guo luqiang-guo requested review from oneflow-ci-bot and removed request for Ancientshi and poohRui August 23, 2021 00:42
@oneflow-ci-bot oneflow-ci-bot removed their request for review August 23, 2021 01:34
@daquexian
Copy link
Contributor

daquexian commented Aug 23, 2021

目前的错误信息感觉还有改进的空间,它应该面向用户而不是内部开发者。写错误信息的时候应该考虑用户会在这么场景下触发这个错误,然后提示给用户尽可能丰富的信息。

比如 tensor 的 parallel_desc 方法,目前的错误信息是 "MirroredTensor has no parallel_desc property",但 "MirroredTensor" 和 "parallel_desc" 都是用户陌生的概念(因为它们没有暴露到 python 层),而且错误信息也没有告诉用户该怎么做。parallel_desc 方法对应 python api 里的 placement 方法,更好一些的错误信息应该是 "Only consistent tensors have 'placement', please use '.device()' for local tensors." (placement 和 device 两侧的引号对提高用户可读性很有帮助).

再比如 AsConsistentTensor 方法,"MirroredTensor has no AsConsistentTensor property" 这个信息对用户来说也有疏离感,改成 "An error occured when converting a local tensor to consistent tensor. Check if you are calling consistent methods on local tensors" 就好一些

我先去掉了 automerge label,如果决定在下一个 pr 里再修复这些问题可以再添加回来

@luqiang-guo luqiang-guo requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 23, 2021 06:42
@luqiang-guo luqiang-guo requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 23, 2021 06:44
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 23, 2021 08:06
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 23, 2021 10:01
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 141.4ms (= 7067.8ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.5ms (= 6425.0ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 141.4ms / 128.5ms)

PyTorch resnet50 time: 85.1ms (= 4252.9ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.8ms (= 3738.5ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.14 (= 85.1ms / 74.8ms)

PyTorch resnet50 time: 55.3ms (= 2764.1ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 50.1ms (= 2502.6ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 55.3ms / 50.1ms)

PyTorch resnet50 time: 44.9ms (= 2242.8ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 42.9ms (= 2147.1ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.04 (= 44.9ms / 42.9ms)

PyTorch resnet50 time: 43.2ms (= 2159.5ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 38.4ms (= 1919.8ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.12 (= 43.2ms / 38.4ms)

@oneflow-ci-bot oneflow-ci-bot merged commit 72dd1e9 into master Aug 23, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the add_unimplemented_return_information branch August 23, 2021 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants