Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update autotest framework #5520

Merged
merged 32 commits into from
Jul 21, 2021
Merged

Update autotest framework #5520

merged 32 commits into from
Jul 21, 2021

Conversation

daquexian
Copy link
Contributor

@daquexian daquexian commented Jul 16, 2021

  1. 重构自动测试框架让 generator 可以组合。原来的以 1/3 的概率对有默认值参数不传参的行为现在需要显式触发,可以通过将 generator 设置为 random_or_nothing 或者 oneof(x, nothing(), possibility=2/3)x | nothing() 等方式触发,generator 为 randomconstant 等时不会再考虑 api 参数的默认值。

    如果要跳过某一个参数的生成(例如新版 pytorch bn 的 device 和 dtype),直接将 generator 设置为 nothing() 即可

  2. 添加 torch_flow_dual_object.py,实现 from automated_test_util import * 之后,可以通过

        @autotest(auto_backward=False)
        def test_against_pytorch(test_case):
            m = torch.nn.Flatten(start_dim=random(1, 6) | nothing(),
                    end_dim=random(1, 6) | nothing())
            m.train(random())
            device = random_device()
            m.to(device)
            x = random_pytorch_tensor().to(device)
            y = m(x)
            return y

    这样纯写 pytorch 代码的方式来自动测试

Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
return flow.tensor(torch_tensor.cpu().numpy())


def convert_torch_object_to_flow(x):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原先的每个 generator 生成一份 oneflow 数据和一份 pytorch 数据的行为顺手改掉了。现在每个 generator 只生成 pytorch 数据,再通过 convert_torch_object_to_flow 转成 oneflow 数据

def value(self):
if self._value is None:
self._value = self._calc_value()
return self._value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

每个 generator 对象会缓存自己本次生成的值,用于 x0 = random(1, 6); x1 = x0 + 1; x2 = x0 + 1 这种情况,计算 x1 和 x2 时 x0 只会被计算一次,x1 和 x2 的值会相等

return self._value

def size(self):
return 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 size 用来处理 a | b | c 这样连续的 |,期望的行为是 a b c 各有 1/3 概率

def random(low, high):
def generator(annotation):
class oneof(generator):
def __init__(self, *args, possibility=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 possiblity 只能通过 keyword 形式传进来,需要注意

t = [generator(x) for x in annotation.__args__]
return zip(*t)
return self._generate(x)
if annotation.__origin__ is Tuple or annotation.__origin__ is py_tuple:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tuple 被劫持成了 generator dsl 里的一个操作(这个目前不会暴露到文件外),所以原来 python 的 tuple 现在用 py_tuple 表示

continue
flow_data, torch_data = generate(name)

generator_tuple = tuple(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里创建一个容纳所有 generator 的新 generator,这样才能让 x0 = random(1, 6); x1 = x0 + 1; x2 = x0 + 1 这种情况计算 x1 和 x2 时 x0 只会被计算一次

oneflow/python/test/modules/test_flatten.py Outdated Show resolved Hide resolved
@jackalcooper
Copy link
Collaborator

jackalcooper commented Jul 16, 2021

是不是可以把这些介绍写在某个 README 里面,然后出错的时候提示去看这个readme

Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>
Signed-off-by: daquexian <daquexian566@gmail.com>

counter = 0

def GetDualObject(name, pytorch, oneflow):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没看太懂这部分 DualObject的处理,大老师可以简单讲下吗

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该是为了防止多次call method的时候前一次的args被以引用的形式进行了修改,导致后一次call method时接受的args值不正确,所以在上下文中动态generate一个method出来,使得每次call method时args的引用不同

return np.allclose(torch_tensor.detach().cpu().numpy(), flow_tensor.numpy())


def autotest(n=20, auto_backward=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

待加rtol和atol

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已加

def check_tensor_equality(torch_tensor, flow_tensor):
# TODO: check dtype
if torch_tensor.grad is not None:
assert flow_tensor.grad is not None, "OneFlow tensor doesn't have grad while PyTorch tensor has one"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以加一下更明显的错误提示吗,比如输出一下出错的这组随机测试的输入和attr,这样CI有异常的话就可以快速定位错误了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个比想象中复杂一些,之后再加吧

Signed-off-by: daquexian <daquexian566@gmail.com>
@BBuf BBuf added the automerge label Jul 21, 2021
@BBuf BBuf requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 21, 2021 01:30
@oneflow-ci-bot oneflow-ci-bot removed their request for review July 21, 2021 02:49
Signed-off-by: daquexian <daquexian566@gmail.com>
@BBuf BBuf requested a review from oneflow-ci-bot July 21, 2021 09:15
@github-actions
Copy link
Contributor

CI failed, removing label automerge

@oneflow-ci-bot oneflow-ci-bot removed their request for review July 21, 2021 10:16
Comment on lines +49 to +53
def register_flow_to_flow_converter(func):
annotation2torch_to_flow_converter[annotation] = func
return func

return register_flow_to_flow_converter

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个register_flow_to_flow_converter名字写错了

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有太get到,拼写问题吗

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该是torch_to_flow吧

@BBuf BBuf requested a review from oneflow-ci-bot July 21, 2021 12:26
@BBuf BBuf added the automerge label Jul 21, 2021
Comment on lines +31 to +32
def torch_tensor_to_flow(x):
return flow.tensor(x.cpu().numpy())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没用上在generators.py中定义的tensor_converter

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 21, 2021 12:45
oneflow_args,
oneflow_kwargs,
) = get_args(pytorch_method, *args, **kwargs)
pytorch_res = pytorch_method(*pytorch_args, **pytorch_kwargs)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一行多余了

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,已删除

@github-actions
Copy link
Contributor

CI failed, removing label automerge

@BBuf BBuf added the automerge label Jul 21, 2021
@oneflow-ci-bot oneflow-ci-bot removed their request for review July 21, 2021 15:12
@oneflow-ci-bot oneflow-ci-bot self-requested a review July 21, 2021 15:12
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 139.1ms (= 4173.0ms / 30, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 125.9ms (= 3776.0ms / 30, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.11 (= 139.1ms / 125.9ms)

PyTorch resnet50 time: 84.9ms (= 2546.3ms / 30, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 73.6ms (= 2206.6ms / 30, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.15 (= 84.9ms / 73.6ms)

PyTorch resnet50 time: 59.5ms (= 1783.9ms / 30, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 49.0ms (= 1469.1ms / 30, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.21 (= 59.5ms / 49.0ms)

PyTorch resnet50 time: 48.6ms (= 1459.3ms / 30, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 46.4ms (= 1390.8ms / 30, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.05 (= 48.6ms / 46.4ms)

PyTorch resnet50 time: 49.8ms (= 1494.2ms / 30, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 45.3ms (= 1359.8ms / 30, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 49.8ms / 45.3ms)

@oneflow-ci-bot oneflow-ci-bot merged commit 865ae1a into master Jul 21, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the update_autotest branch July 21, 2021 16:20
@oneflow-ci-bot oneflow-ci-bot removed their request for review July 21, 2021 16:20
oneflow_res = torch_tensor_to_flow(pytorch_res)
else:
oneflow_res = oneflow(*oneflow_args, **oneflow_kwargs)
return GetDualObject("unused", pytorch_res, oneflow_res)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是不是应该用GetDualObject参数中传入的name


@data_generator(torch.Tensor)
class random_tensor(generator):
def __init__(self, ndim=None, dim0=1, dim1=None, dim2=None, dim3=None, dim4=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的dim0=1是不是写错了

@zzk0 zzk0 mentioned this pull request Oct 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants