Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import oneflow as torch #6076

Merged
merged 29 commits into from Aug 29, 2021
Merged

Import oneflow as torch #6076

merged 29 commits into from Aug 29, 2021

Conversation

BBuf
Copy link
Contributor

@BBuf BBuf commented Aug 27, 2021

此PR将TorchVison的常见模型加入CI测试,确保可以import oneflow as torch和 import torch as oneflow可以跑同一份模型代码(兼容性)。目前已经支持如下模型的测试,评价指标是100个iter的loss相似度:

  • resnet50
  • alexnet

@BBuf BBuf requested a review from jackalcooper August 27, 2021 03:12
@jackalcooper
Copy link
Collaborator

有个png图片被提交了

@@ -0,0 +1,66 @@
import oneflow as flow
import oneflow.nn as nn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/Oneflow-Inc/models/blob/main/scripts/compare_speed_with_pytorch.py#L67-L87 这里实现了把 pytorch 模型文件里的 import torch 覆盖为 import oneflow as torch,这样不需要维护两套模型文件了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不仅仅是import,里面有一些op torchvsion直接使用的torch.xxx,需要改成oneflow.xxx,这里可以做到吗?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为已经 import oneflow as torch 了,所以 torch.xxx 就是 oneflow.xxx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哦,对,好的,我修改一下

@@ -0,0 +1,450 @@
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个文件是放错目录了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有个png图片被提交了

那个图片在我本地是没有的了,而且我在pr里点删除是灰的,很奇怪

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有放错,我开了一个models来存所有的模型文件

@@ -0,0 +1,228 @@
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

文件太长,上个comment写错地方了。是这个文件好像放错地方了?test util 里面不应该放 unittest.main() 的东西吧?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那这个文件应该放在哪里呢?modules吗

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那这个文件应该放在哪里呢?modules吗

嗯,不然不会被测试的

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我的理解 test util 下提供一下用例会用到的公共函数,自身不应该是直接运行的测试

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

和auotest类似调整了目录结构,将oneflow_pytorch_compatiblity_test需要的公共函数当成一个包导入,在modules下进行测试。

@@ -0,0 +1,30 @@
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note 去掉oneflow export之后这种hack的文件可以不用写了,直接import oneflow.test_utils

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已删除。

new_parameters[k] = flow.tensor(w[k].detach().numpy())

try:
shutil.rmtree("/dataset/imagenet/compatiblity_models")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note
不应该直接写 /dataset 目录,应该用 python tempfile 创建临时目录

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,已解决。

with open("/tmp/tmp_model.py", "w") as new_f:
new_f.write(buf)

python_module = import_file("/tmp/tmp_model.py")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note
import_file 这个函数应该改成直接接受源码,在里面处理零时文件创建和回收的逻辑

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的



def import_file(path):
spec = importlib.util.spec_from_file_location("mod", path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note
记得加一下flush

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已加

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 27, 2021 14:36
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 27, 2021 16:30
@github-actions
Copy link
Contributor

CI failed, removing label automerge

@oneflow-ci-bot oneflow-ci-bot removed their request for review August 27, 2021 17:54
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 28, 2021 03:26
@github-actions
Copy link
Contributor

CI failed, removing label automerge

@oneflow-ci-bot oneflow-ci-bot removed their request for review August 28, 2021 04:31
@BBuf BBuf added the automerge label Aug 29, 2021
@oneflow-ci-bot oneflow-ci-bot removed their request for review August 29, 2021 03:52
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 29, 2021 03:52
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

OneFlow resnet50 time: 128.2ms (= 6407.9ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow GPU used (rank 0): 0 MiB
PyTorch resnet50 time: 141.2ms (= 7059.3ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
PyTorch GPU used (rank 0, estimated): 0 MiB
Relative speed: 1.10 (= 141.2ms / 128.2ms)

OneFlow resnet50 time: 74.4ms (= 3722.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow GPU used (rank 0): 0 MiB
PyTorch resnet50 time: 85.1ms (= 4255.5ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
PyTorch GPU used (rank 0, estimated): 0 MiB
Relative speed: 1.14 (= 85.1ms / 74.4ms)

OneFlow resnet50 time: 47.6ms (= 2381.2ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow GPU used (rank 0): 0 MiB
PyTorch resnet50 time: 60.0ms (= 3001.6ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
PyTorch GPU used (rank 0, estimated): 0 MiB
Relative speed: 1.26 (= 60.0ms / 47.6ms)

OneFlow resnet50 time: 41.9ms (= 2096.5ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow GPU used (rank 0): 0 MiB
PyTorch resnet50 time: 49.8ms (= 2491.5ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
PyTorch GPU used (rank 0, estimated): 0 MiB
Relative speed: 1.19 (= 49.8ms / 41.9ms)

OneFlow resnet50 time: 37.4ms (= 1871.2ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow GPU used (rank 0): 0 MiB
PyTorch resnet50 time: 42.6ms (= 2132.3ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
PyTorch GPU used (rank 0, estimated): 0 MiB
Relative speed: 1.14 (= 42.6ms / 37.4ms)

OneFlow resnet50 time: 141.7ms (= 7085.8ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow GPU used (rank 0): 0 MiB
PyTorch resnet50 time: 146.8ms (= 7340.5ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
PyTorch GPU used (rank 0, estimated): 0 MiB
Relative speed: 1.04 (= 146.8ms / 141.7ms)

OneFlow resnet50 time: 90.2ms (= 4507.7ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow GPU used (rank 0): 0 MiB
PyTorch resnet50 time: 90.0ms (= 4502.0ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
PyTorch GPU used (rank 0, estimated): 0 MiB
Relative speed: 1.00 (= 90.0ms / 90.2ms)

OneFlow resnet50 time: 68.8ms (= 3439.2ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow GPU used (rank 0): 0 MiB
PyTorch resnet50 time: 65.8ms (= 3291.3ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
PyTorch GPU used (rank 0, estimated): 0 MiB
Relative speed: 0.96 (= 65.8ms / 68.8ms)

OneFlow resnet50 time: 64.0ms (= 3199.8ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow GPU used (rank 0): 0 MiB
PyTorch resnet50 time: 55.1ms (= 2753.0ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
PyTorch GPU used (rank 0, estimated): 0 MiB
Relative speed: 0.86 (= 55.1ms / 64.0ms)

OneFlow resnet50 time: 58.9ms (= 2947.1ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow GPU used (rank 0): 0 MiB
PyTorch resnet50 time: 48.0ms (= 2400.0ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
PyTorch GPU used (rank 0, estimated): 0 MiB
Relative speed: 0.81 (= 48.0ms / 58.9ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review August 29, 2021 06:28
@oneflow-ci-bot oneflow-ci-bot merged commit 01c55a0 into master Aug 29, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the import_oneflow_as_torch branch August 29, 2021 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants