Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix select_first_grad bug #6142

Merged
merged 7 commits into from
Sep 2, 2021
Merged

Fix select_first_grad bug #6142

merged 7 commits into from
Sep 2, 2021

Conversation

wyg1997
Copy link
Contributor

@wyg1997 wyg1997 commented Sep 2, 2021

解决 oneflow-inc/oneteam#578 中 insight-face ddp 的问题。ddp 中使用的 select_first op ,加入的Parameters 列表可能有的 requires_grad 为 False,但仍会为其求出梯度,导致在 AutogradEngine 中检察报错。

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot September 2, 2021 11:18
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot September 2, 2021 12:32
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot September 2, 2021 13:45
@github-actions
Copy link
Contributor

github-actions bot commented Sep 2, 2021

Speed stats:
GPU Name: GeForce GTX 1080 

OneFlow resnet50 time: 125.6ms (= 6280.1ms / 50, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 136.8ms (= 6837.6ms / 50, input_shape=[16, 3, 224, 224])
Relative speed: 1.09 (= 136.8ms / 125.6ms)

OneFlow resnet50 time: 73.2ms (= 3657.6ms / 50, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 82.1ms (= 4106.8ms / 50, input_shape=[8, 3, 224, 224])
Relative speed: 1.12 (= 82.1ms / 73.2ms)

OneFlow resnet50 time: 47.5ms (= 2373.9ms / 50, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.9ms (= 2897.0ms / 50, input_shape=[4, 3, 224, 224])
Relative speed: 1.22 (= 57.9ms / 47.5ms)

OneFlow resnet50 time: 46.2ms (= 2308.2ms / 50, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 49.5ms (= 2474.4ms / 50, input_shape=[2, 3, 224, 224])
Relative speed: 1.07 (= 49.5ms / 46.2ms)

OneFlow resnet50 time: 41.5ms (= 2076.7ms / 50, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.2ms (= 2009.2ms / 50, input_shape=[1, 3, 224, 224])
Relative speed: 0.97 (= 40.2ms / 41.5ms)

OneFlow resnet50 time: 141.4ms (= 7072.1ms / 50, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.9ms (= 8095.4ms / 50, input_shape=[16, 3, 224, 224], ddp, world size=2)
Relative speed: 1.14 (= 161.9ms / 141.4ms)

OneFlow resnet50 time: 94.2ms (= 4711.9ms / 50, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.5ms (= 5223.3ms / 50, input_shape=[8, 3, 224, 224], ddp, world size=2)
Relative speed: 1.11 (= 104.5ms / 94.2ms)

OneFlow resnet50 time: 71.2ms (= 3558.9ms / 50, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 76.9ms (= 3843.5ms / 50, input_shape=[4, 3, 224, 224], ddp, world size=2)
Relative speed: 1.08 (= 76.9ms / 71.2ms)

OneFlow resnet50 time: 62.4ms (= 3119.1ms / 50, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.7ms (= 3535.6ms / 50, input_shape=[2, 3, 224, 224], ddp, world size=2)
Relative speed: 1.13 (= 70.7ms / 62.4ms)

OneFlow resnet50 time: 61.0ms (= 3050.8ms / 50, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.1ms (= 3205.8ms / 50, input_shape=[1, 3, 224, 224], ddp, world size=2)
Relative speed: 1.05 (= 64.1ms / 61.0ms)

@oneflow-ci-bot oneflow-ci-bot merged commit 183bcfc into master Sep 2, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the fix-ddp_select_first_grad branch September 2, 2021 14:53
@oneflow-ci-bot oneflow-ci-bot removed their request for review September 2, 2021 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants