Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug of multi-GPU train nn.Graph extra mem cost in rank 0 #5930

Merged
merged 5 commits into from Aug 18, 2021

Conversation

chengtbf
Copy link
Contributor

修复 多卡(多进程)训练 nn.Graph 时,rank 0 上会多占用很多显存的 BUG。

BUG 的原因是我们在 cudaMallocHost 的时候没有指定 device id,那么每个进程都会默认指定 device 0,此时虽然申请的是 host pinned memory,但是也会在相应的进程上创建 device 0 的 CUDA context。 CUDA context 是进程内概念,各个 rank 之间不共享, 每个 context 至少占用 300MiB - 500MiB 的显存。

before:
6

after:
5

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 140.0ms (= 7001.0ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.1ms (= 6403.6ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.09 (= 140.0ms / 128.1ms)

PyTorch resnet50 time: 82.7ms (= 4135.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.7ms (= 3732.6ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.11 (= 82.7ms / 74.7ms)

PyTorch resnet50 time: 55.4ms (= 2770.2ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 48.5ms (= 2425.9ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.14 (= 55.4ms / 48.5ms)

PyTorch resnet50 time: 48.7ms (= 2434.8ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 41.2ms (= 2058.8ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.18 (= 48.7ms / 41.2ms)

PyTorch resnet50 time: 44.3ms (= 2217.1ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 40.2ms (= 2008.2ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 44.3ms / 40.2ms)

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 140.1ms (= 7003.8ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.2ms (= 6408.9ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.09 (= 140.1ms / 128.2ms)

PyTorch resnet50 time: 84.1ms (= 4205.6ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.5ms (= 3725.9ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.13 (= 84.1ms / 74.5ms)

PyTorch resnet50 time: 57.3ms (= 2864.8ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.5ms (= 2372.9ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.21 (= 57.3ms / 47.5ms)

PyTorch resnet50 time: 50.0ms (= 2500.8ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 43.2ms (= 2159.5ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.16 (= 50.0ms / 43.2ms)

PyTorch resnet50 time: 43.6ms (= 2179.8ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 37.2ms (= 1859.0ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.17 (= 43.6ms / 37.2ms)

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 140.4ms (= 7019.7ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 127.8ms (= 6392.2ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 140.4ms / 127.8ms)

PyTorch resnet50 time: 83.2ms (= 4161.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.3ms (= 3713.6ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.12 (= 83.2ms / 74.3ms)

PyTorch resnet50 time: 57.1ms (= 2855.7ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.7ms (= 2387.1ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.20 (= 57.1ms / 47.7ms)

PyTorch resnet50 time: 47.3ms (= 2362.7ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 39.7ms (= 1986.0ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.19 (= 47.3ms / 39.7ms)

PyTorch resnet50 time: 44.1ms (= 2204.7ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 41.2ms (= 2058.0ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.07 (= 44.1ms / 41.2ms)

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 140.5ms (= 7024.5ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.1ms (= 6403.4ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 140.5ms / 128.1ms)

PyTorch resnet50 time: 84.2ms (= 4208.9ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.3ms (= 3716.3ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.13 (= 84.2ms / 74.3ms)

PyTorch resnet50 time: 60.3ms (= 3013.8ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 48.3ms (= 2414.3ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.25 (= 60.3ms / 48.3ms)

PyTorch resnet50 time: 46.7ms (= 2335.2ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 40.1ms (= 2006.1ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.16 (= 46.7ms / 40.1ms)

PyTorch resnet50 time: 43.1ms (= 2154.3ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 37.6ms (= 1882.0ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.14 (= 43.1ms / 37.6ms)

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 139.5ms (= 6974.1ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.2ms (= 6411.0ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.09 (= 139.5ms / 128.2ms)

PyTorch resnet50 time: 84.5ms (= 4224.7ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.4ms (= 3722.4ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.13 (= 84.5ms / 74.4ms)

PyTorch resnet50 time: 58.9ms (= 2945.2ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 48.2ms (= 2409.9ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.22 (= 58.9ms / 48.2ms)

PyTorch resnet50 time: 49.8ms (= 2488.6ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 42.6ms (= 2127.7ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.17 (= 49.8ms / 42.6ms)

PyTorch resnet50 time: 44.0ms (= 2200.5ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 41.8ms (= 2090.9ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.05 (= 44.0ms / 41.8ms)

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 140.3ms (= 7013.2ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.3ms (= 6414.5ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.09 (= 140.3ms / 128.3ms)

PyTorch resnet50 time: 84.7ms (= 4237.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.3ms (= 3715.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.14 (= 84.7ms / 74.3ms)

PyTorch resnet50 time: 57.0ms (= 2850.6ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.4ms (= 2371.6ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.20 (= 57.0ms / 47.4ms)

PyTorch resnet50 time: 48.3ms (= 2414.4ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 39.8ms (= 1991.3ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.21 (= 48.3ms / 39.8ms)

PyTorch resnet50 time: 42.8ms (= 2138.5ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 41.9ms (= 2094.9ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.02 (= 42.8ms / 41.9ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review August 17, 2021 20:21
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 18, 2021 01:48
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 18, 2021 02:54
@chengtbf chengtbf removed the request for review from oneflow-ci-bot August 18, 2021 03:24
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 141.3ms (= 7066.6ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.4ms (= 6420.5ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 141.3ms / 128.4ms)

PyTorch resnet50 time: 84.2ms (= 4207.7ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.4ms (= 3721.5ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.13 (= 84.2ms / 74.4ms)

PyTorch resnet50 time: 57.2ms (= 2861.1ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.3ms (= 2363.7ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.21 (= 57.2ms / 47.3ms)

PyTorch resnet50 time: 49.8ms (= 2491.5ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 38.1ms (= 1902.9ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.31 (= 49.8ms / 38.1ms)

PyTorch resnet50 time: 37.8ms (= 1888.9ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 36.3ms (= 1816.6ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.04 (= 37.8ms / 36.3ms)

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 18, 2021 07:54
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 140.9ms (= 7046.5ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.1ms (= 6403.3ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 140.9ms / 128.1ms)

PyTorch resnet50 time: 83.0ms (= 4148.1ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.4ms (= 3718.4ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.12 (= 83.0ms / 74.4ms)

PyTorch resnet50 time: 57.7ms (= 2882.8ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.9ms (= 2396.3ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.20 (= 57.7ms / 47.9ms)

PyTorch resnet50 time: 47.4ms (= 2372.4ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 44.7ms (= 2237.4ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.06 (= 47.4ms / 44.7ms)

PyTorch resnet50 time: 41.7ms (= 2085.4ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 37.0ms (= 1851.5ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.13 (= 41.7ms / 37.0ms)

@oneflow-ci-bot oneflow-ci-bot merged commit 357f71a into master Aug 18, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the dev_cc_fix_mem_rank0 branch August 18, 2021 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants