Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor RuntimeCtx for multi-runtime #5664

Merged
merged 7 commits into from
Jul 30, 2021

Conversation

chengtbf
Copy link
Contributor

为了支持多个 Runtime 同时存在,进行一系列必要的重构。

  • 移除 RuntimeCtx 、 Runtime 抽象的过时构造参数: total_piece_num 和 is_experiment_phase
  • 移除 Actor 中对 Global<Profile> 和 试跑相关的入口代码 collect_act_event 等(ActEventLog 的逻辑没有移除,但现在不再调用了),性能调试目前有更易用的接口是: OF_PROFILER_RANGE_PUSH 等
  • 重构 Runtime 中对 Counter 的使用,支持多 Runtime

为了支持多个 Runtime,所以需要允许有相互独立的 Actor Count 的计数方式(每个 Runtime 自己需要独立计数,同时 Thread 是跨 Runtime(多job )的,所以需要在 Actor 里记录所属的 Runtime),最终为了兼容 Single-Client 和 Multi-Client 对 Runtime 的使用方式,我选择了 job_id 作为区分,并在 Actor 中记录所属的 job_id。

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 30, 2021 06:54
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 30, 2021 09:05
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 30, 2021 10:11
@oneflow-ci-bot oneflow-ci-bot self-requested a review July 30, 2021 11:18
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 141.8ms (= 7091.1ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 126.8ms (= 6339.1ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.12 (= 141.8ms / 126.8ms)

PyTorch resnet50 time: 83.6ms (= 4181.6ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.1ms (= 3706.8ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.13 (= 83.6ms / 74.1ms)

PyTorch resnet50 time: 58.5ms (= 2924.8ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 49.1ms (= 2454.6ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.19 (= 58.5ms / 49.1ms)

PyTorch resnet50 time: 47.8ms (= 2391.7ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 41.9ms (= 2095.8ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.14 (= 47.8ms / 41.9ms)

PyTorch resnet50 time: 41.8ms (= 2090.9ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 49.6ms (= 2479.4ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 0.84 (= 41.8ms / 49.6ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review July 30, 2021 13:11
@oneflow-ci-bot oneflow-ci-bot merged commit 458bc06 into master Jul 30, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the dev_cc_refactor_runtime_context branch July 30, 2021 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants