flow.tmp_compute_stream #8866

lixinqi · 2022-08-08T02:44:58Z

使用api oneflow.async.thread暴露vm的worker thread给用户编程。
用法示例：

loss = model()
with flow.asyncs.thread(thread_global_id=2):
    write_metrics(loss)

经过和建浩，后江，啸宇一起讨论，决定暂时不对python层暴露stream或者StreamSet等概念，因为解释成本太高，而且用户极易理解错误，简单的暴露thread概念就能完成绝大部分业务需求了。

…pi oneflow.stream to oneflow.experimental.stream_set

lixinqi · 2022-08-29T02:19:00Z

experimental

试验性的工具类 torch 一般是放到 utils.xxx 下面，稳定后再升级到二级命名空间下、或者一级命名空间下。

之前的实验性功能命名空间 experimental 也都已经删掉了

我没有找到过torch.experimental.xxx api。最多只是找到了functorch.experimental.xxx和torchtext.experimental.xxx。后两者看起来是基于pytorch框架做得库，不是torch本身的东西。
或者，在其子名字空间下还可以找到experimental
https://pytorch.org/docs/stable/search.html?q=torch.experimental&check_keywords=yes&area=default#

…c/oneflow into tmp_compute_stream_type_guard

daquexian · 2022-08-29T08:00:05Z

python/oneflow/test/modules/test_async_thread.py

+        for tensor in tensors:
+            test_case.assertEqual(tensor[0], 1)
+            test_case.assertEqual(tensor[int(tensor.shape[0] / 2)], 1)


这里要测试什么呢，如果是通信的正确性好像并没有体现？因为即使不同 rank 的通信错位了 tensor 的值也会是全 1

这的本意是为了测试是否死锁。
也许我应该进一步，对值也做测试。

已处理： 1dfa972

github-actions · 2022-08-29T08:38:33Z

CI failed when running job: Build cpu. PR label automerge has been removed

lixinqi · 2022-08-30T02:36:10Z

备选方案以及辩护理由

新奇、建浩提 flow.StreamSet / flow.stream_set

啸宇：不应该直接放在oneflow顶层名字空间
后江：Stream这个字眼不合适，普通c++的开发人员容易理解成io流，而cuda开发人员容易理解为stream，但其实都不太像。

新奇、建浩提 flow.experimental.StreamSet / flow.experimental.stream_set

啸宇：torch的不太成熟的api一般都放在torch.utils名字空间下。
后江：Stream这个字眼不合适，普通c++的开发人员容易理解成io流，而cuda开发人员容易理解为stream，但其实都不太像。

新奇提 flow.vm.StreamSet / flow.vm.stream_set

新奇：不太想把vm这个概念暴露出来，本质上不应该让用户关心。
后江：Stream这个字眼不合适，普通c++的开发人员容易理解成io流，而cuda开发人员容易理解为stream，但其实都不太像。

新奇提 flow.worker.StreamSet / flow.worker.stream_set

新奇：用户不太能理解worker语义。
啸宇：子命名空间应该是特定功能，worker有点不合适。
后江：Stream这个字眼不合适，普通c++的开发人员容易理解成io流，而cuda开发人员容易理解为stream，但其实都不太像。

新奇提 flow.worker_thread.StreamSet / flow.worker_thread.stream_set

新奇：worker_thread里用户的业务逻辑太远。
啸宇：子命名空间应该是特定功能，worker_thread有点不合适。
后江：Stream这个字眼不合适，普通c++的开发人员容易理解成io流，而cuda开发人员容易理解为stream，但其实都不太像。

啸宇提 flow.stream.StreamSet / flow.stream.stream_set

后江：Stream这个字眼不合适，普通c++的开发人员容易理解成io流，而cuda开发人员容易理解为stream，但其实都不太像。

新奇，啸宇提flow.utils.StreamSet / flow.utils.stream_set

后江：Stream这个字眼不合适，普通c++的开发人员容易理解成io流，而cuda开发人员容易理解为stream，但其实都不太像。

啸宇，后江，建浩提新的名字空间 flow.async

均无异议。新奇：看起来像是flow.vm的替代，但是更易于用户理解。
畅想了什么api可以放到该名字空间下：flow.async.local_sync ，flow.async.global_sync，

后江提flow.async.run

with flow.async.run(thread_global_id):
    pass

啸宇：run这个字眼不太合适，应该是个名词

新奇提flow.async.run(flow.async.Fiber(thread_global_id))

后江：Fiber比stream好，但仍然会让人联想到常见的协程概念，但此处又不太一致。

新奇、后江提flow.async.pipeline

新奇，啸宇：pipeline在pytorch里有类似概念: Pipe module。那个更多的是一个composer，把分属两个设备的module组合起来，类似Sequential module。此处的flow.async.pipeline更多地想表达类似micro thread的概念。

后江提flow.asyncs.thread(thread_global_id)

后江：暂时不要提供fiber概念，用户不易理解。c++层面的StreamSet可以继续存在，python层只导出flow.asyncs.thread。用户用不同fiber所要解决的问题，总是可以用不同thread_global_id来解决，而且更加可靠，因为不同的thread_global_id肯定不会出现不同fiber争用线程资源。

default_thread_id = 0
decoder_async_thread_id = 1
loss_async_thread_id = 2

for i in epoch_iters:

    with flow.asyncs.thread(decoder_async_thread_id):
        data = get_data()

    loss = train(model)

    with flow.asyncs.thread(loss_async_thread_id):
        write_metric(loss)

lixinqi · 2022-08-30T02:40:05Z

决策原则

api是否明确暴露eager 运行时的核心特性。
让用户最快地理解api如何使用。

github-actions · 2022-08-30T05:17:17Z

Speed stats:

github-actions · 2022-08-30T06:22:59Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8866/

github-actions · 2022-08-30T06:24:11Z

Speed stats:

GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.3ms (= 12926.8ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 142.9ms (= 14289.9ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.11 (= 142.9ms / 129.3ms)

OneFlow resnet50 time: 74.5ms (= 7451.9ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.4ms (= 8540.7ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.15 (= 85.4ms / 74.5ms)

OneFlow resnet50 time: 47.1ms (= 9410.3ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.9ms (= 11573.4ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.23 (= 57.9ms / 47.1ms)

OneFlow resnet50 time: 34.3ms (= 6859.3ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 42.9ms (= 8582.9ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.25 (= 42.9ms / 34.3ms)

OneFlow resnet50 time: 28.2ms (= 5635.2ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.1ms (= 7626.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.35 (= 38.1ms / 28.2ms)

OneFlow swin dataloader time: 0.268s (= 53.662s / 200, num_workers=1)
PyTorch swin dataloader time: 0.150s (= 30.005s / 200, num_workers=1)
Relative speed: 0.559 (= 0.150s / 0.268s)

OneFlow swin dataloader time: 0.077s (= 15.423s / 200, num_workers=4)
PyTorch swin dataloader time: 0.040s (= 8.082s / 200, num_workers=4)
Relative speed: 0.524 (= 0.040s / 0.077s)

OneFlow swin dataloader time: 0.040s (= 7.950s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.350s / 200, num_workers=8)
Relative speed: 0.547 (= 0.022s / 0.040s)

❌ OneFlow resnet50 time: 138.3ms (= 13829.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 165.0ms (= 16496.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 165.0ms / 138.3ms)

OneFlow resnet50 time: 84.2ms (= 8417.6ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 101.7ms (= 10172.8ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 101.7ms / 84.2ms)

OneFlow resnet50 time: 57.2ms (= 11435.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.7ms (= 15533.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 77.7ms / 57.2ms)

OneFlow resnet50 time: 44.3ms (= 8866.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.8ms (= 13754.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 68.8ms / 44.3ms)

OneFlow resnet50 time: 38.6ms (= 7725.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.0ms (= 15407.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.99 (= 77.0ms / 38.6ms)

github-actions · 2022-08-30T09:06:59Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/8866/

github-actions · 2022-08-30T09:09:03Z

Speed stats:

GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 129.2ms (= 12916.2ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 150.0ms (= 15004.8ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.16 (= 150.0ms / 129.2ms)

OneFlow resnet50 time: 74.5ms (= 7451.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 83.8ms (= 8382.0ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.12 (= 83.8ms / 74.5ms)

OneFlow resnet50 time: 46.7ms (= 9344.2ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 58.6ms (= 11715.7ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.25 (= 58.6ms / 46.7ms)

OneFlow resnet50 time: 34.2ms (= 6838.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 44.2ms (= 8835.2ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.29 (= 44.2ms / 34.2ms)

OneFlow resnet50 time: 28.2ms (= 5638.8ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 39.7ms (= 7933.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.41 (= 39.7ms / 28.2ms)

OneFlow swin dataloader time: 0.262s (= 52.388s / 200, num_workers=1)
PyTorch swin dataloader time: 0.154s (= 30.826s / 200, num_workers=1)
Relative speed: 0.588 (= 0.154s / 0.262s)

OneFlow swin dataloader time: 0.074s (= 14.738s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.399s / 200, num_workers=4)
Relative speed: 0.570 (= 0.042s / 0.074s)

OneFlow swin dataloader time: 0.039s (= 7.822s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.331s / 200, num_workers=8)
Relative speed: 0.554 (= 0.022s / 0.039s)

❌ OneFlow resnet50 time: 138.1ms (= 13805.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 168.1ms (= 16807.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.22 (= 168.1ms / 138.1ms)

OneFlow resnet50 time: 84.2ms (= 8421.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 107.3ms (= 10730.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 107.3ms / 84.2ms)

OneFlow resnet50 time: 57.3ms (= 11463.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.0ms (= 15598.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 78.0ms / 57.3ms)

OneFlow resnet50 time: 44.5ms (= 8899.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.5ms (= 13909.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.56 (= 69.5ms / 44.5ms)

OneFlow resnet50 time: 38.7ms (= 7745.8ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.5ms (= 14894.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.92 (= 74.5ms / 38.7ms)

lixinqi added 30 commits May 12, 2022 21:11

ThreadLocalGuard

6e8e9c9

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

08e9178

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

f59d17d

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

3eb809a

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

55c163c

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

8aa2e8f

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

7612597

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

de5f971

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

8e86949

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

2ca0707

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

8537b7e

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

55c5160

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

e643eb1

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

eccdfe6

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

043accc

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

97b0eef

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

1591853

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

ba6f2d7

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

5e1a86a

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

1ee004c

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

e853c71

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

c5afe82

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

14226d6

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

754d6a7

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

acb7c98

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

5916848

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

913f6f5

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

fa3867e

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

61bee99

Merge branch 'master' of github.com:Oneflow-Inc/oneflow

7eb2d72

lixinqi and others added 2 commits August 29, 2022 10:08

rename api oneflow.Stream to oneflow.experimental.StreamSet; rename a…

339b923

…pi oneflow.stream to oneflow.experimental.stream_set

Merge branch 'master' into tmp_compute_stream_type_guard

950ef94

lixinqi and others added 5 commits August 29, 2022 14:21

replace flow.Stream/flow.stream with flow.async.thread

02b7919

Merge branch 'tmp_compute_stream_type_guard' of github.com:Oneflow-In…

b9e5ab2

…c/oneflow into tmp_compute_stream_type_guard

Merge branch 'master' into tmp_compute_stream_type_guard

ff19b80

add doc string for flow.async.thread

3ef7660

Merge branch 'tmp_compute_stream_type_guard' of github.com:Oneflow-In…

b90e943

…c/oneflow into tmp_compute_stream_type_guard

strint approved these changes Aug 29, 2022

View reviewed changes

hjchen2 approved these changes Aug 29, 2022

View reviewed changes

daquexian reviewed Aug 29, 2022

View reviewed changes

make unittest stricter

1dfa972

lixinqi added enhancement automerge system labels Aug 29, 2022

lixinqi requested a review from oneflow-ci-bot August 29, 2022 08:12

github-actions bot removed the automerge label Aug 29, 2022

rename flow.async to flow.asyncs

9311dd3

fix bugs in test_asyncs_thread.py

697d368

Merge branch 'master' into tmp_compute_stream_type_guard

2948242

lixinqi merged commit 4fefb3e into master Aug 30, 2022

lixinqi deleted the tmp_compute_stream_type_guard branch August 30, 2022 09:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flow.tmp_compute_stream #8866

flow.tmp_compute_stream #8866

lixinqi commented Aug 8, 2022 •

edited

lixinqi commented Aug 29, 2022 •

edited

daquexian Aug 29, 2022

lixinqi Aug 29, 2022

lixinqi Aug 29, 2022

github-actions bot commented Aug 29, 2022

lixinqi commented Aug 30, 2022 •

edited

lixinqi commented Aug 30, 2022

github-actions bot commented Aug 30, 2022

github-actions bot commented Aug 30, 2022

github-actions bot commented Aug 30, 2022

github-actions bot commented Aug 30, 2022

github-actions bot commented Aug 30, 2022

flow.tmp_compute_stream #8866

flow.tmp_compute_stream #8866

Conversation

lixinqi commented Aug 8, 2022 • edited

lixinqi commented Aug 29, 2022 • edited

daquexian Aug 29, 2022

Choose a reason for hiding this comment

lixinqi Aug 29, 2022

Choose a reason for hiding this comment

lixinqi Aug 29, 2022

Choose a reason for hiding this comment

github-actions bot commented Aug 29, 2022

lixinqi commented Aug 30, 2022 • edited

备选方案以及辩护理由

新奇、建浩提 flow.StreamSet / flow.stream_set

新奇、建浩提 flow.experimental.StreamSet / flow.experimental.stream_set

新奇提 flow.vm.StreamSet / flow.vm.stream_set

新奇提 flow.worker.StreamSet / flow.worker.stream_set

新奇提 flow.worker_thread.StreamSet / flow.worker_thread.stream_set

啸宇提 flow.stream.StreamSet / flow.stream.stream_set

新奇，啸宇提flow.utils.StreamSet / flow.utils.stream_set

啸宇，后江，建浩提新的名字空间 flow.async

后江提flow.async.run

新奇提flow.async.run(flow.async.Fiber(thread_global_id))

新奇、后江提flow.async.pipeline

后江提flow.asyncs.thread(thread_global_id)

lixinqi commented Aug 30, 2022

决策原则

github-actions bot commented Aug 30, 2022

github-actions bot commented Aug 30, 2022

github-actions bot commented Aug 30, 2022

github-actions bot commented Aug 30, 2022

github-actions bot commented Aug 30, 2022

lixinqi commented Aug 8, 2022 •

edited

lixinqi commented Aug 29, 2022 •

edited

lixinqi commented Aug 30, 2022 •

edited