Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support tensor.to()/to_local() #5271

Merged
merged 117 commits into from Aug 3, 2021
Merged

Conversation

clackhan
Copy link
Contributor

@clackhan clackhan commented Jun 22, 2021

支持local_tensor.to(...),以及对称模式下 gpu版本local_tensor.to_consistent(...),示例代码如下:
0号进程演示

>>> import oneflow.experimental as flow
>>> import numpy as np
>>> ndarr = np.asarray([[7, 9, 5], [3,6,9]])
>>> x = flow.Tensor(ndarr)
>>> y = x.to("cuda")
>>> y
tensor([[7., 9., 5.],
        [3., 6., 9.]], device='cuda:0', dtype=oneflow.float32)
>>> p = flow.placement("cuda", {0:range(2)})
>>> z = y.to_consistent([flow.sbp.broadcast], p)
>>> z.to_local()
tensor([[7., 9., 5.],
        [3., 6., 9.]], device='cuda:0', dtype=oneflow.float32)
>>> m = y.to_consistent([flow.sbp.split(0)], p)
>>> m.shape
flow.Size([4, 3])
>>> m.to_local()
tensor([[7., 9., 5.],
        [3., 6., 9.]], device='cuda:0', dtype=oneflow.float32)
>>> n = y.to_consistent([flow.sbp.partial_sum], p)
>>> n.to_local()
tensor([[7., 9., 5.],
        [3., 6., 9.]], device='cuda:0', dtype=oneflow.float32)
>>>

1号进程演示

>>> import oneflow.experimental as flow
>>> import numpy as np
>>> ndarr = np.asarray([[1,-4, 5], [2,-3,7]])
>>> x = flow.Tensor(ndarr)
>>> y = x.to("cuda")
>>> y
tensor([[ 1., -4.,  5.],
        [ 2., -3.,  7.]], device='cuda:1', dtype=oneflow.float32)
>>> p = flow.placement("cuda", {0:range(2)})
>>> z = y.to_consistent([flow.sbp.broadcast], p)
>>> z.to_local()
tensor([[7., 9., 5.],
        [3., 6., 9.]], device='cuda:1', dtype=oneflow.float32)
>>> m = y.to_consistent([flow.sbp.split(0)], p)
>>> m.shape
flow.Size([4, 3])
>>> m.to_local()
tensor([[ 1., -4.,  5.],
        [ 2., -3.,  7.]], device='cuda:1', dtype=oneflow.float32)
>>> n = y.to_consistent([flow.sbp.partial_sum], p)
>>> n.to_local()
tensor([[0., 0., 0.],
        [0., 0., 0.]], device='cuda:1', dtype=oneflow.float32)
>>>

…support_tensor_to/to_local

Conflicts:
	oneflow/core/framework/op_expr.cpp
	oneflow/core/framework/tensor.h
@clackhan clackhan requested review from oneflow-ci-bot and removed request for oneflow-ci-bot July 5, 2021 02:11
@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2021

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 139.3ms (= 6966.1ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 126.8ms (= 6340.3ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 139.3ms / 126.8ms)

PyTorch resnet50 time: 85.4ms (= 4271.5ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.1ms (= 3707.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.15 (= 85.4ms / 74.1ms)

PyTorch resnet50 time: 57.8ms (= 2888.2ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.2ms (= 2362.0ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.22 (= 57.8ms / 47.2ms)

PyTorch resnet50 time: 47.7ms (= 2386.4ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 41.6ms (= 2078.7ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.15 (= 47.7ms / 41.6ms)

PyTorch resnet50 time: 43.5ms (= 2174.4ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 45.6ms (= 2280.0ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 0.95 (= 43.5ms / 45.6ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review August 2, 2021 11:37
int64_t cur_parallel_id =
CHECK_JUST(parallel_desc->ParallelId4MachineDeviceId(cur_machine_id, cur_machine_id));
if (cur_parallel_id != root) {
Memset<DeviceType::kGPU>(ctx->device_ctx(), out->mut_dptr(), 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是什么原因,这个 op 名是 reduce,但 reduce 本身是不包含这个置零的操作的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个逻辑要删除,之前陈旧的想法有点问题

@oneflow-ci-bot oneflow-ci-bot self-requested a review August 2, 2021 16:11
@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2021

CI failed, removing label automerge

@github-actions github-actions bot removed the automerge label Aug 2, 2021
@oneflow-ci-bot oneflow-ci-bot removed their request for review August 2, 2021 16:47
@github-actions
Copy link
Contributor

github-actions bot commented Aug 3, 2021

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 146.5ms (= 7322.9ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 126.7ms (= 6337.0ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.16 (= 146.5ms / 126.7ms)

PyTorch resnet50 time: 84.0ms (= 4199.7ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.1ms (= 3704.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.13 (= 84.0ms / 74.1ms)

PyTorch resnet50 time: 58.6ms (= 2931.0ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.7ms (= 2384.0ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.23 (= 58.6ms / 47.7ms)

PyTorch resnet50 time: 49.8ms (= 2489.8ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 42.6ms (= 2132.2ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.17 (= 49.8ms / 42.6ms)

PyTorch resnet50 time: 43.1ms (= 2154.1ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 43.5ms (= 2176.6ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 0.99 (= 43.1ms / 43.5ms)

@oneflow-ci-bot oneflow-ci-bot merged commit a72c21d into master Aug 3, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the support_tensor_to/to_local branch August 3, 2021 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants