Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hybird optim] reduce rend/recv times for recompute #34248

Merged
merged 1 commit into from Jul 20, 2021

Conversation

FeixLiu
Copy link
Contributor

@FeixLiu FeixLiu commented Jul 19, 2021

PR types

Performance optimization

PR changes

Others

Describe

Screen Shot 2021-07-15 at 10 43 50 AM

Screen Shot 2021-07-15 at 10 43 55 AM

观察到recompute的var在forward pass与recompute pass中均被send,会造成stage0 Backward --send--> stage1 Forward(recompute) 的情况。 优化过后,只需在前向时一次send,反向时将前向的var值assgin给 recompute的var即可。
all send throughput optim throughput gain
47765 49067 +2.7%
partial send throughput optim throughput gain
48851 49067 +0.4%

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

'out_shape': var_shape,
'dtype': var.dtype,
self._op_device_key: cur_dev,
self._op_role_key: op_role,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之后加一个断言,recompute这个情况下op_role为backward

@@ -4867,15 +4900,13 @@ def _insert_send_recv(cur_id, prev_id):
})
extra_index_info['index'] += 1
insert_index = None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

咋把这几个换行删了。。

@@ -17,6 +17,7 @@ list(APPEND DIST_TEST_OPS test_parallel_dygraph_sparse_embedding)
list(APPEND DIST_TEST_OPS test_parallel_dygraph_sparse_embedding_over_height)
list(APPEND DIST_TEST_OPS test_parallel_dygraph_transformer)
list(APPEND DIST_TEST_OPS test_fleet_pipeline_meta_optimizer)
list(APPEND DIST_TEST_OPS test_fleet_pipeline_meta_optimizer_with_recompute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单测加到原来test_fleet_pipeline_meta_optimizer会出错吗

Copy link
Contributor

@wangxicoding wangxicoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangxicoding wangxicoding merged commit 3a5f1f2 into PaddlePaddle:develop Jul 20, 2021
@FeixLiu FeixLiu deleted the recomput_optim branch July 20, 2021 02:20
@FeixLiu FeixLiu changed the title [hybird optim] reduce rend/recv times for recompute, test=develop [hybird optim] reduce rend/recv times for recompute Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants