Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local dep object pool #5953

Merged
merged 9 commits into from
Aug 19, 2021
Merged

Local dep object pool #5953

merged 9 commits into from
Aug 19, 2021

Conversation

lixinqi
Copy link
Contributor

@lixinqi lixinqi commented Aug 18, 2021

解决cuda_h2d导致的内存开销过大问题。

CHECK_OR_RETURN(!local_dep_object_pool->empty());
size_t pool_size = local_dep_object_pool->size();
static thread_local int64_t index = 0;
return local_dep_object_pool->at(index++ % pool_size).Mutable();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalDepObject的复用不会造成任何问题,甚至还有好处。比如cuda_h2d device上只准备了2个LocalDepObject,这样整个计算流就在double buffer的模式下工作。
可以认为LocalDepObject就是流控机制。

Maybe<size_t> Device::instr_local_dep_object_pool_size() const {
static const size_t kDoubleBufferPoolSize = 2;
static const HashMap<std::string, size_t> type2pool_size{
{"cpu", GetInstructionHighWaterMark()}, {"cuda", GetInstructionHighWaterMark()},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里把 pool size 设置成 GetInstructionHighWaterMark(),相当于并没有顺序化?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

当然不是。你可以认为之前这里是无穷大,那种情况下都有顺序化呀。顺序化是device的LocalDepObject成员控制的。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哦哦,我想错了

@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot August 19, 2021 07:11
@oneflow-ci-bot oneflow-ci-bot self-requested a review August 19, 2021 08:09
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 140.8ms (= 7041.6ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.0ms (= 6400.7ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 140.8ms / 128.0ms)

PyTorch resnet50 time: 85.8ms (= 4289.6ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.4ms (= 3718.9ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.15 (= 85.8ms / 74.4ms)

PyTorch resnet50 time: 57.8ms (= 2889.4ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.1ms (= 2353.2ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.23 (= 57.8ms / 47.1ms)

PyTorch resnet50 time: 48.1ms (= 2404.0ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 41.6ms (= 2082.1ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.15 (= 48.1ms / 41.6ms)

PyTorch resnet50 time: 41.7ms (= 2084.5ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 39.1ms (= 1957.2ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.07 (= 41.7ms / 39.1ms)

@oneflow-ci-bot oneflow-ci-bot merged commit 533018a into master Aug 19, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the local_dep_object_pool branch August 19, 2021 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants