[Performance] Redirect `AllocWorkspace` to PyTorch's allocator if available #4199

yaox12 · 2022-07-01T08:40:53Z

Description

Related issues: #3933 #3957.

Redirect AllocWorkspace/FreeWorkspace to PyTorch's allocator via raw_alloc and raw_delete.

I run examples/pytorch/graphsage/node_classification.py and get the GPU memory footprints as:

	nvidia-smi	max_allocated	allocated	max_reserved	reserved
new allocator + pure_gpu	10629	9944	5542	9971	9971
old allocator + pure_gpu	10645	7930	5243	7958	7958
new allocator + uva	2531	550	266	1480	1480
old allocator + uva	2591	550	265	1241	1241

*The four columns on the right are reported by torch.cuda.max_allocated/allocated/max_reserved/reserved.

The total GPU memory footprints are close. Advantages are:

Users can release the reserved GPU memory via PyTorch's APIs if they'd like to.
When PyTorch reports OOM, users won't see a big discrepancy between the memory used by PyTorch and the GPUs capacity.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change
Related issue is referred in this PR

dgl-bot · 2022-07-01T08:41:22Z

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

dgl-bot · 2022-07-05T09:15:38Z

Commit ID: c07819c

Build ID: 4

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

yaox12 · 2022-07-06T01:16:33Z

More backgrounds on PyTorch's CUDA allocator:

Call stack of relevant objects: CudaCachingAllocator device_allocator → THCCachingAllocator caching_allocator → std::vector<std::unique_ptr<DeviceCachingAllocator>> device_allocator
There’re mainly two ways in PyTorch to allocate CUDA memory:
1. c10::cuda::CUDACachingAllocator::get()->allocate() [code]. It uses CudaCachingAllocator device_allocator and has a life cycle management (will be freed automatically). Inside, it calls THCCachingAllocator caching_allocator for the actual allocation. [code]
2. c10::cuda::CUDACachingAllocator::raw_alloc()/raw_delete() [code]. It uses THCCachingAllocator caching_allocator directly and requires being freed via raw_delete manually.

cc @nv-dlasalle

tensoradapter/pytorch/torch.cpp

dgl-bot · 2022-07-07T03:52:07Z

Commit ID: d210032

Build ID: 5

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

dgl-bot · 2022-07-07T05:28:14Z

Commit ID: e2ec146

Build ID: 6

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

…ilable (dmlc#4199)

shintarok111 · 2023-04-11T01:49:50Z

I encountered the same issue as in #3933. Has the problem been resolved?

yaox12 added 3 commits June 30, 2022 17:36

use raw_alloc and raw_delete

ce53673

add docstring

1e4e8bc

Merge branch 'master' into redirect_to_tensoradaptor

24e62d0

This comment was marked as outdated.

Sign in to view

yaox12 added 2 commits July 1, 2022 02:30

fix CPU compilation

ca599b0

Merge branch 'master' into redirect_to_tensoradaptor

ed7b0aa

This comment was marked as outdated.

Sign in to view

fix compilation

fa2f621

This comment was marked as outdated.

Sign in to view

jermainewang requested a review from BarclayII July 4, 2022 12:16

jermainewang added the Release Candidate Candidate PRs for the upcoming release label Jul 4, 2022

Merge branch 'master' into redirect_to_tensoradaptor

c07819c

yaox12 requested a review from nv-dlasalle July 5, 2022 09:32

nv-dlasalle approved these changes Jul 6, 2022

View reviewed changes

BarclayII reviewed Jul 7, 2022

View reviewed changes

tensoradapter/pytorch/torch.cpp Show resolved Hide resolved

BarclayII approved these changes Jul 7, 2022

View reviewed changes

Merge branch 'master' into redirect_to_tensoradaptor

d210032

Merge branch 'master' into redirect_to_tensoradaptor

e2ec146

yaox12 merged commit 9ee7ced into dmlc:master Jul 7, 2022

BarclayII pushed a commit to BarclayII/dgl that referenced this pull request Aug 10, 2022

[Performance] Redirect AllocWorkspace to PyTorch's allocator if ava…

210351b

…ilable (dmlc#4199)

frozenbugs removed the Release Candidate Candidate PRs for the upcoming release label Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Redirect `AllocWorkspace` to PyTorch's allocator if available #4199

[Performance] Redirect `AllocWorkspace` to PyTorch's allocator if available #4199

yaox12 commented Jul 1, 2022

dgl-bot commented Jul 1, 2022

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

dgl-bot commented Jul 5, 2022

yaox12 commented Jul 6, 2022

dgl-bot commented Jul 7, 2022

dgl-bot commented Jul 7, 2022

shintarok111 commented Apr 11, 2023

[Performance] Redirect AllocWorkspace to PyTorch's allocator if available #4199

[Performance] Redirect AllocWorkspace to PyTorch's allocator if available #4199

Conversation

yaox12 commented Jul 1, 2022

Description

Checklist

dgl-bot commented Jul 1, 2022

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

dgl-bot commented Jul 5, 2022

yaox12 commented Jul 6, 2022

dgl-bot commented Jul 7, 2022

dgl-bot commented Jul 7, 2022

shintarok111 commented Apr 11, 2023

[Performance] Redirect `AllocWorkspace` to PyTorch's allocator if available #4199

[Performance] Redirect `AllocWorkspace` to PyTorch's allocator if available #4199