[Performance][Optimizer] Enable using UVA and FP16 with SparseAdamOptimizer #3885

nv-dlasalle · 2022-03-25T23:15:37Z

Description

This PR enables transferring optimizer states via UVA (by default) as well as storing them in FP16 (requires opt-in). While there are many factors, combining both of these optimizations improves performance in backward by about 2x and significantly cuts down on memory usage (both from FP16, and from not need allocate buffer tensors to copy to from the GPU).

This depends on #3997 to ensure the UVA arrays get properly freed when the optimizer is destroyed.

Checklist

Please feel free to remove inapplicable items for your PR.

The PR title starts with [$CATEGORY] (such as [NN], [Model], [Doc], [Feature]])
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented
To the best of my knowledge, examples are either not affected by this change,
or have been fixed to be compatible with this change

Changes

Adds "Scatter" series of UVA function for scatter data from the GPU into pinned CPU memory.
Adds parameters use_uva and dtype to the SparseAdamOptimizer.

dgl-bot · 2022-03-25T23:16:06Z

To trigger regression tests:

@dgl-bot run [instance-type] [which tests] [compare-with-branch];
For example: @dgl-bot run g4dn.4xlarge all dmlc/master or @dgl-bot run c5.9xlarge kernel,api dmlc/master

python/dgl/optim/pytorch/sparse_optim.py

src/array/cuda/uvm/array_index_select_uvm.cu

tests/pytorch/test_optim.py

classicsong · 2022-06-13T07:10:08Z

LTGM

jermainewang · 2022-06-15T14:29:31Z

The test_unified_tensor UT failed:

tests/pytorch/test_unified_tensor.py::test_unified_tensor FAILED         [ 99%]
tests/pytorch/test_unified_tensor.py::test_multi_gpu_unified_tensor[1] PASSED [ 99%]
tests/pytorch/test_unified_tensor.py::test_multi_gpu_unified_tensor[2] SKIPPED [100%]

=================================== FAILURES ===================================
_____________________________ test_unified_tensor ______________________________

    @unittest.skipIf(os.name == 'nt', reason='Do not support windows yet')
    @unittest.skipIf(F.ctx().type == 'cpu', reason='gpu only test')
    def test_unified_tensor():
        test_row_size = 65536
        test_col_size = 128
    
        rand_test_size = 8192
    
        input = th.rand((test_row_size, test_col_size))
        input_unified = dgl.contrib.UnifiedTensor(input, device=th.device('cuda'))
    
        seq_idx = th.arange(0, test_row_size)
        assert th.all(th.eq(input[seq_idx], input_unified[seq_idx]))
    
        seq_idx = seq_idx.to(th.device('cuda'))
        assert th.all(th.eq(input[seq_idx].to(th.device('cuda')), input_unified[seq_idx]))
    
        rand_idx = th.randint(0, test_row_size, (rand_test_size,))
        assert th.all(th.eq(input[rand_idx], input_unified[rand_idx]))
    
&gt;       rand_idx = rand_idx.to(th.device('cuda'))
E       RuntimeError: CUDA error: invalid argument
E       CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
E       For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

tests/pytorch/test_unified_tensor.py:36: RuntimeError

I saw you've changed the IndexSelect operator so probably related.

python/dgl/optim/pytorch/sparse_optim.py

yaox12 · 2022-06-23T10:15:51Z

@nv-dlasalle If you don't have anything to add, I think it's OK to merge this PR.

dgl-bot · 2022-06-23T15:31:52Z

Commit ID: b16bb40

Build ID: 22

Status: ✅ CI test succeeded

Report path: link

Full logs path: link

nv-dlasalle force-pushed the uva_embedding branch from 553e25f to a8579c4 Compare March 30, 2022 21:49

nv-dlasalle added 10 commits March 30, 2022 14:50

Add uva by default to embedding

f5d5704

More updates

1563ac6

Update optimizer

f3f8797

Add new uva functions

3eac9fa

Expose new pinned memory function

2947c6c

Add unit tests

cabdff1

Update formatting

a8579c4

Fix unit test

77027e5

Handle auto UVA case when training is on CPU

7b67629

Allow per-embedding decisions for whether to use UVA

71d2656

nv-dlasalle requested a review from classicsong April 1, 2022 17:10

classicsong reviewed Apr 29, 2022

View reviewed changes

classicsong self-requested a review May 5, 2022 01:28

This comment was marked as outdated.

Sign in to view

Address spares_optim.py comments

e421b02

This comment was marked as outdated.

Sign in to view

Remove unused templates

cbaeaa0

This comment was marked as outdated.

Sign in to view

nv-dlasalle added 2 commits May 9, 2022 16:13

Update unit test

a0de13d

Use dgl allocate memory for pinning

1948b6f

This comment was marked as outdated.

Sign in to view

Merge branch 'master' into uva_embedding

566bf37

This comment was marked as outdated.

Sign in to view

Merge branch 'master' into uva_embedding

28a19b7

This comment was marked as outdated.

Sign in to view

Merge branch 'master' into uva_embedding

097a92e

This comment was marked as outdated.

Sign in to view

Merge branch 'master' into uva_embedding

3717631

This comment was marked as outdated.

Sign in to view

classicsong approved these changes Jun 13, 2022

View reviewed changes

Merge branch 'master' into uva_embedding

0c92522

This comment was marked as outdated.

Sign in to view

This was referenced Jun 17, 2022

[Bug] Calling IsPinned() in a worker process results in RuntimeError: CUDA error: unspecified launch failure #4134

Closed

[Bugfix][Rework] Automatically unpin tensors pinned by DGL (rework #3997) #4135

Merged

yaox12 requested changes Jun 21, 2022

View reviewed changes

python/dgl/optim/pytorch/sparse_optim.py Outdated Show resolved Hide resolved

yaox12 added 3 commits June 23, 2022 14:21

Merge branch 'master' into uva_embedding

f1eb9fc

allow automatically unpin

ba55afa

workaround for d2h copy with a different dtype

7a57d9e

This comment was marked as outdated.

Sign in to view

yaox12 added 3 commits June 23, 2022 15:49

fix linting

185044b

update error message

39b448d

Merge branch 'master' into uva_embedding

f7842d8

This comment was marked as outdated.

Sign in to view

yaox12 added 2 commits June 23, 2022 18:08

update copyright

97aa354

Merge branch 'master' into uva_embedding

b16bb40

yaox12 approved these changes Jun 23, 2022

View reviewed changes

This comment was marked as outdated.

Sign in to view

yaox12 force-pushed the uva_embedding branch 2 times, most recently from 44b4c98 to b16bb40 Compare June 23, 2022 14:24

This comment was marked as outdated.

Sign in to view

yaox12 merged commit 020f024 into dmlc:master Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance][Optimizer] Enable using UVA and FP16 with SparseAdamOptimizer #3885

[Performance][Optimizer] Enable using UVA and FP16 with SparseAdamOptimizer #3885

nv-dlasalle commented Mar 25, 2022 •

edited

Loading

dgl-bot commented Mar 25, 2022

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

classicsong commented Jun 13, 2022

This comment was marked as outdated.

jermainewang commented Jun 15, 2022

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

yaox12 commented Jun 23, 2022

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

dgl-bot commented Jun 23, 2022

[Performance][Optimizer] Enable using UVA and FP16 with SparseAdamOptimizer #3885

[Performance][Optimizer] Enable using UVA and FP16 with SparseAdamOptimizer #3885

Conversation

nv-dlasalle commented Mar 25, 2022 • edited Loading

Description

Checklist

Changes

dgl-bot commented Mar 25, 2022

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

classicsong commented Jun 13, 2022

This comment was marked as outdated.

jermainewang commented Jun 15, 2022

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

yaox12 commented Jun 23, 2022

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

dgl-bot commented Jun 23, 2022

nv-dlasalle commented Mar 25, 2022 •

edited

Loading