Add GDS-compatible allocator with 4k alignment. #3754

mzient · 2022-03-23T10:26:02Z

Signed-off-by: Michał Zientkiewicz mzient@gmail.com

Category:

New feature (non-breaking change which adds functionality)

Description:

GDS is more efficient when mapped memory is aligned to a 4k boundary. It also doesn't work with memory allocated with cuMemCreate.
This PR adds a dedicated GDS memory pool, which is separate and used only for that purpose.
As a preparatory step, pool_resource was extended so it can be informed about maximum supported alignment of the upstream resource and work around that limitation when allocating upstream blocks.

Additional information:

Affected modules and functionalities:

Pool resource
NumPy reader (GPU)

Key points relevant for the review:

N/A

Checklist

Tests

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-2667

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

dali-automaton · 2022-03-23T10:28:50Z

CI MESSAGE: [4214574]: BUILD STARTED

dali-automaton · 2022-03-23T14:17:09Z

CI MESSAGE: [4214574]: BUILD PASSED

JanuszL · 2022-03-23T16:06:54Z

dali/core/mm/pool_resource_test.cc

@@ -32,6 +32,7 @@ void TestPoolResource(int num_iter) {
  test_host_resource upstream;
  {
    auto opt = default_host_pool_opts();
+    opt.max_upstream_alignment = 32;  // force the use of overaligned upstream allocations


We do we need that change?

The distribution of alignments in this test is 1-256. We need something smaller to test the codepath with overalignment. I want this to be explicit, in case the default value (which is 256) is changed.

ok, this is just for test. NVM

JanuszL · 2022-03-23T16:11:24Z

dali/operators/reader/gds_mem.cc

+GDSAllocator::GDSAllocator() {
+  // Currently, GPUDirect Storage can work only with memory allocated with cudaMalloc and
+  // cuMemAlloc. Since DALI is transitioning to CUDA Virtual Memory Management for memory
+  // allocation, we need a special allocator that's compatible with GDS.


How is that achieved? Is it sufficient to just use coalescing_free_tree (as I understand it still uses the CUDA Virtual Memory Management)?

See the next line:

static auto upstream = std::make_shared<mm::cuda_malloc_memory_resource>();

cuda_malloc_resource uses plain cudaMalloc.

jantonguirao · 2022-03-24T13:36:30Z

include/dali/core/mm/pool_resource.h

+    char *block_end = block_start + blk_size;
+    assert(tail <= block_end);
+
+    if (blk_size != bytes) {


Suggested change

if (blk_size != bytes) {

if (blk_size > bytes) {

According to what the comment says?

Well, it can't be less :)
If anything, there could be an assert(blk_size > bytes); inside, but wouldn't that be an overkill?

Actually, it's already indirectly tested as assert(tail <= block_end);

jantonguirao · 2022-03-24T13:36:51Z

include/dali/core/mm/pool_resource.h

      lock_guard guard(lock_);
-      free_list_.put(static_cast<char *>(new_block) + bytes, blk_size - bytes);
-      return new_block;
+      if (ret != block_start)


Suggested change

if (ret != block_start)

if (ret > block_start)

jantonguirao · 2022-03-24T13:37:01Z

include/dali/core/mm/pool_resource.h

+      if (ret != block_start)
+        free_list_.put(block_start, ret - block_start);
+
+      if (tail != block_end)


Suggested change

if (tail != block_end)

if (tail < block_end)

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Add GDS-compatible allocator with 4k alignment.

313fb4f

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

mzient force-pushed the GDSMem branch from 3789485 to 313fb4f Compare March 23, 2022 10:26

jantonguirao assigned klecki and JanuszL Mar 23, 2022

JanuszL reviewed Mar 23, 2022

View reviewed changes

jantonguirao assigned jantonguirao and unassigned klecki Mar 24, 2022

jantonguirao reviewed Mar 24, 2022

View reviewed changes

jantonguirao approved these changes Mar 24, 2022

View reviewed changes

JanuszL approved these changes Mar 24, 2022

View reviewed changes

mzient merged commit 1ac7e7d into NVIDIA:main Mar 24, 2022

JanuszL mentioned this pull request Mar 30, 2022

DALI 2022 roadmap #3774

Closed

cyyever pushed a commit to cyyever/DALI that referenced this pull request May 13, 2022

Add GDS-compatible allocator with 4k alignment. (NVIDIA#3754)

b4bf072

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022

Add GDS-compatible allocator with 4k alignment. (NVIDIA#3754)

4ebac74

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GDS-compatible allocator with 4k alignment. #3754

Add GDS-compatible allocator with 4k alignment. #3754

mzient commented Mar 23, 2022

dali-automaton commented Mar 23, 2022

dali-automaton commented Mar 23, 2022

JanuszL Mar 23, 2022

mzient Mar 24, 2022

JanuszL Mar 24, 2022

JanuszL Mar 23, 2022

mzient Mar 24, 2022

jantonguirao Mar 24, 2022

mzient Mar 24, 2022 •

edited

mzient Mar 24, 2022

jantonguirao Mar 24, 2022

jantonguirao Mar 24, 2022

Add GDS-compatible allocator with 4k alignment. #3754

Add GDS-compatible allocator with 4k alignment. #3754

Conversation

mzient commented Mar 23, 2022

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Checklist

Tests

Documentation

DALI team only

Requirements

dali-automaton commented Mar 23, 2022

dali-automaton commented Mar 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Mar 24, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Mar 24, 2022 •

edited