[BUG] If a host memory buffer is spilled, it cannot be unspilled #10004

jbrennan333 · 2023-12-08T21:41:40Z

Describe the bug
While testing the host memory retry code for InternalRowToColumnarBatchIterator, I found that some nds queries were producing incorrect results.

=== Unmatch Queries: ['query10', 'query69', 'query37', 'query35', 'query82', 'query23_part1', 'query23_part2', 'query84', 'query79'] ===

After some debugging, and discussion with @abellina, I found the source of the problem.
The retry code in question allocates two buffers, makes them spillable, fills the buffers inside a withHostBufferWriteLock block, and then uses the buffers inside a withHostBufferReadOnly block. The last part looks like this:

        int[] used = sdb.withHostBufferWriteLock( (dataBuffer) -> {
          return sob.withHostBufferWriteLock( (offsetsBuffer) -> {
            return fillBatch(dataBuffer, offsetsBuffer);
          });
        });
        batchAndRange = sdb.withHostBufferReadOnly( (dataBuffer) -> {
          return sob.withHostBufferReadOnly( (offsetsBuffer) -> {
            // do stuff
        }

The problem occurs when one or both of the buffers are spilled before we enter the withHostBufferWriteLock block.
When the spill happens, the rapids buffer associated with the RapidsBufferHandle (in the SpillableHostBuffers) switches from a RapidsHostMemoryBuffer to RapidsDiskBuffer. So when we enter the write block, the RapidsDiskBuffer allocates a new HostMemoryBuffer and copies into it from disk (this is all zeros, because we spilled before filling it). This hostbuffer is part of the RapidsDiskBuffer. We then fill up this buffer with useful data.

When we exit the write block, this hostbuffer is closed, so when we enter the read block, we get the same RapidsDiskBuffer, and because its hostbuffer is closed, we reload from disk, which is all zeros.

Steps/Code to reproduce bug
I was able to reproduce this by running nds query10 at scale 100 on my desktop with the following configs:

 --conf spark.rapids.sql.exec.ShuffleExchangeExec=false \
 --conf spark.rapids.memory.host.offHeapLimit.enabled=true \
 --conf spark.rapids.memory.host.offHeapLimit.size=32G \

I was running with 16 executor cores.

Expected behavior
We need to figure out where to actually unspill HostMemoryBuffers that have been spilled.
Instead of using a local hostbuffer inside of the RapidsDiskBuffer, we might need to allocate a new RapidsHostMemoryBuffer and unspill into it. Maybe something like what RapidsBufferStore.getDeviceMemoryBuffer does.

Environment details (please complete the following information)
I ran this on my local RTX4000 desktop.

Additional context
We can workaround this issue in InternalRowToColumnarBatchIterator by just using a single write block. I will probably just do this as part of adding split handling.

The text was updated successfully, but these errors were encountered:

jbrennan333 · 2023-12-08T21:42:27Z

Here is the debug output that shows the problem (I added some logs):

23/12/08 20:54:13 DEBUG RapidsHostMemoryStore: Adding host buffer for: [id=TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88), size=1073741824, uncompressed=1073741824, meta_id=0, meta_size=1073741824]
23/12/08 20:54:13 DEBUG RapidsHostMemoryStore: Buffer TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) is spillable. total=32212254776 spillable=9663676416
23/12/08 20:54:14 DEBUG RapidsHostMemoryStore: Spilling buffer TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88). size=1073741824 total=32212254776, new spillable=12348030988
23/12/08 20:54:14 DEBUG RapidsDiskStore: Spilled to /tmp/blockmgr-439383f8-22de-48b1-963f-6df7ddd22cb5/39/temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88 0:5242901
23/12/08 20:54:14 DEBUG RapidsBufferCatalog: Spilled TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) from tier host memory. Removing. Registering TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) Some(local disk buffer size=1073741824)
23/12/08 20:54:21 INFO SpillableHostBuffer: withHostBufferWriteLock: Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) at tier local disk for com.nvidia.spark.rapids.RapidsBufferCatalog$RapidsBufferHandleImpl@39d5a5a7
23/12/08 20:54:21 INFO RapidsDiskStore$RapidsDiskBuffer: getMemoryBuffer: Loading Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) from disk
23/12/08 20:54:23 INFO RapidsDiskStore$RapidsDiskBuffer: Close() for Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88), hostbuffer refcount: 1
23/12/08 20:54:23 INFO RapidsDiskStore$RapidsDiskBuffer: close(): Closing hostbuffer for Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88)
23/12/08 20:54:23 INFO SpillableHostBuffer: withHostBufferReadonly: Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) at tier local disk for com.nvidia.spark.rapids.RapidsBufferCatalog$RapidsBufferHandleImpl@39d5a5a7
23/12/08 20:54:23 INFO RapidsDiskStore$RapidsDiskBuffer: getMemoryBuffer: Loading Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) from disk
23/12/08 20:54:24 INFO RapidsDiskStore$RapidsDiskBuffer: Close() for Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88), hostbuffer refcount: 1
23/12/08 20:54:24 INFO RapidsDiskStore$RapidsDiskBuffer: close(): Closing hostbuffer for Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88)
23/12/08 20:54:24 DEBUG RapidsBufferCatalog: Removing buffer TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88)

jbrennan333 · 2023-12-08T21:48:06Z

A related issue is that the hostbuffer that is in RapidsDiskBuffer is allocated outside the HostAlloc framework, so it increases how much memory we are using, but it is not counted against our limits.

jbrennan333 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 8, 2023

jbrennan333 mentioned this issue Dec 11, 2023

Add cpu oom retry split handling to InternalRowToColumnarBatchIterator #10011

Merged

mattahrens assigned jbrennan333 Dec 12, 2023

mattahrens added reliability Features to improve reliability or bugs that severly impact the reliability of the plugin and removed ? - Needs Triage Need team to review and classify labels Dec 12, 2023

jbrennan333 mentioned this issue Dec 17, 2023

Improve host memory spill interfaces #10065

Merged

jbrennan333 closed this as completed in #10065 Dec 19, 2023

abellina mentioned this issue Dec 20, 2023

[BUG] GPU physical plan missing ReusedExchange execs #10005

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] If a host memory buffer is spilled, it cannot be unspilled #10004

[BUG] If a host memory buffer is spilled, it cannot be unspilled #10004

jbrennan333 commented Dec 8, 2023

jbrennan333 commented Dec 8, 2023

jbrennan333 commented Dec 8, 2023

[BUG] If a host memory buffer is spilled, it cannot be unspilled #10004

[BUG] If a host memory buffer is spilled, it cannot be unspilled #10004

Comments

jbrennan333 commented Dec 8, 2023

jbrennan333 commented Dec 8, 2023

jbrennan333 commented Dec 8, 2023