Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] If a host memory buffer is spilled, it cannot be unspilled #10004

Closed
jbrennan333 opened this issue Dec 8, 2023 · 2 comments · Fixed by #10065
Closed

[BUG] If a host memory buffer is spilled, it cannot be unspilled #10004

jbrennan333 opened this issue Dec 8, 2023 · 2 comments · Fixed by #10065
Assignees
Labels
bug Something isn't working reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@jbrennan333
Copy link
Collaborator

Describe the bug
While testing the host memory retry code for InternalRowToColumnarBatchIterator, I found that some nds queries were producing incorrect results.

=== Unmatch Queries: ['query10', 'query69', 'query37', 'query35', 'query82', 'query23_part1', 'query23_part2', 'query84', 'query79'] ===

After some debugging, and discussion with @abellina, I found the source of the problem.
The retry code in question allocates two buffers, makes them spillable, fills the buffers inside a withHostBufferWriteLock block, and then uses the buffers inside a withHostBufferReadOnly block. The last part looks like this:

        int[] used = sdb.withHostBufferWriteLock( (dataBuffer) -> {
          return sob.withHostBufferWriteLock( (offsetsBuffer) -> {
            return fillBatch(dataBuffer, offsetsBuffer);
          });
        });
        batchAndRange = sdb.withHostBufferReadOnly( (dataBuffer) -> {
          return sob.withHostBufferReadOnly( (offsetsBuffer) -> {
            // do stuff
        }

The problem occurs when one or both of the buffers are spilled before we enter the withHostBufferWriteLock block.
When the spill happens, the rapids buffer associated with the RapidsBufferHandle (in the SpillableHostBuffers) switches from a RapidsHostMemoryBuffer to RapidsDiskBuffer. So when we enter the write block, the RapidsDiskBuffer allocates a new HostMemoryBuffer and copies into it from disk (this is all zeros, because we spilled before filling it). This hostbuffer is part of the RapidsDiskBuffer. We then fill up this buffer with useful data.

When we exit the write block, this hostbuffer is closed, so when we enter the read block, we get the same RapidsDiskBuffer, and because its hostbuffer is closed, we reload from disk, which is all zeros.

Steps/Code to reproduce bug
I was able to reproduce this by running nds query10 at scale 100 on my desktop with the following configs:

 --conf spark.rapids.sql.exec.ShuffleExchangeExec=false \
 --conf spark.rapids.memory.host.offHeapLimit.enabled=true \
 --conf spark.rapids.memory.host.offHeapLimit.size=32G \

I was running with 16 executor cores.

Expected behavior
We need to figure out where to actually unspill HostMemoryBuffers that have been spilled.
Instead of using a local hostbuffer inside of the RapidsDiskBuffer, we might need to allocate a new RapidsHostMemoryBuffer and unspill into it. Maybe something like what RapidsBufferStore.getDeviceMemoryBuffer does.

Environment details (please complete the following information)
I ran this on my local RTX4000 desktop.

Additional context
We can workaround this issue in InternalRowToColumnarBatchIterator by just using a single write block. I will probably just do this as part of adding split handling.

@jbrennan333 jbrennan333 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 8, 2023
@jbrennan333
Copy link
Collaborator Author

Here is the debug output that shows the problem (I added some logs):

23/12/08 20:54:13 DEBUG RapidsHostMemoryStore: Adding host buffer for: [id=TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88), size=1073741824, uncompressed=1073741824, meta_id=0, meta_size=1073741824]
23/12/08 20:54:13 DEBUG RapidsHostMemoryStore: Buffer TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) is spillable. total=32212254776 spillable=9663676416
23/12/08 20:54:14 DEBUG RapidsHostMemoryStore: Spilling buffer TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88). size=1073741824 total=32212254776, new spillable=12348030988
23/12/08 20:54:14 DEBUG RapidsDiskStore: Spilled to /tmp/blockmgr-439383f8-22de-48b1-963f-6df7ddd22cb5/39/temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88 0:5242901
23/12/08 20:54:14 DEBUG RapidsBufferCatalog: Spilled TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) from tier host memory. Removing. Registering TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) Some(local disk buffer size=1073741824)
23/12/08 20:54:21 INFO SpillableHostBuffer: withHostBufferWriteLock: Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) at tier local disk for com.nvidia.spark.rapids.RapidsBufferCatalog$RapidsBufferHandleImpl@39d5a5a7
23/12/08 20:54:21 INFO RapidsDiskStore$RapidsDiskBuffer: getMemoryBuffer: Loading Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) from disk
23/12/08 20:54:23 INFO RapidsDiskStore$RapidsDiskBuffer: Close() for Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88), hostbuffer refcount: 1
23/12/08 20:54:23 INFO RapidsDiskStore$RapidsDiskBuffer: close(): Closing hostbuffer for Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88)
23/12/08 20:54:23 INFO SpillableHostBuffer: withHostBufferReadonly: Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) at tier local disk for com.nvidia.spark.rapids.RapidsBufferCatalog$RapidsBufferHandleImpl@39d5a5a7
23/12/08 20:54:23 INFO RapidsDiskStore$RapidsDiskBuffer: getMemoryBuffer: Loading Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88) from disk
23/12/08 20:54:24 INFO RapidsDiskStore$RapidsDiskBuffer: Close() for Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88), hostbuffer refcount: 1
23/12/08 20:54:24 INFO RapidsDiskStore$RapidsDiskBuffer: close(): Closing hostbuffer for Rapids Buffer ID TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88)
23/12/08 20:54:24 DEBUG RapidsBufferCatalog: Removing buffer TempSpillBufferId(561,temp_local_11f3357e-1678-40a5-a75f-d2e941a71d88)

@jbrennan333
Copy link
Collaborator Author

A related issue is that the hostbuffer that is in RapidsDiskBuffer is allocated outside the HostAlloc framework, so it increases how much memory we are using, but it is not counted against our limits.

@mattahrens mattahrens added reliability Features to improve reliability or bugs that severly impact the reliability of the plugin and removed ? - Needs Triage Need team to review and classify labels Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants