Improve internal row to columnar host memory by using a combined spillable buffer #10450

jbrennan333 · 2024-02-20T23:03:57Z

This improves the host memory oom handling for InternalRowToColumnarBatchIterator by using a single spillable host buffer for both the data and offsets, and then slicing it to pull out the individual host buffers whenever we need to operate on them.

This simplifies the code and makes the host memory handling work better. Previously we were allocating one host buffer and making it spillable, and then allocating another buffer and making it spillable, all inside a single withRetry block. By doing this, we were sometimes able to allocate the second only because we spilled the first, and we really need both or nothing. When we later tried to get both host buffers from the two separate spillable buffers, we more frequently hit OOM with no retries.

Prior to this change, I needed about 32GB host memory limit run q75 and q10 with ShuffleExchangeExec operator disabled on my desktop at scale 100 with 16 executor cores. With this change, I can run them with a 12GB host memory limit.

I've run unit tests and integration tests for this and tested q10 and q75 locally.
I am in the process of running the full nds benchmark with host memory restricted.

…lable buffer Signed-off-by: Jim Brennan <jimb@nvidia.com>

jbrennan333 · 2024-02-20T23:10:06Z

build

jbrennan333 · 2024-02-21T15:31:08Z

I have run power run at scale 100 on my desktop with ShuffleExchangeExec disabled (to force more row to columnar conversions) and with host memory limited to 24g. I am not seeing any ooms in InternalRowToColumnarBatchIterator, but I do see some in other places (ColumnarToRowIterator.loadNextBatch and AcceleratedColumnarToRowIterator.setupBatchAndClose). I don't see these failures if I don't disable ShuffleExchangeExec.

jbrennan333 · 2024-02-21T17:06:17Z

I was able to run the full power-run at scale 100 with host memory limited to 8g with no ooms (this was without disabling ShuffleExchangeExec, so it doesn't test this change much - just following up on my previous comment.

sameerz · 2024-02-24T01:36:34Z

Was there any performance impact seen when running the NDS tests?

jbrennan333 · 2024-02-26T15:34:42Z

Was there any performance impact seen when running the NDS tests?

This code is not normally exercised with nds, unless I disable the ShuffleExchangeExec operator. I did not do a performance comparison for this. I think it is unlikely that there would be any noticeable change except in restricted host memory situations, where this will do better because it reduces thrashing when tasks get only one of the buffers they need.

Improve internal row to columnar host memory by using a combined spil…

51fad40

…lable buffer Signed-off-by: Jim Brennan <jimb@nvidia.com>

jbrennan333 added improve reliability Features to improve reliability or bugs that severly impact the reliability of the plugin labels Feb 20, 2024

jbrennan333 self-assigned this Feb 20, 2024

jbrennan333 linked an issue Feb 20, 2024 that may be closed by this pull request

[FEA] Add Host Memory Retry for Row to Columnar Conversion #8887

Open

jbrennan333 removed a link to an issue Feb 20, 2024

[FEA] Add Host Memory Retry for Row to Columnar Conversion #8887

Open

jbrennan333 linked an issue Feb 20, 2024 that may be closed by this pull request

[FEA] Add Host Memory Retry for Row to Columnar Conversion #8887

Open

jbrennan333 mentioned this pull request Feb 21, 2024

[FEA] Add Host Memory Retry for Row to Columnar Conversion #8887

Open

revans2 approved these changes Feb 21, 2024

View reviewed changes

jbrennan333 merged commit d2c9ed4 into NVIDIA:branch-24.04 Feb 21, 2024
40 of 41 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve internal row to columnar host memory by using a combined spillable buffer #10450

Improve internal row to columnar host memory by using a combined spillable buffer #10450

jbrennan333 commented Feb 20, 2024

jbrennan333 commented Feb 20, 2024

jbrennan333 commented Feb 21, 2024

jbrennan333 commented Feb 21, 2024

sameerz commented Feb 24, 2024

jbrennan333 commented Feb 26, 2024

Improve internal row to columnar host memory by using a combined spillable buffer #10450

Improve internal row to columnar host memory by using a combined spillable buffer #10450

Conversation

jbrennan333 commented Feb 20, 2024

jbrennan333 commented Feb 20, 2024

jbrennan333 commented Feb 21, 2024

jbrennan333 commented Feb 21, 2024

sameerz commented Feb 24, 2024

jbrennan333 commented Feb 26, 2024