Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With a single row GpuExplode tries to split the generator array #10131

Merged
merged 9 commits into from
Jan 11, 2024

Conversation

abellina
Copy link
Collaborator

Closes #10088

The following PR adds special handling in GpuGenerateExec where a single row (cols + list to split) could be split in case of OOM.

The approach is very simple and leads to several retries, we simply split the column we are exploding (this would be a single list), in two halves. These are then attempted again, and following split and retry semantics, they can be split again until we are either successful, or the exploding column has 1 entry (we cannot split a single row with 1 array element).

@abellina
Copy link
Collaborator Author

build

@abellina abellina marked this pull request as ready for review December 29, 2023 21:35
@abellina
Copy link
Collaborator Author

build

@sameerz sameerz added the bug Something isn't working label Dec 29, 2023
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with @revans2 and he mentioned uses like posexplode might cause issues since the positions would need fixup after this mid-row split logic.

Comment on lines 886 to 888
withResource(tbl.getColumn(col).getScalarElement(0)) { sc =>
ColumnVector.fromScalar(sc, 1)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we're getting the first (and only) row of a column and then building a new column from that. Why isn't this just incref of the original table column without copying anything?

Comment on lines 881 to 883
colToReconstitute.reduce(
ReductionAggregation.collectList(), DType.LIST)) { sc =>
ColumnVector.fromScalar(sc, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a roundabout way to build a list column. We already have the child column, we just need a simple offset vector of two rows, 0 and child column size, and then build a list column from that.

@abellina
Copy link
Collaborator Author

Thanks for the review @jlowe and @revans2. I confirmed this approach does not work for posexplode => the position column is added early. Good news is that we'll have a test to cover it.

@abellina abellina marked this pull request as draft December 29, 2023 23:16
@kuhushukla
Copy link
Collaborator

One other consideration would be how this change performs wrt to the CPU and what(if) we need to change any recommendations to get close to or better than the CPU perf.

@abellina
Copy link
Collaborator Author

build

@abellina
Copy link
Collaborator Author

build

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general it looks good.

@@ -367,7 +371,7 @@ abstract class GpuExplodeBase extends GpuUnevaluableUnaryExpression with GpuGene
val inputRows = inputBatch.numRows()

// if the number of input rows is 1 or less, cannot split
if (inputRows <= 1) return Array()
//if (inputRows <= 1) return Array()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see us push inputSplitIndices into the individual generate expressions. But that can be follow on as it really is just tech debt.

Also why is the row commented out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, that should not be commented out.

I'll look into the inputSplitIndices refactor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed this for the refactor of inputSplitIndices and more... #10187. I don't want to block this issue further.

inputTable.explodeOuterPosition(genOffset)
} else {
inputTable.explodePosition(genOffset)
}
// the position column is at genOffset and the exploded column at genOffset + 1
val posColToFix = toFixUp.getColumn(genOffset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit toFixUp needs to either be under withResource, or at least a closeOnException.

@abellina
Copy link
Collaborator Author

build

@abellina
Copy link
Collaborator Author

Note that the integration tests don't exercise the split path because it isn't setup correctly. I am going to punt on split integration tests. I do have unit tests in this PR that exercise the path.

Filed this as follow on work: #10185

@abellina abellina marked this pull request as ready for review January 11, 2024 02:13
@abellina
Copy link
Collaborator Author

@revans2 @jlowe this is ready for another look.

@kuhushukla the approach taken here is purely on retry and we are recovering from the cuDF column size limit exception. In order to make this perform better, we can change our size calculation to account for the cuDF column size limit up front. Lets add this as part of a refactor of the generator split code. That code only knows how to split by rows right now and that was a bigger change, so we decided to do the retry way instead. If we can split the input not just by rows, but within rows (before we retry), we can generalize some of this code and avoid several retry attempts.

jlowe
jlowe previously approved these changes Jan 11, 2024
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor code cleanup suggestion, otherwise lgtm.

@abellina
Copy link
Collaborator Author

build

@abellina abellina merged commit 97bc92a into NVIDIA:branch-24.02 Jan 11, 2024
40 checks passed
@abellina abellina deleted the split_single_row_in_generate branch January 11, 2024 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] GpuExplode single row split to fit cuDF limits
5 participants