Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support "WindowGroupLimit" optimization on GPU [databricks] #10500

Merged
merged 19 commits into from
Feb 29, 2024

Conversation

mythrocks
Copy link
Collaborator

Fixes #8208.

This commit adds support for WindowGroupLimitExec to run on GPU. This optimization was added in Apache Spark 3.5, to reduce the number of rows that participate in shuffles, for queries that contain filters on the result of ranking functions. For example:

SELECT foo, bar FROM (
  SELECT foo, bar, 
         RANK() OVER (PARTITION BY foo ORDER BY bar) AS rnk
  FROM mytable )
WHERE rnk < 10

Such a query would require a shuffle to bring all rows in a window-group to be made available in the same task.
In Spark 3.5, an optimization was added in SPARK-37099 to take advantage of the rnk < 10 predicate to reduce shuffle load.
Specifically, since only 9 (i.e. 10-1) ranks participate in the window function, only those many rows need be shuffled into the task, per input batch. By pre-filtering rows that can't possibly satisfy the condition, the number of shuffled records can be reduced.

The GPU implementation (i.e. GpuWindowGroupLimitExec) differs slightly from the CPU implementation, because it needs to execute on the entire input column batch. As a result, GpuWindowGroupLimitExec runs the rank scan on each input batch, and then filters out ranks that exceed the limit specified in the predicate (rnk < 10). After the shuffle, the RANK() is calculated again by GpuRunningWindowExec, to produce the final result.

The current implementation addresses RANK() and DENSE_RANK window functions. Other ranking functions (like ROW_NUMBER()) can be added at a later date.

1. Rough idea of algo.
2. Not bothering with the 3.5 shim now.
3. NPE when running BasicWindowCalc.
Without this, the runtime assumes that the ranking function is a CPU function,
even though the GpuWindowGroupLimitExec is on the GPU.

Signed-off-by: MithunR <mythrocks@gmail.com>
@mythrocks mythrocks added the performance A performance related task/issue label Feb 26, 2024
@mythrocks mythrocks self-assigned this Feb 26, 2024
@mythrocks mythrocks marked this pull request as draft February 26, 2024 06:43
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks
Copy link
Collaborator Author

Build

@mythrocks mythrocks marked this pull request as ready for review February 28, 2024 05:19
@revans2
Copy link
Collaborator

revans2 commented Feb 28, 2024

Could we test on databricks? Just because they tend to pull things back and this feels like one of the things that they would pull back.

@mythrocks mythrocks changed the title Support "WindowGroupLimit" optimization on GPU Support "WindowGroupLimit" optimization on GPU [databricks] Feb 28, 2024
@mythrocks
Copy link
Collaborator Author

Build

@mythrocks
Copy link
Collaborator Author

I just confirmed that this change works on Databricks as well.

@mythrocks mythrocks merged commit f85d5ef into NVIDIA:branch-24.04 Feb 29, 2024
38 of 40 checks passed
@mythrocks
Copy link
Collaborator Author

I've merged this change. Thank you for the reviews and advice, @revans2.

mythrocks added a commit that referenced this pull request Mar 11, 2024
* WindowGroupLimit support for [databricks].

Fixes #10531.

This is a followup to #10500, which added support to push down window-group-limit filters before the shuffle phase.

#10500 inadvertently neglected to ensure that the optimization works on Databricks. (It turns out that window-group-limit was cherry-picked into Databricks 13.3, despite the nominal Spark version being `3.4.1`.)

This change ensures that the same optimization is available on Databricks 13.3 (and beyond).

---------

Signed-off-by: MithunR <mythrocks@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
2 participants