Skip to content

[GLUTEN-10933][VL] Introduce GPU ShuffleWriterType kGpuHashShuffle#10984

Merged
jinchengchenghh merged 8 commits intoapache:mainfrom
jinchengchenghh:gpu_refactor
Nov 3, 2025
Merged

[GLUTEN-10933][VL] Introduce GPU ShuffleWriterType kGpuHashShuffle#10984
jinchengchenghh merged 8 commits intoapache:mainfrom
jinchengchenghh:gpu_refactor

Conversation

@jinchengchenghh
Copy link
Copy Markdown
Contributor

@jinchengchenghh jinchengchenghh commented Oct 30, 2025

Related issue: #10933

@github-actions github-actions Bot added CORE works for Gluten Core VELOX CLICKHOUSE labels Oct 30, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copy link
Copy Markdown
Contributor

@marin-ma marin-ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just one more question: how do we currently test gpu_hash shuffle?

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Now it is tested on my local environment, because it relies on GPU, I test TPCDS Q95 for previous PR, this is just a refactor, I don't test it. I will add cpp test for GPU shuffle reader.

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhztheplayer
Copy link
Copy Markdown
Member

BTW We have the same parameter enableCudf for Velox plan execution (See NativePlanEvaluator.java). I was hoping we could finally pass these parameters from Java to C++ using configuration options. #9785

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

Yes, use pair parameter name enableCudf would help understand the logic. What do you think about that? @marin-ma
After resolve #9785, the logic could be simple.

@marin-ma
Copy link
Copy Markdown
Contributor

@jinchengchenghh I think for shuffle, we still can stick with adding different writer type. Again the shuffle writer type is a match for different shuffle reader deserialisation, and using different writer type it's easy to distinguish them from avoiding introducing params like "isRssShuffle", "isGpuShuffle", etc.

Moreover, ShuffleWriterType.name is the display string on Spark UI. We can easily tell whether the shuffle is gpu shuffle with this change.

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

I find I also need to update shuffle writer, but shuffleWriterType does not send to native, do you have any suggestion? @marin-ma

          nativeShuffleWriter = if (isSort) {
            shuffleWriterJniWrapper.createSortShuffleWriter(
              numPartitions,
              dep.nativePartitioning.getShortName,
              GlutenShuffleUtils.getStartPartitionId(
                dep.nativePartitioning,
                taskContext.partitionId),
              conf.get(SHUFFLE_DISK_WRITE_BUFFER_SIZE).toInt,
              conf.get(SHUFFLE_SORT_INIT_BUFFER_SIZE).toInt,
              conf.get(SHUFFLE_SORT_USE_RADIXSORT),
              partitionWriterHandle
            )
          } else {
            shuffleWriterJniWrapper.createHashShuffleWriter(
              numPartitions,
              dep.nativePartitioning.getShortName,
              GlutenShuffleUtils.getStartPartitionId(
                dep.nativePartitioning,
                taskContext.partitionId),
              nativeBufferSize,
              reallocThreshold,
              partitionWriterHandle
            )
          }

@marin-ma
Copy link
Copy Markdown
Contributor

@jinchengchenghh Do you have the code for the changes in shuffle writer? I don't see any modifications to the existing shuffle writer.

@jinchengchenghh
Copy link
Copy Markdown
Contributor Author

In the followup PR, Velox bool is bit, but cudf is BOOL8, it is byte

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@marin-ma
Copy link
Copy Markdown
Contributor

@jinchengchenghh Let's discuss and address that in your next pr. Based on the discussion offline seems like there are many discrepancies between cudf and the current hash shuffle writer implementation.

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 3, 2025

Run Gluten Clickhouse CI on x86

@jinchengchenghh jinchengchenghh merged commit 40a8988 into apache:main Nov 3, 2025
98 of 103 checks passed
jinchengchenghh added a commit to jinchengchenghh/gluten that referenced this pull request Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants