[VL] Support GPU async native shuffle read by marin-ma · Pull Request #12370 · apache/gluten

marin-ma · 2026-06-25T17:02:01Z

The parallelism of gpu stages is limited by the GPU concurrency, but the shuffle read process, which includes data fetching, decompression and deserialisation, are still running on CPU. In this case we can parallelise these process to produce the output data asynchronously.

This PR adopts a producer–consumer design. Producer threads asynchronously perform shuffle reads, including data fetching, decompression, and deserialization, and produce decoded data. The consumer (the main thread) retrieves the prepared data as it becomes available and creates the corresponding device buffers.

github-actions · 2026-06-25T17:02:33Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-25T17:05:51Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-25T17:12:28Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-25T17:14:25Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-25T17:47:59Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-25T18:03:24Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-25T18:10:19Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-25T18:22:49Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-25T18:24:19Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-26T08:37:24Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-29T10:01:34Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-29T14:22:17Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-29T17:34:18Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-29T20:09:17Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-29T20:29:52Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-29T21:18:55Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-30T08:27:04Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-30T13:52:49Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-06-30T14:39:38Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-07-01T09:29:46Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-07-01T10:32:44Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-07-01T15:10:54Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-07-02T13:18:46Z

Run Gluten Clickhouse CI on x86

Copilot

Pull request overview

Copilot reviewed 46 out of 46 changed files in this pull request and generated 5 comments.

+
+  private def assertPositiveBlockSize(blockId: BlockId, blockSize: Long): Unit = {
+    if (blockSize < 0) {
+      throw BlockException(blockId, "Negative block size " + size)


+    LOG(INFO) << "Trying to get from cached buffer queue. Queue length: " << queue_.size()
+              << ", total size in queue: " << totalSize_ << ", current batch size: " << batch->numBytes() << std::endl;


+
+    notFull_.wait(lock, [&]() { return noMoreBatches_ || totalSize_ + batchSize <= capacity_; });
+    if (noMoreBatches_) {
+      LOG(WARNING) << "Discard batch due to calling put() after noMorBatches().";


+      // Stop reading more streams. Blocked by the native reader threads.
+      jniWrapper.stop(shuffleReaderHandle)
+      onComplete.foreach(_())
      // Would remove the resource object from registry to lower GC pressure.
      TaskResources.releaseResource(resourceId)


github-actions · 2026-07-02T15:22:08Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-07-02T15:55:13Z

Run Gluten Clickhouse CI on x86

Copilot

Pull request overview

Copilot reviewed 46 out of 46 changed files in this pull request and generated 6 comments.

+
+  private def assertPositiveBlockSize(blockId: BlockId, blockSize: Long): Unit = {
+    if (blockSize < 0) {
+      throw BlockException(blockId, "Negative block size " + size)


+ReaderThreadPool* VeloxBackend::getReaderThreadPool() {
+  static std::once_flag readerThreadPoolInit;
+  std::call_once(readerThreadPoolInit, [this] {
+    const auto configuredThreads =
+        backendConf_->get<int32_t>(kShuffleReaderThreads, static_cast<int32_t>(std::thread::hardware_concurrency()));
+    // std::thread::hardware_concurrency() can return 0;
+    const auto numThreads = configuredThreads > 0 ? configuredThreads : 1;
+    readerThreadPool_ = std::make_unique<ReaderThreadPool>(numThreads);
+  });
+  return readerThreadPool_.get();
+}


+  for (auto& task : tasks) {
+    tasks_.push({std::move(task), priority});
+  }
+}


+        auto& prioritizedTask = tasks_.top();
+        LOG(INFO) << "Worker thread " << std::this_thread::get_id() << " is executing a task with priority "
+                  << prioritizedTask.priority;
+        task = std::move(prioritizedTask.task);
+        tasks_.pop();
+      }
+
+      if (task) {
+        task();
+      }


+void VeloxGpuHashShuffleReaderDeserializer::read() {
+  std::shared_ptr<arrow::io::InputStream> inputStream = nullptr;



+  // Close input stream if it's still open.
+  if (inputStream != nullptr) {
+    GLUTEN_THROW_NOT_OK(inputStream->Close());
+  }
+
+  // Decrement active reader count.
+  if (activeReaders_.fetch_sub(1, std::memory_order_acq_rel) == 1) {
+    batchQueue_->noMoreBatches();
+    completionCV_.notify_all();
+  }


github-actions Bot added CORE works for Gluten Core VELOX labels Jun 25, 2026

marin-ma force-pushed the gpu-async-native-shuffle-read branch from e5cc3aa to 0713a26 Compare June 25, 2026 17:05

marin-ma force-pushed the gpu-async-native-shuffle-read branch from 0713a26 to bebb887 Compare June 25, 2026 17:11

marin-ma force-pushed the gpu-async-native-shuffle-read branch from bebb887 to 5ecc27c Compare June 25, 2026 17:13

marin-ma force-pushed the gpu-async-native-shuffle-read branch from f840d7a to c1d2691 Compare June 25, 2026 18:09

marin-ma force-pushed the gpu-async-native-shuffle-read branch from b716f71 to 7c4a796 Compare June 26, 2026 08:34

marin-ma marked this pull request as ready for review June 30, 2026 16:04

Copilot AI review requested due to automatic review settings June 30, 2026 16:04

Copilot started reviewing on behalf of marin-ma June 30, 2026 16:05 View session

marin-ma force-pushed the gpu-async-native-shuffle-read branch from ad69c7d to 096215f Compare June 30, 2026 16:09

Copilot AI review requested due to automatic review settings June 30, 2026 16:11

Copilot started reviewing on behalf of marin-ma June 30, 2026 16:12 View session

This comment was marked as duplicate.

Sign in to view

Copilot AI review requested due to automatic review settings July 1, 2026 10:32

Copilot started reviewing on behalf of marin-ma July 1, 2026 10:32 View session

This comment was marked as duplicate.

Sign in to view

marin-ma added 7 commits July 2, 2026 13:42

support async shuffle read

5e3ee2f

cpp

7357dba

fix spark3.4

bcdd49b

add shims

68e0a8f

address comments

22640ce

update

7af40ed

add conf

33e5b75

Copilot AI review requested due to automatic review settings July 2, 2026 13:09

marin-ma force-pushed the gpu-async-native-shuffle-read branch from 1c29a29 to 33e5b75 Compare July 2, 2026 13:09

Copilot started reviewing on behalf of marin-ma July 2, 2026 13:18 View session

github-actions Bot added the DOCS label Jul 2, 2026

Copilot AI reviewed Jul 2, 2026

View reviewed changes

address comments

9618cdf

update

1acac8c

Copilot AI review requested due to automatic review settings July 2, 2026 15:54

marin-ma changed the title ~~[WIP][VL] Support GPU async native shuffle read~~ [VL] Support GPU async native shuffle read Jul 2, 2026

Copilot started reviewing on behalf of marin-ma July 2, 2026 15:54 View session

Copilot AI reviewed Jul 2, 2026

View reviewed changes

		LOG(INFO) << "Trying to get from cached buffer queue. Queue length: " << queue_.size()
		<< ", total size in queue: " << totalSize_ << ", current batch size: " << batch->numBytes() << std::endl;

		void VeloxGpuHashShuffleReaderDeserializer::read() {
		std::shared_ptr<arrow::io::InputStream> inputStream = nullptr;

Uh oh!

Conversation

marin-ma commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 29, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

This comment was marked as duplicate.

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

This comment was marked as duplicate.

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marin-ma commented Jun 25, 2026 •

edited

Loading