Skip to content

[VL] Support GPU async native shuffle read#12370

Open
marin-ma wants to merge 9 commits into
apache:mainfrom
marin-ma:gpu-async-native-shuffle-read
Open

[VL] Support GPU async native shuffle read#12370
marin-ma wants to merge 9 commits into
apache:mainfrom
marin-ma:gpu-async-native-shuffle-read

Conversation

@marin-ma

@marin-ma marin-ma commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

The parallelism of gpu stages is limited by the GPU concurrency, but the shuffle read process, which includes data fetching, decompression and deserialisation, are still running on CPU. In this case we can parallelise these process to produce the output data asynchronously.

This PR adopts a producer–consumer design. Producer threads asynchronously perform shuffle reads, including data fetching, decompression, and deserialization, and produce decoded data. The consumer (the main thread) retrieves the prepared data as it becomes available and creates the corresponding device buffers.

@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels Jun 25, 2026
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@marin-ma marin-ma force-pushed the gpu-async-native-shuffle-read branch from e5cc3aa to 0713a26 Compare June 25, 2026 17:05
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@marin-ma marin-ma force-pushed the gpu-async-native-shuffle-read branch from 0713a26 to bebb887 Compare June 25, 2026 17:11
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@marin-ma marin-ma force-pushed the gpu-async-native-shuffle-read branch from bebb887 to 5ecc27c Compare June 25, 2026 17:13
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

2 similar comments
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@marin-ma marin-ma force-pushed the gpu-async-native-shuffle-read branch from f840d7a to c1d2691 Compare June 25, 2026 18:09
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

2 similar comments
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@marin-ma marin-ma force-pushed the gpu-async-native-shuffle-read branch from b716f71 to 7c4a796 Compare June 26, 2026 08:34
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

9 similar comments
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@marin-ma marin-ma marked this pull request as ready for review June 30, 2026 16:04
Copilot AI review requested due to automatic review settings June 30, 2026 16:04
@marin-ma marin-ma force-pushed the gpu-async-native-shuffle-read branch from ad69c7d to 096215f Compare June 30, 2026 16:09
Copilot AI review requested due to automatic review settings June 30, 2026 16:11

This comment was marked as duplicate.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copilot AI review requested due to automatic review settings July 1, 2026 10:32
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

This comment was marked as duplicate.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copilot AI review requested due to automatic review settings July 2, 2026 13:09
@marin-ma marin-ma force-pushed the gpu-async-native-shuffle-read branch from 1c29a29 to 33e5b75 Compare July 2, 2026 13:09
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions github-actions Bot added the DOCS label Jul 2, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 46 out of 46 changed files in this pull request and generated 5 comments.


private def assertPositiveBlockSize(blockId: BlockId, blockSize: Long): Unit = {
if (blockSize < 0) {
throw BlockException(blockId, "Negative block size " + size)
Comment thread cpp/velox/shuffle/VeloxGpuShuffleReader.cc
Comment thread cpp/velox/utils/CachedBatchQueue.h Outdated
Comment on lines +61 to +62
LOG(INFO) << "Trying to get from cached buffer queue. Queue length: " << queue_.size()
<< ", total size in queue: " << totalSize_ << ", current batch size: " << batch->numBytes() << std::endl;
Comment thread cpp/velox/utils/CachedBatchQueue.h Outdated

notFull_.wait(lock, [&]() { return noMoreBatches_ || totalSize_ + batchSize <= capacity_; });
if (noMoreBatches_) {
LOG(WARNING) << "Discard batch due to calling put() after noMorBatches().";
Comment on lines 225 to 229
// Stop reading more streams. Blocked by the native reader threads.
jniWrapper.stop(shuffleReaderHandle)
onComplete.foreach(_())
// Would remove the resource object from registry to lower GC pressure.
TaskResources.releaseResource(resourceId)
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copilot AI review requested due to automatic review settings July 2, 2026 15:54
@marin-ma marin-ma changed the title [WIP][VL] Support GPU async native shuffle read [VL] Support GPU async native shuffle read Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 46 out of 46 changed files in this pull request and generated 6 comments.


private def assertPositiveBlockSize(blockId: BlockId, blockSize: Long): Unit = {
if (blockSize < 0) {
throw BlockException(blockId, "Negative block size " + size)
Comment on lines +304 to +314
ReaderThreadPool* VeloxBackend::getReaderThreadPool() {
static std::once_flag readerThreadPoolInit;
std::call_once(readerThreadPoolInit, [this] {
const auto configuredThreads =
backendConf_->get<int32_t>(kShuffleReaderThreads, static_cast<int32_t>(std::thread::hardware_concurrency()));
// std::thread::hardware_concurrency() can return 0;
const auto numThreads = configuredThreads > 0 ? configuredThreads : 1;
readerThreadPool_ = std::make_unique<ReaderThreadPool>(numThreads);
});
return readerThreadPool_.get();
}
Comment on lines +40 to +43
for (auto& task : tasks) {
tasks_.push({std::move(task), priority});
}
}
Comment on lines +85 to +94
auto& prioritizedTask = tasks_.top();
LOG(INFO) << "Worker thread " << std::this_thread::get_id() << " is executing a task with priority "
<< prioritizedTask.priority;
task = std::move(prioritizedTask.task);
tasks_.pop();
}

if (task) {
task();
}
Comment on lines +132 to 134
void VeloxGpuHashShuffleReaderDeserializer::read() {
std::shared_ptr<arrow::io::InputStream> inputStream = nullptr;

Comment on lines +191 to +200
// Close input stream if it's still open.
if (inputStream != nullptr) {
GLUTEN_THROW_NOT_OK(inputStream->Close());
}

// Decrement active reader count.
if (activeReaders_.fetch_sub(1, std::memory_order_acq_rel) == 1) {
batchQueue_->noMoreBatches();
completionCV_.notify_all();
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core DOCS VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants