Skip to content

Support bounded parallel chunk transfers#29341

Open
tyler-french wants to merge 1 commit intobazelbuild:masterfrom
tyler-french:tfrench/cdc-concurrent
Open

Support bounded parallel chunk transfers#29341
tyler-french wants to merge 1 commit intobazelbuild:masterfrom
tyler-french:tfrench/cdc-concurrent

Conversation

@tyler-french
Copy link
Copy Markdown
Contributor

@tyler-french tyler-french commented Apr 19, 2026

Description

For --experimental_remote_cache_chunking implemented in #28437

This PR enables parallel uploads and downloads for chunked files, to improve performance. Since the concurrency is globally bounded already by GRPC total connectsion, we create a separate bound per file to prevent too-fast fanout. This is done using 32 which is a good balance, but not too high.

To prevent issues using batches, we create simple sliding-window style transfer managers.

RELNOTES: CDC chunk uploads and downloads can now happen in parallel within a large blob.

Benchmarking:

With our synthetic benchmark of network delays and simulated jitter, the parallelism leads to a 20x improvement, but of course, this doesn't always match realistic situations.

After Change:

Benchmark                                 (avgChunkSizeBytes)  (chunkCount)  (chunkSizeBytes)  (delayMillis)  (fileSizeBytes)  (jitterMillis)  (schedulerThreads)  Mode  Cnt   Score   Error  Units
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   1  avgt    3  34.958 ± 0.092  ms/op
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   2  avgt    3  34.971 ± 0.085  ms/op
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   4  avgt    3  34.983 ± 0.127  ms/op
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   8  avgt    3  34.974 ± 0.213  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   1  avgt    3  35.006 ± 1.170  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   2  avgt    3  35.028 ± 1.280  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   4  avgt    3  35.071 ± 1.534  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   8  avgt    3  35.056 ± 1.407  ms/op

Before Change:

Benchmark                                 (avgChunkSizeBytes)  (chunkCount)  (chunkSizeBytes)  (delayMillis)  (fileSizeBytes)  (jitterMillis)  (schedulerThreads)  Mode  Cnt    Score     Error  Units
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   2  avgt    3  811.458 ± 466.502  ms/op
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   4  avgt    3  811.918 ± 453.385  ms/op
ChunkedTransferBenchmark.downloadChunked                  N/A            32              1024             25              N/A              10                   8  avgt    3  811.849 ± 453.511  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   1  avgt    3  741.295 ± 392.466  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   2  avgt    3  741.600 ± 404.457  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   4  avgt    3  742.135 ± 401.637  ms/op
ChunkedTransferBenchmark.uploadChunked                   1024           N/A               N/A             25            32768              10                   8  avgt    3  742.024 ± 398.510  ms/op

Big File:

CURRENT BRANCH (512 MiB)

Benchmark                                 (avgChunkSizeBytes)  (chunkCount)  (chunkSizeBytes)  (delayMillis)  (fileSizeBytes)  (jitterMillis)  (schedulerThreads)  Mode  Cnt    Score      Error  Units
ChunkedTransferBenchmark.downloadChunked                  N/A           512           1048576             25              N/A              10                   8  avgt    3  416.743 ±    8.043  ms/op
ChunkedTransferBenchmark.uploadChunked                1048576           N/A               N/A             25        536870912              10                   8  avgt    3  806.346 ± 1386.743  ms/op
MASTER BASELINE (512 MiB)

Benchmark                                 (avgChunkSizeBytes)  (chunkCount)  (chunkSizeBytes)  (delayMillis)  (fileSizeBytes)  (jitterMillis)  (schedulerThreads)  Mode  Cnt      Score      Error  Units
ChunkedTransferBenchmark.downloadChunked                  N/A           512           1048576             25              N/A              10                   8  avgt    3  12783.277 ± 1555.102  ms/op
ChunkedTransferBenchmark.uploadChunked                1048576           N/A               N/A             25        536870912              10                   8  avgt    3  11758.738 ± 2207.502  ms/op

@tyler-french tyler-french requested a review from a team as a code owner April 19, 2026 18:23
@github-actions github-actions Bot added team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Apr 19, 2026
@tyler-french
Copy link
Copy Markdown
Contributor Author

FYI @tjgq I think this was a follow-up from the original PR

@tyler-french
Copy link
Copy Markdown
Contributor Author

@bazel-io fork 9.2.0

Copy link
Copy Markdown
Contributor

@sluongng sluongng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit skeptical of the current approach, so I will skip on reading the window filling implementation right now. (though Codex does suggest there is a problem there)

Each invocation may have multiple actions/spawns running in parallel, each creates multiple blob uploads/downloads, and some of those blobs are chunked blobs. Adding parallelism on the blob level feels like a local optimization, and the new flag does not offer strong control over the total parallelism of the invocation.

I wonder if we need something higher-level that lets us effectively enforce a global parallelism for uploads and downloads🤔

Comment thread src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java Outdated
@tyler-french tyler-french force-pushed the tfrench/cdc-concurrent branch from 8969051 to 33689c3 Compare April 22, 2026 15:44
Copilot AI review requested due to automatic review settings April 22, 2026 15:44
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables bounded parallelism for content-defined chunk (CDC) blob uploads/downloads, improving throughput while avoiding unbounded per-blob fanout.

Changes:

  • Implement sliding-window style, per-blob bounded concurrency for chunk uploads in ChunkedBlobUploader.
  • Implement sliding-window style, per-blob bounded concurrency for chunk downloads (including in-flight dedup) in ChunkedBlobDownloader.
  • Add/expand unit tests for window refill, cancellation, and failure propagation; add a JMH benchmark binary target.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java Adds bounded in-flight chunk upload window and cancellation on failure.
src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java Adds bounded in-flight chunk download window with reassembly and in-flight dedup.
src/test/java/com/google/devtools/build/lib/remote/ChunkedBlobUploaderTest.java Adds tests for window refill, cancellation, and failure handling for parallel uploads.
src/test/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloaderTest.java Updates tests for new download API and adds parallel-window behavior tests.
src/test/java/com/google/devtools/build/lib/remote/ChunkedTransferBenchmark.java Introduces a JMH benchmark for chunked upload/download with latency + jitter.
src/test/java/com/google/devtools/build/lib/remote/BUILD Adds a java_opt_binary target to run the new benchmark.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java Outdated
Comment thread src/test/java/com/google/devtools/build/lib/remote/ChunkedTransferBenchmark.java Outdated
Comment thread src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobUploader.java Outdated
@tyler-french
Copy link
Copy Markdown
Contributor Author

I'm a bit skeptical of the current approach, so I will skip on reading the window filling implementation right now. (though Codex does suggest there is a problem there)

Each invocation may have multiple actions/spawns running in parallel, each creates multiple blob uploads/downloads, and some of those blobs are chunked blobs. Adding parallelism on the blob level feels like a local optimization, and the new flag does not offer strong control over the total parallelism of the invocation.

I wonder if we need something higher-level that lets us effectively enforce a global parallelism for uploads and downloads🤔

Updated this in the follow-up direction you suggested: I removed the flag/plumbing and kept only a small hardcoded per-blob window of 32 as a guard against huge single-blob fanout. The actual global active RPC limit is still the shared gRPC channel pool (--remote_max_connections / --remote_max_concurrency_per_connection), so this isn’t intended to be a new invocation-level concurrency control - the combined cache doesn't have such restriction as far as I can tell.

@tyler-french tyler-french force-pushed the tfrench/cdc-concurrent branch from 33689c3 to 8363b37 Compare April 22, 2026 17:15
Comment thread src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java Outdated
Comment thread src/main/java/com/google/devtools/build/lib/remote/ChunkedBlobDownloader.java Outdated
@tyler-french tyler-french force-pushed the tfrench/cdc-concurrent branch from 8363b37 to 976a25f Compare April 27, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting-review PR is awaiting review from an assigned reviewer team-Remote-Exec Issues and PRs for the Execution (Remote) team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants