[CELEBORN-2331] Parallelize batch open stream client creation#3692
[CELEBORN-2331] Parallelize batch open stream client creation#3692sunchao wants to merge 4 commits into
Conversation
|
Overall: Clean refactoring that parallelizes data-client creation across workers. The separation into group → parallel create → build request is well-structured, and the test coverage is thorough (parallel execution, failure isolation, retry, interruption). 1. Redundant retries with identical connection parameters
while (!clientCreated && locationsIterator.hasNext) {
val location = locationsIterator.next()
try {
clientsByHostPort.put(hostPort, createClient(location)) // same host:port every time
clientCreated = true
} catch { ... }
}For a reducer reading 1000 partitions from a failing worker, this means up to 1000 identical connection attempts (each paying the full connection timeout). Since Consider either:
2. +1 to SteNicholas's suggestion for a config switch A 3. Minor: In the original code, Reviewed with Claude Code |
|
@RexXiong Thanks for reviewing. The requested rollback switch is added in d44fa47 as |
|
Thanks. Merged to main(v0.7.0). |
Why are the changes needed?
CelebornShuffleReaderbatches stream-open requests by worker, but it previously created the data client for each worker serially before sending those already-parallel batch requests. When a reducer reads from multiple workers, connection setup for a slow or unavailable worker can delay useful work against the remaining healthy workers.Parallelizing this setup removes the worker-by-worker wait from the normal path. Because this changes task-side connection scheduling, the optimization also needs an operational fallback that restores the prior behavior without requiring a code rollback.
What changes were proposed in this PR?
The reader now first gathers pending stream-open locations by worker address, then creates one data client per distinct worker concurrently using the existing stream-creator pool. Once client setup completes, it sends the existing
BATCH_OPEN_STREAMrequests only for workers with an available client, allowing healthy workers to proceed even if another worker fails during setup.The client-creation phase preserves the prior retry behavior for later locations on the same worker when an earlier client attempt fails. It also handles task cancellation explicitly: if the waiting Spark task is interrupted, it restores the interrupt status and cancels unfinished setup work; worker-side interruption is propagated rather than treated as an ordinary retryable failure.
This optimization is controlled by
celeborn.client.spark.batch.openStream.parallelClientCreation.enabled, which defaults totrue. Setting it tofalseselects the original serial client-creation and request-building flow, giving deployments a targeted rollback switch if parallel connection setup causes unexpected operational behavior.How was this PR tested?