[SPARK-56512][CORE] Avoid redundant BlockId parsing in ShuffleBlockFetcherIterator by LuciferYang · Pull Request #55374 · apache/spark

LuciferYang · 2026-04-16T17:45:47Z

What changes were proposed in this pull request?

This PR caches the result of BlockId.apply(blockId) into a local variable in ShuffleBlockFetcherIterator.onBlockFetchSuccess, avoiding parsing the same block ID string twice.

Why are the changes needed?

BlockId.apply() sequentially matches the input string against 20 patterns. In the current code, it is called twice on the same blockId string within onBlockFetchSuccess:

updateMergedReqsDuration(BlockId(blockId).isShuffleChunk)
results.put(SuccessFetchResult(BlockId(blockId), ...))

This callback is on the hot path of shuffle data fetching — invoked once per block for every shuffle read. Parsing once and reusing the result eliminates redundant case matching.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass Github Actions

Was this patch authored or co-authored using generative AI tooling?

No

dongjoon-hyun

+1 for this PR.

Just a question. Are these all instances in this pattern?

lakechd · 2026-04-16T20:20:06Z

-            updateMergedReqsDuration(BlockId(blockId).isShuffleChunk)
-            results.put(SuccessFetchResult(BlockId(blockId), infoMap(blockId)._2,
+            val blkId = BlockId(blockId)
+            updateMergedReqsDuration(blkId.isShuffleChunk)


the implementation seems different from pr memo description

Thanks for taking a look! Could you point out specifically which part of the description seems different from the implementation? I'd be happy to clarify or update.

LuciferYang · 2026-04-17T00:29:34Z

+1 for this PR.

Just a question. Are these all instances in this pattern?

Yes — this was the only instance. I searched the entire codebase for BlockId(stringVar) calls. Other call sites in NettyBlockRpcServer, JsonProtocol, BlockManagerMessages, and DiskBlockManager each parse the string only once.

LuciferYang · 2026-04-17T13:44:20Z

Merged into master. Thanks @dongjoon-hyun and @lakechd

init

c12c472

dongjoon-hyun approved these changes Apr 16, 2026

View reviewed changes

lakechd reviewed Apr 16, 2026

View reviewed changes

LuciferYang closed this in 132de08 Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56512][CORE] Avoid redundant BlockId parsing in ShuffleBlockFetcherIterator#55374

[SPARK-56512][CORE] Avoid redundant BlockId parsing in ShuffleBlockFetcherIterator#55374
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:SPARK-56512

LuciferYang commented Apr 16, 2026 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

lakechd Apr 16, 2026

Uh oh!

LuciferYang Apr 17, 2026

Uh oh!

LuciferYang commented Apr 17, 2026

Uh oh!

LuciferYang commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

LuciferYang commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

lakechd Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

LuciferYang Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Apr 17, 2026

Uh oh!

LuciferYang commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LuciferYang commented Apr 16, 2026 •

edited

Loading