Skip to content

[SPARK-56512][CORE] Avoid redundant BlockId parsing in ShuffleBlockFetcherIterator#55374

Closed
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:SPARK-56512
Closed

[SPARK-56512][CORE] Avoid redundant BlockId parsing in ShuffleBlockFetcherIterator#55374
LuciferYang wants to merge 1 commit into
apache:masterfrom
LuciferYang:SPARK-56512

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang commented Apr 16, 2026

What changes were proposed in this pull request?

This PR caches the result of BlockId.apply(blockId) into a local variable in ShuffleBlockFetcherIterator.onBlockFetchSuccess, avoiding parsing the same block ID string twice.

Why are the changes needed?

BlockId.apply() sequentially matches the input string against 20 patterns. In the current code, it is called twice on the same blockId string within onBlockFetchSuccess:

updateMergedReqsDuration(BlockId(blockId).isShuffleChunk)
results.put(SuccessFetchResult(BlockId(blockId), ...))

This callback is on the hot path of shuffle data fetching — invoked once per block for every shuffle read. Parsing once and reusing the result eliminates redundant case matching.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Pass Github Actions

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this PR.

Just a question. Are these all instances in this pattern?

updateMergedReqsDuration(BlockId(blockId).isShuffleChunk)
results.put(SuccessFetchResult(BlockId(blockId), infoMap(blockId)._2,
val blkId = BlockId(blockId)
updateMergedReqsDuration(blkId.isShuffleChunk)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implementation seems different from pr memo description

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look! Could you point out specifically which part of the description seems different from the implementation? I'd be happy to clarify or update.

@LuciferYang
Copy link
Copy Markdown
Contributor Author

+1 for this PR.

Just a question. Are these all instances in this pattern?

Yes — this was the only instance. I searched the entire codebase for BlockId(stringVar) calls. Other call sites in NettyBlockRpcServer, JsonProtocol, BlockManagerMessages, and DiskBlockManager each parse the string only once.

@LuciferYang
Copy link
Copy Markdown
Contributor Author

Merged into master. Thanks @dongjoon-hyun and @lakechd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants