[SPARK-56512][CORE] Avoid redundant BlockId parsing in ShuffleBlockFetcherIterator#55374
[SPARK-56512][CORE] Avoid redundant BlockId parsing in ShuffleBlockFetcherIterator#55374LuciferYang wants to merge 1 commit into
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1 for this PR.
Just a question. Are these all instances in this pattern?
| updateMergedReqsDuration(BlockId(blockId).isShuffleChunk) | ||
| results.put(SuccessFetchResult(BlockId(blockId), infoMap(blockId)._2, | ||
| val blkId = BlockId(blockId) | ||
| updateMergedReqsDuration(blkId.isShuffleChunk) |
There was a problem hiding this comment.
the implementation seems different from pr memo description
There was a problem hiding this comment.
Thanks for taking a look! Could you point out specifically which part of the description seems different from the implementation? I'd be happy to clarify or update.
Yes — this was the only instance. I searched the entire codebase for BlockId(stringVar) calls. Other call sites in |
|
Merged into master. Thanks @dongjoon-hyun and @lakechd |
What changes were proposed in this pull request?
This PR caches the result of
BlockId.apply(blockId)into a local variable inShuffleBlockFetcherIterator.onBlockFetchSuccess, avoiding parsing the same block ID string twice.Why are the changes needed?
BlockId.apply()sequentially matches the input string against 20 patterns. In the current code, it is called twice on the sameblockIdstring withinonBlockFetchSuccess:This callback is on the hot path of shuffle data fetching — invoked once per block for every shuffle read. Parsing once and reusing the result eliminates redundant case matching.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No