Skip to content

[#2740] improvement(server) Just check block count while checking commit result#2742

Merged
xianjingfeng merged 1 commit intoapache:masterfrom
xianjingfeng:issue_2740
Mar 25, 2026
Merged

[#2740] improvement(server) Just check block count while checking commit result#2742
xianjingfeng merged 1 commit intoapache:masterfrom
xianjingfeng:issue_2740

Conversation

@xianjingfeng
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Replace Roaring64NavigableMap with AtomicLong for checking commit result

Why are the changes needed?

Fix: #2740
To save memory.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI

@xianjingfeng xianjingfeng requested review from roryqi and zuston March 19, 2026 10:16
@github-actions
Copy link
Copy Markdown

Test Results

 3 162 files  + 7   3 162 suites  +7   6h 50m 7s ⏱️ -34s
 1 246 tests ± 0   1 244 ✅ ± 0   1 💤 ±0  1 ❌ ±0 
15 643 runs   - 70  15 627 ✅  - 70  15 💤 ±0  1 ❌ ±0 

For more details on these failures, see this check.

Results for commit 34a8fd9. ± Comparison against base commit b98b488.

@roryqi
Copy link
Copy Markdown
Contributor

roryqi commented Mar 19, 2026

Count may not be correct if we have the duplicated data or the duplicated commit operation.

shuffleTaskInfos.computeIfAbsent(appId, x -> new ShuffleTaskInfo(appId));
long size = 0L;
// With memory storage type should never need cachedBlockIds,
int blockCount = spbs.length - shufflePartitionedData.getDuplicateBlockCount();
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated blocks will not be countd. @roryqi
Furthermore, based on the current code, the same block will not be committed twice.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @zuston WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the counter may be not accurate. We have too many branches. It is hard to prove it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need commit the data? If we use memory mode, we can avoid committing the blocks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need commit the data? If we use memory mode, we can avoid committing the blocks.

+1

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need commit the data? If we use memory mode, we can avoid committing the blocks.

But we have an option not to use memory, or we should disable this option in non-unit test scenarios.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think twice. It may not influence our correctness. Because we don't use committed bitmap to judge correctness.

@xianjingfeng xianjingfeng merged commit b80940d into apache:master Mar 25, 2026
39 of 41 checks passed
@xianjingfeng
Copy link
Copy Markdown
Member Author

@roryqi @zuston Thanks for your review.

@xianjingfeng xianjingfeng deleted the issue_2740 branch March 25, 2026 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Too many blocks generated by a single task for a partition can cause memory overflow

3 participants