[CELEBORN-1410] Combine multiple ShuffleBlockInfo into a single ShuffleBlockInfo by s0nskar · Pull Request #2524 · apache/celeborn

s0nskar · 2024-05-21T14:21:08Z

What changes were proposed in this pull request?

Merging smaller ShuffleBlockInfo corresponding into same mapID, such that size of each block does not exceeds celeborn.shuffle.chunk.size

Why are the changes needed?

As sorted ShuffleBlocks are contiguous, we can compact multiple ShuffleBlockInfo into one as long as the size of compacted one does not exceeds half of celeborn.shuffle.chunk.size. This way we can decrease the number of ShuffleBlock objects.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UTs

s0nskar · 2024-05-21T14:22:42Z

@waitinfuture Does these changes looks ok? Also in ticket you have mentioned

as long as the size of compacted one does not exceeds half of celeborn.shuffle.chunk.size

Why do we want to limit the size at half of shuffleChunkSize and not use full shuffleChunkSize.

codecov · 2024-05-21T14:47:40Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 40.81%. Comparing base (121395f) to head (817eb62).
Report is 51 commits behind head on main.

❗ Current head 817eb62 differs from pull request most recent head 3df993c

Please upload reports for the commit 3df993c to get more accurate results.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2524      +/-   ##
==========================================
+ Coverage   40.17%   40.81%   +0.64%     
==========================================
  Files         218      220       +2     
  Lines       13742    14104     +362     
  Branches     1214     1258      +44     
==========================================
+ Hits         5520     5755     +235     
- Misses       7905     8020     +115     
- Partials      317      329      +12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

waitinfuture

Hi @s0nskar , thanks for the PR! I left a comment, PTAL :)

waitinfuture · 2024-05-21T14:47:45Z

...er/src/main/java/org/apache/celeborn/service/deploy/worker/storage/PartitionFilesSorter.java

-            sortedBlock.offset = fileIndex;
-            sortedBlock.length = length;
-            sortedShuffleBlocks.add(sortedBlock);
            fileIndex += transferBlock(offset, length);


I think we should put fileIndex += transferBlock(offset, length); below the if-else block, because we need make sure the offset of the first sortedBlock equals to 0.

Nice catch, Fixed it!

I'm trying to setup UT locally it should've been caught by it. Facing below error while running the tests, have you seen this error.

scala: bad option: -P:silencer:globalFilters=.*deprecated.*

I don't see such error, I usually run local UT through Intellij or using sbt https://celeborn.apache.org/docs/latest/developers/sbt/#testing-with-sbt , you can try out this document

Thanks! This is working for me ✌️

waitinfuture · 2024-05-21T14:50:49Z

Why do we want to limit the size at half of shuffleChunkSize and not use full shuffleChunkSize.

The reason is to avoid that the next batch exceeds half the shuffleChunkSize so the whole length exceeds shuffleChunkSize. But I think this is trivial, it's ok to use full shuffleChunkSize.

s0nskar · 2024-05-21T17:36:39Z

...er/src/main/java/org/apache/celeborn/service/deploy/worker/storage/PartitionFilesSorter.java

+            // combine multiple `ShuffleBlockInfo` into a single `ShuffleBlockInfo` of size
+            // less than `shuffleChunkSize`
+            if (!sortedShuffleBlocks.isEmpty() &&
+                sortedShuffleBlocks.get(sortedShuffleBlocks.size() - 1).length + length <= shuffleChunkSize) {


@waitinfuture Just wanted to double check if <= condition would be fine here, or shuffleChunkSize is exclusive upperBound.

<= is fine here, shuffleChunkSize is not exclusive upper bound, in fact it's not a strict bound.

s0nskar · 2024-05-22T05:11:27Z

Thanks for the review @waitinfuture

FMX · 2024-05-22T07:10:02Z

...er/src/main/java/org/apache/celeborn/service/deploy/worker/storage/PartitionFilesSorter.java

+            // size of compacted `ShuffleBlockInfo` does not exceed `shuffleChunkSize`
+            if (!sortedShuffleBlocks.isEmpty()
+                && sortedShuffleBlocks.get(sortedShuffleBlocks.size() - 1).length + length
+                    <= shuffleChunkSize) {


The condition "<= shuffleChunkSize" may cause the returned chunk's size to approach 2 * chunkSize. This won't be a trouble but changes the default behavior.

Yes, say shuffleChunkSize is 8m, the generated blocks can be two 7.9m chunks, when fetch data it can return the two chunks together, which is 15.8m. That's the reason why I mentioned in the jira to use half of shuffleChunkSize, in which the worst case is 3.9m * 3 = 11.7m.

Maybe narrow the threshold to some fraction * shuffleChunkSize is more safe, say 0.25, WDYT?

sortedShuffleBlocks.get(sortedShuffleBlocks.size() - 1).length + length <= 0.25 * shuffleChunkSize

yeah, that sounds good. I can make this threshold configurable with default value of 0.25 to start with.

Although i'm not super clear about the issue in the discussion. Why would fetch data will return two or three chunks together making it 7.9 * 2 = 15.8m or 3.9 * 3 = 11.7 as mentioned above. Do clients cache the initial read chunk while reading the next one. If that is the case, can someone point me to this piece of code.

yeah, that sounds good. I can make this threshold configurable with default value of 0.25 to start with.

Although i'm not super clear about the issue in the discussion. Why would fetch data will return two or three chunks together making it 7.9 * 2 = 15.8m or 3.9 * 3 = 11.7 as mentioned above. Do clients cache the initial read chunk while reading the next one. If that is the case, can someone point me to this piece of code.

Hi @s0nskar , when creating ReduceFileMeta it will compact contiguous ShuffleBlockInfos into a chunk, it first adds the current ShuffleBlockInfo then checks whether size exceeds fetchChunkSize:

celeborn/common/src/main/java/org/apache/celeborn/common/util/ShuffleBlockInfoUtils.java

Line 53 in ba49970

if (info.offset - sortedChunkOffset.get(sortedChunkOffset.size() - 1) >= fetchChunkSize) {

When Worker handles fetch, it sends one chunk at a time, and chunk boundaries are stored in ReduceFileMeta#chunkOffsets.

got it, thanks for the context.

As per documentation fetchChunkSize is "Max chunk size". So shouldn't we change above condition to respect it. As you mentioned, if the ShuffleBlocks are generated close of 8mb then chunk can be close to 16mb.

We can change this condition to generate offsets closer to fetchChunkSize with some error factor of 10 or 20%. wdyt?

common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala

RexXiong · 2024-05-23T08:27:31Z

...er/src/main/java/org/apache/celeborn/service/deploy/worker/storage/PartitionFilesSorter.java

+            // combine multiple small length `ShuffleBlockInfo` for same mapId such that
+            // size of compacted `ShuffleBlockInfo` does not exceed `shuffleChunkSize`
+            if (!sortedShuffleBlocks.isEmpty()
+                && sortedShuffleBlocks.get(sortedShuffleBlocks.size() - 1).length + length


can we introduce a variable for sortedShuffleBlocks.get(sortedShuffleBlocks.size() - 1)

refactored the code a little to simplify it.

waitinfuture · 2024-05-23T09:32:30Z

Hi @s0nskar , please run UPDATE=1 build/mvn clean test -pl common -am -Dtest=none -DwildcardSuites=org.apache.celeborn.ConfigurationSuite to auto update docs, see https://github.com/apache/celeborn/blob/main/CONTRIBUTING.md

common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala

cfmcgrady

LGTM

waitinfuture

LGTM, thanks! Merging to main(v0.5.0)/branch-0.4(v0.4.2)

…leBlockInfo Merging smaller `ShuffleBlockInfo` corresponding into same mapID, such that size of each block does not exceeds `celeborn.shuffle.chunk.size` As sorted ShuffleBlocks are contiguous, we can compact multiple `ShuffleBlockInfo` into one as long as the size of compacted one does not exceeds half of `celeborn.shuffle.chunk.size`. This way we can decrease the number of ShuffleBlock objects. No Existing UTs Closes #2524 from s0nskar/CELEBORN-1410. Lead-authored-by: Sanskar Modi <sanskarmodi97@gmail.com> Co-authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: zky.zhoukeyong <zky.zhoukeyong@alibaba-inc.com>

[CELEBORN-1410] combine multiple into a single

817eb62

waitinfuture reviewed May 21, 2024

View reviewed changes

Fixed the file index issue

1497969

s0nskar commented May 21, 2024

View reviewed changes

Fix style issues

f946bd3

s0nskar marked this pull request as ready for review May 22, 2024 05:11

Edited comment

10ee553

SteNicholas changed the title ~~[CELEBORN-1410] combine multiple ShuffleBlockInfo into a single ShuffleBlockInfo~~ [CELEBORN-1410] Combine multiple ShuffleBlockInfo into a single ShuffleBlockInfo May 22, 2024

FMX reviewed May 22, 2024

View reviewed changes

introduce compaction factor config

713f8df

RexXiong reviewed May 23, 2024

View reviewed changes

s0nskar added 2 commits May 23, 2024 14:06

Fixed issue with code

64864c5

Addressed review comments

c1e300f

cfmcgrady reviewed May 23, 2024

View reviewed changes

common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala Outdated Show resolved Hide resolved

cfmcgrady and others added 3 commits May 23, 2024 18:55

Apply suggestions from code review

b9b8d09

Update docs

4273e95

Fixed version in the docs

3df993c

cfmcgrady approved these changes May 23, 2024

View reviewed changes

waitinfuture approved these changes May 23, 2024

View reviewed changes

waitinfuture closed this in f527b22 May 23, 2024

s0nskar deleted the CELEBORN-1410 branch May 23, 2024 13:29

Conversation

s0nskar commented May 21, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

s0nskar commented May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

waitinfuture left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

waitinfuture commented May 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

s0nskar commented May 22, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

waitinfuture May 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

waitinfuture May 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

waitinfuture commented May 23, 2024

Uh oh!

Uh oh!

cfmcgrady left a comment

Choose a reason for hiding this comment

Uh oh!

waitinfuture left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

s0nskar commented May 21, 2024 •

edited

Loading

codecov bot commented May 21, 2024 •

edited

Loading

waitinfuture May 22, 2024 •

edited

Loading

waitinfuture May 23, 2024 •

edited

Loading