[#378] feat: introduce storage manager selector #621

zuston · 2023-02-17T08:21:59Z

What changes were proposed in this pull request?

introduce storage manager selector to support more selector strategy for MultiStorageManager
Introduce the conf of rss.server.multistorage.cold.storage.preferred.factor to support different flush strategy, like I hope huge partition directly flushed to HDFS and normal partition could be flushed to DISK when single buffer flush is enabled.

Why are the changes needed?

Solving the problem mentioned in #378 (comment).

In current codebase, when encountering huge partition, if single buffer flush is enabled, the normal partition data will be flush to HDFS(I don't hope so, because the local disk is free and the flushing speed is faster than HDFS). But if disable single flush buffer, the huge partition event before marking as huge partition may be big, which cause the slow flushing and then cause requiring allocated buffer failed.

Based on above problems, this PR is to make single event carrying with 100 mb flushed into HDFS or local file leveraging the conf of rss.server.multistorage.cold.storage.preferred.factor

Does this PR introduce any user-facing change?

Yes. Doc will be updated later.

How was this patch tested?

UTs

codecov-commenter · 2023-02-17T08:28:56Z

Codecov Report

Merging #621 (b3e217f) into master (e20fb62) will increase coverage by 2.11%.
The diff coverage is 98.07%.

@@             Coverage Diff              @@
##             master     #621      +/-   ##
============================================
+ Coverage     60.90%   63.02%   +2.11%     
- Complexity     1799     1810      +11     
============================================
  Files           214      204      -10     
  Lines         12381    10528    -1853     
  Branches       1042     1056      +14     
============================================
- Hits           7541     6635     -906     
+ Misses         4437     3547     -890     
+ Partials        403      346      -57

Impacted Files	Coverage Δ
...r/storage/multi/DefaultStorageManagerSelector.java	`93.75% <93.75%> (ø)`
...g/apache/uniffle/server/ShuffleDataFlushEvent.java	`84.90% <100.00%> (+1.23%)`	⬆️
...a/org/apache/uniffle/server/ShuffleServerConf.java	`99.35% <100.00%> (+0.03%)`	⬆️
.../org/apache/uniffle/server/ShuffleTaskManager.java	`76.76% <100.00%> (+0.13%)`	⬆️
...he/uniffle/server/buffer/ShuffleBufferManager.java	`83.50% <100.00%> (+0.40%)`	⬆️
...he/uniffle/server/storage/MultiStorageManager.java	`59.61% <100.00%> (-1.10%)`	⬇️
...e/server/storage/multi/StorageManagerSelector.java	`100.00% <100.00%> (ø)`
...a/org/apache/uniffle/server/RegisterHeartBeat.java	`43.85% <0.00%> (-43.86%)`	⬇️
.../java/org/apache/uniffle/server/ShuffleServer.java	`62.06% <0.00%> (-2.32%)`	⬇️
... and 24 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

zuston · 2023-02-17T09:38:15Z

PTAL @jerqi @xianjingfeng . This is a remaining improvement for #378

server/src/main/java/org/apache/uniffle/server/buffer/ShuffleBufferManager.java

server/src/test/java/org/apache/uniffle/server/storage/MultiStorageManagerTest.java

server/src/main/java/org/apache/uniffle/server/storage/multi/DefaultStorageManagerSelector.java

advancedxy · 2023-02-17T13:22:57Z

server/src/main/java/org/apache/uniffle/server/ShuffleTaskManager.java

@@ -671,7 +671,7 @@ public ShuffleTaskInfo getShuffleTaskInfo(String appId) {

  private void triggerFlush() {
    synchronized (this.shuffleBufferManager) {
-      this.shuffleBufferManager.flushIfNecessary();
+      this.shuffleBufferManager.flushIfNecessary(this::getPartitionDataSize);


I can get you intention.. But I believe passing the getPartitionDataSize method ref around is adding a lot of interface change and it's adding maintenance overhead.

How about make ShuffleTaskManager a private field of ShuffleBufferManager? Then all this interface change is unnecessary.

The shuffleTaskManager is initialized after creating the instance of ShuffleBufferManager, emm, we'd better to pass the ShuffleServer into ShuffleBufferManager. But it still looks unclear, especially for the dependency of different managers. I have no idea on this.

The shuffleTaskManager is initialized after creating the instance of ShuffleBufferManager, emm, we'd better to pass the ShuffleServer into ShuffleBufferManager. But it still looks unclear, especially for the dependency of different managers. I have no idea on this.

For impl purpose:
ShuffleBufferManager should expose a setTaskManager method, and in the construction of ShuffleTaskManager, it could call shuffleBufferManager.setTaskManager(this)

But I do think the BufferManager, 'TaskManagerandFlushManager`'s dependency is unclear and should be refactored in later PRs.

advancedxy · 2023-02-21T02:38:55Z

Hi @zuston could you rebase/merge you branch with the latest master branch?

I'd like to check whether the CI workflow for operator is worked as expected or not.

zuston · 2023-02-21T02:56:11Z

I'd like to check whether the CI workflow for operator is worked as expected or not.

Done

advancedxy · 2023-02-21T03:00:07Z

I'd like to check whether the CI workflow for operator is worked as expected or not.

Done

Thanks. The ci workflow of operator is skipped as expected.

advancedxy · 2023-02-22T14:47:09Z

server/src/main/java/org/apache/uniffle/server/buffer/ShuffleBufferManager.java

@@ -76,7 +78,12 @@ public class ShuffleBufferManager {
  // appId -> shuffleId -> shuffle size in buffer
  protected Map<String, Map<Integer, AtomicLong>> shuffleSizeMap = Maps.newConcurrentMap();

-  public ShuffleBufferManager(ShuffleServerConf conf, ShuffleFlushManager shuffleFlushManager) {
+  public ShuffleBufferManager(ShuffleServerConf conf, ShuffleServer shuffleServer) {
+    this(conf, shuffleServer.getShuffleFlushManager(), shuffleServer.getShuffleTaskManager());


I believe shuffleTaskManager should be lazily accessed. It't not initialized yet in https://github.com/apache/incubator-uniffle/pull/621/files#diff-1a1a029f5533bd7f4722cefdc586b5658ca9521bfcc8e817138aa7695c4a4e75R197

advancedxy

LGTM, except one minor comment

advancedxy · 2023-02-25T04:58:48Z

server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java

+      .enumType(StorageManagerSelector.ColdStoragePreferredFactor.class)
+      .defaultValue(StorageManagerSelector.ColdStoragePreferredFactor.HUGE_EVENT)
+      .withDescription("The cold storage preferred factor for multiple storage manager. Only the value is "
+          + StorageManagerSelector.ColdStoragePreferredFactor.HUGE_EVENT + ", the conf of "


This description is a bit hard to read. Would you mind do some rewording here.

I believe we should also update the doc about this new configuration.

jerqi · 2023-02-25T10:25:44Z

server/src/main/java/org/apache/uniffle/server/ShuffleServerConf.java

@@ -263,6 +264,15 @@ public class ShuffleServerConf extends RssBaseConf {
      .defaultValue(64L * 1024L * 1024L)
      .withDescription("For multistorage, the event size exceed this value, flush data  to cold storage");

+  public static final ConfigOption<StorageManagerSelector.ColdStoragePreferredFactor>
+      MULTISTORAGE_SELECTOR_COLD_STORAGE_PREFER = ConfigOptions
+      .key("rss.server.multistorage.cold.storage.preferred.factor")


Could you give this config option a better name? It is more like a strategy in our system.

rss.server.multistorage.selector.strategy

This has been discussed in above conversation. #621 (comment). If you prefer, I will refactor it.

It's better to refactor ... it's more clear.

…ategy

jerqi

LGTM

…ategy (apache#621) ### What changes were proposed in this pull request? 1. Introduce storage manager selector to support more selector strategy for `MultiStorageManager` 2. Introduce the conf of `rss.server.multistorage.manager.selector.class` to support different flush strategy, like I hope huge partition directly flushed to HDFS and normal partition could be flushed to DISK when single buffer flush is enabled. ### Why are the changes needed? Solving the problem mentioned in apache#378 (comment). In current codebase, when encountering huge partition, if single buffer flush is enabled, the normal partition data will be flush to HDFS(I don't hope so, because the local disk is free and the flushing speed is faster than HDFS). But if disable single flush buffer, the huge partition event before marking as huge partition may be big, which cause the slow flushing and then cause requiring allocated buffer failed. Based on above problems, this PR is to make single event carrying with 100 mb flushed into HDFS or local file leveraging the conf of `rss.server.multistorage.manager.selector.class` ### Does this PR introduce _any_ user-facing change? Yes. Doc will be updated later. ### How was this patch tested? 1. UTs

zuston requested review from xianjingfeng and jerqi February 17, 2023 09:32

zuston changed the title ~~[#378] feat: introduce storage manager selector to support more selector strategy~~ [#378] feat: introduce storage manager selector Feb 17, 2023

xianjingfeng reviewed Feb 17, 2023

View reviewed changes

server/src/main/java/org/apache/uniffle/server/buffer/ShuffleBufferManager.java Outdated Show resolved Hide resolved

xianjingfeng reviewed Feb 17, 2023

View reviewed changes

server/src/test/java/org/apache/uniffle/server/storage/MultiStorageManagerTest.java Outdated Show resolved Hide resolved

advancedxy reviewed Feb 17, 2023

View reviewed changes

zuston force-pushed the fallbackOp1 branch from 4b3d389 to 2156b81 Compare February 21, 2023 02:55

zuston requested a review from advancedxy February 22, 2023 12:04

advancedxy reviewed Feb 22, 2023

View reviewed changes

zuston requested a review from advancedxy February 25, 2023 01:02

advancedxy reviewed Feb 25, 2023

View reviewed changes

jerqi reviewed Feb 25, 2023

View reviewed changes

feat: introduce storage manager selector to support more selector str…

c3c9641

…ategy

zuston force-pushed the fallbackOp1 branch from 1f5d81c to c3c9641 Compare March 1, 2023 09:39

zuston requested a review from jerqi March 1, 2023 09:39

jerqi approved these changes Mar 1, 2023

View reviewed changes

zuston merged commit 1fbdfe5 into apache:master Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#378] feat: introduce storage manager selector #621

[#378] feat: introduce storage manager selector #621

zuston commented Feb 17, 2023

codecov-commenter commented Feb 17, 2023 •

edited

Loading

zuston commented Feb 17, 2023

advancedxy Feb 17, 2023

zuston Feb 18, 2023

zuston Feb 20, 2023

advancedxy Feb 20, 2023

advancedxy commented Feb 21, 2023

zuston commented Feb 21, 2023

advancedxy commented Feb 21, 2023

advancedxy Feb 22, 2023

zuston Feb 25, 2023

advancedxy left a comment

advancedxy Feb 25, 2023

zuston Feb 27, 2023

jerqi Feb 25, 2023

zuston Feb 27, 2023

jerqi Feb 27, 2023

zuston Feb 27, 2023

jerqi Feb 27, 2023

jerqi left a comment

[#378] feat: introduce storage manager selector #621

[#378] feat: introduce storage manager selector #621

Conversation

zuston commented Feb 17, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

codecov-commenter commented Feb 17, 2023 • edited Loading

Codecov Report

zuston commented Feb 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy commented Feb 21, 2023

zuston commented Feb 21, 2023

advancedxy commented Feb 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jerqi left a comment

Choose a reason for hiding this comment

codecov-commenter commented Feb 17, 2023 •

edited

Loading