Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ISSUE-448][Feature] shuffle server report storage info #449

Merged
merged 6 commits into from
Jan 11, 2023

Conversation

advancedxy
Copy link
Contributor

@advancedxy advancedxy commented Dec 27, 2022

What changes were proposed in this pull request?

ShuffleServer reports local storage info about itself.
This PR also defines a general message definition to extend remote distributed info.

Why are the changes needed?

To do better shuffle assignments and get more insight of shuffle server
This addresses #448

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UTs.

@advancedxy advancedxy marked this pull request as ready for review December 28, 2022 10:45
@advancedxy advancedxy changed the title WIP: shuffle server report storage capability [FEATURE] shuffle server report storage capability Dec 28, 2022
@advancedxy advancedxy changed the title [FEATURE] shuffle server report storage capability [Feature] shuffle server report storage capability Dec 28, 2022
@codecov-commenter
Copy link

codecov-commenter commented Dec 28, 2022

Codecov Report

Merging #449 (f7fb582) into master (6ddf8a7) will increase coverage by 0.07%.
The diff coverage is 60.25%.

@@             Coverage Diff              @@
##             master     #449      +/-   ##
============================================
+ Coverage     58.71%   58.78%   +0.07%     
- Complexity     1651     1704      +53     
============================================
  Files           199      206       +7     
  Lines         11214    11471     +257     
  Branches        998     1024      +26     
============================================
+ Hits           6584     6743     +159     
- Misses         4237     4317      +80     
- Partials        393      411      +18     
Impacted Files Coverage Δ
...he/uniffle/coordinator/CoordinatorGrpcService.java 2.08% <0.00%> (-0.02%) ⬇️
...che/uniffle/server/storage/HdfsStorageManager.java 93.75% <0.00%> (-1.49%) ⬇️
...he/uniffle/server/storage/MultiStorageManager.java 60.71% <0.00%> (-3.44%) ⬇️
...e/uniffle/storage/common/StorageMediaProvider.java 0.00% <0.00%> (ø)
...org/apache/uniffle/common/storage/StorageInfo.java 35.18% <35.18%> (ø)
...g/apache/uniffle/common/storage/StorageStatus.java 42.85% <42.85%> (ø)
...rg/apache/uniffle/common/storage/StorageMedia.java 43.47% <43.47%> (ø)
...le/storage/common/DefaultStorageMediaProvider.java 53.12% <53.12%> (ø)
...he/uniffle/server/storage/LocalStorageManager.java 85.87% <74.19%> (-2.49%) ⬇️
...le/server/storage/StorageMediaFromEnvProvider.java 89.65% <89.65%> (ø)
... and 17 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -336,6 +336,12 @@ public class ShuffleServerConf extends RssBaseConf {
.withDescription("Threshold when flushing shuffle data to persistent storage, recommend value would be 256K, "
+ "512K, or even 1M");

public static final ConfigOption<String> STORAGE_TYPE_PROVIDER_ENV_KEY = ConfigOptions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an internal config entry? If yes, could you mark it, rename from STORAGE_TYPE_PROVIDER_ENV_KEY to __INTERNAL_STORAGE_TYPE_PROVIDER_ENV_KEY ? WDYT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

admin might change it. I'm not sure it falls into the internal config catalog?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have other suggestions?

proto/src/main/proto/Rss.proto Outdated Show resolved Hide resolved
@advancedxy advancedxy changed the title [Feature] shuffle server report storage capability [Feature] shuffle server report storage info Jan 6, 2023
@jerqi jerqi changed the title [Feature] shuffle server report storage info [ISSUE-448][Feature] shuffle server report storage info Jan 6, 2023
Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an initial PR to support better assignment, it's better to list all subtasks, and indicate the progress in the PR title, like [1/n] xxxx, [2/n] xxxxxx.

proto/src/main/proto/Rss.proto Outdated Show resolved Hide resolved
@@ -94,19 +101,26 @@ public class LocalStorageManager extends SingleStorageManager {
// We must make sure the order of `storageBasePaths` and `localStorages` is same, or some unit test may be fail
CountDownLatch countDownLatch = new CountDownLatch(storageBasePaths.size());
AtomicInteger successCount = new AtomicInteger();
ServiceLoader<StorageTypeProvider> loader = ServiceLoader.load(StorageTypeProvider.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need multiple storage type providers? I think one is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one is not enough, especially when you need to provide your own provider logic.

Although, the current logic may not handle provider evaluation order properly. But we can modify that later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one is not enough, especially when you need to provide your own provider logic.

Could you help provide some examples about needing multiple providers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help provide some examples about needing multiple providers?

Just like the DefaultStorageTypeProvider and StorageTypeFromEnvProvider. Some base dir might be indeed is a SSD disk but the default could only return HDD>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't agree this, we could use single provider to mark one storage media for one basePath, this is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the DefaultStorageMediaProvider cannot provide accurate result?

The whole point of SPI is to provide an extend way to add plugins. So, the service loader must be a list?

@advancedxy
Copy link
Contributor Author

I think this is an initial PR to support better assignment, it's better to list all subtasks, and indicate the progress in the PR title, like [1/n] xxxx, [2/n] xxxxxx.

The initial intention for this PR is to support shuffle server with LOCAL HDD disks better. You can see the parent and sub-issues from #448.

However this pr extends its scope a bit. I may open another issue to track this and related issues for better shuffle server assignments.

@advancedxy advancedxy marked this pull request as draft January 9, 2023 02:59
@advancedxy
Copy link
Contributor Author

PLAT @zuston .

@advancedxy advancedxy marked this pull request as ready for review January 9, 2023 08:51
@jerqi jerqi linked an issue Jan 9, 2023 that may be closed by this pull request
3 tasks
@jerqi
Copy link
Contributor

jerqi commented Jan 10, 2023

If we introduce extra configuration option, we should add the document about this feature to explain how to use this feature.

@advancedxy
Copy link
Contributor Author

If we introduce extra configuration option, we should add the document about this feature to explain how to use this feature.

Done. Please take another look when you have time.

Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let @zuston take a look

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@jerqi jerqi merged commit 19a8bac into apache:master Jan 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Shuffle server shall report its local storage capability
5 participants