-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#414] feat(client): support specifying per-partition's max concurrency to write in client side #815
Conversation
Codecov Report
@@ Coverage Diff @@
## master #815 +/- ##
============================================
+ Coverage 57.13% 59.42% +2.29%
- Complexity 2155 2164 +9
============================================
Files 322 303 -19
Lines 15676 13362 -2314
Branches 1246 1247 +1
============================================
- Hits 8956 7940 -1016
+ Misses 6213 4987 -1226
+ Partials 507 435 -72
... and 22 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
common/src/main/java/org/apache/uniffle/common/config/RssClientConf.java
Show resolved
Hide resolved
b57ac1c
to
89b2ed9
Compare
|
||
import org.apache.uniffle.common.ShuffleDataDistributionType; | ||
|
||
public class ShuffleSpecification { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the builder pattern? Because we will add more fields in the future. This class will vary frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense.
return specification.getDistributionType(); | ||
} | ||
|
||
public void setSpecification(ShuffleDataDistributionType type, int maxConcurrency) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use setSpecification(ShuffleSpecification spec)
here? We don't need synchronized, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, we also should use the AtomicReference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Let others take a look.
common/src/main/java/org/apache/uniffle/common/config/RssClientConf.java
Show resolved
Hide resolved
docs/server_guide.md
Outdated
@@ -89,6 +89,7 @@ This document will introduce how to deploy Uniffle shuffle servers. | |||
| rss.server.multistorage.fallback.strategy.class | - | The fallback strategy for `MEMORY_LOCALFILE_HDFS`. Support `org.apache.uniffle.server.storage.RotateStorageManagerFallbackStrategy`,`org.apache.uniffle.server.storage.LocalStorageManagerFallbackStrategy` and `org.apache.uniffle.server.storage.HdfsStorageManagerFallbackStrategy`. If not set, `org.apache.uniffle.server.storage.HdfsStorageManagerFallbackStrategy` will be used. | | |||
| rss.server.leak.shuffledata.check.interval | 3600000 | The interval of leak shuffle data check (ms) | | |||
| rss.server.max.concurrency.of.single.partition.writer | 1 | The max concurrency of single partition writer, the data partition file number is equal to this value. Default value is 1. This config could improve the writing speed, especially for huge partition. | | |||
| rss.server.client.max.concurrency.limit.per-partition.write | - | The limit for max concurrency per-partition write specified by client, this won't be enabled by default. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem right.
It's quite weird to mix rss.server.client
here.
Better call it, rss.server.max.concurrency.of.per-partition.write
?
description:
Server side limit for max concurrency of per partition writer, which could be specified by client side. If client specifies `rss.client.max.concurrency.per-partition.write`, the `rss.server.max.concurrency.of.single.partition.writer` could be overridden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem right.
It's quite weird to mix
rss.server.client
here. Better call it,rss.server.max.concurrency.of.per-partition.write
?description:
Server side limit for max concurrency of per partition writer, which could be specified by client side. If client specifies `rss.client.max.concurrency.per-partition.write`, the `rss.server.max.concurrency.of.single.partition.writer` could be overridden.
+1 . The name is a little weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have rss.server.max.concurrency.of.single.partition.writer
. How about rss.server.max.concurrency.limit.of.per-partition.write
?
And I hope rename rss.server.max.concurrency.of.single.partition.writer
to rss.server.max.concurrency.of.per-partition.write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will cause some compatible problems if we rename config option. If you insist on it, you should add migration guide document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have
rss.server.max.concurrency.of.single.partition.writer
. How aboutrss.server.max.concurrency.limit.of.per-partition.write
?And I hope rename
rss.server.max.concurrency.of.single.partition.writer
torss.server.max.concurrency.of.per-partition.write
That sounds good to me. If you are going to rename. You should deprecate that first in v0.8, new configuration should fall back to that old configuration and maybe remove it from v0.9 or v1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any more update for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I will do but something takes up my time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the renaming of rss.server.max.concurrency.of.single.partition.writer
to rss.server.max.concurrency.of.per-partition.write
will be started after #841 is finished.
We should add some migration documents. |
I don't do any incompatible change. The rename config will delay until #841 finish |
Ping @jerqi |
#841 has merged. You can rename the config option and add the migration guide document. |
I will submit a new PR to do this. This is not related with this PR. |
You would better include this change in this pr. It's not big change. |
OK. I will do. But I still think any modifications that are not related will need to be done in the different pr. |
It's the related changes, isn't it? The feature and its configuration option is related. |
No. This PR only introduces the conf of But the renaming is for the key of |
8c155da
to
a04bb71
Compare
Done @jerqi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @zuston
What changes were proposed in this pull request?
Why are the changes needed?
The PR of #396 has introduced the concurrent HDFS writing for one partition, but the concurrency is determined by the server client. In order to increase flexibility, this PR supports specifying per-partition's max concurrency to write in client side
Does this PR introduce any user-facing change?
Yes. The client conf of
<client_type>.rss.client.max.concurrency.per-partition.write
andrss.server.client.max.concurrency.limit.per-partition.write
are introduced.How was this patch tested?