[Improvement][AQE] Support getting memory data skip by upstream task ids #358

zuston · 2022-11-25T11:36:03Z

What changes were proposed in this pull request?

Support getting memory data skip by upstream task ids

Why are the changes needed?

In current codebase, when the shuffle-server memory is large and
job is optimized by AQE skew rule, the multiple readers of the same
partition will get the shuffle data from the same shuffle-server.

To avoid reading unused localfile/HDFS data, the PR of #137 has
introduce the LOCAL_ORDER mechanism to filter the most of data.

But for the storage of MEMORY, it still suffer from this. So this PR is to avoid
reading unused data for one reader, by expectedTaskIds bitmap to
filter.

And this optimization is only enabled when AQE skew is applied.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UTs

Benchmark

Table

Table1: 100g, dtypes: Array[(String, String)] = Array((v1,StringType), (k1,IntegerType)).
And all columns of k1 have the same value (value = 10)

Table2: 10 records, dtypes: Array[(String, String)] = Array((k2,IntegerType), (v2,StringType)).
And it has the only one record of k2=10

Env

Spark Resource Profile: 10 executors(1core2g)
Shuffle-server Environment: 10 shuffle servers, 10g for buffer read and write.
Spark Shuffle Client Config: storage type: MEMORY_LOCALFILE with LOCAL_ORDER
SQL: spark.sql("select * from Table1,Table2 where k1 = k2").write.mode("overwrite").parquet("xxxxxx")

Result

ESS: cost 3min
Uniffle without patch: cost 11.6min (2.1 + 9.5)
Uniffle with patch: cost 3.5min (2.1 + 1.4)

zuston · 2022-11-25T11:36:38Z

I have proposed a draft implementation @jerqi @xianjingfeng If you have time, please take a look.

client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java

proto/src/main/proto/Rss.proto

jerqi · 2022-11-28T06:55:01Z

Should we give this pr a performance test?

codecov-commenter · 2022-11-28T06:59:41Z

Codecov Report

Merging #358 (1554faa) into master (ad51341) will increase coverage by 0.55%.
The diff coverage is 17.30%.

@@             Coverage Diff              @@
##             master     #358      +/-   ##
============================================
+ Coverage     58.01%   58.56%   +0.55%     
- Complexity     1361     1586     +225     
============================================
  Files           171      193      +22     
  Lines          9006    10881    +1875     
  Branches        787      953     +166     
============================================
+ Hits           5225     6373    +1148     
- Misses         3449     4132     +683     
- Partials        332      376      +44

Impacted Files	Coverage Δ
...e/uniffle/client/factory/ShuffleClientFactory.java	`0.00% <0.00%> (ø)`
...client/request/CreateShuffleReadClientRequest.java	`0.00% <0.00%> (ø)`
...pache/uniffle/server/ShuffleServerGrpcService.java	`0.81% <0.00%> (-0.02%)`	⬇️
.../org/apache/uniffle/server/ShuffleTaskManager.java	`77.22% <0.00%> (ø)`
...uniffle/storage/factory/ShuffleHandlerFactory.java	`0.00% <0.00%> (ø)`
.../storage/handler/impl/MemoryClientReadHandler.java	`0.00% <0.00%> (ø)`
...orage/request/CreateShuffleReadHandlerRequest.java	`0.00% <0.00%> (ø)`
...che/uniffle/client/impl/ShuffleReadClientImpl.java	`87.87% <33.33%> (-1.82%)`	⬇️
...rg/apache/uniffle/server/buffer/ShuffleBuffer.java	`93.33% <100.00%> (+0.28%)`	⬆️
...he/uniffle/server/buffer/ShuffleBufferManager.java	`82.67% <100.00%> (+0.06%)`	⬆️
... and 24 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

storage/src/main/java/org/apache/uniffle/storage/request/CreateShuffleReadHandlerRequest.java

zuston · 2022-11-28T10:02:28Z

Should we give this pr a performance test?

Performance test has been attached in description. It works well

server/src/main/java/org/apache/uniffle/server/buffer/ShuffleBufferManager.java

jerqi · 2022-11-28T11:07:10Z

There are some conflicts with #276. I will merge #276 first. LGTM except some nits.

jerqi · 2022-11-28T16:53:58Z

client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java

@@ -119,6 +120,9 @@ public RssShuffleReader(
    this.partitionToShuffleServers = rssShuffleHandle.getPartitionToServers();
    this.rssConf = rssConf;
    this.dataDistributionType = dataDistributionType;
+    // This mechanism of expectedTaskIdsBitmap filter is to filter out the most of data.
+    // especially for AQE skew optimization
+    this.expectedTaskIdsBitmapFilterEnable = mapEndIndex == Integer.MAX_VALUE ? false : true;


If this is the last reduce partition, may the range of mapId be [n, Integer.MAX_VALUE]?

So do we need to use the startMapIndex==0 and mapEndIndex==max_value to judge?

Yes, I think it's more accurate.

jerqi · 2022-11-29T02:24:51Z

@xianjingfeng Do you have another suggestion?

xianjingfeng

LGTM

jerqi

LGTM, thanks @zuston @xianjingfeng , I will add @xianjingfeng as this pr's co-author. Because this pr is based on pr #294

jerqi · 2022-11-29T06:34:46Z

cc @bin41215 Maybe you have interest about this pr.

jerqi reviewed Nov 25, 2022

View reviewed changes

client-spark/spark3/src/main/java/org/apache/spark/shuffle/reader/RssShuffleReader.java Outdated Show resolved Hide resolved

jerqi requested a review from xianjingfeng November 26, 2022 05:24

jerqi reviewed Nov 26, 2022

View reviewed changes

proto/src/main/proto/Rss.proto Outdated Show resolved Hide resolved

jerqi reviewed Nov 28, 2022

View reviewed changes

storage/src/main/java/org/apache/uniffle/storage/request/CreateShuffleReadHandlerRequest.java Outdated Show resolved Hide resolved

zuston requested a review from jerqi November 28, 2022 10:28

jerqi reviewed Nov 28, 2022

View reviewed changes

server/src/main/java/org/apache/uniffle/server/buffer/ShuffleBufferManager.java Show resolved Hide resolved

jerqi reviewed Nov 28, 2022

View reviewed changes

zuston added 2 commits November 29, 2022 10:20

[Improvement][AQE] Support getting memory data skip by upstream task ids

ee103c8

fix

1554faa

zuston force-pushed the memory branch from e4c8ed0 to 1554faa Compare November 29, 2022 02:22

xianjingfeng approved these changes Nov 29, 2022

View reviewed changes

jerqi approved these changes Nov 29, 2022

View reviewed changes

jerqi merged commit 0e45f2d into apache:master Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement][AQE] Support getting memory data skip by upstream task ids #358

[Improvement][AQE] Support getting memory data skip by upstream task ids #358

zuston commented Nov 25, 2022 •

edited

zuston commented Nov 25, 2022

jerqi commented Nov 28, 2022

codecov-commenter commented Nov 28, 2022 •

edited

zuston commented Nov 28, 2022 •

edited

jerqi commented Nov 28, 2022

jerqi Nov 28, 2022

zuston Nov 29, 2022

jerqi Nov 29, 2022

zuston Nov 29, 2022

jerqi commented Nov 29, 2022

xianjingfeng left a comment

jerqi left a comment

jerqi commented Nov 29, 2022

[Improvement][AQE] Support getting memory data skip by upstream task ids #358

[Improvement][AQE] Support getting memory data skip by upstream task ids #358

Conversation

zuston commented Nov 25, 2022 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Benchmark

Table

Env

Result

zuston commented Nov 25, 2022

jerqi commented Nov 28, 2022

codecov-commenter commented Nov 28, 2022 • edited

Codecov Report

zuston commented Nov 28, 2022 • edited

jerqi commented Nov 28, 2022

jerqi Nov 28, 2022

Choose a reason for hiding this comment

zuston Nov 29, 2022

Choose a reason for hiding this comment

jerqi Nov 29, 2022

Choose a reason for hiding this comment

zuston Nov 29, 2022

Choose a reason for hiding this comment

jerqi commented Nov 29, 2022

xianjingfeng left a comment

Choose a reason for hiding this comment

jerqi left a comment

Choose a reason for hiding this comment

jerqi commented Nov 29, 2022

zuston commented Nov 25, 2022 •

edited

codecov-commenter commented Nov 28, 2022 •

edited

zuston commented Nov 28, 2022 •

edited