[#615] improvement: Reduce task binary by removing 'partitionToServers' from RssShuffleHandle #637

jiafuzha · 2023-02-21T09:31:43Z

What changes were proposed in this pull request?

move partition -> shuffle servers mapping from direct field of RssShuffleHandle to a broadcast variable to reduce task binary size.

Why are the changes needed?

to reduce task delay and task serialize/deserialize time by reduce task binary size

Does this PR introduce any user-facing change?

No.

How was this patch tested?

tested with 10000 partitions shuffle. Task binary size reduced from more than 670KB to less than 6KB.
tested with multiple shuffle stages in same job to verify ShuffleHandleInfo cache logic

…m RssShuffleHandle Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

codecov-commenter · 2023-02-21T09:40:03Z

Codecov Report

Merging #637 (45e62a5) into master (5882e10) will increase coverage by 2.14%.
The diff coverage is 12.50%.

@@             Coverage Diff              @@
##             master     #637      +/-   ##
============================================
+ Coverage     60.63%   62.78%   +2.14%     
  Complexity     1845     1845              
============================================
  Files           228      215      -13     
  Lines         12702    10753    -1949     
  Branches       1062     1062              
============================================
- Hits           7702     6751     -951     
+ Misses         4592     3655     -937     
+ Partials        408      347      -61

Impacted Files	Coverage Δ
...ava/org/apache/spark/shuffle/RssShuffleHandle.java	`0.00% <0.00%> (ø)`
...va/org/apache/spark/shuffle/ShuffleHandleInfo.java	`0.00% <0.00%> (ø)`
...org/apache/spark/shuffle/RssSparkShuffleUtils.java	`68.26% <50.00%> (-1.12%)`	⬇️

... and 15 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…m RssShuffleHandle Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

advancedxy · 2023-02-21T11:03:38Z

PTAL @xianjingfeng first. I will review it later tonight.

xianjingfeng · 2023-02-21T13:37:44Z

client-spark/common/src/main/java/org/apache/spark/shuffle/PartitionShuffleServerMap.java

+ * Class for holding partition ID -> shuffle servers mapping.
+ * It's to be broadcast to executors and referenced by shuffle tasks.
+ */
+public class PartitionShuffleServerMap {


This class may store other information, such as RemoteStorageInfo . I think we should use a better name. ShuffleHandlerInfo? I hope you have a better idea.

sounds good.

client-spark/common/src/main/java/org/apache/spark/shuffle/RssShuffleHandle.java

xianjingfeng · 2023-02-21T13:41:51Z

client-spark/common/src/main/java/org/apache/spark/shuffle/RssShuffleHandle.java

  public int getShuffleId() {
    return shuffleId();
  }

-  public Set<ShuffleServerInfo> getShuffleServersForData() {


xianjingfeng · 2023-02-21T13:46:06Z

It's OK for me for the basic logic. You can deal with code style and ci first. @zjf2012

client-spark/common/src/main/java/org/apache/spark/shuffle/RssShuffleHandle.java

advancedxy · 2023-02-21T15:35:05Z

client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkShuffleUtils.java

+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import scala.reflect.ClassTag;


the import group order doesn't seem right.

Let's try spark way:

import java import scala import third_party import uniffle

let adjust it accordingly.

advancedxy · 2023-02-21T15:41:23Z

client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkShuffleUtils.java

+   * @param partitionToServers
+   * @return Broadcast variable registered for auto cleanup
+   */
+  public static Broadcast<PartitionShuffleServerMap> createPartShuffleServerMap(


I'm prefer to pass SparkContext as a parameter here. it's much clear and we should pass SparkContext if we are going to add unit test for this method.

advancedxy · 2023-02-21T15:45:18Z

The code lgtm overall. some minor questions:

How was this patch tested?
tested with 10000 partitions shuffle

Do you have some data about the serialized task before and after, is there any slowdown when using broadcast and how many shuffle servers in your environments?

And just to be safe, how much the size of broadcast is occupied when your 10000 partition test? Just to make sure it don't bring too much memory pressure to driver.

jiafuzha · 2023-02-22T01:35:19Z

The code lgtm overall. some minor questions:

How was this patch tested?
tested with 10000 partitions shuffle

Do you have some data about the serialized task before and after, is there any slowdown when using broadcast and how many shuffle servers in your environments?

And just to be safe, how much the size of broadcast is occupied when your 10000 partition test? Just to make sure it don't bring too much memory pressure to driver.

For now, I don't have more servers. So, I only use two shuffle servers. Before my optimization, both map task and reduce task have more than 670KB binary size. After optimization, they reduce to less than 6KB. It's dramatic.

Broadcast uses bittorrent-like way to distribute variable to each executor once. Executors can get some chunk of broadcast variable from other executors instead of all from driver. And task serialize/deserialize time drops a lot. So in theory, it has no way to slow down job.

The size of the broadcast should be less than 670KB deduced from above statement. I'll try to capture it today.

jerqi · 2023-02-22T02:04:12Z

Will the broadcast be cleaned when the shuffle is removed?

jiafuzha · 2023-02-22T02:17:19Z

Will the broadcast be cleaned when the shuffle is removed?

yes. it's registered for cleanup when sparkcontext creates a new broadcast variable. However, the cleaner may not clean it immediately just like below unregisterShuffle method. The cleaner gets chance to capture them (shuffle, broadcast...) only after GC happened. This behavior is good for this broadcast variable since it's small anyway. Will it cause problem in below code snippet? shuffle server may not get cleaned up quickly if there is no right GC in driver.

@Override public boolean unregisterShuffle(int shuffleId) { try { if (SparkEnv.get().executorId().equals("driver")) { shuffleWriteClient.unregisterShuffle(appId, shuffleId); } } catch (Exception e) { LOG.warn("Errors on unregister to remote shuffle-servers", e); } return true; }

jerqi · 2023-02-22T03:00:03Z

Will the broadcast be cleaned when the shuffle is removed?

yes. it's registered for cleanup when sparkcontext creates a new broadcast variable. However, the cleaner may not clean it immediately just like below unregisterShuffle method. The cleaner gets chance to capture them (shuffle, broadcast...) only after GC happened. This behavior is good for this broadcast variable since it's small anyway. Will it cause problem in below code snippet? shuffle server may not get cleaned up quickly if there is no right GC in driver.

@Override public boolean unregisterShuffle(int shuffleId) { try { if (SparkEnv.get().executorId().equals("driver")) { shuffleWriteClient.unregisterShuffle(appId, shuffleId); } } catch (Exception e) { LOG.warn("Errors on unregister to remote shuffle-servers", e); } return true; }

Ok

jiafuzha · 2023-02-22T09:30:51Z

identified two problems,

size of the newly created broadcast variable is large than original task binary.
The broadcasted ShuffleHandleInfo is not registered to kryoserializer. It means all job will fail if someone set "spark.kryo.registerRequired" to true.

I need to serialize ShuffleHandleInfo to binary with spark closureSerialize before spark kryoserializing and broadcasting it. Working on it.

…m RssShuffleHandle Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

jiafuzha · 2023-02-23T08:22:47Z

The code lgtm overall. some minor questions:

How was this patch tested?
tested with 10000 partitions shuffle

Do you have some data about the serialized task before and after, is there any slowdown when using broadcast and how many shuffle servers in your environments?
And just to be safe, how much the size of broadcast is occupied when your 10000 partition test? Just to make sure it don't bring too much memory pressure to driver.

For now, I don't have more servers. So, I only use two shuffle servers. Before my optimization, both map task and reduce task have more than 670KB binary size. After optimization, they reduce to less than 6KB. It's dramatic.

Broadcast uses bittorrent-like way to distribute variable to each executor once. Executors can get some chunk of broadcast variable from other executors instead of all from driver. And task serialize/deserialize time drops a lot. So in theory, it has no way to slow down job.

The size of the broadcast should be less than 670KB deduced from above statement. I'll try to capture it today.

Before my patch, "task + partition -> shuffle servers " size in binary is about 83.6KiB. Deserialized size is about 659.8KiB.

2023-02-22 13:36:13,581 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 83.6 KiB, free 23.4 GiB)
2023-02-22 13:36:13,582 INFO broadcast.TorrentBroadcast: Reading broadcast variable 2 took 4 ms
2023-02-22 13:36:13,583 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 659.8 KiB, free 23.4 GiB)

With my patch, the broadcasted variable size in binary is about 80.6KiB < 83.6KiB. Deserialized size is about 655KiB < 659.8KiB . They are as expected.

2023-02-23 12:43:20,015 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 80.6 KiB, free 23.4 GiB)
2023-02-23 12:43:20,015 INFO broadcast.TorrentBroadcast: Reading broadcast variable 1 took 4 ms
2023-02-23 12:43:20,017 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 655.0 KiB, free 23.4 GiB)

jiafuzha · 2023-02-23T08:28:50Z

Please see vanished task deserialization time with patch from https://docs.google.com/document/d/1TZ-3Mgwj9j7n1mMyCrS3sskFv_uOtUtXl1oF9Y_oMOw/edit?usp=sharing.

jerqi · 2023-02-24T08:51:08Z

…m RssShuffleHandle

What changes were proposed in this pull request?

move partition -> shuffle servers mapping from direct field of RssShuffleHandle to a broadcast variable to reduce task binary size.

Why are the changes needed?

to reduce task delay and task serialize/deserialize time by reduce task binary size

Does this PR introduce any user-facing change?

No.

How was this patch tested?

tested with 10000 partitions shuffle. Task binary size reduced from more than 670KB to less than 6KB.

tested with multiple shuffle stages in same job to verify ShuffleHandleInfo cache logic

Could you modify the title and description? It seems that the title is too long, some words of title were added to description.

client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java

client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java

…m RssShuffleHandle Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

jerqi · 2023-03-04T16:34:38Z

@advancedxy @xianjingfeng Do you have another suggestion?

xianjingfeng

LGTM

advancedxy · 2023-03-05T12:24:31Z

@advancedxy @xianjingfeng Do you have another suggestion?

I'm not sure, this pr introduced some quite complex logic to broadcast shuffle handle info.

If I was implementing this feature, I would just use Kryo by default, and in RSSShuffleManager indicating users to either
turn off spark.kryo.registerRequired (which is explicitly set by user) or manually register RssShuffleHandle.

jiafuzha · 2023-03-06T02:06:21Z

@advancedxy @xianjingfeng Do you have another suggestion?

I'm not sure, this pr introduced some quite complex logic to broadcast shuffle handle info.

If I was implementing this feature, I would just use Kryo by default, and in RSSShuffleManager indicating users to either turn off spark.kryo.registerRequired (which is explicitly set by user) or manually register RssShuffleHandle.

Just make sure we are on the same page. Only registering RssShuffleHandle to kryo serializer doesn't help resolve this issue. Each task will still have more than 670KB size in binary. for 10000 partitions job. And each task will have quite noticeable deserialization time to repeatedly deserialize partition -> shuffle server mappings as shown in https://docs.google.com/document/d/1TZ-3Mgwj9j7n1mMyCrS3sskFv_uOtUtXl1oF9Y_oMOw/edit?usp=sharing.

RssShuffleHandle is a field of ShuffleDependency which is serialized to task binary in below code in DAGScheduler. Without broadcast of ShuffleHandleInfo, it's hard to pull it out and avoid repeat of ShuffleHandleInfo.

` RDDCheckpointData.synchronized {
taskBinaryBytes = stage match {
case stage: ShuffleMapStage =>
JavaUtils.bufferToArray(
closureSerializer.serialize((stage.rdd, stage.shuffleDep): AnyRef))
case stage: ResultStage =>
JavaUtils.bufferToArray(closureSerializer.serialize((stage.rdd, stage.func): AnyRef))
}

    partitions = stage.rdd.partitions
  }`

I think broadcast itself is quite simple and efficient. I don't see other better alternatives.

advancedxy · 2023-03-06T03:46:05Z

@advancedxy @xianjingfeng Do you have another suggestion?

I'm not sure, this pr introduced some quite complex logic to broadcast shuffle handle info.
If I was implementing this feature, I would just use Kryo by default, and in RSSShuffleManager indicating users to either turn off spark.kryo.registerRequired (which is explicitly set by user) or manually register RssShuffleHandle.

Just make sure we are on the same page. Only registering RssShuffleHandle to kryo serializer doesn't help resolve this issue. Each task will still have more than 670KB size in binary. for 10000 partitions job. And each task will have quite noticeable deserialization time to repeatedly deserialize partition -> shuffle server mappings as shown in https://docs.google.com/document/d/1TZ-3Mgwj9j7n1mMyCrS3sskFv_uOtUtXl1oF9Y_oMOw/edit?usp=sharing.

RssShuffleHandle is a field of ShuffleDependency which is serialized to task binary in below code in DAGScheduler. Without broadcast of ShuffleHandleInfo, it's hard to pull it out and avoid repeat of ShuffleHandleInfo.

` RDDCheckpointData.synchronized { taskBinaryBytes = stage match { case stage: ShuffleMapStage => JavaUtils.bufferToArray( closureSerializer.serialize((stage.rdd, stage.shuffleDep): AnyRef)) case stage: ResultStage => JavaUtils.bufferToArray(closureSerializer.serialize((stage.rdd, stage.func): AnyRef)) }
    partitions = stage.rdd.partitions
  }`
I think broadcast itself is quite simple and efficient. I don't see other better alternatives.

Sorry I wasn't clear enough. I agree broadcast is the best way to go. I'm just not sure that we should broadcast binary rather than just broadcast the shuffle handle info simply. The only problem that we don't broadcast shuffle handle info directly is that once spark.kryo.registerRequired(which is false by default) is true, the job will fail. To avoid this problem, we introduce quite some complex logic to deserialize and cache and weak ref etc. I'm wondering if that's worth it.

jiafuzha · 2023-03-06T03:49:53Z

@advancedxy @xianjingfeng Do you have another suggestion?

I'm not sure, this pr introduced some quite complex logic to broadcast shuffle handle info.
If I was implementing this feature, I would just use Kryo by default, and in RSSShuffleManager indicating users to either turn off spark.kryo.registerRequired (which is explicitly set by user) or manually register RssShuffleHandle.

Just make sure we are on the same page. Only registering RssShuffleHandle to kryo serializer doesn't help resolve this issue. Each task will still have more than 670KB size in binary. for 10000 partitions job. And each task will have quite noticeable deserialization time to repeatedly deserialize partition -> shuffle server mappings as shown in https://docs.google.com/document/d/1TZ-3Mgwj9j7n1mMyCrS3sskFv_uOtUtXl1oF9Y_oMOw/edit?usp=sharing.
RssShuffleHandle is a field of ShuffleDependency which is serialized to task binary in below code in DAGScheduler. Without broadcast of ShuffleHandleInfo, it's hard to pull it out and avoid repeat of ShuffleHandleInfo.
` RDDCheckpointData.synchronized { taskBinaryBytes = stage match { case stage: ShuffleMapStage => JavaUtils.bufferToArray( closureSerializer.serialize((stage.rdd, stage.shuffleDep): AnyRef)) case stage: ResultStage => JavaUtils.bufferToArray(closureSerializer.serialize((stage.rdd, stage.func): AnyRef)) }
    partitions = stage.rdd.partitions
  }`
I think broadcast itself is quite simple and efficient. I don't see other better alternatives.
Sorry I wasn't clear enough. I agree broadcast is the best way to go. I'm just not sure that we should broadcast binary rather than just broadcast the shuffle handle info simply. The only problem that we don't broadcast shuffle handle info directly is that once spark.kryo.registerRequired(which is false by default) is true, the job will fail. To avoid this problem, we introduce quite some complex logic to deserialize and cache and weak ref etc. I'm wondering if that's worth it.

no worries. You guys can decide which option is better. I can revert back if you like.

advancedxy · 2023-03-06T06:05:49Z

@advancedxy @xianjingfeng Do you have another suggestion?

I'm not sure, this pr introduced some quite complex logic to broadcast shuffle handle info.
If I was implementing this feature, I would just use Kryo by default, and in RSSShuffleManager indicating users to either turn off spark.kryo.registerRequired (which is explicitly set by user) or manually register RssShuffleHandle.

Just make sure we are on the same page. Only registering RssShuffleHandle to kryo serializer doesn't help resolve this issue. Each task will still have more than 670KB size in binary. for 10000 partitions job. And each task will have quite noticeable deserialization time to repeatedly deserialize partition -> shuffle server mappings as shown in https://docs.google.com/document/d/1TZ-3Mgwj9j7n1mMyCrS3sskFv_uOtUtXl1oF9Y_oMOw/edit?usp=sharing.
RssShuffleHandle is a field of ShuffleDependency which is serialized to task binary in below code in DAGScheduler. Without broadcast of ShuffleHandleInfo, it's hard to pull it out and avoid repeat of ShuffleHandleInfo.
` RDDCheckpointData.synchronized { taskBinaryBytes = stage match { case stage: ShuffleMapStage => JavaUtils.bufferToArray( closureSerializer.serialize((stage.rdd, stage.shuffleDep): AnyRef)) case stage: ResultStage => JavaUtils.bufferToArray(closureSerializer.serialize((stage.rdd, stage.func): AnyRef)) }
    partitions = stage.rdd.partitions
  }`
I think broadcast itself is quite simple and efficient. I don't see other better alternatives.
Sorry I wasn't clear enough. I agree broadcast is the best way to go. I'm just not sure that we should broadcast binary rather than just broadcast the shuffle handle info simply. The only problem that we don't broadcast shuffle handle info directly is that once spark.kryo.registerRequired(which is false by default) is true, the job will fail. To avoid this problem, we introduce quite some complex logic to deserialize and cache and weak ref etc. I'm wondering if that's worth it.
no worries. You guys can decide which option is better. I can revert back if you like.

Thanks for your understanding. Let's take a quick vote here, cc @zuston @jerqi @xianjingfeng @kaijchen

if you are going to vote for current impl, please react with 🚀
if you are going to vote on a simpler impl, a.k.a broadcast shuffle handle info directly, please react with 🎉 .

smallzhongfeng · 2023-03-04T08:21:37Z

client-spark/common/src/main/java/org/apache/spark/shuffle/RssShuffleHandle.java

+  // shuffle ID to ShuffleIdRef
+  // ShuffleIdRef acts as strong reference to prevent cached ShuffleHandleInfo being GCed during shuffle
+  // ShuffleIdRef will be removed when unregisterShuffle()
+  private static Map<Integer, ShuffleIdRef> _globalShuffleIdRefMap = new ConcurrentHashMap<>();


It seems that the underline does not conform to the name definition specification of java. Right?

yes. it's just a convention to denote special variable. If it's violate uniffle style, I can change it.

smallzhongfeng · 2023-03-04T08:21:46Z

client-spark/common/src/main/java/org/apache/spark/shuffle/RssShuffleHandle.java

+  // ShuffleIdRef will be removed when unregisterShuffle()
+  private static Map<Integer, ShuffleIdRef> _globalShuffleIdRefMap = new ConcurrentHashMap<>();
+  // each shuffle has unique ID even for multiple concurrent running shuffles and jobs per application
+  private static ThreadLocal<HandleInfoLocalCache> _localHandleInfoCache =


…m RssShuffleHandle Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

advancedxy · 2023-03-09T06:32:33Z

Hi, @zjf2012 seems that the majority of us think it's better to simply broadcast RssShuffleHandle object here, requires that spark.kryo.registerRequired=false. Would you mind to change it back?

jiafuzha · 2023-03-09T06:34:45Z

Hi, @zjf2012 seems that the majority of us think it's better to simply broadcast RssShuffleHandle object here, requires that spark.kryo.registerRequired=false. Would you mind to change it back?

sure, I'll change it back tomorrow or next week.

…m RssShuffleHandle revert to ShuffleHandleInfo without ser/deser Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

smallzhongfeng

LGTM! Thanks @zjf2012

advancedxy

LGTM, except one minor comment.

client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkShuffleUtils.java

advancedxy · 2023-03-13T06:42:49Z

thanks @zjf2012. I noticed there's no added UTs here, do you think it's possible to add new UTs or existing UTs are sufficient?

jiafuzha · 2023-03-13T06:54:05Z

thanks @zjf2012. I noticed there's no added UTs here, do you think it's possible to add new UTs or existing UTs are sufficient?

You are welcome. I think current UT is good enough since my changes only wrap up some existing things. No complex logic involved. If I add some UT, it would be mockups for the broadcast, no much actual business logic.

xianjingfeng

LGTM

xianjingfeng · 2023-03-14T02:02:26Z

Thanks @zjf2012

…Servers' from RssShuffleHandle (apache#637) ### What changes were proposed in this pull request? move partition -> shuffle servers mapping from direct field of RssShuffleHandle to a broadcast variable to reduce task binary size. ### Why are the changes needed? to reduce task delay and task serialize/deserialize time by reduce task binary size ### Does this PR introduce any user-facing change? No. ### How was this patch tested? tested with 10000 partitions shuffle. Task binary size reduced from more than 670KB to less than 6KB. tested with multiple shuffle stages in same job to verify ShuffleHandleInfo cache logic

zuston · 2023-08-15T08:25:14Z

Will the broadcast be cleaned when the shuffle is removed?

yes. it's registered for cleanup when sparkcontext creates a new broadcast variable. However, the cleaner may not clean it immediately just like below unregisterShuffle method. The cleaner gets chance to capture them (shuffle, broadcast...) only after GC happened. This behavior is good for this broadcast variable since it's small anyway. Will it cause problem in below code snippet? shuffle server may not get cleaned up quickly if there is no right GC in driver.

@Override public boolean unregisterShuffle(int shuffleId) { try { if (SparkEnv.get().executorId().equals("driver")) { shuffleWriteClient.unregisterShuffle(appId, shuffleId); } } catch (Exception e) { LOG.warn("Errors on unregister to remote shuffle-servers", e); } return true; }

I don't see the logic of destroy the broadcast var explicitly. Do we need to destroy it in the unregisterShuffle method?

[Improvement] Reduce task binary by removing 'partitionToServers' fro…

f50380b

…m RssShuffleHandle Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

jiafuzha mentioned this pull request Feb 21, 2023

[Improvement] Reduce task binary by removing 'partitionToServers' from RssShuffleHandle #615

Closed

3 tasks

[Improvement] Reduce task binary by removing 'partitionToServers' fro…

c4084e4

…m RssShuffleHandle Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

advancedxy requested a review from xianjingfeng February 21, 2023 11:02

xianjingfeng reviewed Feb 21, 2023

View reviewed changes

client-spark/common/src/main/java/org/apache/spark/shuffle/RssShuffleHandle.java Show resolved Hide resolved

xianjingfeng reviewed Feb 21, 2023

View reviewed changes

xianjingfeng linked an issue Feb 21, 2023 that may be closed by this pull request

[Improvement] Reduce task binary by removing 'partitionToServers' from RssShuffleHandle #615

Closed

3 tasks

advancedxy reviewed Feb 21, 2023

View reviewed changes

jiafuzha added 2 commits February 23, 2023 08:14

[Improvement] Reduce task binary by removing 'partitionToServers' fro…

18a830d

…m RssShuffleHandle Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

[Improvement] Reduce task binary by removing 'partitionToServers' fro…

09a7e60

…m RssShuffleHandle Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

jiafuzha requested review from advancedxy and xianjingfeng and removed request for advancedxy and xianjingfeng February 23, 2023 08:23

xianjingfeng reviewed Mar 2, 2023

View reviewed changes

client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java Outdated Show resolved Hide resolved

xianjingfeng reviewed Mar 2, 2023

View reviewed changes

client-spark/spark2/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java Outdated Show resolved Hide resolved

xianjingfeng reviewed Mar 2, 2023

View reviewed changes

client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java Outdated Show resolved Hide resolved

jiafuzha requested a review from xianjingfeng March 2, 2023 03:08

[Improvement] Reduce task binary by removing 'partitionToServers' fro…

3087e2f

…m RssShuffleHandle Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

xianjingfeng previously approved these changes Mar 5, 2023

View reviewed changes

smallzhongfeng reviewed Mar 8, 2023

View reviewed changes

jiafuzha dismissed xianjingfeng’s stale review via a17e6ad March 9, 2023 01:55

[Improvement] Reduce task binary by removing 'partitionToServers' fro…

a17e6ad

…m RssShuffleHandle Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

jiafuzha added 2 commits March 13, 2023 02:09

[Improvement] Reduce task binary by removing 'partitionToServers' fro…

1e65b27

…m RssShuffleHandle revert to ShuffleHandleInfo without ser/deser Signed-off-by: jiafu zhang <jiafu.zhang@intel.com>

Merge remote-tracking branch 'upstream/master' into ISSUE_615

45e62a5

jiafuzha requested a review from advancedxy March 13, 2023 03:09

smallzhongfeng approved these changes Mar 13, 2023

View reviewed changes

advancedxy approved these changes Mar 13, 2023

View reviewed changes

client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkShuffleUtils.java Show resolved Hide resolved

xianjingfeng approved these changes Mar 14, 2023

View reviewed changes

xianjingfeng merged commit 0a2cec9 into apache:master Mar 14, 2023

[#615] improvement: Reduce task binary by removing 'partitionToServers' from RssShuffleHandle #637

[#615] improvement: Reduce task binary by removing 'partitionToServers' from RssShuffleHandle #637

Conversation

jiafuzha commented Feb 21, 2023 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

codecov-commenter commented Feb 21, 2023 • edited Loading

Codecov Report

advancedxy commented Feb 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xianjingfeng commented Feb 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy commented Feb 21, 2023

jiafuzha commented Feb 22, 2023

jerqi commented Feb 22, 2023

jiafuzha commented Feb 22, 2023

jerqi commented Feb 22, 2023

jiafuzha commented Feb 22, 2023

jiafuzha commented Feb 23, 2023

jiafuzha commented Feb 23, 2023

jerqi commented Feb 24, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

jerqi commented Mar 4, 2023 • edited Loading

xianjingfeng left a comment

Choose a reason for hiding this comment

advancedxy commented Mar 5, 2023

jiafuzha commented Mar 6, 2023

advancedxy commented Mar 6, 2023

jiafuzha commented Mar 6, 2023

advancedxy commented Mar 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

advancedxy commented Mar 9, 2023

jiafuzha commented Mar 9, 2023

smallzhongfeng left a comment

Choose a reason for hiding this comment

advancedxy left a comment

Choose a reason for hiding this comment

advancedxy commented Mar 13, 2023

jiafuzha commented Mar 13, 2023

xianjingfeng left a comment

Choose a reason for hiding this comment

xianjingfeng commented Mar 14, 2023

zuston commented Aug 15, 2023

jiafuzha commented Feb 21, 2023 •

edited

Loading

codecov-commenter commented Feb 21, 2023 •

edited

Loading

jerqi commented Mar 4, 2023 •

edited

Loading