Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#1086] [Doc] Simplify the Gluten code and add the doc #1322

Merged
merged 5 commits into from
Nov 27, 2023

Conversation

summaryzb
Copy link
Contributor

@summaryzb summaryzb commented Nov 17, 2023

What changes were proposed in this pull request?

  1. Simplify the code when integrate with gluten
  2. Add gluten uniffle doc

Why are the changes needed?

Integrate with gluten before gluten and uniffle release

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual test since both projects are not released, after release, I'll add some tests in both projects

@summaryzb
Copy link
Contributor Author

@leixm @yl09099 @jerqi
CommitID 0da1dd1164437a4cd in branch-0.8 seems like incomplete, it has make the unit test fail

@summaryzb
Copy link
Contributor Author

Maybe we should import all the related commits in Dynamic shuffle server assignment , or just remove 0da1dd1164437a4cd

@summaryzb
Copy link
Contributor Author

@advancedxy @xianjingfeng PTAL

@jerqi
Copy link
Contributor

jerqi commented Nov 21, 2023

branch 0.8 has reverted 0da1dd1. Could you rebase this branch?

@@ -162,9 +162,10 @@ public WriteBufferManager(
}

/** add serialized columnar data directly when integrate with gluten */
public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData) {
public List<ShuffleBlockInfo> addPartitionData(
int partitionId, byte[] serializedData, int serializedDataLength) {
Copy link
Contributor Author

@summaryzb summaryzb Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently gluten reuse this serializedData byte array, serializedDataLength may be shorter than length of serializedData, so here change the signature of method addPartitionData

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already another method

public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData, int serializedDataLength, long start) {

I don't think we should change this signature. Caller could simply call the longer one, or we should add a new method:

public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData, int serializedDataLength)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, remove this and change the title

@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (cf25897) 53.86% compared to head (f4d9c14) 54.89%.

Additional details and impacted files
@@               Coverage Diff                @@
##             branch-0.8    #1322      +/-   ##
================================================
+ Coverage         53.86%   54.89%   +1.02%     
  Complexity         2601     2601              
================================================
  Files               391      371      -20     
  Lines             22445    20084    -2361     
  Branches           1879     1879              
================================================
- Hits              12090    11025    -1065     
+ Misses             9645     8420    -1225     
+ Partials            710      639      -71     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@summaryzb summaryzb changed the title [#940] [FollowUp] Simplify the code when integrate with gluten [#940] [FollowUp] Gluten reuse partition byte array Nov 21, 2023
Copy link
Contributor

@advancedxy advancedxy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will be cherry-picked into master, right?

@@ -86,7 +86,7 @@ public class RssShuffleWriter<K, V, C> extends ShuffleWriter<K, V> {
private final Map<Integer, List<ShuffleServerInfo>> partitionToServers;
private final Set<ShuffleServerInfo> shuffleServersForData;
private final long[] partitionLengths;
private final boolean isMemoryShuffleEnabled;
protected final boolean isMemoryShuffleEnabled;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than make this field protected, I would like to expose a method as follows:

protected boolean isMemoryShuffleEnabled() {
  return isMemoryShuffleEnabled;
}

And access it anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protected field and method can be accessed in the same scope, and the final field can not be changed except in the constructor.

@@ -162,9 +162,10 @@ public WriteBufferManager(
}

/** add serialized columnar data directly when integrate with gluten */
public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData) {
public List<ShuffleBlockInfo> addPartitionData(
int partitionId, byte[] serializedData, int serializedDataLength) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already another method

public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData, int serializedDataLength, long start) {

I don't think we should change this signature. Caller could simply call the longer one, or we should add a new method:

public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData, int serializedDataLength)

assertEquals(1, spyManager.getBuffers().size());
assertEquals(0, shuffleBlockInfos.size());
shuffleBlockInfos = spyManager.addPartitionData(0, new byte[64]);
shuffleBlockInfos = spyManager.addPartitionData(0, new byte[64], 64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also add a case when serializedDataLength is less than byteArray.size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

@xianjingfeng
Copy link
Member

xianjingfeng commented Nov 22, 2023

I am publishing 0.8.0-rc3. Is this pr necessary. @summaryzb

@summaryzb
Copy link
Contributor Author

I am publishing 0.8.0-rc3. Is this pr necessary. @summaryzb

@xianjingfeng Yes, thanks

@summaryzb
Copy link
Contributor Author

partitionId, byte[] serializedData, int serializedDataLength) {

@summaryzb summaryzb closed this Nov 22, 2023
@summaryzb summaryzb reopened this Nov 22, 2023
@summaryzb
Copy link
Contributor Author

This PR will be cherry-picked into master, right?
Yes

@summaryzb summaryzb changed the title [#940] [FollowUp] Gluten reuse partition byte array [#1086] [Doc] Add gluten uniffle doc Nov 22, 2023
@summaryzb
Copy link
Contributor Author

@advancedxy PTAL

Copy link
Member

@xianjingfeng xianjingfeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jerqi jerqi changed the title [#1086] [Doc] Add gluten uniffle doc [#1086] [Doc] Simplify the Gluten code and add the doc Nov 25, 2023
@xianjingfeng xianjingfeng merged commit aa25cfa into apache:branch-0.8 Nov 27, 2023
32 checks passed
@xianjingfeng
Copy link
Member

@summaryzb Could you cherry-pick this pr into master?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants