-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1086] [Doc] Simplify the Gluten code and add the doc #1322
Conversation
Maybe we should import all the related commits in Dynamic shuffle server assignment , or just remove |
@advancedxy @xianjingfeng PTAL |
branch 0.8 has reverted 0da1dd1. Could you rebase this branch? |
3b9d145
to
526d8e8
Compare
@@ -162,9 +162,10 @@ public WriteBufferManager( | |||
} | |||
|
|||
/** add serialized columnar data directly when integrate with gluten */ | |||
public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData) { | |||
public List<ShuffleBlockInfo> addPartitionData( | |||
int partitionId, byte[] serializedData, int serializedDataLength) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently gluten reuse this serializedData
byte array, serializedDataLength
may be shorter than length of serializedData
, so here change the signature of method addPartitionData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already another method
public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData, int serializedDataLength, long start) {
I don't think we should change this signature. Caller could simply call the longer one, or we should add a new method:
public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData, int serializedDataLength)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, remove this and change the title
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## branch-0.8 #1322 +/- ##
================================================
+ Coverage 53.86% 54.89% +1.02%
Complexity 2601 2601
================================================
Files 391 371 -20
Lines 22445 20084 -2361
Branches 1879 1879
================================================
- Hits 12090 11025 -1065
+ Misses 9645 8420 -1225
+ Partials 710 639 -71 ☔ View full report in Codecov by Sentry. |
2b01113
to
d4751b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR will be cherry-picked into master, right?
@@ -86,7 +86,7 @@ public class RssShuffleWriter<K, V, C> extends ShuffleWriter<K, V> { | |||
private final Map<Integer, List<ShuffleServerInfo>> partitionToServers; | |||
private final Set<ShuffleServerInfo> shuffleServersForData; | |||
private final long[] partitionLengths; | |||
private final boolean isMemoryShuffleEnabled; | |||
protected final boolean isMemoryShuffleEnabled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than make this field protected
, I would like to expose a method as follows:
protected boolean isMemoryShuffleEnabled() {
return isMemoryShuffleEnabled;
}
And access it anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protected field and method can be accessed in the same scope, and the final field can not be changed except in the constructor.
@@ -162,9 +162,10 @@ public WriteBufferManager( | |||
} | |||
|
|||
/** add serialized columnar data directly when integrate with gluten */ | |||
public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData) { | |||
public List<ShuffleBlockInfo> addPartitionData( | |||
int partitionId, byte[] serializedData, int serializedDataLength) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's already another method
public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData, int serializedDataLength, long start) {
I don't think we should change this signature. Caller could simply call the longer one, or we should add a new method:
public List<ShuffleBlockInfo> addPartitionData(int partitionId, byte[] serializedData, int serializedDataLength)
assertEquals(1, spyManager.getBuffers().size()); | ||
assertEquals(0, shuffleBlockInfos.size()); | ||
shuffleBlockInfos = spyManager.addPartitionData(0, new byte[64]); | ||
shuffleBlockInfos = spyManager.addPartitionData(0, new byte[64], 64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also add a case when serializedDataLength is less than byteArray.size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
I am publishing 0.8.0-rc3. Is this pr necessary. @summaryzb |
@xianjingfeng Yes, thanks |
|
|
@advancedxy PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@summaryzb Could you cherry-pick this pr into master? |
What changes were proposed in this pull request?
Why are the changes needed?
Integrate with gluten before gluten and uniffle release
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Manual test since both projects are not released, after release, I'll add some tests in both projects