Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#908] feat(tez): Write byte array shuffle data to MapOutput #909

Merged
merged 6 commits into from
May 31, 2023

Conversation

lifeSo
Copy link
Collaborator

@lifeSo lifeSo commented May 28, 2023

What changes were proposed in this pull request?

Write byte array shuffle data to MapOutput

Why are the changes needed?

Fix: # (908)

Does this PR introduce any user-facing change?

No.

How was this patch tested?

test unit

}

@VisibleForTesting
protected static byte[] calcChecksum(final byte[] buffer) throws IOException {
Copy link
Contributor

@jerqi jerqi May 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reuse org.apache.uniffle.common.util.ChecksumUtils#getCrc32?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we did not know this method before, this method is ok and modified.


@VisibleForTesting
protected static byte[] calcChecksum(final byte[] buffer) throws IOException {
DataChecksum sum = DataChecksum.newDataChecksum(DataChecksum.Type.CRC32, Integer.MAX_VALUE);
Copy link
Contributor

@jerqi jerqi May 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the privillege protected? If we hope to have privillege to test this method, we can put the test code and source code into the same package, and we can use default privillege to achieve the aim.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we removed this method and use org.apache.uniffle.common.util.ChecksumUtils#getCrc32

}


public static void write(final FetchedInput mapOutput, byte[] buffer) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test for this method?

private static final Logger LOG = LoggerFactory.getLogger(RssTezBypassWriter.class);
private static final byte[] HEADER = new byte[] { (byte) 'T', (byte) 'I', (byte) 'F', (byte) 0};

public static void write(MapOutput mapOutput, byte[] buffer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test for this method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we added some, is it enough ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

} else if (mapOutput.getType() == MapOutput.Type.DISK) {
// RSS leverages its own compression, it is incompatible with hadoop's disk file compression.
// So we should disable this situation.
throw new IllegalStateException("RSS does not support OnDiskMapOutput as shuffle ouput,"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we throw RssException?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we modified it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we don't modify this exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we missed it, and now modified it.

@jerqi jerqi changed the title [#908] feat(tez): Write byte array shuffle data to MapoutPut [#908] feat(tez): Write byte array shuffle data to MapOutput May 29, 2023
OutputStream output = ((DiskFetchedInput) mapOutput).getOutputStream();
output.write(HEADER);
output.write(buffer);
output.write(Ints.toByteArray((int)ChecksumUtils.getCrc32(buffer)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Em.... it seems different from origin implement. Why do return integer type instead of long time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tez only read 4 byte to do checksum,if write 8 byte, it will wrong when read and do checksum.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@jerqi
Copy link
Contributor

jerqi commented May 30, 2023

Could you rebase or merge master branch?

@jerqi
Copy link
Contributor

jerqi commented May 31, 2023

cc @zhengchenyu

@jerqi
Copy link
Contributor

jerqi commented May 31, 2023

@lifeSo This comment isn't addressed. #909 (comment)

Copy link
Contributor

@jerqi jerqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @lifeSo , wait for CI.

@codecov-commenter
Copy link

Codecov Report

Merging #909 (26e173f) into master (1189580) will increase coverage by 0.95%.
The diff coverage is 20.00%.

@@             Coverage Diff              @@
##             master     #909      +/-   ##
============================================
+ Coverage     55.23%   56.19%   +0.95%     
+ Complexity     2200     2069     -131     
============================================
  Files           333      297      -36     
  Lines         16451    13154    -3297     
  Branches       1308     1232      -76     
============================================
- Hits           9087     7392    -1695     
+ Misses         6851     5352    -1499     
+ Partials        513      410     -103     
Impacted Files Coverage Δ
...mon/shuffle/orderedgrouped/RssTezBypassWriter.java 20.00% <20.00%> (ø)

... and 38 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jerqi jerqi linked an issue May 31, 2023 that may be closed by this pull request
3 tasks
@jerqi jerqi merged commit b4e109e into apache:master May 31, 2023
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Write byte array shuffle data to MapOutput
3 participants