Skip to content

Comments

[SPARK-44993][CORE] Add ShuffleChecksumUtils.compareChecksums by reusing ShuffleChecksumTestHelp.compareChecksums#42707

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-44993
Closed

[SPARK-44993][CORE] Add ShuffleChecksumUtils.compareChecksums by reusing ShuffleChecksumTestHelp.compareChecksums#42707
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-44993

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 28, 2023

What changes were proposed in this pull request?

This PR aims to add ShuffleChecksumUtils.compareChecksums by reusing the existing test code ShuffleChecksumTestHelp.compareChecksums in order to reuse the functionality in the main code.

Why are the changes needed?

This is very useful in the test code. We can take advantage of this verification logic in core module.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs with the existing test codes because this is a kind of refactoring.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Aug 28, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-44993][CORE] Move compareChecksums from ShuffleChecksumTestHelpe to ShuffleChecksumUtils [SPARK-44993][CORE] Move compareChecksums from ShuffleChecksumTestHelp to ShuffleChecksumUtils Aug 28, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-44993][CORE] Move compareChecksums from ShuffleChecksumTestHelp to ShuffleChecksumUtils [SPARK-44993][CORE] Add ShuffleChecksumUtils.compareChecksums by reusing ShuffleChecksumTestHelp..compareChecksums Aug 28, 2023
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-44993][CORE] Add ShuffleChecksumUtils.compareChecksums by reusing ShuffleChecksumTestHelp..compareChecksums [SPARK-44993][CORE] Add ShuffleChecksumUtils.compareChecksums by reusing ShuffleChecksumTestHelp.compareChecksums Aug 28, 2023
@dongjoon-hyun
Copy link
Member Author

core test pipeline passed. Could you review this PR when you have some time, @viirya ?

Screenshot 2023-08-28 at 4 28 40 PM

/**
* Ensure that the checksum values are consistent with index file and data file.
*/
def compareChecksums(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put this into ShuffleChecksumHelper?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShuffleChecksumHelper is test code helper under test/scala/.... So we can not reuse it in the main code.

core/src/test/scala/org/apache/spark/shuffle/ShuffleChecksumTestHelper.scala

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I meant ShuffleChecksumHelper not ShuffleChecksumTestHelper.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misread that.

BTW, I also considered that but I decided not to do because it's Java class which means we need to re-implement the whole logic. I guess you also suggested a moving from Scala to Scala code, not Scala to Java. Or, did you suggest to reimplement with Java?

common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/checksum/ShuffleChecksumHelper.java

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. No, I didn't realized that ShuffleChecksumHelper is Java code. Thanks for reply.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just wondering if we should put it into ShuffleChecksumHelper directly.

@dongjoon-hyun
Copy link
Member Author

Thank you for review. I replied #42707 (comment) .

@dongjoon-hyun
Copy link
Member Author

Merged to master for Apache Spark 4.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-44993 branch August 29, 2023 00:29
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
…using `ShuffleChecksumTestHelp.compareChecksums`

### What changes were proposed in this pull request?

This PR aims to add `ShuffleChecksumUtils.compareChecksums` by reusing the existing test code `ShuffleChecksumTestHelp.compareChecksums` in order to reuse the functionality in the main code.

### Why are the changes needed?

This is very useful in the test code. We can take advantage of this verification logic in `core` module.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the existing test codes because this is a kind of refactoring.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#42707 from dongjoon-hyun/SPARK-44993.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 5db58f9)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants