-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29124][CORE] Use MurmurHash3 bytesHash(data, seed)
instead of bytesHash(data)
#25821
Conversation
bytesHash(data, seed)
instead of bytesHash(data)
bytesHash(data, seed)
instead of bytesHash(data)
bytesHash(data, seed)
instead of bytesHash(data)
See #25404 (comment) -- is that the equivalent change? But I agree that we can and should test such a change against the current build to see if it changes behavior. |
This is an opposite approach. This PR makes Apache Spark independent from that Scala patch instead of matching to it. In that PR, I also suggest this approach.
|
Oh, OK. But wouldn't that probably change the behavior of Spark? I guess we'll see here in the tests. |
For me, the PR itself looks solid and safe, but I also believe that Jenkins will make it sure for us soon. 😄 |
Test build #110825 has finished for PR 25821 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure why it works, but tests pass, so it seems OK to me, if you see a reason this will work for Scala 2.12.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it looks simpler than I thought in the bed :) Thank you.
Merged to master. |
Thank you all! |
See comment on other PR - yep I get the idea now, makes sense. |
…f `bytesHash(data)` This PR changes `bytesHash(data)` API invocation with the underlaying `byteHash(data, arraySeed)` invocation. ```scala def bytesHash(data: Array[Byte]): Int = bytesHash(data, arraySeed) ``` The original API is changed between Scala versions by the following commit. From Scala 2.12.9, the semantic of the function is changed. If we use the underlying form, we are safe during Scala version migration. - scala/scala@846ee2b#diff-ac889f851e109fc4387cd738d52ce177 No. This is a kind of refactoring. Pass the Jenkins with the existing tests. Closes #25821 from dongjoon-hyun/SPARK-SCALA-HASH. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 3ece8ee) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Hi, All. |
Agree. |
…f `bytesHash(data)` This PR changes `bytesHash(data)` API invocation with the underlaying `byteHash(data, arraySeed)` invocation. ```scala def bytesHash(data: Array[Byte]): Int = bytesHash(data, arraySeed) ``` The original API is changed between Scala versions by the following commit. From Scala 2.12.9, the semantic of the function is changed. If we use the underlying form, we are safe during Scala version migration. - scala/scala@846ee2b#diff-ac889f851e109fc4387cd738d52ce177 No. This is a kind of refactoring. Pass the Jenkins with the existing tests. Closes apache#25821 from dongjoon-hyun/SPARK-SCALA-HASH. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
…f `bytesHash(data)` This PR changes `bytesHash(data)` API invocation with the underlaying `byteHash(data, arraySeed)` invocation. ```scala def bytesHash(data: Array[Byte]): Int = bytesHash(data, arraySeed) ``` The original API is changed between Scala versions by the following commit. From Scala 2.12.9, the semantic of the function is changed. If we use the underlying form, we are safe during Scala version migration. - scala/scala@846ee2b#diff-ac889f851e109fc4387cd738d52ce177 No. This is a kind of refactoring. Pass the Jenkins with the existing tests. Closes apache#25821 from dongjoon-hyun/SPARK-SCALA-HASH. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR changes
bytesHash(data)
API invocation with the underlayingbyteHash(data, arraySeed)
invocation.Why are the changes needed?
The original API is changed between Scala versions by the following commit. From Scala 2.12.9, the semantic of the function is changed. If we use the underlying form, we are safe during Scala version migration.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
This is a kind of refactoring.
Pass the Jenkins with the existing tests.