New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23599][SQL] Add a UUID generator from Pseudo-Random Numbers #20817
Conversation
cc @hvanhovell |
Test build #88222 has finished for PR 20817 at commit
|
retest this please |
Test build #88224 has finished for PR 20817 at commit
|
retest this please. |
val mostSigBits = (random.nextLong() & 0xFFFFFFFFFFFF0FFFL) | 0x0000000000004000L | ||
val leastSigBits = (random.nextLong() | 0x8000000000000000L) & 0xBFFFFFFFFFFFFFFFL | ||
|
||
new UUID(mostSigBits, leastSigBits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to use a different RNG. java.util.Random
only has 48 bits of state, which is less than the 122 bits we need for UUID generation. Something like PCG or a Mersenne twister would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Mersenne Twister is used in the update.
Test build #88231 has finished for PR 20817 at commit
|
@viirya umm, it may eat more memory sometime ... |
@kiszk is it taking more memory because of the test? If it does can we make the test case smaller? |
I think this test takes more memory. Unfortunately, when I reduced the size of test, the problem cannot be reproduced in my environment. |
Test build #88251 has finished for PR 20817 at commit
|
Test build #88259 has finished for PR 20817 at commit
|
case class RandomUUIDGenerator(randomSeed: Long) { | ||
private val random = new MersenneTwister(randomSeed) | ||
|
||
def getNextUUID(): UUID = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should also create a version that creates a UTF8String directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I've added it.
Test build #88290 has finished for PR 20817 at commit
|
retest this please. |
Test build #88301 has finished for PR 20817 at commit
|
ping @hvanhovell Is there any more comments? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - merging to master. Thanks!
@hvanhovell Thanks for merging this! I will continue to work on make use of this UUID generator in UUID expression. |
## What changes were proposed in this pull request? This patch adds a UUID generator from Pseudo-Random Numbers. We can use it later to have deterministic `UUID()` expression. ## How was this patch tested? Added unit tests. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #20817 from viirya/SPARK-23599. (cherry picked from commit 4de638c) Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
## What changes were proposed in this pull request? This patch adds a UUID generator from Pseudo-Random Numbers. We can use it later to have deterministic `UUID()` expression. ## How was this patch tested? Added unit tests. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#20817 from viirya/SPARK-23599. (cherry picked from commit 4de638c) Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
What changes were proposed in this pull request?
This patch adds a UUID generator from Pseudo-Random Numbers. We can use it later to have deterministic
UUID()
expression.How was this patch tested?
Added unit tests.