Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23599][SQL] Add a UUID generator from Pseudo-Random Numbers #20817

Closed
wants to merge 3 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Mar 14, 2018

What changes were proposed in this pull request?

This patch adds a UUID generator from Pseudo-Random Numbers. We can use it later to have deterministic UUID() expression.

How was this patch tested?

Added unit tests.

@viirya
Copy link
Member Author

viirya commented Mar 14, 2018

cc @hvanhovell

@SparkQA
Copy link

SparkQA commented Mar 14, 2018

Test build #88222 has finished for PR 20817 at commit b7bce25.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class RandomUUIDGenerator(random: Random)

@kiszk
Copy link
Member

kiszk commented Mar 14, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Mar 14, 2018

Test build #88224 has finished for PR 20817 at commit b7bce25.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class RandomUUIDGenerator(random: Random)

@viirya
Copy link
Member Author

viirya commented Mar 14, 2018

retest this please.

@viirya
Copy link
Member Author

viirya commented Mar 14, 2018

@kiszk The failed test seems from #20779. Is it still flaky?

val mostSigBits = (random.nextLong() & 0xFFFFFFFFFFFF0FFFL) | 0x0000000000004000L
val leastSigBits = (random.nextLong() | 0x8000000000000000L) & 0xBFFFFFFFFFFFFFFFL

new UUID(mostSigBits, leastSigBits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use a different RNG. java.util.Random only has 48 bits of state, which is less than the 122 bits we need for UUID generation. Something like PCG or a Mersenne twister would work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Mersenne Twister is used in the update.

@SparkQA
Copy link

SparkQA commented Mar 14, 2018

Test build #88231 has finished for PR 20817 at commit b7bce25.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class RandomUUIDGenerator(random: Random)

@kiszk
Copy link
Member

kiszk commented Mar 14, 2018

@viirya umm, it may eat more memory sometime ...

@hvanhovell
Copy link
Contributor

@kiszk is it taking more memory because of the test? If it does can we make the test case smaller?

@kiszk
Copy link
Member

kiszk commented Mar 14, 2018

I think this test takes more memory. Unfortunately, when I reduced the size of test, the problem cannot be reproduced in my environment.

@SparkQA
Copy link

SparkQA commented Mar 15, 2018

Test build #88251 has finished for PR 20817 at commit b2062c7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class RandomUUIDGenerator(randomSeed: Long)

@SparkQA
Copy link

SparkQA commented Mar 15, 2018

Test build #88259 has finished for PR 20817 at commit cd73b4c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class RandomUUIDGenerator(randomSeed: Long)

case class RandomUUIDGenerator(randomSeed: Long) {
private val random = new MersenneTwister(randomSeed)

def getNextUUID(): UUID = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should also create a version that creates a UTF8String directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I've added it.

@SparkQA
Copy link

SparkQA commented Mar 16, 2018

Test build #88290 has finished for PR 20817 at commit 75b80a2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 16, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 16, 2018

Test build #88301 has finished for PR 20817 at commit 75b80a2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Mar 19, 2018

ping @hvanhovell Is there any more comments? Thanks.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - merging to master. Thanks!

@asfgit asfgit closed this in 4de638c Mar 19, 2018
@viirya
Copy link
Member Author

viirya commented Mar 19, 2018

@hvanhovell Thanks for merging this! I will continue to work on make use of this UUID generator in UUID expression.

asfgit pushed a commit that referenced this pull request Mar 25, 2018
## What changes were proposed in this pull request?

This patch adds a UUID generator from Pseudo-Random Numbers. We can use it later to have deterministic `UUID()` expression.

## How was this patch tested?

Added unit tests.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #20817 from viirya/SPARK-23599.

(cherry picked from commit 4de638c)
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
peter-toth pushed a commit to peter-toth/spark that referenced this pull request Oct 6, 2018
## What changes were proposed in this pull request?

This patch adds a UUID generator from Pseudo-Random Numbers. We can use it later to have deterministic `UUID()` expression.

## How was this patch tested?

Added unit tests.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#20817 from viirya/SPARK-23599.

(cherry picked from commit 4de638c)
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
@viirya viirya deleted the SPARK-23599 branch December 27, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants