-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only 32 bits of randomness, right? #10
Comments
Hey norpan! I am not very deeply versed in pseudo random number generators, but when the Random.PCG library is used correctly (by using generators instead of re-seeding for every UUID) then AFAIK what is primarily important for UUID collisions is not the bit count of the initial base seed but the period of the PRNG. IIRC this is significantly higher than 2^32 for PCG implementations (it seems the default period for PCG is 2^63 but I might have misread that). Additionally, because of the way the UUID implementation generates random numbers, it is unlikely that the same 31 hex digit sequence would be generated after a single period. (please consult http://www.pcg-random.org/rng-basics.html on how PCG basically works). So I am relatively confident that the threshold where we cross above a 25% chance of regenerating UUIDs is significantly higher that the 50 000 you estimate (a quick lookup leads me to believe it should happen at about a billion generated UUIDs). But as I said, I am certainly no expert and would be very interested in your findings if you research this in more depth!. @mgold, the author of the elm Random.PCG library might be a good person to talk to about these things. |
Yes, so the period may be higher, which is good for the single-session case. The worst-case scenario is that these 50k uuids are generated in by different sessions so then there is still only 32 bit (about 4 billion different uuids) generated, since the PRNG is reseeded on each session. So, my use case is around a thousand users of users over multiple sessions, and generating duplicate uuids is not good. I just wanted to check if this was something you are aware of. It's possible to handle duplicates, but I'd rather not. |
At one point Pcg did use the 64-bit variation that would not have this problem. It was changed to a 32-bit version for performance reason (JS does not have 64-bit ints). I think your best bet at this point might be to generate 10k uuids a few times and see if there's a collision. |
I thought a bit more about this and this is a pretty nasty gotcha for UUIDs I think. @mgold, is there a way to get back a 64 bit seed for your pcg library? Or would mean in effect duplicating the library? |
Look at |
You might be interested in my just published Pcg-Extended library. The purpose of this library is to make the most of your random seed. For this library, this means you would have to seed it with at least 4 32-bit ints to get the necessary 128-bit of randomness for the UUID |
@danyx23 Would you be open for a PR that adds support for my Pcg-Extended library? uuidGenerator : Random.Pcg.Extended.Generator Uuid Then a user can get enough randomness if needed. I can also create a fork if you'd prefer not having both options in one package. |
For sure, although it's not my package :) |
oh right, I meant @danyx23 , sorry to ping you instead 😄 |
Hey @Zinggi yes definitely. I'm even thinking about replacing Pcg-Extended. The reason I originally went with Pcg-Extended back in Elm 0.16 or so was that it provided more randomness than the default implementation. I think when you use a UUID library you should always get enough bits of randomness to get somewhere close to the potential 123 bits of randomness in a V4 UUID. What do you think? @mgold what are your thoughts? |
Well just using my Pcg-Extended version, doesn't automatically give you enough bits of randomness. But in any case, I think I've got time to create a PR in a few days. (Probably the day after tomorrow). |
The official elm-lang PRNG will be using the PCG algorithm in 0.19. I think that third-party PRNGs will be something to be avoided at that point. |
So, the issue is not so much the period or the internal state size as the initialization. If you have a 32 bit seed you can have as long a period as you like, but you still only have 2^32 different random series. So if you have a few hundred thousand users generating uuids independently, a collision is still possible. |
Yeah it's a bit of a tricky situation now. So with the upcoming changes to 0.19, what do you think about this: we make this version use the then default PCG PRNG with no further external dependencies, and we put a big disclaimer in this version saying spelling out the details which AFAICS boil down to: collisions are unlikely even for relatively high numbers of UUIDs from a single client, but because the seed is in effect a 32bit int collisions are a real possibility if you start generating UUIDs from more than a few thousand clients. @norpan you could then publish your fork as an official alternative that I would link from this library. It would pull in your RPE library and could specifically focus on explaining how to get a good enough seed etc.. What do you think? |
@danyx23 you probably meant me ;) I'm gonna publish my fork, probably some time next week. |
Sounds good.
Provided your a managing the seeds correctly this should be true. (And if you don't manage the seeds correctly, any PRNG will give collisions.) @Zinggi Before you publish your fork, please consider: if someone needs really, really random uuids, would they be better off passing them in from a port and having the option of secure randomness? There is a cost to polluting the package ecosystem with similar packages. |
Cool, thank you all for the good discussion. I will make the changes to this repo once 0.19 is out (but can link a fork before that). |
We have actually used ports and a javascript uuid generator that uses good random numbers already so it's not an issue for us anymore. However, I still think that a uuid generator should by default have as good a random source as possible. And 32 bits initial seed is a bit low. If that's what you get in Elm, so be it, perhaps it would make sense to work on making the built-in random generator have a larger seed instead. |
I'm in a position where I need random and unique UUIDs across many users for an application I'm working on. For my application passing a generated UUID via port would be an inferior solution to the one I currently use, which is passing in a random seed from
I agree, but if there is just a single user with similar needs to me, I think it's worthy to be published.
It makes sense that the default random generator uses a small seed and internal state, as that's most efficient. For most applications where you need randomness, the normal PCG offers a great trade off. After 0.19 is out, an official way to access |
I published it now: http://package.elm-lang.org/packages/Zinggi/elm-uuid/latest |
For anyone else who reaches here and is disheartened to see that @Zinggi's fork is not compatible with Elm 0.19.1 (and won't be), check out NoRedInk/elm-uuid =) Probably best to update the fork link in the readme here |
I looked at your code and it just uses 32 bits of randomness (because the seed of Random.PCG is a 32 bit int), if I'm not completely mistaken. This means that according to https://en.wikipedia.org/wiki/Birthday_attack#Mathematics if you generate 50,000 uuids you will have a 25% chance of a collision.
Would be great if there was an easy way to access
window.crypto.getRandomValues()
.The text was updated successfully, but these errors were encountered: