Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only 32 bits of randomness, right? #10

Open
norpan opened this issue Apr 18, 2017 · 21 comments
Open

Only 32 bits of randomness, right? #10

norpan opened this issue Apr 18, 2017 · 21 comments

Comments

@norpan
Copy link
Contributor

norpan commented Apr 18, 2017

I looked at your code and it just uses 32 bits of randomness (because the seed of Random.PCG is a 32 bit int), if I'm not completely mistaken. This means that according to https://en.wikipedia.org/wiki/Birthday_attack#Mathematics if you generate 50,000 uuids you will have a 25% chance of a collision.

Would be great if there was an easy way to access window.crypto.getRandomValues().

@danyx23
Copy link
Owner

danyx23 commented Apr 18, 2017

Hey norpan! I am not very deeply versed in pseudo random number generators, but when the Random.PCG library is used correctly (by using generators instead of re-seeding for every UUID) then AFAIK what is primarily important for UUID collisions is not the bit count of the initial base seed but the period of the PRNG. IIRC this is significantly higher than 2^32 for PCG implementations (it seems the default period for PCG is 2^63 but I might have misread that). Additionally, because of the way the UUID implementation generates random numbers, it is unlikely that the same 31 hex digit sequence would be generated after a single period. (please consult http://www.pcg-random.org/rng-basics.html on how PCG basically works).

So I am relatively confident that the threshold where we cross above a 25% chance of regenerating UUIDs is significantly higher that the 50 000 you estimate (a quick lookup leads me to believe it should happen at about a billion generated UUIDs). But as I said, I am certainly no expert and would be very interested in your findings if you research this in more depth!. @mgold, the author of the elm Random.PCG library might be a good person to talk to about these things.

@norpan
Copy link
Contributor Author

norpan commented Apr 18, 2017

Yes, so the period may be higher, which is good for the single-session case. The worst-case scenario is that these 50k uuids are generated in by different sessions so then there is still only 32 bit (about 4 billion different uuids) generated, since the PRNG is reseeded on each session.

So, my use case is around a thousand users of users over multiple sessions, and generating duplicate uuids is not good. I just wanted to check if this was something you are aware of. It's possible to handle duplicates, but I'd rather not.

@mgold
Copy link

mgold commented Apr 19, 2017

At one point Pcg did use the 64-bit variation that would not have this problem. It was changed to a 32-bit version for performance reason (JS does not have 64-bit ints).

I think your best bet at this point might be to generate 10k uuids a few times and see if there's a collision.

@danyx23
Copy link
Owner

danyx23 commented Jun 25, 2017

I thought a bit more about this and this is a pretty nasty gotcha for UUIDs I think. @mgold, is there a way to get back a 64 bit seed for your pcg library? Or would mean in effect duplicating the library?

@mgold
Copy link

mgold commented Jun 26, 2017

Look at independentSeed. It's not quite a 2^64 period generator. Rather it lets you pick between 2^32 generators with a 2^32 period each. Does that help? Otherwise you'd have to duplicate the library.

@Zinggi
Copy link

Zinggi commented Dec 20, 2017

You might be interested in my just published Pcg-Extended library.

The purpose of this library is to make the most of your random seed.
E.g. if you seed pcg-extended with 8 32-bit ints from window.crypto.getRandomValues(), you get to use all 2^(32*8)-bits of randomness and an equally large period.

For this library, this means you would have to seed it with at least 4 32-bit ints to get the necessary 128-bit of randomness for the UUID

@Zinggi
Copy link

Zinggi commented Mar 16, 2018

@danyx23 Would you be open for a PR that adds support for my Pcg-Extended library?
I'd add this function

uuidGenerator : Random.Pcg.Extended.Generator Uuid

Then a user can get enough randomness if needed.

I can also create a fork if you'd prefer not having both options in one package.

@norpan
Copy link
Contributor Author

norpan commented Mar 16, 2018

For sure, although it's not my package :)

@Zinggi
Copy link

Zinggi commented Mar 16, 2018

oh right, I meant @danyx23 , sorry to ping you instead 😄

@danyx23
Copy link
Owner

danyx23 commented Mar 19, 2018

Hey @Zinggi yes definitely. I'm even thinking about replacing Pcg-Extended.

The reason I originally went with Pcg-Extended back in Elm 0.16 or so was that it provided more randomness than the default implementation. I think when you use a UUID library you should always get enough bits of randomness to get somewhere close to the potential 123 bits of randomness in a V4 UUID.

What do you think? @mgold what are your thoughts?

@Zinggi
Copy link

Zinggi commented Mar 19, 2018

Well just using my Pcg-Extended version, doesn't automatically give you enough bits of randomness.
You still have to properly seed it via a flag with enough values from window.crypto.getRandomValues(..).
But at least it has the potential to offer enough randomness if used properly. And even if used wrong, it is just as good as the currently used normal Pcg version.

But in any case, I think I've got time to create a PR in a few days. (Probably the day after tomorrow).
I think I'll just add it as an additional value (e.g. a MINOR) change, and then you can decide if you want to completely remove the other generator in a MAJOR change later.

@mgold
Copy link

mgold commented Mar 20, 2018

The official elm-lang PRNG will be using the PCG algorithm in 0.19. I think that third-party PRNGs will be something to be avoided at that point. I think you grab multiple ints in 0..2^32-1, you should get a an amount of randomness that is acceptable for anyone using a PRNG. [EDIT: This is incorrect, see next comment.] This can be wrapped up in a Generator Uuid. Otherwise, if someone wants a cryptographically secure uuid, provide a function to create one from 32-bit integers and let the caller worry about passing them in from a port.

@norpan
Copy link
Contributor Author

norpan commented Mar 30, 2018

So, the issue is not so much the period or the internal state size as the initialization. If you have a 32 bit seed you can have as long a period as you like, but you still only have 2^32 different random series. So if you have a few hundred thousand users generating uuids independently, a collision is still possible.

@danyx23
Copy link
Owner

danyx23 commented Apr 2, 2018

Yeah it's a bit of a tricky situation now. So with the upcoming changes to 0.19, what do you think about this: we make this version use the then default PCG PRNG with no further external dependencies, and we put a big disclaimer in this version saying spelling out the details which AFAICS boil down to: collisions are unlikely even for relatively high numbers of UUIDs from a single client, but because the seed is in effect a 32bit int collisions are a real possibility if you start generating UUIDs from more than a few thousand clients.

@norpan you could then publish your fork as an official alternative that I would link from this library. It would pull in your RPE library and could specifically focus on explaining how to get a good enough seed etc..

What do you think?

@Zinggi
Copy link

Zinggi commented Apr 2, 2018

@danyx23 you probably meant me ;)
The plan seems good to me.

I'm gonna publish my fork, probably some time next week.

@mgold
Copy link

mgold commented Apr 3, 2018

we make this version use the then default PCG PRNG with no further external dependencies, and we put a big disclaimer in this version saying spelling out the details

Sounds good.

collisions are unlikely even for relatively high numbers of UUIDs from a single client, but because the seed is in effect a 32bit int collisions are a real possibility if you start generating UUIDs from more than a few thousand clients.

Provided your a managing the seeds correctly this should be true. (And if you don't manage the seeds correctly, any PRNG will give collisions.)

@Zinggi Before you publish your fork, please consider: if someone needs really, really random uuids, would they be better off passing them in from a port and having the option of secure randomness? There is a cost to polluting the package ecosystem with similar packages.

@danyx23
Copy link
Owner

danyx23 commented Apr 3, 2018

Cool, thank you all for the good discussion. I will make the changes to this repo once 0.19 is out (but can link a fork before that).

@norpan
Copy link
Contributor Author

norpan commented Apr 3, 2018

We have actually used ports and a javascript uuid generator that uses good random numbers already so it's not an issue for us anymore.

However, I still think that a uuid generator should by default have as good a random source as possible. And 32 bits initial seed is a bit low. If that's what you get in Elm, so be it, perhaps it would make sense to work on making the built-in random generator have a larger seed instead.

@Zinggi
Copy link

Zinggi commented Apr 3, 2018

Before you publish your fork, please consider: if someone needs really, really random uuids, would they be better off passing them in from a port and having the option of secure randomness?

I'm in a position where I need random and unique UUIDs across many users for an application I'm working on. For my application passing a generated UUID via port would be an inferior solution to the one I currently use, which is passing in a random seed from crypto.getRandomValues(...) and then use the PCG-extended generator to generate a UUID.
This is better because I need other good random values that also make use of the PCG.extended generator. If I would only need a single UUID and nothing else from this generator, then the solution with ports would be just as good. (Except for the fact that I think the more written in Elm, the better)

There is a cost to polluting the package ecosystem with similar packages.

I agree, but if there is just a single user with similar needs to me, I think it's worthy to be published.

However, I still think that a uuid generator should by default have as good a random source as possible. And 32 bits initial seed is a bit low. If that's what you get in Elm, so be it, perhaps it would make sense to work on making the built-in random generator have a larger seed instead.

It makes sense that the default random generator uses a small seed and internal state, as that's most efficient. For most applications where you need randomness, the normal PCG offers a great trade off.

After 0.19 is out, an official way to access crypto.getRandomValues(...) will probably be worked out, as that's part of the web platform. When this is done, we can use that to provide a UUID generator that cannot be used wrong.
But until then, my fork should be good enough.

@Zinggi
Copy link

Zinggi commented Apr 3, 2018

I published it now: http://package.elm-lang.org/packages/Zinggi/elm-uuid/latest

@ZimbiX
Copy link

ZimbiX commented May 5, 2020

For anyone else who reaches here and is disheartened to see that @Zinggi's fork is not compatible with Elm 0.19.1 (and won't be), check out NoRedInk/elm-uuid =)

Probably best to update the fork link in the readme here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants