New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework the Random module #3402

Merged
merged 5 commits into from Oct 15, 2016

Conversation

Projects
None yet
5 participants
@oprypin
Contributor

oprypin commented Oct 9, 2016

The Random module currently has several problems:

  • The upper bound for the desired random number in rand(max : Int) can be any integer type, but it is actually limited to UInt32::MAX by the random number generator itself, moreover a cast to Int32 happens, sometimes causing negative numbers in the output (fix #3350)
  • Because a simple modulo operation is used to produce a random number in rand(max : Int), the output is not uniformly distributed. With a large enough range, some numbers may appear even twice as often as others.
  • It works only with 32-bit random number generators, while many modern PRNGs produce 64-bit numbers, and there are also byte-based generators (like urandom).
  • rand(range : Range(Int, Int)) is especially fragile because it depends on rand(max : Int) and adds more possibilities for overflows with large numbers.
  • rand(max : Float) uses the same 32-bit resolution, which means that quite a few floating point values can never be reached.

That's why I decided to apply my previous experience with developing an RNG library to rework the Random module.

These changes make it possible to:

  • Generate uniformly distributed random integers of any type and any range. The result will always be the same type as the supplied argument.

    Even things like rand(Int64::MIN..Int64::MAX) work

  • Provide the same nice interface based on a random number generator that produces integers of any size, not just 32-bit integers

    Now it is easy to make an RNG class based on urandom's bytes, like in Python, but that's not included in this PR

Now each RNG operation can consume multiple numbers from the underlying random number generator if needed, but of course everything is still deterministic. For implementation details see the comments in the code.

The interface stays almost the same (though I would like to suggest some changes in the future), but classes based on Random now need to provide a next_u method instead of next_u32.

Disclaimer:
I have put a lot of thought into this topic over the years, but I'm still an amateur. The algorithms here are based on widely accepted ideas, but they don't closely follow a particular widely-used implementation.

Show outdated Hide outdated spec/std/random_spec.cr
@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Oct 12, 2016

Contributor

@BlaXpirit I think we could use this PR as an example of what a PR should look like. The docs and comments in the code are extremely clear and useful, and the changes/addition are more than welcome. Even though you say you are an amateur in this subject you probably know more than most people, including me and I'd dare to say every other core committer.

I'd merge this right away, but I want to either double-check with @waj, or if someone else could review this and give a 👍 we can merge this.

Contributor

asterite commented Oct 12, 2016

@BlaXpirit I think we could use this PR as an example of what a PR should look like. The docs and comments in the code are extremely clear and useful, and the changes/addition are more than welcome. Even though you say you are an amateur in this subject you probably know more than most people, including me and I'd dare to say every other core committer.

I'd merge this right away, but I want to either double-check with @waj, or if someone else could review this and give a 👍 we can merge this.

@lbguilherme

This comment has been minimized.

Show comment
Hide comment
@lbguilherme

lbguilherme Oct 12, 2016

Contributor

How about naming the method next instead of next_u?

Contributor

lbguilherme commented Oct 12, 2016

How about naming the method next instead of next_u?

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Oct 13, 2016

Contributor

I think next_u is fine, it must return an unsigned integer. next is too generic.

Contributor

asterite commented Oct 13, 2016

I think next_u is fine, it must return an unsigned integer. next is too generic.

@kirbyfan64

This comment has been minimized.

Show comment
Hide comment
@kirbyfan64

kirbyfan64 Oct 13, 2016

Contributor

Nice! Random numbers can be a pain point in a lot of statically typed languages (cough C++ cough).

Contributor

kirbyfan64 commented Oct 13, 2016

Nice! Random numbers can be a pain point in a lot of statically typed languages (cough C++ cough).

@RX14

This comment has been minimized.

Show comment
Hide comment
@RX14

RX14 Oct 13, 2016

Member

I think next_u looks weird, maybe next_unsigned? I don't think the longer name is an issue.

Member

RX14 commented Oct 13, 2016

I think next_u looks weird, maybe next_unsigned? I don't think the longer name is an issue.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Oct 13, 2016

Contributor

@RX14 It's consistent with to_i, to_u, to_f, etc.

Contributor

asterite commented Oct 13, 2016

@RX14 It's consistent with to_i, to_u, to_f, etc.

@RX14

This comment has been minimized.

Show comment
Hide comment
@RX14

RX14 Oct 15, 2016

Member

While we're at it, why not add .uuid, .base64, .random_bytes and .hex methods like SecureRandom. Implement a Random implementation that uses SecureRandom's random source, and remove the unnecessary distinction between Random and SecureRandom's interfaces.

Maybe people want less-secure UUIDs from a fast source, or maybe they want to use a secure random source with rand().

Member

RX14 commented Oct 15, 2016

While we're at it, why not add .uuid, .base64, .random_bytes and .hex methods like SecureRandom. Implement a Random implementation that uses SecureRandom's random source, and remove the unnecessary distinction between Random and SecureRandom's interfaces.

Maybe people want less-secure UUIDs from a fast source, or maybe they want to use a secure random source with rand().

@RX14

This comment has been minimized.

Show comment
Hide comment
@RX14

RX14 Oct 15, 2016

Member

And probably merge #2716 or something like it too.

Member

RX14 commented Oct 15, 2016

And probably merge #2716 or something like it too.

@oprypin

This comment has been minimized.

Show comment
Hide comment
@oprypin

oprypin Oct 15, 2016

Contributor

Implement a Random implementation that uses SecureRandom's random

@RX14 I had something like that planned (going step by step):

Now it is easy to make an RNG class based on urandom's bytes, like in Python, but that's not included in this PR

Contributor

oprypin commented Oct 15, 2016

Implement a Random implementation that uses SecureRandom's random

@RX14 I had something like that planned (going step by step):

Now it is easy to make an RNG class based on urandom's bytes, like in Python, but that's not included in this PR

@RX14

This comment has been minimized.

Show comment
Hide comment
@RX14

RX14 Oct 15, 2016

Member

@BlaXpirit adding helper methods for generating base64, hex, and at least a slice of random bytes probably should be in here though.

Member

RX14 commented Oct 15, 2016

@BlaXpirit adding helper methods for generating base64, hex, and at least a slice of random bytes probably should be in here though.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Oct 15, 2016

Contributor

@RX14 We can do small steps and make sure each of those steps is good. Then adding the next step becomes easier :-)

I'll merge this, I didn't get to review it with someone else but it looks really good to me, and it will give room to more improvements \o/

Big thank you, @BlaXpirit ❤️

Contributor

asterite commented Oct 15, 2016

@RX14 We can do small steps and make sure each of those steps is good. Then adding the next step becomes easier :-)

I'll merge this, I didn't get to review it with someone else but it looks really good to me, and it will give room to more improvements \o/

Big thank you, @BlaXpirit ❤️

@asterite asterite merged commit dc2e6d7 into crystal-lang:master Oct 15, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@asterite asterite added this to the 0.20.0 milestone Oct 15, 2016

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Oct 15, 2016

Contributor

@RX14 The problem with SecureRandom is that if you want several bytes then it's a single urandom call, but here we'll have to do one urandom call for each byte. I don't know what a solution for that is, but in the meantime both classes are usable like that so it's not a big issue.

Contributor

asterite commented Oct 15, 2016

@RX14 The problem with SecureRandom is that if you want several bytes then it's a single urandom call, but here we'll have to do one urandom call for each byte. I don't know what a solution for that is, but in the meantime both classes are usable like that so it's not a big issue.

@oprypin

This comment has been minimized.

Show comment
Hide comment
@oprypin

oprypin Oct 16, 2016

Contributor

Aww, this was squash-merged :(

Contributor

oprypin commented Oct 16, 2016

Aww, this was squash-merged :(

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Oct 16, 2016

Contributor

@BlaXpirit I did it because individual commits didn't compile. The first commit used rand_type which wasn't defined there, I think it got introduced in the third commit.

In any case, I don't consider commit history that important, and the whole PR makes sense as a whole. And sorry, maybe I should have asked or pointed that out.

Contributor

asterite commented Oct 16, 2016

@BlaXpirit I did it because individual commits didn't compile. The first commit used rand_type which wasn't defined there, I think it got introduced in the third commit.

In any case, I don't consider commit history that important, and the whole PR makes sense as a whole. And sorry, maybe I should have asked or pointed that out.

@oprypin

This comment has been minimized.

Show comment
Hide comment
@oprypin

oprypin Oct 16, 2016

Contributor

@asterite, oh, sorry. That's what I get for trying to make the commit history make sense after writing everything. Good that you noticed this.

Contributor

oprypin commented Oct 16, 2016

@asterite, oh, sorry. That's what I get for trying to make the commit history make sense after writing everything. Good that you noticed this.

@RX14

This comment has been minimized.

Show comment
Hide comment
@RX14

RX14 Oct 16, 2016

Member

I think commit history is very important, especially that all the commits compile. They're the only metadata to changes you get and so should have detailed explanations of anything that isn't clear in the commit.

Member

RX14 commented Oct 16, 2016

I think commit history is very important, especially that all the commits compile. They're the only metadata to changes you get and so should have detailed explanations of anything that isn't clear in the commit.

@asterite

This comment has been minimized.

Show comment
Hide comment
@asterite

asterite Oct 16, 2016

Contributor

@BlaXpirit I noticed the missing method and thought it was easier to squash than to ask you to rewrite history (I'm lazy to do that when someone asks me to, maybe because I'm really bad at git), but next time I'll ask before merging :-)

Contributor

asterite commented Oct 16, 2016

@BlaXpirit I noticed the missing method and thought it was easier to squash than to ask you to rewrite history (I'm lazy to do that when someone asks me to, maybe because I'm really bad at git), but next time I'll ask before merging :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment