Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partially Fix Issue 18596: use arc4random when available for unpredictableSeed #6267

Merged
merged 1 commit into from Mar 22, 2018

Conversation

n8sh
Copy link
Member

@n8sh n8sh commented Mar 12, 2018

EDIT: reduced this PR just to using arc4random when available. Other cases left for another PR.

Use arc4random when available, otherwise replace MinstdRand0 with xorshift64*/32.

This pull request does not seek to make std.random.unpredictableSeed cryptographically secure, but just correct some basic deficiencies mentioned in https://issues.dlang.org/show_bug.cgi?id=18596:

Currently std.random.unpredictableSeed returns the result of a thread-local MinstdRand0 instance xor'd against the clock. MinstdRand0 is slow (due to integer division) and somewhat outdated. A particular weakness of using MinstdRand0 is that it is very likely that consecutive calls to unpredictableSeed will return numbers that are identical in the high bit, since MinstdRand0 only produces results in the range 1 .. 2 ^^ 31 - 1.

Proposed remedy:

There are modern PRNG algorithms that have comparable state size to MinstdRand0 (64 bits or 32 bits) but are faster than MinstdRand0 and have output that scores better on randomness tests like BigCrush.

XorShift64*/32 is one example. (Results of randomness tests and speed tests appear in charts in this paper.) It has the virtue that 0 is an illegal state, which means that we don't need a separate flag to indicate whether it is initialized.

And:

On some platforms we can use functions like arc4random which incorporate system entropy and remove the need to roll our own entropy-gathering function to set an initial state for a PRNG.

@n8sh n8sh requested a review from wilzbach as a code owner March 12, 2018 10:14
@dlang-bot
Copy link
Contributor

dlang-bot commented Mar 12, 2018

Thanks for your pull request and interest in making D better, @n8sh! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
18596 enhancement std.random.unpredictableSeed could use something better than MinstdRand0

@n8sh
Copy link
Member Author

n8sh commented Mar 12, 2018

More ambitious work was previously explored by @yshui in pull #5230.

@wilzbach re: #6021 (review)

FWIW we should start porting the good stuff from mir-random to Phobos, for example, we could begin with with unpredictableSeed.

This is a start, but there is still more to do before the parts you wrote for Linux and Windows are in. I wanted to start small.

Copy link
Member

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comments and questions

std/random.d Outdated
@property uint unpredictableSeed() @trusted nothrow @nogc
alias unpredictableSeed = unpredictableSeedOf!uint;
/// ditto
@property UIntType unpredictableSeedOf(UIntType)() @nogc nothrow @trusted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a new public symbol it would require a changelog entry + @andralex approval.
Maybe it's easier to set it to private for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have more experience here, if you think splitting it will speed along the review process I'll do that and amend the title.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f you think splitting it will speed along the review process

Yeah I do think it will help. Approval of a new public symbol takes usually > one month (as once its added it can't be removed or modified anymore), so I would recommend setting it to private and once this PR is merged, opening a PR that just removes private. With this approach, you aren't blocked on the approval for new symbols.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK a public alias of a private member doesn't work so I instead removed the unpredictableSeedOf!UIntType template.

std/random.d Outdated
if (!seeded)
version (AnyARC4Random)
{
// On macOS if we just need 32 bits it is faster to use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT is this no longer only macOS, so "/On macOS/d"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote "on macOS" because I've timed this way is faster on macOS but I haven't timed it on other platforms. I suspect it's faster on OpenBSD etc. but I don't know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah my point was just that the static if below is (UIntType.sizeof <= uint.sizeof), so you don't distinguish between macOS, but mention it in your motivation which is a bit confusing.

I suspect it's faster on OpenBSD etc. but I don't know.

I think so too. It's just returning one register value and no looping should be necessary.

std/random.d Outdated
// generators, scrambled" (Vigna 2016).
x ^= x >>> 12;
x ^= x << 25;
x ^= x >>> 37;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use the existing Xorshift range?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to but I can't. It is broken for anything but 32-bit xorshift.

Copy link
Member Author

@n8sh n8sh Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean xorshift with 32-bit words, although it supports several sizes of that.

EDIT: earlier reported at https://issues.dlang.org/show_bug.cgi?id=18327 "std.random.XorshiftEngine is parameterized by UIntType but only works with uint"

Copy link
Member

@wilzbach wilzbach Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that sucks. You are referring to https://issues.dlang.org/show_bug.cgi?id=18327, right?

(edit: I didn't see your response before)

std/random.d Outdated
ulong tid = cast(ulong) &_seeder; // Distinct for each thread.
tid *= m;
tid = (tid ^ (tid >>> 47)) * m;
result = (result ^ tid) * m;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated for three times, couldn't it be but in a shift function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was tiny enough that I didn't think it was worth the bother, but it might be clearer this way. Will do.

@wilzbach
Copy link
Member

This is a start, but there is still more to do before the parts you wrote for Linux and Windows are in. I wanted to start small.

Fair enough. Thanks a lot for picking this up! I haven't done much with random numbers lately, so this unfortunately was lost in my TODO queue.

but there is still more to do before the parts

Do you plan to replace bootstrapSeed with getRandomX then and just use it as fallback?

FWIW we should start porting the good stuff from mir-random to Phobos

Thinking more about it, except for unpredictableSeed is probably quite hard to do so and it's more or less a lost cause due to the inability for making breaking changes. At some point we should probably just deprecate std.random and adapt mir-random to e.g. std.math.random

@n8sh n8sh force-pushed the unpredictableSeedOf-arc4random branch from 74ef55f to 7ba2c23 Compare March 12, 2018 10:52
@n8sh n8sh changed the title Fix Issue 18595 & Issue 18596: add unpredictableSeedOf!UIntType and unpredictableSeed could use something better than MinstdRand0 Fix 18596: unpredictableSeed could use something better than MinstdRand0 Mar 12, 2018
std/random.d Outdated
{
ulong result = void;
enum ulong m = 0xc6a4_a793_5bd1_e995UL; // MurmurHash2_64A constant.
void update_result(ulong x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the nitpicking, but the DStyle requires camelCase not snake_case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@n8sh n8sh force-pushed the unpredictableSeedOf-arc4random branch 2 times, most recently from 15e5ec5 to bfaf31c Compare March 12, 2018 11:04
@wilzbach
Copy link
Member

AFAIK a public alias of a private member doesn't work so I instead removed the unpredictableSeedOf!UIntType template.

Yeah you are correct: https://run.dlang.io/is/Fk9UQy (sorry)

@n8sh
Copy link
Member Author

n8sh commented Mar 12, 2018

Do you plan to replace bootstrapSeed with getRandomX then and just use it as fallback?

Yeah, either that or internally use getRandomX to implement unpredictableSeed, either way. Might vary by platform depending on benchmarks. On Windows I would probably use CryptGenRandom to produce the result like in mir.random.unpredictableSeed since it seems fast enough.

@n8sh n8sh force-pushed the unpredictableSeedOf-arc4random branch from bfaf31c to 84287ed Compare March 12, 2018 11:28
@JackStouffer
Copy link
Member

This needs two notes added to the docs:

  1. How secure/insecure this is for cryptographic purposes
  2. I know it's common for users to re-seed their RNGs often with something like this function. As I understand it, that actually makes for lower entropy. If so, please make a note explaining this.

@n8sh
Copy link
Member Author

n8sh commented Mar 12, 2018

  1. How secure/insecure this is for cryptographic purposes

There is no change to the existing level of security (none). This is currently documented at the top of the file:

$(RED Disclaimer:) The _random number generators and API provided in this
module are not designed to be cryptographically secure, and are therefore
unsuitable for cryptographic or security-related purposes such as generating
authentication tokens or network sequence numbers. For such needs, please use a
reputable cryptographic library instead.

@n8sh n8sh force-pushed the unpredictableSeedOf-arc4random branch from 84287ed to 5eab123 Compare March 12, 2018 13:53
@n8sh
Copy link
Member Author

n8sh commented Mar 12, 2018

  1. I know it's common for users to re-seed their RNGs often with something like this function. As I understand it, that actually makes for lower entropy. If so, please make a note explaining this.

Added this:

Note:
In general periodically 'reseeding' a PRNG does not improve its quality
and in some cases may harm it. For an extreme example the Mersenne
Twister has `2 ^^ 19937 - 1` distinct states but after `seed(uint)` is
called it can only be in one of `2 ^^ 32` distinct states regardless of
how excellent the source of entropy is.

@n8sh n8sh force-pushed the unpredictableSeedOf-arc4random branch 5 times, most recently from a3ce0b0 to 9eea216 Compare March 12, 2018 14:58
@JackStouffer
Copy link
Member

Pinging people who are qualified to review this @wilzbach @quickfur @andralex

Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert but before we'd use the pid, the tid, and the time as sources of "surprise". Now we're only using the time. Isn't that a step backwards? E.g. several threads call the function simultaneously.

@n8sh
Copy link
Member Author

n8sh commented Mar 19, 2018

I'm not an expert but before we'd use the pid, the tid, and the time as sources of "surprise". Now we're only using the time. Isn't that a step backwards?

Before once for each thread we used the pid and tid and time to initialize a thread-local PRNG, and used the output of that PRNG mixed with the time to produce seeds. We're still doing that. EDIT: The once-per-thread initialization using pid+tid+time is in the private bootstrapSeed function.

@n8sh n8sh force-pushed the unpredictableSeedOf-arc4random branch from 9eea216 to 2e12e1e Compare March 19, 2018 04:53
@joseph-wakeling-sociomantic

otherwise replace MinstdRand0 with xorshift64*/32.

I'm not keen on that particular change because those are generators a user might reasonably want to use from within their own app, and hence it's probably not a good idea to have the unpredictable seed make use of those same algorithms.

Before making any change it's probably worth asking what the state of the art is in other languages and libraries.

std/random.d Outdated
}
else
{
private ulong _seeder; // 0 indicates uninitialized.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor note: this can be made a static variable inside the bootstrapSeed function, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be inside unpredictableSeed but I put it outside so it could be used in several different functions (for example, if in the future there are variants of unpredictableSeed with outputs of different sizes).

@joseph-wakeling-sociomantic

Before making any change it's probably worth asking what the state of the art is in other languages and libraries.

To expand on that: the principal problem I see here is that the fallback (non-arc4) solution doesn't meaningfully improve on the mechanism used. It just switches out the pseudo-random algorithm that is used to transform the pid+tid+time-derived seed.

It would be better to look at improved mechanisms for that, in general, than to think that switching the pseudo-random component particularly gives us any benefit.

@n8sh
Copy link
Member Author

n8sh commented Mar 21, 2018

What I would suggest is that we separate out concerns, i.e. we don't couple the decision to prefer arc4 where available, with the decision to rewrite the existing unpredictableSeed mechanism. Does that sound reasonable?

Sure.

@n8sh n8sh force-pushed the unpredictableSeedOf-arc4random branch from 2e12e1e to f39686c Compare March 22, 2018 07:59
@n8sh n8sh changed the title Fix 18596: unpredictableSeed could use something better than MinstdRand0 Partially Fix Issue 18596: use arc4random when available for unpredictableSeed Mar 22, 2018
@n8sh
Copy link
Member Author

n8sh commented Mar 22, 2018

Reduced this PR just to using arc4random when available. Other cases left for another PR.

@joseph-wakeling-sociomantic

Now, specifically w.r.t. whether we should prefer (a)RC4 where it is available: how does this proposal fit with the fact that RC4 is being phased out in other contexts?
https://tools.ietf.org/html/rfc7465
https://www.computerworld.com/article/2489395/encryption/microsoft-continues-rc4-encryption-phase-out-plan-with--net-security-updates.html

@JackStouffer I'm not sure whether we should approve this without discussing RC4/arc4random itself. On the contrary I think it would be a good idea to consider, in detail, the range of options for improving unpredictableSeed.

In particular, w.r.t. this existing doc:

$(RED Disclaimer:) The _random number generators and API provided in this
module are not designed to be cryptographically secure, and are therefore
unsuitable for cryptographic or security-related purposes such as generating
authentication tokens or network sequence numbers. For such needs, please use a
reputable cryptographic library instead.

.... this disclaimer is not a good reason to accept solutions that are considered inadequate or deprecated in other contexts.

@JackStouffer
Copy link
Member

.... this disclaimer is not a good reason to accept solutions that are considered inadequate or deprecated in other contexts.

On the contrary, the disclaimer is the very reason we can accept it where in other contexts, others cannot. If arc4random produces more randomized results (and is faster) than the current function when available, it should be of no consequence whether it's not suitable for securing data, as that is expressly not the purpose of std.random. As far as I can tell, arc4random is better than the current solution.

@n8sh
Copy link
Member Author

n8sh commented Mar 22, 2018

Now, specifically w.r.t. whether we should prefer (a)RC4 where it is available: how does this proposal fit with the fact that RC4 is being phased out in other contexts?

FYI, on a number of platforms arc4random doesn't actually use RC4 but some modern cipher. See the code comments in this PR for details.

@joseph-wakeling-sociomantic

On the contrary, the disclaimer is the very reason we can accept it where in other contexts, others cannot.

If other contexts are abandoning RC4, might it not be a good idea to actually examine what they are preferring?

After all, it's not a great look to be newly adopting something that others are actively abandoning. It might even have consequences for adoption (I wouldn't want to assume that, for example, there might not be new standards banning use of any library that uses RC4). We should at least consider whether there are alternative, straightforwardly better options that can be implemented just as readily.

I am strongly opposed to that disclaimer being used as an excuse to accept changes that deliver less than the best possible solution, given the current state of knowledge.

FYI, on a number of platforms arc4random doesn't actually use RC4 but some modern cipher. See the code comments in this PR for details.

That matches my recollection. FWIW I would personally prefer that if we do we only use those implementations that are known to have improved implementations.

@joseph-wakeling-sociomantic

BTW, @n8sh, just for clarity: I am really pleased that you are working on unpredictableSeed and trying to improve it. I just think that, given the currently not-great situation of it, we might as well put the work in to try to understand all the possible improvements we could make.

It might well be a good effort/result tradeoff to switch to arc4random on selected platforms, but it's worth being clear what the state of the art is and how much trouble it would be to use/implement.

So, I would encourage you not to constrain yourself by concerns like minimal change, or the disclaimer about non-crypto. I'd rather suggest considering the question: if you were designing unpredictableSeed from scratch, to be as good as it can be, how should it be implemented?

@wilzbach
Copy link
Member

if you were designing unpredictableSeed from scratch, to be as good as it can be, how should it be implemented?

Like mir-random (that applies for almost everything in std.random).
Imho std.random is an (almost) lost cause.

@joseph-wakeling-sociomantic

So what does mir-random do for its unpredictable seed?

@joseph-wakeling-sociomantic

BTW folks, I can't help but feel that, when one reviewer is raising concerns over a PR, it's rather uncool to just click "Approve" without offering some sort of response to those concerns?

@wilzbach
Copy link
Member

Reduced this PR just to using arc4random when available. Other cases left for another PR.

Ugh. I am sorry @n8sh. I wanted to leave this PR open for a few days and didn't expect this reaction.
FTR

  • I liked the other bits of your PR and hope you aren't too demotivated by this discussion.
  • I agree with the others that unpredictableSeed doesn't need to be cyrptographically secure, because nothing in std.random is

@wilzbach
Copy link
Member

I can't help but feel that, when one reviewer is raising concerns over a PR, it's rather uncool to just click "Approve" without offering some sort of response to those concerns?

Your concerns have already been addressed by other people. I disagree on your assessment (and I did comment stating this though that had a bit of a delay from my phone).
Hence, I show my support for these changes by approving them.

So what does mir-random do for its unpredictable seed?

This is/was a preparation for porting unpredictableSeed from mir-random to Phobos. Well, before it was cut down to the current state.
For example, it uses Linux's "new" built-in kernel API:
https://github.com/libmir/mir-random/blob/master/source/mir/random/engine/package.d

@wilzbach
Copy link
Member

Oh and for the record LLVM's STL uses arc4random for its seeding too: https://github.com/llvm-mirror/libcxx/blob/master/src/random.cpp
(see also the other links and sources posted here: libmir/mir-random#13)

@JackStouffer
Copy link
Member

@joseph-wakeling-sociomantic If there exists better functions than arc4random for this purpose, then I agree that it would make sense to use those if possible. My issue was that we would be stalling this PR for concerns about security when that's not the purpose of std.random.

Can't this also be addressed in a follow-up PR? This is ready to go and is an improvement over the status quo.

@wilzbach
Copy link
Member

It might well be a good effort/result tradeoff to switch to arc4random on selected platforms, but it's worth being clear what the state of the art is and how much trouble it would be to use/implement.

For the record, arc4random is still state-of-the-art of BSD-like platforms (as mentioned C++'s STL uses it too) and is a lot better than combining the Thread ID + current timestamp. IIRC only FreeBSD/DragonFlyBSD don't use ChaCha20, but it's an OS API like /dev/urandom and /dev/random and as with /dev/random you are essentially up to having to trust the OS.

There are many articles online like
http://nshipster.com/random/#why-should-i-use-arc4random(3)-instead-of-rand(3)-or-random(3)? or https://security.stackexchange.com/questions/85601/is-arc4random-secure-enough, but the main point is that neither /dev/(u)random nor arc4random is perfect, but the arc4random system call is faster than the file-based /dev/(u)random API.
This PR is about (step-by-step) moving away from the home-brewed ThreadID + Timestamp seeding approach over which arc4random is obviously better.

@wilzbach
Copy link
Member

FYI, on a number of platforms arc4random doesn't actually use RC4 but some modern cipher. See the code comments in this PR for details.

(for the record copied over from this excellent SO post):

Also as pointed out on SO

@joseph-wakeling-sociomantic

My issue was that we would be stalling this PR for concerns about security when that's not the purpose of std.random.

We are having a discussion about what the options are and why we might pick one or another, which leaves a detailed, documented record of the reasons why an important change like this gets made.

It's important to have that discussion -- and that record -- not only because it means that people can then look back later and understand the reasons for the change, but also because it means that we make sure that the change is really the right one for the project.

For example, arc4random is being preferred over /dev/urandom because it's faster ... but does it actually matter to be as fast as possible to generate an unpredictableSeed? How many unpredictableSeed calls do you want to make in any given application, and is it ever likely to be a bottleneck?

An alternative example: @n8sh remarked earlier that the choice of fallback solution was a compromise to ensure minimal behavioural change compared to existing unpredictableSeed. I'm interested in encouraging him to not feel constrained in that way, but to feel free to offer what he thinks is the best solution.

Can't this also be addressed in a follow-up PR? This is ready to go and is an improvement over the status quo.

Well, it would be nice to avoid code churn. It's arguably preferable to identify what we really think is the best option and make that happen ... than to have a series of contradictory changes scattered across different releases. For example, it doesn't look like the mir-random approach would be that hard to adapt for phobos. If that's viewed as preferable, why go for a halfway house?

BTW please note at no point have I said "This PR should not be merged." What I've asked for is clarity on the pros, cons, and alternatives.

I liked the other bits of your PR and hope you aren't too demotivated by this discussion.

For the record, @n8sh, if you're ever feeling demotivated by anything I say, or that I'm being unfair or creating hassle for you ... just raise the point with me. I don't bite, I do care that you feel positive about contributing, and my intention with any review is to ensure the most effective possible changes for phobos. My impression was that you were enjoying the discussion and welcoming the opportunity to clarify details of the impact of these changes.

@wilzbach note that as a result of this discussion, we now have a much better body of material explaining the context and meaning of this proposed change. That is considerably more helpful (for all of us, and for anyone wanting to understand this code) than just an "agree" or "disagree" based on reasons that only exist in the heads of the people (dis)agreeing.

Copy link
Contributor

@WebDrake WebDrake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to my home account to write up a slightly more detailed set of feedback and thoughts.

While I recognize the general point about std.random not guaranteeing cryptographic security, I don't think that's a valid argument for introducing a dependency on a standard that is deprecated by the body that proposed it. But a possible compromise might be as follows:

  • use arc4random in unpredictableSeed only for version (SecureARC4Random) (i.e. where ChaCha20 or AES is the underlying implementation);

  • expose arc4random, arc4random_buf and arc4random_uniform publicly for any platform that defines them, so that any user can use them (note, we might want to do this in some other module than std.random given that it's platform-dependent);

  • ensure the different ARC4Random versions are documented, so the user knows how to check the state of things on their platform.

In this way, the user has a free choice to use arc4random directly regardless of the security level, but unpredictableSeed only uses it where the underlying implementation is not a deprecated/legacy version.

The notes below should give some more detail on some of these points.

{
extern(C) private @nogc nothrow
{
uint arc4random() @safe;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why arc4random should not be exposed publicly on platforms that offer it, together with arc4random_buf and arc4random_uniform ... ? Perhaps given that these are system-dependent it might be better not to expose them via phobos, but I don't see any reason per se why they should not be made available.

Assuming it can be made public, I would suggest documenting the SecureARC4Random, LegacyARC4Random and AnyARC4Random versions in its documentation, so that the user knows how to check for themselves what quality standards it meets for their platform.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why arc4random should not be exposed publicly on platforms that offer it, together with arc4random_buf and arc4random_uniform ... ?

This PR has been edited not to introduce any new public symbols because new public symbols require @andralex's approval. See discussion: #6267 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If we wanted to be more D-esque, the arc4random_buf implementation might be provided as arc4random_buf(void[] buf) and invoke the C API version underneath the hood; or that could be provided as an additional overload.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been edited not to introduce any new public symbols because new public symbols require @andralex's approval. See discussion: #6267 (comment)

Fair enough, but note that this does not mean we cannot pursue the compromise outlined above: we'd just have to split between the changes to unpredictableSeed and the introduction of public arc4random functions.

In the event of serious hassle over the latter, we could always revisit the question of which platforms get the under-the-hood arc4random unpredictableSeed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mir-random does something along those lines, but abstracts away the difference between platforms.

size_t genRandomNonBlocking()(scope ubyte[] buffer) @nogc nothrow @trusted
{
    pragma(inline, true);
    return mir_random_genRandomNonBlocking(buffer.ptr, buffer.length);
}

mir_random_genRandomNonBlocking (the awkward name is for C linkage) could be calling arc4random_buf, or it could be using Linux's getrandom syscall, etc. I think that's the right kind of interface.

Copy link
Member

@wilzbach wilzbach Mar 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, introducing new public symbols is a tricky business.
And even if we get an approval, it should probably go to std.experimental.random as experience shows that it's really hard to get all use-cases right.

we wanted to be more D-esque,

There are a lot of things that could be done, e.g. ubyte[] (and of course camelCase), but I doubt that a user is ever going to care about the underlying details. What they care about is:

  • cross-platform
  • blocking/non-blocking
  • @nogc vs . GC
  • entropy quality
  • speed

Sometimes these concerns can be combined, but they don't want to special-case their code just because their code might be used on OSX too.

Is there any reason why arc4random should not be exposed publicly on platforms that offer it,

See reasons above + it's really easy to do so yourself as a user if you really want to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introduction of public arc4random functions.

BTW declaration of extern(C) functions is done in DRuntime. Phobos is supposed to contain the cross-platform high-level abstractions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I assumed, hence the remarks about not exposing them via phobos.

and in some cases may harm it. For an extreme example the Mersenne
Twister has `2 ^^ 19937 - 1` distinct states but after `seed(uint)` is
called it can only be in one of `2 ^^ 32` distinct states regardless of
how excellent the source of entropy is.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great piece of documentation, which I would suggest extending with discussion on how to more effectively seed generators (for example, MersenneTwisterEngine exposes a method to seed the generator using an InputRange of random words).

However, I would suggest not coupling it with these changes (it's independent of them) and submitting it in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While of course, separating PR has advantages, they also come with an additional overhead of creating and reviewing them.
So it's really ok to combine doc updates of the function currently touched in a PR. No need for the extra work.

static bool seeded;
static MinstdRand0 rand;
if (!seeded)
version (AnyARC4Random)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a conservative first option, I would suggest only using arc4random for version (SecureARC4Random).

The consideration here is that in any case we still need to do something to improve unpredictableSeed for platforms that don't define arc4random. Platforms that have only a legacy arc4random may well benefit more from sharing that longer-term solution, than from the legacy arc4random. If we just switch now, the risk is that when the better solution is there, they don't get it, because everyone forgets that the legacy arc4random question needs revisiting.

I think it's worth keeping the pressure to do better ourselves, than just outsourcing to a platform implementation that uses a deprecated standard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The consideration here is that in any case we still need to do something to improve unpredictableSeed for platforms that don't define arc4random.

These other solutions will also be platform-specific. There is nothing on the immediate horizon better than arc4random that will be usable on FreeBSD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These other solutions will also be platform-specific. There is nothing on the immediate horizon better than arc4random that will be usable on FreeBSD.

OK, fair enough.

Would you be OK with the idea of breaking out the dangers-of-unwise-seeding doc into a separate patch (just to separate out the concerns)? Otherwise, given the case you've made, I'm fine to move forward with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be OK with the idea of breaking out the dangers-of-unwise-seeding doc into a separate patch (just to separate out the concerns)?

Imho there's no need for this. PRs can have doc updates too, this is already a tiny PR and its related to this PR. This has never been an official criteria for a PR - it's only good practice to keep the focus of a PR small and to a single module only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho there's no need for this. PRs can have doc updates too, this is already a tiny PR and its related to this PR.

You'll note I said separate patch in the comment you are responding to. (I did make an earlier remark about separate PR, but that was before responses that made clear that this PR should go forward essentially as-is.)

// cryptographically secure sources of randomness are needed.

// Performance note: ChaCha20 is about 70% faster than ARC4, contrary to
// what one might assume from it being more secure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor note/query: I believe at least some /dev/urandom implementations use ChaCha20. How does this impact the question of speed of /dev/urandom versus arc4random ... ? Is it worth considering as a factor?

// of randomness, and also so other people reading this source code (as
// Phobos is often looked to as an example of good D programming practices)
// do not mistakenly use insecure versions of arc4random in contexts where
// cryptographically secure sources of randomness are needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and one note that my review missed: this discussion comment is great, but I would suggest moving it inside the unpredictableSeed function, where it is most relevant. Assuming that arc4random might be made public, some of the discussion points could be covered in its documentation instead.

@n8sh
Copy link
Member Author

n8sh commented Mar 22, 2018

Well, it would be nice to avoid code churn. It's arguably preferable to identify what we really think is the best option and make that happen ... than to have a series of contradictory changes scattered across different releases.

The roadmap forward would be adding more version blocks, so each subsequent PR only adds new cases rather than deleting or reversing previous PRs. Not removing MinstdRand0 in this PR means that there will have to be a little churn when it is replaced because those lines were touched by this PR as an indentation change, but besides that the structure of the code still lends itself to incremental upgrades.

Can't this also be addressed in a follow-up PR? This is ready to go and is an improvement over the status quo.

I second this sentiment.

@wilzbach
Copy link
Member

FYI: I now put this PR on the merge-queue. Thanks again @n8sh for the hard work!
@joseph-wakeling-sociomantic @WebDrake sorry for being a bit harsh and you are right that often what is obvious to one (as it's "in the head"), isn't obvious to other, so thanks a lot for all your comments and help!
I did put the PR on the merge queue because we have never really been picky about documentation concerns of a PR as once the code as has been added everyone (and not only the PR author) can improve the documentation.

This is a great piece of documentation, which I would suggest extending with discussion on how to more effectively seed generators

This doesn't block this PR - documentation updates can always follow-up after this has been merged ;-)

@WebDrake
Copy link
Contributor

I did put the PR on the merge queue because we have never really been picky about documentation concerns of a PR as once the code as has been added everyone (and not only the PR author) can improve the documentation.

No worries, your call ;-) I tend to encourage people to take every opportunity to separate out the concerns of a changeset, just to be clear what really goes together; but of course there are bigger fish to fry.

Thanks @n8sh for a nice piece of work, and for all your patience and effort responding to questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants