New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partially Fix Issue 18596: use arc4random when available for unpredictableSeed #6267
Conversation
Thanks for your pull request and interest in making D better, @n8sh! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla references
|
More ambitious work was previously explored by @yshui in pull #5230.
This is a start, but there is still more to do before the parts you wrote for Linux and Windows are in. I wanted to start small. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few initial comments and questions
std/random.d
Outdated
@property uint unpredictableSeed() @trusted nothrow @nogc | ||
alias unpredictableSeed = unpredictableSeedOf!uint; | ||
/// ditto | ||
@property UIntType unpredictableSeedOf(UIntType)() @nogc nothrow @trusted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is a new public symbol it would require a changelog entry + @andralex approval.
Maybe it's easier to set it to private
for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have more experience here, if you think splitting it will speed along the review process I'll do that and amend the title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f you think splitting it will speed along the review process
Yeah I do think it will help. Approval of a new public symbol takes usually > one month (as once its added it can't be removed or modified anymore), so I would recommend setting it to private
and once this PR is merged, opening a PR that just removes private
. With this approach, you aren't blocked on the approval for new symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK a public alias of a private member doesn't work so I instead removed the unpredictableSeedOf!UIntType
template.
std/random.d
Outdated
if (!seeded) | ||
version (AnyARC4Random) | ||
{ | ||
// On macOS if we just need 32 bits it is faster to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT is this no longer only macOS, so "/On macOS/d"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote "on macOS" because I've timed this way is faster on macOS but I haven't timed it on other platforms. I suspect it's faster on OpenBSD etc. but I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my point was just that the static if
below is (UIntType.sizeof <= uint.sizeof)
, so you don't distinguish between macOS, but mention it in your motivation which is a bit confusing.
I suspect it's faster on OpenBSD etc. but I don't know.
I think so too. It's just returning one register value and no looping should be necessary.
std/random.d
Outdated
// generators, scrambled" (Vigna 2016). | ||
x ^= x >>> 12; | ||
x ^= x << 25; | ||
x ^= x >>> 37; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use the existing Xorshift
range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to but I can't. It is broken for anything but 32-bit xorshift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean xorshift with 32-bit words, although it supports several sizes of that.
EDIT: earlier reported at https://issues.dlang.org/show_bug.cgi?id=18327 "std.random.XorshiftEngine is parameterized by UIntType but only works with uint"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that sucks. You are referring to https://issues.dlang.org/show_bug.cgi?id=18327, right?
(edit: I didn't see your response before)
std/random.d
Outdated
ulong tid = cast(ulong) &_seeder; // Distinct for each thread. | ||
tid *= m; | ||
tid = (tid ^ (tid >>> 47)) * m; | ||
result = (result ^ tid) * m; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is repeated for three times, couldn't it be but in a shift
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was tiny enough that I didn't think it was worth the bother, but it might be clearer this way. Will do.
Fair enough. Thanks a lot for picking this up! I haven't done much with random numbers lately, so this unfortunately was lost in my TODO queue.
Do you plan to replace
Thinking more about it, except for |
74ef55f
to
7ba2c23
Compare
std/random.d
Outdated
{ | ||
ulong result = void; | ||
enum ulong m = 0xc6a4_a793_5bd1_e995UL; // MurmurHash2_64A constant. | ||
void update_result(ulong x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the nitpicking, but the DStyle requires camelCase not snake_case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
15e5ec5
to
bfaf31c
Compare
Yeah you are correct: https://run.dlang.io/is/Fk9UQy (sorry) |
Yeah, either that or internally use |
bfaf31c
to
84287ed
Compare
This needs two notes added to the docs:
|
There is no change to the existing level of security (none). This is currently documented at the top of the file:
|
84287ed
to
5eab123
Compare
Added this:
|
a3ce0b0
to
9eea216
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert but before we'd use the pid, the tid, and the time as sources of "surprise". Now we're only using the time. Isn't that a step backwards? E.g. several threads call the function simultaneously.
Before once for each thread we used the pid and tid and time to initialize a thread-local PRNG, and used the output of that PRNG mixed with the time to produce seeds. We're still doing that. EDIT: The once-per-thread initialization using pid+tid+time is in the private |
9eea216
to
2e12e1e
Compare
I'm not keen on that particular change because those are generators a user might reasonably want to use from within their own app, and hence it's probably not a good idea to have the unpredictable seed make use of those same algorithms. Before making any change it's probably worth asking what the state of the art is in other languages and libraries. |
std/random.d
Outdated
} | ||
else | ||
{ | ||
private ulong _seeder; // 0 indicates uninitialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor note: this can be made a static
variable inside the bootstrapSeed
function, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be inside unpredictableSeed
but I put it outside so it could be used in several different functions (for example, if in the future there are variants of unpredictableSeed
with outputs of different sizes).
To expand on that: the principal problem I see here is that the fallback (non-arc4) solution doesn't meaningfully improve on the mechanism used. It just switches out the pseudo-random algorithm that is used to transform the pid+tid+time-derived seed. It would be better to look at improved mechanisms for that, in general, than to think that switching the pseudo-random component particularly gives us any benefit. |
Sure. |
2e12e1e
to
f39686c
Compare
Reduced this PR just to using arc4random when available. Other cases left for another PR. |
Now, specifically w.r.t. whether we should prefer @JackStouffer I'm not sure whether we should approve this without discussing RC4/ In particular, w.r.t. this existing doc:
.... this disclaimer is not a good reason to accept solutions that are considered inadequate or deprecated in other contexts. |
On the contrary, the disclaimer is the very reason we can accept it where in other contexts, others cannot. If |
FYI, on a number of platforms |
If other contexts are abandoning RC4, might it not be a good idea to actually examine what they are preferring? After all, it's not a great look to be newly adopting something that others are actively abandoning. It might even have consequences for adoption (I wouldn't want to assume that, for example, there might not be new standards banning use of any library that uses RC4). We should at least consider whether there are alternative, straightforwardly better options that can be implemented just as readily. I am strongly opposed to that disclaimer being used as an excuse to accept changes that deliver less than the best possible solution, given the current state of knowledge.
That matches my recollection. FWIW I would personally prefer that if we do we only use those implementations that are known to have improved implementations. |
BTW, @n8sh, just for clarity: I am really pleased that you are working on It might well be a good effort/result tradeoff to switch to So, I would encourage you not to constrain yourself by concerns like minimal change, or the disclaimer about non-crypto. I'd rather suggest considering the question: if you were designing |
Like mir-random (that applies for almost everything in std.random). |
So what does |
BTW folks, I can't help but feel that, when one reviewer is raising concerns over a PR, it's rather uncool to just click "Approve" without offering some sort of response to those concerns? |
Ugh. I am sorry @n8sh. I wanted to leave this PR open for a few days and didn't expect this reaction.
|
Your concerns have already been addressed by other people. I disagree on your assessment (and I did comment stating this though that had a bit of a delay from my phone).
This is/was a preparation for porting unpredictableSeed from mir-random to Phobos. Well, before it was cut down to the current state. |
Oh and for the record LLVM's STL uses |
@joseph-wakeling-sociomantic If there exists better functions than Can't this also be addressed in a follow-up PR? This is ready to go and is an improvement over the status quo. |
For the record, There are many articles online like |
(for the record copied over from this excellent SO post):
Also as pointed out on SO
|
We are having a discussion about what the options are and why we might pick one or another, which leaves a detailed, documented record of the reasons why an important change like this gets made. It's important to have that discussion -- and that record -- not only because it means that people can then look back later and understand the reasons for the change, but also because it means that we make sure that the change is really the right one for the project. For example, An alternative example: @n8sh remarked earlier that the choice of fallback solution was a compromise to ensure minimal behavioural change compared to existing
Well, it would be nice to avoid code churn. It's arguably preferable to identify what we really think is the best option and make that happen ... than to have a series of contradictory changes scattered across different releases. For example, it doesn't look like the mir-random approach would be that hard to adapt for phobos. If that's viewed as preferable, why go for a halfway house? BTW please note at no point have I said "This PR should not be merged." What I've asked for is clarity on the pros, cons, and alternatives.
For the record, @n8sh, if you're ever feeling demotivated by anything I say, or that I'm being unfair or creating hassle for you ... just raise the point with me. I don't bite, I do care that you feel positive about contributing, and my intention with any review is to ensure the most effective possible changes for phobos. My impression was that you were enjoying the discussion and welcoming the opportunity to clarify details of the impact of these changes. @wilzbach note that as a result of this discussion, we now have a much better body of material explaining the context and meaning of this proposed change. That is considerably more helpful (for all of us, and for anyone wanting to understand this code) than just an "agree" or "disagree" based on reasons that only exist in the heads of the people (dis)agreeing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching to my home account to write up a slightly more detailed set of feedback and thoughts.
While I recognize the general point about std.random
not guaranteeing cryptographic security, I don't think that's a valid argument for introducing a dependency on a standard that is deprecated by the body that proposed it. But a possible compromise might be as follows:
-
use
arc4random
inunpredictableSeed
only forversion (SecureARC4Random)
(i.e. whereChaCha20
orAES
is the underlying implementation); -
expose
arc4random
,arc4random_buf
andarc4random_uniform
publicly for any platform that defines them, so that any user can use them (note, we might want to do this in some other module thanstd.random
given that it's platform-dependent); -
ensure the different
ARC4Random
versions are documented, so the user knows how to check the state of things on their platform.
In this way, the user has a free choice to use arc4random
directly regardless of the security level, but unpredictableSeed
only uses it where the underlying implementation is not a deprecated/legacy version.
The notes below should give some more detail on some of these points.
{ | ||
extern(C) private @nogc nothrow | ||
{ | ||
uint arc4random() @safe; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why arc4random
should not be exposed publicly on platforms that offer it, together with arc4random_buf
and arc4random_uniform
... ? Perhaps given that these are system-dependent it might be better not to expose them via phobos, but I don't see any reason per se why they should not be made available.
Assuming it can be made public, I would suggest documenting the SecureARC4Random
, LegacyARC4Random
and AnyARC4Random
versions in its documentation, so that the user knows how to check for themselves what quality standards it meets for their platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why
arc4random
should not be exposed publicly on platforms that offer it, together witharc4random_buf
andarc4random_uniform
... ?
This PR has been edited not to introduce any new public symbols because new public symbols require @andralex's approval. See discussion: #6267 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If we wanted to be more D-esque, the arc4random_buf
implementation might be provided as arc4random_buf(void[] buf)
and invoke the C API version underneath the hood; or that could be provided as an additional overload.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has been edited not to introduce any new public symbols because new public symbols require @andralex's approval. See discussion: #6267 (comment)
Fair enough, but note that this does not mean we cannot pursue the compromise outlined above: we'd just have to split between the changes to unpredictableSeed
and the introduction of public arc4random
functions.
In the event of serious hassle over the latter, we could always revisit the question of which platforms get the under-the-hood arc4random
unpredictableSeed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mir-random does something along those lines, but abstracts away the difference between platforms.
size_t genRandomNonBlocking()(scope ubyte[] buffer) @nogc nothrow @trusted
{
pragma(inline, true);
return mir_random_genRandomNonBlocking(buffer.ptr, buffer.length);
}
mir_random_genRandomNonBlocking
(the awkward name is for C linkage) could be calling arc4random_buf
, or it could be using Linux's getrandom
syscall, etc. I think that's the right kind of interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, introducing new public symbols is a tricky business.
And even if we get an approval, it should probably go to std.experimental.random
as experience shows that it's really hard to get all use-cases right.
we wanted to be more D-esque,
There are a lot of things that could be done, e.g. ubyte[]
(and of course camelCase), but I doubt that a user is ever going to care about the underlying details. What they care about is:
- cross-platform
- blocking/non-blocking
@nogc
vs . GC- entropy quality
- speed
Sometimes these concerns can be combined, but they don't want to special-case their code just because their code might be used on OSX too.
Is there any reason why arc4random should not be exposed publicly on platforms that offer it,
See reasons above + it's really easy to do so yourself as a user if you really want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
introduction of public arc4random functions.
BTW declaration of extern(C)
functions is done in DRuntime. Phobos is supposed to contain the cross-platform high-level abstractions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I assumed, hence the remarks about not exposing them via phobos.
and in some cases may harm it. For an extreme example the Mersenne | ||
Twister has `2 ^^ 19937 - 1` distinct states but after `seed(uint)` is | ||
called it can only be in one of `2 ^^ 32` distinct states regardless of | ||
how excellent the source of entropy is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great piece of documentation, which I would suggest extending with discussion on how to more effectively seed generators (for example, MersenneTwisterEngine
exposes a method to seed the generator using an InputRange
of random words).
However, I would suggest not coupling it with these changes (it's independent of them) and submitting it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While of course, separating PR has advantages, they also come with an additional overhead of creating and reviewing them.
So it's really ok to combine doc updates of the function currently touched in a PR. No need for the extra work.
static bool seeded; | ||
static MinstdRand0 rand; | ||
if (!seeded) | ||
version (AnyARC4Random) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a conservative first option, I would suggest only using arc4random
for version (SecureARC4Random)
.
The consideration here is that in any case we still need to do something to improve unpredictableSeed
for platforms that don't define arc4random
. Platforms that have only a legacy arc4random
may well benefit more from sharing that longer-term solution, than from the legacy arc4random
. If we just switch now, the risk is that when the better solution is there, they don't get it, because everyone forgets that the legacy arc4random
question needs revisiting.
I think it's worth keeping the pressure to do better ourselves, than just outsourcing to a platform implementation that uses a deprecated standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consideration here is that in any case we still need to do something to improve
unpredictableSeed
for platforms that don't definearc4random
.
These other solutions will also be platform-specific. There is nothing on the immediate horizon better than arc4random
that will be usable on FreeBSD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These other solutions will also be platform-specific. There is nothing on the immediate horizon better than
arc4random
that will be usable on FreeBSD.
OK, fair enough.
Would you be OK with the idea of breaking out the dangers-of-unwise-seeding doc into a separate patch (just to separate out the concerns)? Otherwise, given the case you've made, I'm fine to move forward with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be OK with the idea of breaking out the dangers-of-unwise-seeding doc into a separate patch (just to separate out the concerns)?
Imho there's no need for this. PRs can have doc updates too, this is already a tiny PR and its related to this PR. This has never been an official criteria for a PR - it's only good practice to keep the focus of a PR small and to a single module only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imho there's no need for this. PRs can have doc updates too, this is already a tiny PR and its related to this PR.
You'll note I said separate patch in the comment you are responding to. (I did make an earlier remark about separate PR, but that was before responses that made clear that this PR should go forward essentially as-is.)
// cryptographically secure sources of randomness are needed. | ||
|
||
// Performance note: ChaCha20 is about 70% faster than ARC4, contrary to | ||
// what one might assume from it being more secure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor note/query: I believe at least some /dev/urandom
implementations use ChaCha20
. How does this impact the question of speed of /dev/urandom
versus arc4random
... ? Is it worth considering as a factor?
// of randomness, and also so other people reading this source code (as | ||
// Phobos is often looked to as an example of good D programming practices) | ||
// do not mistakenly use insecure versions of arc4random in contexts where | ||
// cryptographically secure sources of randomness are needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and one note that my review missed: this discussion comment is great, but I would suggest moving it inside the unpredictableSeed
function, where it is most relevant. Assuming that arc4random
might be made public, some of the discussion points could be covered in its documentation instead.
The roadmap forward would be adding more version blocks, so each subsequent PR only adds new cases rather than deleting or reversing previous PRs. Not removing MinstdRand0 in this PR means that there will have to be a little churn when it is replaced because those lines were touched by this PR as an indentation change, but besides that the structure of the code still lends itself to incremental upgrades.
I second this sentiment. |
FYI: I now put this PR on the merge-queue. Thanks again @n8sh for the hard work!
This doesn't block this PR - documentation updates can always follow-up after this has been merged ;-) |
No worries, your call ;-) I tend to encourage people to take every opportunity to separate out the concerns of a changeset, just to be clear what really goes together; but of course there are bigger fish to fry. Thanks @n8sh for a nice piece of work, and for all your patience and effort responding to questions. |
EDIT: reduced this PR just to using arc4random when available. Other cases left for another PR.
Use arc4random when available,
otherwise replace MinstdRand0 with xorshift64*/32.This pull request does not seek to make
std.random.unpredictableSeed
cryptographically secure, but just correct some basic deficiencies mentioned in https://issues.dlang.org/show_bug.cgi?id=18596:Proposed remedy:
XorShift64*/32 is one example. (Results of randomness tests and speed tests appear in charts in this paper.) It has the virtue that 0 is an illegal state, which means that we don't need a separate flag to indicate whether it is initialized.
And: