Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support generating random 64-bit values. #26741

Closed
tannergooding opened this issue Jul 10, 2018 · 30 comments · Fixed by #47085
Closed

Support generating random 64-bit values. #26741

tannergooding opened this issue Jul 10, 2018 · 30 comments · Fixed by #47085
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Milestone

Comments

@tannergooding
Copy link
Member

Rationale

It is sometimes desirable to generate random numbers as inputs to tests. This works nicely for the majority of the primitive types, but not for long/ulong.

As such, I propose a method be exposed that allows the generation of random 64-bit integer values.

Proposed API

public class Random
{
    public virtual long NextInt64();
    public virtual long NextInt64(long maxValue);
    public virtual long NextInt64(long minValue, long maxValue);
}
@Tornhoof
Copy link
Contributor

Shouldn't be the return value be long then?

@tannergooding
Copy link
Member Author

Shouldn't be the return value be long then?

Yep. Fixed

@stephentoub
Copy link
Member

In general I think such an addition makes sense, but some questions:

  • Would all three overloads actually be virtual?
  • How do you see the base implementations being implemented? In particular, if someone's derived from Random and overridden other virtuals, presumably those virtuals should be used (otherwise the new methods could utilize the wrong algorithm)... which methods would these delegate to?

@tannergooding
Copy link
Member Author

@stephentoub, the short answer to your questions is that they are virtual because the existing Random.Next() methods are virtual and they should behave/be implemented following the same rules as the existing algorithms (which is, overloading one does not impact the other two).

This follows the "existing" practice for the System.Random class, which AFAIK is preferred.

@stephentoub
Copy link
Member

stephentoub commented Jul 10, 2018

What about my second question? Would they be implemented entirely in terms of calls to Sample()?

@tannergooding
Copy link
Member Author

I don't think they should/can reliably delegate. The overall range of the existing methods isn't sufficient to provide good distribution for long/ulong (unless we default the range to the same as int).

If a user has already extended System.Random and overridden Next, they would likely need to reimplement NextInt64 as well. Thinking about it, they could also have already exposed a NextInt64 method, in which case this could be a "breaking" change for recompilation where warn as error is enabled.

An alternative would be to extend System.Random ourselves and not make these methods virtual at all.

@stephentoub
Copy link
Member

I don't think they should/can reliably delegate.

That's problematic then (it's why I was asking). Let's say I've created my own SecureRandom : Random type that uses RandomNumberGenerator under the covers, and I've overridden all of the virtuals to behave the way I need them to. Now we add this new base virtual method, but it's not implemented in terms of any of the existing virtuals. Someone starts using my type before I've had a chance to provide a new override, and they call the new methods, and they get functionality that behaves incorrectly.

Thinking about it, they could also have already exposed a NextInt64 method, in which case this could be a "breaking" change for recompilation where warn as error is enabled.

While true, that's a problem we face any time we add anything new... a new type could conflict with a type the customer named, a new method on anything that's not sealed could conflict with a method the customer added on a derived type, a new method could start to be targeted silently instead of an extension method of the same name the customer defined, etc.

@tannergooding
Copy link
Member Author

That's problematic then (it's why I was asking).

Right. But I don't think we can provide new overloads, short of implementing our own sub-class/providing a new Random type (or providing some kind of compat-switch), otherwise. But maybe that is a direction worth taking...

System.Random is already fairly "frozen-in-time". AFAIK, we can't readily change the algorithm (as it is a massive breaking change to anyone relying on the deterministic output from providing a fixed-seed) and because it isn't sealed it is hard to add new overloads/methods.

@danmoseley
Copy link
Member

Can't the default implementations just call the existing int implementations twice (and shift one), to create an appropriate long?

@tannergooding
Copy link
Member Author

Can't the default implementations just call the existing int implementations twice (and shift one), to create an appropriate long?

Yes, but that doesn't necessarily provide good distribution.

@tannergooding
Copy link
Member Author

It would also make the algorithm more complex in the scenarios where the user specifies a custom range.

@stephentoub
Copy link
Member

But maybe that is a direction worth taking...

For just this method? Meh, I don't think so. There must be ways of getting the desired distribution from one of the existing sources of randomness, whether Sample(), Next(...), NextBytes, NextDouble, etc., or some combination of them, even if it makes the NextInt64 method more expensive than if it wasn't delegating.

@danmoseley
Copy link
Member

Yes, but that doesn't necessarily provide good distribution.

It does if the existing API is producing a good distribution (mixing random bits gives random bits)?

It would also make the algorithm more complex in the scenarios where the user specifies a custom range.

Yes, but seems doable.

@tannergooding
Copy link
Member Author

It does if the existing API is producing a good distribution (mixing random bits gives random bits)?

Right, but because these are pseudo-random number generation algorithms they don't have perfect distribution and that can impact the overall set of numbers generated.

Finding out the distribution characteristics of the existing algorithm (when applied to generating Int64) would be something that needs to be done as part of the implementation process.

@morganbr
Copy link
Contributor

@tannergooding, are you saying that there's more of a relationship between two consecutive outputs from System.Random than there would be from the same algorithm extended to cover 64-bit numbers? I'd be very surprised if that was the case. One could make a case that the actual algorithm in System.Random is suboptimal in general, but I don't think the size of the output is important in that.

@jnm2
Copy link
Contributor

jnm2 commented Jul 11, 2018

I'm just curious how you would compose this. Let's say you wanted to generate a random number between 0 and 0x1_0000_0000, inclusive. If you generate the high 32 bits separately from the low 32 bits, and you use random.Next(0, 2) to do so, there is a 50% chance that the final 64 bit number will be 0x1_0000_0000 and a 50% chance that it will be any of 4 billion other numbers.

@tannergooding
Copy link
Member Author

are you saying that there's more of a relationship between two consecutive outputs from System.Random than there would be from the same algorithm extended to cover 64-bit numbers

I am saying that any inefficiencies or limitations of the current algorithm will be at risk of becoming more apparent.

Since the algorithm was designed for generating integers in the range of 0 to 2.14b, numbers outside that range are no longer guaranteed to be uniform (occur once in a cycle) and the cycle length will be halved (as we require at least two samples per generated number -- we already require 2 samples for negative 32-bit integers, and some 64-bit numbers will require 3 samples).

Additionally, our algorithm deviates from the "Subtractive Number Generator" given by Donald Knuth (https://github.com/dotnet/corefx/issues/23298), so we don't have the same guarantees of uniformity, cycle length, etc that we could potentially rely on to generate these numbers.

Our own documentation touches briefly on how to use System.Random to generate 64-bit values, and some of the downsides of doing so: https://docs.microsoft.com/en-us/dotnet/api/system.random?view=netcore-2.1#generate-random-64-bit-integers

@colgreen
Copy link
Contributor

colgreen commented Dec 3, 2018

Bear in mind (as per @tannergooding's comments) that System.Random has a number of issues (see #23198 for details) and thus probably should not form the basis of any work in the core product requiring a source of pseudo randomness.

For a PRNG implementation that addresses all of the known issues, see:

https://github.com/colgreen/Redzen/blob/master/Redzen/Random/Xoshiro256PlusRandom.cs
https://github.com/colgreen/Redzen/blob/master/Redzen/Random/RandomSourceBase.cs

or (same algorithm with more PRNG state):

https://github.com/colgreen/Redzen/blob/master/Redzen/Random/Xoshiro512StarStarRandom.cs

I understand the point about not wanting to change System.Random as there will be code that relies on its specific (and often quirky) behaviour, but it does have issues and ideally these would be addressed, perhaps as an alternative PRNG class.

Regarding the xoshiro PRNG versus the one in System.Random; there may be aspects of xoshiro that you find undesirable, however, some of the issues In System.Random are in how the random bits are utilised not the PRNG itself, and for those issues I believe the linked code above corrects all of them. (see #23198 and also the comments in the xoshiro source code).

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2020
@joperezr joperezr modified the milestones: 5.0.0, Future Jul 6, 2020
@joperezr
Copy link
Member

joperezr commented Jul 6, 2020

@tannergooding I believe this looks ready to review, so feel free to flag it as such, unless @stephentoub's concern above makes this not viable.

@Symbai
Copy link

Symbai commented Jul 25, 2020

Any update on this? I was about to request the same APIs (edit: and an API for ulong as well) before I saw this old discussion.

@colgreen
Copy link
Contributor

Any update on this? I was about to request the same APIs (edit: and an API for ulong as well) before I saw this old discussion.

Feel free to use https://www.nuget.org/packages/Redzen/ in the meantime :)

@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Jul 27, 2020
@danmoseley
Copy link
Member

danmoseley commented Jul 31, 2020

Building on @colgreen comments, perhaps we could use this as an opportunity to offer a better PRNG just for the 64-bit API. It would probably be odd to have two generators in the same type, but perhaps it could be a derived class ie

public class Random64 : Random
{
    public virtual long NextInt64();
    public virtual long NextInt64(long maxValue);
    public virtual long NextInt64(long minValue, long maxValue);
}

We could use the opportunity to override Next, NextBytes, etc and implement them with the same improved PRNG.

@Symbai
Copy link

Symbai commented Jul 31, 2020

Is there a reason why it HAS to be signed value exclusively? Why not having unsigned support as well for ulong and uint? Especially while we already discussing about improving the random class...

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Aug 7, 2020
@terrajobst
Copy link
Member

terrajobst commented Aug 7, 2020

Video

  • Makes sense
  • Let's also add NextSingle()
namespace System
{
    public class Random
    {
        public virtual long NextInt64();
        public virtual long NextInt64(long maxValue);
        public virtual long NextInt64(long minValue, long maxValue);
        public virtual float NextSingle();
    }
}

@tannergooding tannergooding added the help wanted [up-for-grabs] Good issue for external contributors label Aug 7, 2020
@danmoseley
Copy link
Member

@tannergooding @GrabYourPitchforks did we resolve the question of whether the 64 bit values could use a "better" PRNG? There is no back compat burden on these new API's, and they could use the same seed value as determined today (which is not perfect as after using hardware entropy, if unspecified it will use regular r.NextInt -- but that could be changed without a break, as well). The only drawback would be two codepaths in Random - which is a small burden on us, only.

vorotynsky added a commit to vorotynsky/runtime that referenced this issue Aug 26, 2020
Created method in System.Random for generating longs and floats.

Fix dotnet#26741
vorotynsky added a commit to vorotynsky/runtime that referenced this issue Aug 26, 2020
It should make the algorithm more stable.

Fix dotnet#26741
vorotynsky added a commit to vorotynsky/runtime that referenced this issue Aug 26, 2020
Fix dotnet#26741

Signed-off-by: Vorotynsky Maxim <vorotynsky.maxim@gmail.com>
@stephentoub
Copy link
Member

@tannergooding, @GrabYourPitchforks, any progress on how this should be implemented? f56fe49 is attempting to add a custom 64-bit generator, based on changing the 32-bit one to 64-bit, and that doesn't seem like a good path.

@colgreen
Copy link
Contributor

colgreen commented Jan 15, 2021

I will describe the approach I took in Redzen (my own little side project and nuget package) - just as food for thought really.

All code is here: https://github.com/colgreen/Redzen/tree/master/Redzen/Random

I defined an interface IRandomSource that more or less mirrors the API of System.Random, but with updates/additions, so e.g. there exist methods Next(), Next(int), NextBytes(Span<byte>), etc.

There are multiple implementations of that interface, each named after a specific PRNG algorithm/method, e.g. Xoshiro256StarStarRandom.cs is one of the current PRNGs published by David Blackman and Sebastiano Vigna. There is also a legacy PRNG in there for backwards compatibility reasons - as that was originally the only PRNG in the redzen library.

The next layer is a static class RandomDefaults which has these static methods:

public static IRandomSource CreateRandomSource()
public static IRandomSource CreateRandomSource(ulong seed)

Hence, as owner of the library, I have chosen what I think is a good default PRNG right now, and will construct instances of it behind those general purpose static methods. A user of the lib can choose to use those methods (i.e., trust my choice of PRNG), or construct a specific PRNG class directly.

In future I may add new PRNGs and change the default PRNG behind those static methods, e.g. there's a reasonable chance I will switch to wyrand/wyhash at some point (see colgreen/Redzen#10)

The point being that non-crypto PRNGs are an area of ongoing research, so it's good to acknowledge this in the design of the library and API. I.e. whatever PRNG we choose today may not be desirable in the future.

With regards to 64bit random generation, the redzen PRNGs are all fundamentally 64 bit under the hood, and any methods that require less bits just take the high bits they need (for many PRNGs the high bits are higher quality randomness). And for e.g. wyrand, I think it can use CPU intrinsics in some way to greatly improve performance for 'chunks' of 64 bits. If instead a fundamentally 32 bit PRNG was used, and two blocks of 32bits concatenated, then that may result in new weaknesses (as measured by the various PRNG statistical quality test suites).

I guess a similar pattern to what I'm describing above is the API around creating Encryption classes, e.g. AsymmetricAlgorithm.Create

@tannergooding
Copy link
Member Author

any progress on how this should be implemented? f56fe49 is attempting to add a custom 64-bit generator, based on changing the 32-bit one to 64-bit, and that doesn't seem like a good path.

Assuming we are confident that InternalSample() is producing 32 random bits, then I think we are fine with just getting two int and combining them for long NextInt64().

For long NextInt64(long maxValue), I think we need to create a new GetSampleForLargeRange(). This is because while double is 64-bits, the returned sample is effectively between 0.0 and 1.0 which is only 0x3FF0000000000000 unique values (and with those values impacting scaling in various interesting ways given they themselves aren't evenly distributed.

@stephentoub
Copy link
Member

Assuming we are confident that InternalSample() is producing 32 random bits, then I think we are fine with just getting two int and combining them for long NextInt64().

If I'm understanding the current implementation correctly, I think it's producing 31 bits... ish, in that it's producing a value in the range [0, int.MaxValue).

@stephentoub stephentoub self-assigned this Jan 17, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 17, 2021
@GrabYourPitchforks
Copy link
Member

IMO, @stephentoub's latest PR hits the right notes. Provide the option to do a better thing (with no concrete implementation details promised) in the common case of the parameterless ctor being called. And if a seed is provided, the results are still repeatable, even for calls to the new NextInt64 API.

Since it also seems that our stance is that calling the parameterless ctor is not guaranteed to result in any particular implementation and is not guaranteed to be repeatable, this also puts to rest once and for all that we're not going to mark this type [Serializable].

@stephentoub stephentoub removed the help wanted [up-for-grabs] Good issue for external contributors label Jan 20, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 21, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Feb 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.