-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discussion on uniform float distributions #85
Comments
I don't think number 2 captures how the functions in So there is no rounding, no infinitely long stream and only one number from the RNG is used. |
Note that we decided against keeping |
I forgot to say: thank you for studying this 😄. There was already an issue open how and what to merge back into Back when I added #1 there surprisingly seemed to be no performance difference between the normal methods and the ones with extra precision. But there was a problem with the benchmarks, and I had a lot to learn (better now, but still true 😄). Turns out they are slower, as expected. Now I think the fixed precision variants are a better default. I agree For the higher precision variants it is another matter, because it needs different strategies for |
(Assuming
Method 2 is more general and ignoring rounding, it is functionally equivalent except for the last case, for which it is a generalization. Note that the results would be different because of the implementation, but the outcomes would be equally probable except for the last case. Basically, for method 2 you get this ignoring rounding:
|
For what it's worth, I have some code doing this, and I just cleaned it up a bit. It looks like this: struct BitBuffer {
bits: u32,
len: u32,
}
impl BitBuffer {
fn new() -> BitBuffer {
BitBuffer { bits: 0, len: 0 }
}
fn fill<R: Rng + ?Sized>(&mut self, rng: &mut R) {
self.bits = Rng::next_u32(rng);
self.len = 32;
}
// gets n random bits
fn get_bits<R: Rng + ?Sized>(&mut self, mut n: u32, rng: &mut R) -> u32 {
assert!(n <= 32);
if n == 0 {
return 0;
}
if self.len == 0 {
self.fill(rng);
}
let mut ans = 0;
if n > self.len {
n -= self.len;
ans |= self.bits << n;
self.fill(rng);
};
self.len -= n;
ans |= self.bits >> self.len;
self.bits &= !(!0 << self.len);
ans
}
// finds the first significant bit, if available in first max_bits bits
fn first_one<R: Rng + ?Sized>(&mut self, max_bits: u32, rng: &mut R) -> Option<u32> {
let mut prev_leading = 0;
loop {
// count only leading zeros in available bits
let add_to_leading = self.bits.leading_zeros() + self.len - 32;
let leading = prev_leading + add_to_leading;
// check if we have no ones in initial max_bits
if leading >= max_bits {
// skip consumed zeros
self.len -= max_bits - prev_leading;
return None;
}
// check if we have some significant bit
if add_to_leading < self.len {
// skip not just zeros, but also first significant bit
self.len -= add_to_leading + 1;
self.bits &= !(!0 << self.len);
return Some(leading);
}
// we do not have enough bits yet
prev_leading = leading;
self.fill(rng);
}
}
}
// The random number generated is 0.*, where * is an infinitely long
// stream of random bits.
//
// If * starts with (BIAS - 1) zeros, the number is subnormal. For a
// subnormal number, we set the biased exponent to zero and the
// mantissa bits to the random bits after the leading (BIAS - 1)
// zeros.
//
// For a normal number, we set the biased exponent to (BIAS - 1 -
// leading zeros), then we skip the one bit that stopped the stream of
// leading zeros which will be encoded implicitly. Then we set the
// mantissa bits to the random bits after that first significant bit.
//
// For both subnormal and normal numbers, we have to check rounding.
// We are rounding to the nearest. A tie is not possible since * is an
// infinitely long stream of random bits; so in this case we get a
// random bit to decide whether to round down or up.
pub fn uniform01_f32<R: Rng + ?Sized>(rng: &mut R) -> f32 {
const BIAS: u32 = 127;
const MANT_BITS: u32 = 23;
let mut bits = BitBuffer::new();
let mut u = match bits.first_one(BIAS - 1, rng) {
None => 0,
Some(first) => (BIAS - 1 - first) << MANT_BITS,
};
u |= bits.get_bits(MANT_BITS, rng);
// Handle rounding. For rouding up, just add 1 to u, which always
// moves to the next floating-point number above.
if bits.get_bits(1, rng) == 1 {
u += 1;
}
unsafe { mem::transmute(u) }
} |
Good explanation. That is mostly how the source of the idea was described in https://readings.owlfolio.org/2007/generating-pseudorandom-floating-point-values/ (without a practical way to implement it though). If we are only thinking in terms of |
Requiring 16 |
Found something interesting: suppose you want to generate a random The chance to get a certain value depends on how close it is to zero, with the smallest values having a chance of 1 in 1076. To be able to generate such a number the RNG must have a period of at least 2^1076. |
Didn't see your post before commenting. For myself I work with the number that today's hardware on a single processor can generate about 2^30 values per second. That means ~2^55 per year, and 2^64 takes about 500 CPU-years. Any conversion method that just takes a single |
Yeah, it is pretty unlikely to require two |
So, to summarise, we have:
Separate to that, we have five questions:
Are these details likely to matter?
For many cases, the exact details are not important. In some ways, allowing sampling of 1 but never 0 makes more sense as a default than the status quo, but I'm uncomfortable both making this breaking change and bucking the norm. Do we need Not really sure at this point, but benchmarks of various optimised implementations would be nice. Edit: added questions 4 and 5 and a few details. |
Of the variant "Generate values uniformly distributed between 0 and 1-ε with step ε (optionally offset)", the ranges Step 1 is always to make a float in the range |
Which implies it would be nice to move the Note: I had a glance at dSFMT and it looks like it's not a "native FP generator" anyway, but works similar to some of the methods mentioned here, except that it only generates 52 random bits. |
👍 from me. A benchmark (warning: tempered with 😄; I changed the names):
Xorshift is included for reference, the time the conversion takes is the difference with Xorshift. The three higher-precision variants are ±3× slower than the fixed precision variants. Generating The large difference in performance is (was) a bit surprising to me. Besides |
Hmm. Turns out the extra branch and jump are inserted because counting the number of trailing zero's of Manually setting a bit makes LLVM optimize out the branch ( #[inline(always)]
fn closed_open01(self) -> $ty {
const MIN_EXPONENT: i32 = $FRACTION_BITS - $FLOAT_SIZE;
#[allow(non_snake_case)]
let ADJUST: $ty = (0 as $uty).binor_exp(MIN_EXPONENT);
// 2^MIN_EXPONENT
// Use the last 23 bits to fill the fraction
let fraction = self & $FRACTION_MASK;
let remaining = self >> $FRACTION_BITS;
// Use the remaing (first) 9 bits for the exponent
let exp = - ((remaining | (1 << $FRACTION_BITS))
.trailing_zeros() as i32);
if exp >= MIN_EXPONENT {
fraction.binor_exp(exp)
} else {
fraction.binor_exp(MIN_EXPONENT) - ADJUST
}
} Checking if the value is not zero is faster, but LLVM still inserts an unnecessary extra conditional jump ( #[inline(always)]
fn closed_open01(self) -> $ty {
let fraction = self & $FRACTION_MASK;
let remaining: $uty = self >> $FRACTION_BITS;
if remaining != 0 {
let exp = remaining.trailing_zeros() as i32;
fraction.binor_exp(-exp)
} else {
const MIN_EXPONENT: i32 = $FRACTION_BITS - $FLOAT_SIZE;
let adjust: $ty = (0 as $uty).binor_exp(MIN_EXPONENT);
// 2^MIN_EXPONENT
fraction.binor_exp(MIN_EXPONENT) - adjust
}
} Anyway, a little optimization is still possible, the limit is probably something like |
So would a good strategy be to generate fixed-precision
If you're both happy with the rough plan, I guess we start by merging rust-random#237 (or my reshuffle of it), then open PRs for each other change. @tspiteri @pitdicker? |
Initially I was arguing that the random in I like the And I also like that the In short, I like @dhardy's proposal. |
I am against including this range, but do see some interesting uses. Two reasons:
To be honest I don't see the use. Well, I can see some use, but think there are more 'strange' ranges that could be interesting. And where do you stop? It is not the only 'special' or 'obvious' range It is not a useful range for users, it only reveals an implementation detail We can do something similar (and better) in the If we write a custom conversion in the
I would pick this one,
I think the benefit is too small to make it the default, considering the performance is less than fixed precision.
I don't think any more precision buys us much. The cost is that the number of rounds used from the RNG becomes variable, instead of this being a simple conversion function. But I don't think it will be much slower.
Aren't these two basically the same, as far as the end user is concerned? So in the end I would only pick two ranges:
Of all the possible other ranges I do think the |
Just to keep this somewhere... I tried to find an alternative to counting zero's. A nice trick is to convert the bits that are not used for the fraction ( fn closed_open01(self) -> $ty {
let fraction = self & $FRACTION_MASK;
let remaining = self >> $FRACTION_BITS;
if remaining != 0 {
unsafe {
let mut exp: $uty = mem::transmute(remaining as $ty);
exp -= ($FLOAT_SIZE - $FRACTION_BITS) << $FRACTION_BITS;
mem::transmute(fraction ^ exp)
}
} else {
const MIN_EXPONENT: i32 = $FRACTION_BITS - $FLOAT_SIZE;
let adjust: $ty = (0 as $uty).binor_exp(MIN_EXPONENT);
// 2^MIN_EXPONENT
fraction.binor_exp(MIN_EXPONENT) - adjust
}
} But in the end the performance is slightly less than the zero counting method ( |
@pitdicker if we can get fast conversion to specified ranges via
0-1 with 32/64 bits of precision may be a useful compromise sometimes. Not sure. Having a maximal precision 0-1 range would be useful sometimes I think — though since it is not quite trivial to convert to a high-precision Maybe in the end we should have two or three range types:
|
IIRC, dSFMT provides some |
I am more and more leaning towards not caring much for closed/open ranges with floating point ranges, because after many hours of trying I still can't get all the rounding errors out of the floating point range code (first try back in September? last try in January). For some ranges it is even impossible to represent the size of the range, while the start and end can be exactly represented. In those cases there is no way to get the bounds 'right'. So for primitives like the For integer ranges I once had the thought that both the current range code and a less uniform one make sense. Now we lose some performance to checking the Than for both integer ranges and floating point ranges we would have a choice between high accuracy and best performance. I am getting a little more time again, and see you distribution PR has landed 🎉. Shall I try to work on this? |
Did some tests, both 32/64 bits precision and maximum precision have the same performance. Let's just go with maximum then. Definitely better than having three choices. |
Yes, I decided to merge that. I also completed a merge back into my master branch — phew, that was hard! So now is probably a good time to bring over (or redo) FP sampling / range code if you're interested. |
I think perhaps we should keep the
|
What is left to do here? I don't think we need |
Yes, probably nothing. But lets leave this open at least until we have some high-precision range integrated. |
I've had a look at
FloatConversions
andUniform01
,Closed01
andOpen01
. I'm not sure all the features should be there. Personally I would only provide the following two features, one from a generator perspective, and the other more from a distribution perspective:A way to generate a random number similar to the
next_f32
ornext_f64
in rand 0.4, and which is now inFloatConversions::open_closed01_fixed
. I would not provide any of the other five functions inFloatConversions
and think that if this is not sufficient, probably what is required is something from the uniform distribution. I prefer the situation with rand 0.4 here, wherenext_f32
andnext_f64
are part of the generator.For the uniform distribution inside the
distributions
module, I would provideUniform01
only and removeClosed01
andOpen01
, and for the implementation I would specify that it is equivalent to generating a floating-point number with arbitrarily large precision with a continuous uniform distribution in the range [0,1], and rounding.Number 1 is I think the most common way to simply generate a random float, and it always produces m bits of randomness where m is the number of mantissa bits. The output is a multiple of ɛ that is < 1, and every possible outcome has a probability of occurrence = ɛ. This is what I would expect from a generator. It is very similar to generating integers, except that the randomness is in the mantissa only.
Number 2 is from the distribution point of view. With rounding to the nearest, both 0.0 and 1.0 are possible outcomes. This is functionally equivalent to producing 0.* where * is an infinitely long random bit stream, and then rounding. If * is an infinite stream of zeros, then the number is exactly zero, and if * is an infinite stream of ones, then the number is exactly one.
For this to be done correctly, more than one
u32
may need to be generated for a randomf32
if initially we get lots of zeros. But this is improbable so it should not be considered a performance issue. For example forf32
, if the first generatedu32
does not start with eight zeros, it is sufficient: let's say the generated number is seven zeros followed by 25 ones. Then the output should be0.0000_000
followed by 25 ones. This would produce anf32
equal to (2−ɛ)×2−8, but 2−ɛ only requires 24 of our ones. The 25th significant bit is used to see if we should round down or up, so if that bit is zero we round down returning (2−ɛ)×2−8, otherwise we round up returning 2−7. Note that we do not need to care about ties-to-even here, since * is infinitely long a tie has zero probability.For three or more
u32
numbers to be required, the firstu32
has to be exactly zero, and the secondu32
must start with seven zeros. So the probability that oneu32
is required is 1−2−8, the probability that twou32
values are required is 2−8−2−40, and the probability that three or more are required is 2−40.The text was updated successfully, but these errors were encountered: