Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The random selection doesn't appear to be very random #119

Closed
sgrif opened this issue Jan 15, 2016 · 17 comments
Closed

The random selection doesn't appear to be very random #119

sgrif opened this issue Jan 15, 2016 · 17 comments

Comments

@sgrif
Copy link
Contributor

sgrif commented Jan 15, 2016

I have a type which wraps an i32, and wrote an Arbitrary impl like so:

trait Arbitrary<PgDate> {
    fn arbitrary<G: Gen>() -> Self {
          PgDate(i32::arbitrary())
    }
}

When I actually output what it's running with, the values are only ranging from 100 to -100, instead of 100 completely random numbers as I'd expect. Additionally, shrink appears to be flawed on these types. My understanding was that quickcheck would test for known problematic values for a given type. For i32 I'd expect that to at minimum be 1, 0, -1, i32::MAX and i32::MIN, but when I add the shrink like so:

fn shrink(&self) -> Box<Iterator<Item=Self>> {
    Box::new(self.0.shrink().map(PgDate))

I just see the same range of -100 to 100. This caused a critical bug that occurs for any values less than -2451545 to go unnoticed.

@BurntSushi
Copy link
Owner

@sgrif I think the original idea here is that the size method on quickcheck::Gen would control the magnitude of the integers generated, but perhaps that is a misuse. With #116 merged, I think there is now good precedent for biasing towards known problematic values.

@vi
Copy link
Contributor

vi commented Jan 31, 2016

It is somewhat reversed #99: numbers being biased too much towards -100...100.

I'm going to prepare a reformed number generation analogous to char (but simpler) which should cover entire range, yet preferring some particular numbers.

@bluss
Copy link
Contributor

bluss commented Mar 15, 2016

There needs to be a new api for requesting a quickcheck-controlled size value. Something that's appropriate for selecting the size of a datastructure to generate. I think I mostly use usize's Arbitrary impl for that at the moment.

@mulkieran
Copy link

Experienced a problem with some tests: stratis-storage/stratisd@0f7838d, all u64 values were less than minimum required, so all testt were discarded.

@bluss
Copy link
Contributor

bluss commented Jan 31, 2017

With current behaviour that is expected @mulkieran and you simply need a wrapper type that defines the distribution of values that you need.

@mulkieran
Copy link

@bluss Can you give me an example?

@mulkieran
Copy link

if anyone who understand this problem can give me some help, I'ld appreciate it.

@bluss
Copy link
Contributor

bluss commented Feb 2, 2017

Something like this to make a wrapper type that puts a value inside using the regular Rand impl (not Arbitrary)

extern crate quickcheck;
use quickcheck::Arbitrary;
use quickcheck::Gen;

extern crate rand;
use rand::{Rand, Rng};

#[derive(Copy, Clone, Debug)]
pub struct Random<T>(pub T);

impl<T> Arbitrary for Random<T>
    where T: Rand + Clone + Send + 'static
{
    fn arbitrary<G: Gen>(g: &mut G) -> Self {
        Random(g.gen())
    }
}

@vi
Copy link
Contributor

vi commented Feb 3, 2017

Maybe it's time to make a change like #99, but for numbers?

  1. With 1/4 probability, pick a number from [-100,100]
  2. With 3/4-1/25 probability, pick random bit pattern interpreted as a number
  3. With 3/100 probability, pick a power of two + [-3,3]
  4. With 1/100 probability, pick a number from some pre-selected set of tricky numbers that have more than one bug about and not already included in "1." or "3.". Some typical constants for CRC or prime numbers may appear there.

That could make "sane default" for multiple varied uses. Obviously one could override the distribution to task-specific.

@mulkieran
Copy link

I would always ask myself what Hypothesis (https://github.com/HypothesisWorks/hypothesis-python) does. But it might not be such a useful question for this problem in this situation.

@quark-zju
Copy link

The Haskell implementation seems to first decide maximum bits the generated number has. Then generate a number within the range (-2 ^ max_bits, 2 ^ max_bits). Doing something similar is possible but doing it without regressing performance would be tricky.

Current workaround would be setting environment variable QUICKCHECK_GENERATOR_SIZE=18446744073709551615 on 64-bit platform. Not sure how feasible it is to change the default gen.size() from 100 to usize::MAX. It seems it does not hurt performance at least.

@maxbla
Copy link
Contributor

maxbla commented Aug 12, 2019

@BurntSushi I'd like to hear your thoughts about following propositions addressing this issue.

  1. Ensure that every valid value* for each integer type has some positive probability of being returned by Arbitrary
  2. Completely ignore the size of Gen (for integer types)
  3. Have a greater chance to pick problematic values
    • The most problematic values for any type are probably 0, MAX_VALUE and MIN_VALUE, but I could see -1 and 1 being included in this list along with all numbers at "bit transitions" i.e. 3,4,7,8,15,16
  4. Use a uniform distribution (besides problematic values)
    • Another candidate distribution is the Geometric/Exponential distribution

*valid values for an integer type T means every value in T::min_value()..=T::max_value()

@BurntSushi
Copy link
Owner

@maxbla Thanks for writing that out! I think that all seems pretty reasonable to me.

@romanb
Copy link

romanb commented Aug 12, 2019

@maxbla Have you seen #221 ? I closed it after ~6 months because there was apparently no interest in merging it, but maybe there is still something useful in there for you, if you want to revive the topic with a new PR, since it implements something very similar to what you propose (the size parameter is not ignored but acts as a weighted bias towards a preferred range).

@BurntSushi
Copy link
Owner

I closed it after ~6 months because there was apparently no interest in merging it

There is interest. But I maintain dozens of projects, so it can take me a very long time to merge PRs and/or it's very easy for me to lose track of PRs. Editing comments to include status updates probably doesn't help, as was done in #221. I don't get emails when edits are made.

@romanb
Copy link

romanb commented Aug 12, 2019

There is interest. But I maintain dozens of projects, so it can take me a very long time to merge PRs and/or it's very easy for me to lose track of PRs. Editing comments to include status updates probably doesn't help, as was done in #221. I don't get emails when edits are made.

No worries and I did not mean to criticize, of course. It's just a matter of my personal PR hygiene that I close unmerged PRs on my own after a while when I've also moved on to other things and no longer intend to keep them updated / mergeable.

@maxbla
Copy link
Contributor

maxbla commented Aug 19, 2019

@romanb yep, I saw #221, and thanks, it was helpful in creating #240.

BurntSushi pushed a commit that referenced this issue Dec 27, 2020
This commit tweaks the Arbitrary impls of number types (integers,
floats) to use the full range with a small bias toward "problem" values.
This is a change from prior behavior that would use the `size` parameter
to control the range of integers.

In retrospect, using the `size` parameter this way was probably
misguided. Instead, it should only be used to control the sizes of data
structures instead of also constraining numeric ranges. By constraining
numeric ranges, we leave out a huge space of values that are never
tested.

Fixes #27, Fixes #119, Fixes #190, Fixes #233, Closes #240
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants