Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Idiomatic way to specify the length of an arbitrary vector #297

Open
rrybarczyk opened this issue Aug 12, 2021 · 7 comments
Open

Q: Idiomatic way to specify the length of an arbitrary vector #297

rrybarczyk opened this issue Aug 12, 2021 · 7 comments

Comments

@rrybarczyk
Copy link

Is there documentation on the idiomatic way to specify the length of an arbitrary vector?

My specific example is this:

I have a struct with fields containing various data types (like max 64-bytes, max 255-bytes, exactly 32-bytes, etc.) that I need to implement Arbitrary for. As an example, I will use one field in the Foo struct that represents a vector of 255 bytes.

struct Foo {
  bar: B0255
}

Where

struct B0255<'b>(Inner<'b>);

enum Inner<'a> {
   Ref(&'a [u8]),
   Owned(Vec<u8>),
}

To test this struct, I have created a new struct and implemented Arbitrary, which calls a function from_random which is implemented for Foo. from_random is needed because Arbitrary needs to be implemented in another crate.

struct RandomFoo(Foo);
impl Arbitrary from RandomFoo {
  fn arbitrary(g: &mut Gen) -> Self {
    RandomFoo(Foo::from_random(g))
  }
}

Currently, I am passing in the generator to the from_random function and generating the 255 byte vectors as:

impl Foo {
  fn from_random(g: &mut Gen) -> Self {
    let mut bar = Vec<u8>::arbitrary(g);
    bar.truncate(255);
    let bar: B0255 = bar.try_into().unwrap();
    Foo { bar }
  }
}

Here I am truncating the vector to avoid a panic caused by a vector that is too big. Was thinking that the best approach may be having the try_into return the right error when called with a value that is too big, but I am not sure if this is possible as Arbitrary needs to return Self?

Additionally, when from_random is implemented for fixed size primitives (eg U256) should resize(255, 0) be used instead of truncate?

Alternatively, I can implement from_random as:

impl Foo {
  fn from_random(g: &mut Gen) -> Self {
    let mut bar: B0255 = Gen::new(255);
    bar: B0255 = Vec::<u8>::arbitrary(&mut bar).try_into().unwrap();
    Foo { bar }
  }
}

Which, if either, from_random implementation is the idiomatic way to generate a vector of a specific size?

@neithernut
Copy link

neithernut commented Aug 13, 2021

In the past, I used something like the following in such situations:

let mut g = Gen::new(g.size() / n);
let v: Vec<_> = std::iter::from_fn(|| Some(Arbitrary::arbitrary(&mut g))).take(n).collect()

Obviously not the most pretty thing to write. And you must take care not to forget wrapping Arbitrary::arbitrary(g), otherwise you'll end up with fewer elements than you expected. On the plus side, you definitely don't generate any values you'll truncate away anyway.

If/when #292 gets merged, you can also use the following pattern:

let mut v = vec![Default::default(); N]; // or use `Vec::new` and `Vec::resize`
Gen::new(g.size() / N).fill_slice(v.as_mut());

One day, #282 may also provide an option if your target size is known at compile time:

let buf: [_; N] = Arbitrary::arbitrary(g);
let v: Vec<_> = buf.into();

@neithernut
Copy link

neithernut commented Aug 13, 2021

Stupid me just realized that you can just do:

let mut v = Vec::new();
let mut g = Gen::new(g.size() / n);
v.resize_with(n, || Arbitrary::arbitrary(&mut g));

@Fi3
Copy link

Fi3 commented Aug 13, 2021

@neithernut ty!

So you are suggesting to create a new generator in the from_random function and do not use the one available in Arbitrary::arbitrary?
Can it give any problem when implementing shrink?
Asking cause right now we are using the generic shrink but we plan in the future to implement a correct shrink so we would come out with a solution that play well with it.

@neithernut
Copy link

It depends. With recursive structures, you usually want to either create a new generator with a reduced size in order to ensure the recursion is bounded. In such cases, you also want to specialcase the generator having a size of 0. But if you want to generate a Vec<u8> or a Vec of some other trivial type there's no need to create a new generator. Creating a new generator with the current interfaces also comes with the caveat that you loose reproducibility.

Can it give any problem when implementing shrink?

Whether or not you create a new generator doesn't affect shrinking at all. Shrinking in itself is both independent from the generator and deterministic (unless you do very weird things in your shrinker).

@Fi3
Copy link

Fi3 commented Aug 19, 2021

Ok. Is not very clear why should I shrink if I lose reproducibility? Is not goal of shrinking to have a minimal failing case? If I do not have reproducibility maybe the shrinked case is not failing anymore. Just asking for future reference, right now I will implement from random following your above example, and use the default shrink function.

Ty again

@neithernut
Copy link

Ok. Is not very clear why should I shrink if I lose reproducibility?

You loose reproducibility in the generation when creating a new Gen instance for sub-structures. Which is, btw., only relevant if you control or extract the seed used to create the initial Gen in the first place. But still, any test should be reproducible for a given value. It's only the generation itself that will lack reproducibility.

Shrinking should always be deterministic and only operate on the value itself. For example, you should never use Arbitrary::arbitrary, Gen or any other source of randomness in your shrinker.

Sorry if I wasn't clear enough on that front.

If I do not have reproducibility maybe the shrinked case is not failing anymore.

quickcheck will apply the test on every value yielded by the shrinker, select a value for which the test (still) fails and repeat (by shrinking that value) until the failure can't be reproduced any more. The whole process works under the assumption that a test is reproducible for given inputs as explained above. So, if quickcheck reports a shrunk value, that value will reproduce the failure (if said assumptions hold).

... and use the default shrink function.

Note that the default shrink function supplied by the Arbitrary trait will not yield any shrunk values, i.e. quichcheck will always report the initial reproducer for which a test failed. This may be a valid choice if your value is simple and/or can't be shrunk realistically.

The initial post by @rrybarczyk talks about a Vec<u8> with a fixed lengh. As the Vec's length should (probably) be constant under shrinking, the only reasonable strategy appears to shrink each individual value in that Vec. The shrinker would then return Vecs for which a single position differs from the original (we'd let the aforementioned repetition take care of the rest). Depending on what exactly you test for, such a shrinker may not be worth the effort.

@rrybarczyk
Copy link
Author

Thank you so much @neithernut!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants