Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add auto-derivators for structs and enums #98

Closed
vi opened this issue Sep 28, 2015 · 18 comments
Closed

Feature request: Add auto-derivators for structs and enums #98

vi opened this issue Sep 28, 2015 · 18 comments

Comments

@vi
Copy link
Contributor

vi commented Sep 28, 2015

I want quickcheck to have a compiler plugin that automatically generates Arbitrary impl for structs and enums composed of Arbitrary types, like with serialisation.

That should shrink

extern crate quickcheck;

use quickcheck::Arbitrary;

#[derive(Clone,Debug)]
struct Mumu {
    a : u32,
    b : String,
}

#[derive(Clone,Debug)]
enum Qqq {
    Lol,
    Ror,
    Kokoko(Mumu),
    Totot(u32),
}

impl Arbitrary for Mumu {
    fn arbitrary<G : quickcheck::Gen>(g:&mut G) -> Mumu {
        Mumu {a : Arbitrary::arbitrary(g), b : Arbitrary::arbitrary(g) }
    }
    fn shrink(&self) -> Box<Iterator<Item=Self>> {
        Box::new ( (self.a.clone(), self.b.clone()).shrink().map(|(aa,bb)| Mumu { a: aa, b:bb}))
    }
}

impl Arbitrary for Qqq {
    fn arbitrary<G : quickcheck::Gen>(g:&mut G) -> Qqq {
        let y = g.next_u32() % 4;
        match y {
            0 => Qqq::Lol,
            1 => Qqq::Ror,
            2 => Qqq::Kokoko(Arbitrary::arbitrary(g)),
            3 => Qqq::Totot(Arbitrary::arbitrary(g)),
            _ => panic!(),
        }
    }
    fn shrink(&self) -> Box<Iterator<Item=Self>> {
        match self {
            &Qqq::Totot(ref x) => Box::new(x.shrink().map(|s| Qqq::Totot(s))),
            &Qqq::Kokoko(ref x) => Box::new(x.shrink().map(|s| Qqq::Kokoko(s))),
            _ => quickcheck::empty_shrinker(),
        }
    }
}



#[test]
fn it_works() {
    fn qqq(x : Qqq) -> bool {
        if let Qqq::Kokoko(Mumu { a : v, b: _ }) = x {
            return v != 4
        }
        true
    }

    quickcheck::QuickCheck::new().tests(1000_000_000).quickcheck(qqq as fn(Qqq) -> bool);
}

to

extern crate quickcheck;

use quickcheck::Arbitrary;

#[derive(Clone,Debug,Arbitrary)]
struct Mumu {
    a : u32,
    b : String,
}

#[derive(Clone,Debug,Arbitrary)]
enum Qqq {
    Lol,
    Ror,
    Kokoko(Mumu),
    Totot(u32),
}

#[test]
fn it_works() {
    fn qqq(x : Qqq) -> bool {
        if let Qqq::Kokoko(Mumu { a : v, b: _ }) = x {
            return v != 4
        }
        true
    }

    quickcheck::QuickCheck::new().tests(1000_000_000).quickcheck(qqq as fn(Qqq) -> bool);
}
@BurntSushi
Copy link
Owner

I've often thought of having something like this, but in practice, almost every Arbitrary impl I write needs to encode some key invariants around the values being generated or shrunk. This would defeat the use of a plugin to do any automatic derivation.

Popping up a level, I'm a little down on plugins since their stabilization future is probably still a far way off. As such, I personally probably won't work on something like this any time soon.

@aldanor
Copy link

aldanor commented Jan 29, 2017

Bumping this since proc-macros are getting stabilised in a week or two, and a derive-macro is very easy to implement.

This would be potentially very useful for generating Arbitrary implementations for simple structs without complex invariants.

@burdges
Copy link

burdges commented Jan 30, 2017

If automatic derivation is problematic, then maybe a generator that wrote a proposed Arbitrary implementation, which the developer could paste into their code and customize?

@emk
Copy link

emk commented Mar 12, 2017

Macros 1.1 now allows custom derive in stable (as of 1.15). This would involve a separate library named something like quickcheck_derive. The main quickcheck library could remain compatible with an older version like 1.10, and you'd only need 1.15 if you wanted to use #[derive(Arbitrary)].

I think the key puzzle is how to handle additional constraints, such as the ones I described in #163. Maybe we should try brainstorming what these constraints look like in typical cases. Here's my example:

#[derive(Arbitrary, Clone)]
pub struct Rect {
    left: usize,
    top: usize,
    width: usize,
    height: usize,
}

Constraints: left + width and top + bottom both fit in a usize.

We can actually add attributes like #[arbitrary = "..."] or #[constraint = "...code..."] if we wish, and use them to control the code we generate (much like serde does). But I think it would work better to collect more examples of constraints before trying to design the actual syntax.

@emk
Copy link

emk commented Mar 12, 2017

Big research dump ahead!

This discussion has links to some prior art and research. There's also an interesting discussion of the tradeoffs involved in default instance distribution, although this mostly seems to affect the size distribution of recursive data structures such as lists (that are considerably rarer in Rust than Haskell). The two papers they mention can be found here and here. While these papers look interesting, I'm not sure if they're directly relevant.

So there are some reasons why #[derive(Arbitrary)] seems slightly sketchy in Haskell, but these reasons don't seem to be especially relevant to Rust.

Another question is the relationship between #[derive(Arbitrary)] and instance invariants. For example, my Rect type should never have left + width large enough to overflow usize, regardless of whether it was created by Arbitrary or by a new method—or by updating it with a &mut self method. This could be represented by a trait Invariant with a method that checks whether the invariant holds. (There are sketches of related ideas here (design by contract) and here (a wrapper type which enforces restrictions on a wrapped type), but neither is mature enough for what we need here.)

So there are a bunch of ways to tackle this, but it's hard to proceed without more examples. So let's dig a bit:

Here's a code snippet from nalgebra demonstrating reject:

#[cfg(feature="arbitrary")]
impl<N, S> Arbitrary for PerspectiveBase<N, S>
    where N: Real + Arbitrary,
          S: OwnedStorage<N, U4, U4> + Send,
          S::Alloc: OwnedAllocator<N, U4, U4, S> {
    fn arbitrary<G: Gen>(g: &mut G) -> Self {
        let znear  = Arbitrary::arbitrary(g);
        let zfar   = helper::reject(g, |&x: &N| !(x - znear).is_zero());
        let aspect = helper::reject(g, |&x: &N| !x.is_zero());

        Self::new(aspect, Arbitrary::arbitrary(g), znear, zfar)
    }
}

So custom Arbitrary impls seem to fall into several categories:

  1. A biased distribution of an underlying type, typically implemented as a wrapper.
  2. Trivial impls that can be derived with no customization.
  3. Implementations that use per-field filtering using reject to meet various constraints.

I think it would be both desirable and possible to automate (1) somehow, but that's a separate discussion. I'm interested in (2) and (3). I could imagine a syntax like:

#[derive(Arbitrary, Clone)]
pub struct Rect {
    left: usize,
    top: usize,
    #[arbitrary(constraint = "left.checked_add(width).is_some()")]
    width: usize,
    #[arbitrary(constraint = "top.checked_add(height).is_some()")]
    height: usize,
}

A naive implementation might write:

let left = g.gen();
let width = g.gen_constrained(|width| left.checked_add(width).is_some());
let top = g.gen();
let height = g.gen_constrained(|height| top.checked_add(height).is_some());
Rect { left: left, width: width, top: top, height: height }

...but this would tend to fail if left is picked too close to 2^64-1, unless width generates 0 values fairly often. So we might want (not actually valid code; it needs to continue somewhere):

for _ in 0..MAX_TRIES {
    let left = g.gen();
    let width = g.gen_constrained(|width| left.checked_add(width).is_some());
    let top = g.gen();
    let height = g.gen_constrained(|height| top.checked_add(height).is_some());
    return Rect { left: left, width: width, top: top, height: height };
}
panic!("could not generate instance of Rect; check your constraints");

Anyway, this is just a very rough sketch. But it would work for the examples of cases (2) and (3) that I found, and it wouldn't be especially hard to implement on stable Rust. What are some possible ways we could tweak this to make it better?

emk added a commit to emk/subtitles-rs-old that referenced this issue Mar 12, 2017
Some of this code could theoretically be generated using Macros 1.1; see
BurntSushi/quickcheck#98 for further discussion.
@remexre
Copy link
Contributor

remexre commented May 18, 2017

I'm gonna take a swing at this without constraints; I'll post here if I get it working.

@remexre
Copy link
Contributor

remexre commented May 18, 2017

https://github.com/remexre/quickcheck_derive seems to be working? If people want to try it (and rate my first proc-macro), feel free. I'll probably move it into this repo's workspace and PR once I've implemented constraints and shrinking.

edit: Use https://github.com/remexre/quickcheck/tree/derive instead, I'm planning to PR soon

@remexre
Copy link
Contributor

remexre commented May 19, 2017

What should the behavior be for trying to derive Arbitrary on an empty enum? It doesn't really make sense to do, so maybe it should be disallowed? But I might want to have e.g. an Arbitrary Either<A, Void> or another generic enum that contains a Void.

Right now, it results in a run-time panic, which is probably the least optimal solution.

@remexre
Copy link
Contributor

remexre commented May 19, 2017

Also, I've done a bit of work on constraints, and it's not been great. I'd actually like to see Shrinkable as a separate trait, and valid(&Self) -> bool (with a default impl of always-true) on the Arbitrary trait; I don't think there's a reasonable/sane way to allow the user to specify one function of a derive for shrinking, and the checking for constraints is essentially orthogonal to generating the values, and the logic gets thorny when I try to put them in one function.

I'm going to push what I've got (as https://github.com/remexre/quickcheck_derive/tree/constraints-dead-end), then work on a the above approach.

If anyone has better ideas, certainly share them.

@vi
Copy link
Contributor Author

vi commented May 19, 2017

Maybe to handle empty enums properly, Arbitrary API

fn arbitrary<G: Gen>(g: &mut G) -> Self;

should be changed to return Option<Self> or Vec<Self>?

@BurntSushi
Copy link
Owner

We're not changing the API of the Arbitrary trait to accommodate a niche corner case like void types. Panicing seems fine to me, since if that panic occurs, there's a bug somewhere, no?

@remexre Not sure what to do about Shrinkable because I don't really understand the problems you're facing. I don't have time to dive into your code, so if you could describe the problem here I might be able to give feedback. :-)

@remexre
Copy link
Contributor

remexre commented May 19, 2017

I don't see trying to test an enum containing a Void (possibly because of generic parameters) as a bug in the user's code per se, but I agree that it probably doesn't merit an API change. The problem is that the code gets a lot more complex when I'm trying to figure out whether an enum branch is possible or not as I'm doing codegen. I'm leaning towards it being a compile-time error for now; the panic is from rand, not quickcheck, so it's a bit bad UX.

The problem with Shrinkable is that I can forsee users wanting custom shrink() implementations, but still wanting to derive arbitrary(). For example, num::Complex and num::rational::Ratio probably want different shrink() implementations, despite having essentially the same internal layout (a struct of two T's). Working around this with an attribute #[arbitrary(shrink_with = "my_custom_function")] probably would be possible, but a bit ugly, I suppose.

Splitting Shrinkable into a different trait, though, would let users impl Shrinkable separately from the derive, and a separate derive for Shrinkable could be added for those who want the default behavior.

@BurntSushi
Copy link
Owner

BurntSushi commented May 19, 2017

@remexre RE shrinkable: That's interesting. I grant that the use case is valid. I wonder how often that's required though, and when it is, I imagine writing out the Arbitrary impl wouldn't be that onerous. Generating values tends to be much easier than shrinking in my experience. That is, as a user, I probably want to derive Arbitrary to get automatic shrinking rather than automatic generation.

But, I am open to splitting the trait, but I'm kind of leaning towards "there's not enough justification" presently...

@remexre
Copy link
Contributor

remexre commented May 22, 2017

Okay, for now I'll write it so I could add #[arbitrary(shrink_with = "my_custom_function")] later if need be. I also totally forgot that you could have nested functions, so I can use that for valid(&self) -> bool. I should have a basic PR ready by next Sunday, although that depends how my work schedule ends up.

@remexre
Copy link
Contributor

remexre commented May 29, 2017

This can probably be closed when #175 gets merged.

@8573
Copy link

8573 commented Aug 25, 2018

I note that there now is a third-party quickcheck_derive crate, by @panicbit and @llogiq.

@panicbit
Copy link

@8573 The implementation in #175 looks more feature rich

@BurntSushi
Copy link
Owner

#222 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants