New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement a generator based on Arbitrary for compatibility #108
Conversation
bolero-generator/src/arbitrary.rs
Outdated
let size = T::size_hint(0); | ||
let mut data = match T::size_hint(0) { | ||
(min, Some(max)) if max < ABUSIVE_SIZE => gen_with::<Vec<u8>>() | ||
.len(min..=max) | ||
.generate(&mut *driver)?, | ||
(min, _) => gen_with::<Vec<u8>>().len(min).generate(&mut *driver)?, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I appreciate not having to change anything, I'm thinking it might be more efficient to move this logic into the driver itself, since a lot of them are going to by byte-based anyway and we can potentially eliminate a copy. I'm thinking something like:
trait Driver {
...
fn gen_from_bytes<F: FnOnce(&[u8]) -> Option<R>, R>(&mut self, len: Range<usize>, f: F) -> Option<R>;
}
I'm also wanting to support existing proptest and quickcheck generators so moving this into a shared location would make that easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be completely honest, I don't understand the DriverMode
abstraction yet. I do understand that it's so that fuzzers can use Direct
and cargo test can use Forced
, but… why is it kept alongside the input for shrunk inputs, which indicates that byteslice drivers in forced mode do happen in certain circumstances? Why is it exposed from FillBytes
and not from Driver
? Why is there both DirectRng
and ForcedRng
despite the fact that a rng should have an infinite amount of data available anyway? (Ok for this one I actually know, it's because mode is also used for eg. ranged-int-generation) And, maybe even more than that, why is it even a property related to a Driver
?
I'm thinking that semantically generating a value would require a ValueGenerator
that shows how to generate a value, a Driver
that gives a stream of data, and a Mode
that says how to interpret invalid data. And right now, the Mode
is merged inside the Driver
.
But we could have instead:
ValueGenerator
as currentlyDriver
generates a possibly-finite stream of data.ExtendWithZeros(driver)
extends a driver with zeroes to emulate currentForced
mode for byteslice and rng.enum GenerationMode { FailIfInvalid, RepairIfInvalid }
is the currentDriverMode
, and is not a property of the driver nor is it a property of the generator.
Then, when generating a value, it just needs additionally setting the driver mode. cargo test
could still default to RepairIfInvalid & ExtendWithZeros
and cargo bolero test
to FailIfInvalid
.
Does that make sense?
Anyway, I've implemented what you suggested in this PR, but it ended up requiring quite a few other refactorings. WDYT? (I've also significantly reduced ABUSIVE_SIZE
, so that tests pass quickly without generating megabytes worth of data for a Vec
. Anyway, arbitrary
and similar were designed to work well with fuzzers, that rarely generate more than 4-8k worth of data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah there may be some conflation of the drivers and modes that could be simplified. But I don't want to conflate those changes with this PR. I think it looks great as is!
bolero-generator/src/arbitrary.rs
Outdated
let size = T::size_hint(0); | ||
let mut data = match T::size_hint(0) { | ||
(min, Some(max)) if max < ABUSIVE_SIZE => gen_with::<Vec<u8>>() | ||
.len(min..=max) | ||
.generate(&mut *driver)?, | ||
(min, _) => gen_with::<Vec<u8>>().len(min).generate(&mut *driver)?, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah there may be some conflation of the drivers and modes that could be simplified. But I don't want to conflate those changes with this PR. I think it looks great as is!
bolero-kani/src/lib.rs
Outdated
fn gen_from_bytes<Gen, T>( | ||
&mut self, | ||
_len: std::ops::RangeInclusive<usize>, | ||
mut gen: Gen, | ||
) -> Option<T> | ||
where | ||
Gen: FnMut(&[u8]) -> Option<(usize, T)>, | ||
{ | ||
let value = gen(kani::any()).map(|v| v.1); | ||
kani::assume(value.is_some()); | ||
value | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this will work with kani
, since it doesn't implement any
for &[u8]
(I'm actually surprised it compiled - I will need to look into that separately).
Let's do this instead:
fn gen_from_bytes<Gen, T>( | |
&mut self, | |
_len: std::ops::RangeInclusive<usize>, | |
mut gen: Gen, | |
) -> Option<T> | |
where | |
Gen: FnMut(&[u8]) -> Option<(usize, T)>, | |
{ | |
let value = gen(kani::any()).map(|v| v.1); | |
kani::assume(value.is_some()); | |
value | |
} | |
fn gen_from_bytes<Gen, T>( | |
&mut self, | |
len: std::ops::RangeInclusive<usize>, | |
mut gen: Gen, | |
) -> Option<T> | |
where | |
Gen: FnMut(&[u8]) -> Option<(usize, T)>, | |
{ | |
let bytes = kani::vec::any_vec::<u8, 256>(); | |
kani::assume(len.contains(&bytes.len())); | |
let value = gen(&bytes).map(|v| v.1); | |
kani::assume(value.is_some()); | |
value | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably due to the #[cfg(not(kani))]
stubs at the top of the file. TBH I have literally no idea how to use kani, so I wrote something that seemed to work and tests passed 😅 I guess the CI also runs without kani enabled?
Anyway, I've updated the code to your version and just added a stub above so that tests pass :)
cargo doesn't think of taking old arbitrary versions and otherwise fails with error: package `arbitrary v1.2.3` cannot be built because it requires rustc 1.63.0 or newer, while the currently active rustc version is 1.57.0
Turns out cargo fails to build arbitrary on earlier than 1.63: it can't think of taking older versions than the last, IIRC that's a bug that's already been reported to cargo. I think bumping the MSRV make sense: there's no reason why bolero should be more stable than arbitrary I think, seeing how arbitrary is the current go-to standard 😅 and the next release must be a major one anyway given it changed the api of Driver. That said if keeping the current MSRV at 1.57 is important to you, I can try to wiggle things around so that tests run with 1.57 for non-arbitrary and 1.63 for arbitrary :) |
The failing clippy checks appear to be unrelated to this PR again. FWIW, this is probably due to just taking in the latest clippy rather than pinning a specific version and updating it from time to time. I personally use nix to handle this in my projects, and can submit a PR using it if you want, but lots of people don't like nix so I won't by default :)
|
0dd6d83
to
e5c1f2e
Compare
Oh or maybe the clippy lints are due to the MSRV bump? Anyway, I've just fixed hopefully all the test errors, hopefully this is good to go if you agree with #108 (comment) :) |
.github/workflows/main.yml
Outdated
@@ -44,7 +44,7 @@ jobs: | |||
strategy: | |||
fail-fast: false | |||
matrix: | |||
rust: [1.57.0, stable, beta, nightly] | |||
rust: [1.63.0, stable, beta, nightly] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a compelling reason to bump MSRV here, especially considering arbitrary is an optional dependency. My preference would be to just not enable it for the MSRV tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bolero is a bit different than cargo-fuzz/arbitrary in this area since we allow writing basic unit tests mixed in with your application code. Cargo-fuzz, OTOH, forces you to put them in completely separate crates that don't have the same compatibility requirements as your main product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While what you're saying is definitely right for cargo-fuzz vs. bolero, I think bolero-generator and arbitrary have basically the same goal: have all crates expose implementation of them, which means they need to be as stable as possible.
Anyway, I'll look into figuring out how to get cargo test --features arbitrary
run with 1.63 and stay on 1.57 for the other tests when I look into it next, I'll just need to revert the changes made for clippy in addition to that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have all crates expose implementation of them, which means they need to be as stable as possible.
Yeah that's another great point to consider here!
Sorry for the extra work with the clippy stuff! But I think other than reverting it, we should be good to merge!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it looks like miniz_oxide just broke compatibility with rust versions up to 1.59.0... And we indirectly depend on that through the backtrace
crate. I think we can just bump MSRV to 1.60.0 and call it good (the formatting feature should be available there).
Edit: although i guess we still have the problem of arbitrary's MSRV being 1.63...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I personally don't believe in MSRV 😅 If rust-lang/cargo#9930 were implemented then I'd believe in it, but until it's done I don't think it makes sense to actually worry about it (and even once done I still don't think it'd actually be useful, but… I mean sure people want stability for the user, but the ones actually doing the program building can stomach keeping their toolchains up-to-date, and I say that as maintainer for the nixos distribution 😅 )
IMO having the MSRV being older than whatever is in debian stable is pointless, and debian stable currently has 1.63 — I guess that's also the reason why arbitrary is also living with a 1.63 MSRV.
Anyway, I've reverted the clippy changes and instead fixed the CI to handle two different MSRVs depending on the feature set, so this should be ready to go! But let me know if you change your mind, I still have the tip of the branch around! :)
The clippy issues should be addressed by setting our MSRV in the clippy config https://doc.rust-lang.org/clippy/configuration.html#specifying-the-minimum-supported-rust-version |
@camshaft I think I handled all your review comments, WDYT about the current state of this PR? :) |
Fixes #106
Fixes #31