Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement a generator based on Arbitrary for compatibility #108

Merged
merged 22 commits into from Feb 20, 2023

Conversation

Ekleog
Copy link
Contributor

@Ekleog Ekleog commented Jan 7, 2023

Fixes #106
Fixes #31

bolero-generator/Cargo.toml Outdated Show resolved Hide resolved
bolero-generator/src/arbitrary.rs Outdated Show resolved Hide resolved
let size = T::size_hint(0);
let mut data = match T::size_hint(0) {
(min, Some(max)) if max < ABUSIVE_SIZE => gen_with::<Vec<u8>>()
.len(min..=max)
.generate(&mut *driver)?,
(min, _) => gen_with::<Vec<u8>>().len(min).generate(&mut *driver)?,
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I appreciate not having to change anything, I'm thinking it might be more efficient to move this logic into the driver itself, since a lot of them are going to by byte-based anyway and we can potentially eliminate a copy. I'm thinking something like:

trait Driver {
    ...
    fn gen_from_bytes<F: FnOnce(&[u8]) -> Option<R>, R>(&mut self, len: Range<usize>, f: F) -> Option<R>;
}

I'm also wanting to support existing proptest and quickcheck generators so moving this into a shared location would make that easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be completely honest, I don't understand the DriverMode abstraction yet. I do understand that it's so that fuzzers can use Direct and cargo test can use Forced, but… why is it kept alongside the input for shrunk inputs, which indicates that byteslice drivers in forced mode do happen in certain circumstances? Why is it exposed from FillBytes and not from Driver? Why is there both DirectRng and ForcedRng despite the fact that a rng should have an infinite amount of data available anyway? (Ok for this one I actually know, it's because mode is also used for eg. ranged-int-generation) And, maybe even more than that, why is it even a property related to a Driver?

I'm thinking that semantically generating a value would require a ValueGenerator that shows how to generate a value, a Driver that gives a stream of data, and a Mode that says how to interpret invalid data. And right now, the Mode is merged inside the Driver.

But we could have instead:

  • ValueGenerator as currently
  • Driver generates a possibly-finite stream of data. ExtendWithZeros(driver) extends a driver with zeroes to emulate current Forced mode for byteslice and rng.
  • enum GenerationMode { FailIfInvalid, RepairIfInvalid } is the current DriverMode, and is not a property of the driver nor is it a property of the generator.

Then, when generating a value, it just needs additionally setting the driver mode. cargo test could still default to RepairIfInvalid & ExtendWithZeros and cargo bolero test to FailIfInvalid.

Does that make sense?

Anyway, I've implemented what you suggested in this PR, but it ended up requiring quite a few other refactorings. WDYT? (I've also significantly reduced ABUSIVE_SIZE, so that tests pass quickly without generating megabytes worth of data for a Vec. Anyway, arbitrary and similar were designed to work well with fuzzers, that rarely generate more than 4-8k worth of data)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there may be some conflation of the drivers and modes that could be simplified. But I don't want to conflate those changes with this PR. I think it looks great as is!

bolero-generator/src/arbitrary.rs Outdated Show resolved Hide resolved
@Ekleog
Copy link
Contributor Author

Ekleog commented Jan 22, 2023

I've also just implemented the API suggested in #31, so that this PR could solve it alongside #106 :)

bolero/src/lib.rs Outdated Show resolved Hide resolved
let size = T::size_hint(0);
let mut data = match T::size_hint(0) {
(min, Some(max)) if max < ABUSIVE_SIZE => gen_with::<Vec<u8>>()
.len(min..=max)
.generate(&mut *driver)?,
(min, _) => gen_with::<Vec<u8>>().len(min).generate(&mut *driver)?,
};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there may be some conflation of the drivers and modes that could be simplified. But I don't want to conflate those changes with this PR. I think it looks great as is!

Comment on lines 147 to 158
fn gen_from_bytes<Gen, T>(
&mut self,
_len: std::ops::RangeInclusive<usize>,
mut gen: Gen,
) -> Option<T>
where
Gen: FnMut(&[u8]) -> Option<(usize, T)>,
{
let value = gen(kani::any()).map(|v| v.1);
kani::assume(value.is_some());
value
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work with kani, since it doesn't implement any for &[u8] (I'm actually surprised it compiled - I will need to look into that separately).

Let's do this instead:

Suggested change
fn gen_from_bytes<Gen, T>(
&mut self,
_len: std::ops::RangeInclusive<usize>,
mut gen: Gen,
) -> Option<T>
where
Gen: FnMut(&[u8]) -> Option<(usize, T)>,
{
let value = gen(kani::any()).map(|v| v.1);
kani::assume(value.is_some());
value
}
fn gen_from_bytes<Gen, T>(
&mut self,
len: std::ops::RangeInclusive<usize>,
mut gen: Gen,
) -> Option<T>
where
Gen: FnMut(&[u8]) -> Option<(usize, T)>,
{
let bytes = kani::vec::any_vec::<u8, 256>();
kani::assume(len.contains(&bytes.len()));
let value = gen(&bytes).map(|v| v.1);
kani::assume(value.is_some());
value
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably due to the #[cfg(not(kani))] stubs at the top of the file. TBH I have literally no idea how to use kani, so I wrote something that seemed to work and tests passed 😅 I guess the CI also runs without kani enabled?

Anyway, I've updated the code to your version and just added a stub above so that tests pass :)

rust-toolchain Outdated Show resolved Hide resolved
cargo doesn't think of taking old arbitrary versions and otherwise fails with
error: package `arbitrary v1.2.3` cannot be built because it requires rustc 1.63.0 or newer, while the currently active rustc version is 1.57.0
@Ekleog
Copy link
Contributor Author

Ekleog commented Jan 28, 2023

Turns out cargo fails to build arbitrary on earlier than 1.63: it can't think of taking older versions than the last, IIRC that's a bug that's already been reported to cargo.

I think bumping the MSRV make sense: there's no reason why bolero should be more stable than arbitrary I think, seeing how arbitrary is the current go-to standard 😅 and the next release must be a major one anyway given it changed the api of Driver.

That said if keeping the current MSRV at 1.57 is important to you, I can try to wiggle things around so that tests run with 1.57 for non-arbitrary and 1.63 for arbitrary :)

@Ekleog
Copy link
Contributor Author

Ekleog commented Jan 28, 2023

The failing clippy checks appear to be unrelated to this PR again. FWIW, this is probably due to just taking in the latest clippy rather than pinning a specific version and updating it from time to time. I personally use nix to handle this in my projects, and can submit a PR using it if you want, but lots of people don't like nix so I won't by default :)

As for the unit tests failing, they're due to the fact that I changed the behavior to always request from the arbitrary-requested range rather than limiting to ABUSIVE_SIZE when using DriverMode::Direct; which in turn completely breaks DirectRng-based arbitrary generation. I think the current code makes much more sense because it's actually in line with what DriverMode::Direct means. But I'm not yet sure just removing DirectRng is a possible option? So depending on the results of discussion at #117 I guess I may have to roll back these changes and limit the size given to arbitrary on direct-mode too.

@Ekleog Ekleog force-pushed the gen-arbitrary branch 3 times, most recently from 0dd6d83 to e5c1f2e Compare January 28, 2023 22:51
@Ekleog
Copy link
Contributor Author

Ekleog commented Jan 28, 2023

Oh or maybe the clippy lints are due to the MSRV bump? Anyway, I've just fixed hopefully all the test errors, hopefully this is good to go if you agree with #108 (comment) :)

@@ -44,7 +44,7 @@ jobs:
strategy:
fail-fast: false
matrix:
rust: [1.57.0, stable, beta, nightly]
rust: [1.63.0, stable, beta, nightly]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a compelling reason to bump MSRV here, especially considering arbitrary is an optional dependency. My preference would be to just not enable it for the MSRV tests.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bolero is a bit different than cargo-fuzz/arbitrary in this area since we allow writing basic unit tests mixed in with your application code. Cargo-fuzz, OTOH, forces you to put them in completely separate crates that don't have the same compatibility requirements as your main product.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While what you're saying is definitely right for cargo-fuzz vs. bolero, I think bolero-generator and arbitrary have basically the same goal: have all crates expose implementation of them, which means they need to be as stable as possible.

Anyway, I'll look into figuring out how to get cargo test --features arbitrary run with 1.63 and stay on 1.57 for the other tests when I look into it next, I'll just need to revert the changes made for clippy in addition to that :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have all crates expose implementation of them, which means they need to be as stable as possible.

Yeah that's another great point to consider here!

Sorry for the extra work with the clippy stuff! But I think other than reverting it, we should be good to merge!

Copy link
Owner

@camshaft camshaft Feb 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks like miniz_oxide just broke compatibility with rust versions up to 1.59.0... And we indirectly depend on that through the backtrace crate. I think we can just bump MSRV to 1.60.0 and call it good (the formatting feature should be available there).

Edit: although i guess we still have the problem of arbitrary's MSRV being 1.63... ☹️

Copy link
Contributor Author

@Ekleog Ekleog Feb 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I personally don't believe in MSRV 😅 If rust-lang/cargo#9930 were implemented then I'd believe in it, but until it's done I don't think it makes sense to actually worry about it (and even once done I still don't think it'd actually be useful, but… I mean sure people want stability for the user, but the ones actually doing the program building can stomach keeping their toolchains up-to-date, and I say that as maintainer for the nixos distribution 😅 )

IMO having the MSRV being older than whatever is in debian stable is pointless, and debian stable currently has 1.63 — I guess that's also the reason why arbitrary is also living with a 1.63 MSRV.

Anyway, I've reverted the clippy changes and instead fixed the CI to handle two different MSRVs depending on the feature set, so this should be ready to go! But let me know if you change your mind, I still have the tip of the branch around! :)

@camshaft
Copy link
Owner

camshaft commented Feb 1, 2023

The clippy issues should be addressed by setting our MSRV in the clippy config

https://doc.rust-lang.org/clippy/configuration.html#specifying-the-minimum-supported-rust-version

@Ekleog
Copy link
Contributor Author

Ekleog commented Feb 19, 2023

@camshaft I think I handled all your review comments, WDYT about the current state of this PR? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

impl TypeGenerator for chrono::DateTime, chrono_tz::Tz, etc? Add support for Arbitrary generators
2 participants