Use fuzzing to drive property testing #7

blt · 2018-08-25T22:33:27Z

This series of commits introduces the use of arbitrary and honggfuzz-rs in driving property tests, as discussed briefly by @Shnatsel and myself in #2. The code is now split between the library bits -- the Op, Arbitrary definition etc -- and the interpreter loop, now in the fuzz target called hash_map. If you have honggfuzz-rs installed you can run the fuzz target like:

> cargo hfuzz run hash_map

Installation instructions for honggfuzz can be found at the above project link.

In terms of finding bugs, this was effective at finding the panic crash resulting from @llogiq's introduction of Op::Reserve. The problem, as discussed in 9596448, is that we can make very large capacity requests of the HashMap and it will be unable to allocate memory to meet that request, causing a panic. HashMap does not publish its overhead per pair so we cannot determine if a reservation will fail ahead of time. The check we have in place mostly guards against this but fuzzing managed to turn up truly large reservation failures. It was neat.

The downside to this approach is there's no tooling support for it: no shrinking, failure reports are given as the byte inputs. Dropping into a debugger works well enough but it's not the best.

There's some problems with the available QC tools, at least for the purposes of this project. Namely: * the Rust testing framework limits the amount of concurrency available, starving some tests of runtime * there is no distribution available * test-case generation can be slow Considering we're randomly searching through a state space the name of the game is speed. To that end, I have in mind a notion of QuickChecking which operates first in fuzzer mode--quickly throwing random nonsense into your program to find crashes, track branch conditions in the program--and, once a crash has been found, switches into QC mode to shrink the found case. This commit does not have that mode operation notion, only shuffles code around to run a property test under AFL. There is no shrinking. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

This commit adjust the fuzzer from AFL to honggfuzz. Honggfuzz has the advantage of being multi-threaded by default but, honestly, the initial big draw was easy hook-in to lldb. The AFL fuzzer from the previous commit 'detected' HashMap panicing when we requested too much capacity but I found that an easy call to "cargo hfuzz run-debug hash_map" made the failure much more clear. The main change here is around the Reserve operation, limited now to a smallish reservation bump. This avoids the case where HashMap will panic on call to its `reserve`, being unable to signal that its allocation failed. The model is also careful about requested too much capacity, as noted. If HashMap published its overhead per pair we could calculate when the allocations would fail but it, reasonably, does not publish such information. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

I neglected to remove quickcheck from `lib.rs`, though it is no longer a project dependency. This caused test build of the project to fail. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

Shnatsel · 2018-08-25T23:45:54Z

Oooh, this is exciting! Makes me want to hook up my brand new hack into this and see what happens.

AFL also supports testcase minimization (and performs it as it goes along even on valid inputs), which I find essential, although it's unclear how well it would work for such a test setup. Honggfuzz does not support testcase minimization at all. AFL also has a crash exploration mode that lets you find more crashing inputs given an existing one, which lets you easily gauge the exploitability of a bug. Plus their fuzzing strategies complement each other nicely. So I guess it'd make sense to support both.

Swapping honggfuzz for AFL should be trivial - they share the fuzz! macro syntax, all you have to do is swap the extern crate and perhaps very minor additions otherwise. I'll see if I can hook up a harness to support both, a-la https://github.com/rust-fuzz/targets

AFL integration is not very Rust-y, so its users would have to consult the upstream README, but it's designed to be foolpoof. I'm already familiar with it, so I'll jump in to write the glue and quickstart guide as soon as I'm done with my current round of fuzzing using the aforementioned hack (with which I already got a zero-day).

Shnatsel · 2018-08-25T23:51:30Z

The downside to this approach is there's no tooling support for it: no shrinking...

I'll see if AFL's minimization is applicable

failure reports are given as the byte inputs.

I imagine this can be solved with a println! in the interpreter, along the lines of "Calling function F with parameters X, Y, Z". That's a lot of boilerplate, though; perhaps a procedural macro would help?

blt · 2018-08-26T01:01:05Z

Oooh, this is exciting! Makes me want to hook up my brand new hack into this and see what happens.

I'm curious too!

AFL also supports testcase minimization (and performs it as it goes along even on valid inputs), which I find essential, although it's unclear how well it would work for such a test setup.

That's tricky, I think, since the fuzzer's view of the input is a byte blob and the checker view of the input is a stream of Arbitrary instances derived from the byte blob. have a vague notion that to support shrinking we'll need a two-phased approach: fuzz to find a crash, pass to a shrinking step to find a 'smaller' example, 'smaller' here being defined as calling shrink on the test inputs. This is... hard. In this PR I've moved the test from generating an ops: Vec<Op<u16, u16>> to producing a stream of Op<u16, u16> instead. This has benefits for search performance--the test spends less time building its inputs upfront--but means there's not really anything to call shrink on per se. What I think I'll experiment with is cooking up an Arbitrary producing stream which takes some size. When it comes time to shrink, we call shrink on the stream's size and shrink on every resulting element out of that stream. I guess this implies introducing a fuzz_qc! macro to introduce the shrinking step.

Anyway, somewhat thinking off the top of my head here.

Honggfuzz does not support testcase minimization at all. AFL also has a crash exploration mode that lets you find more crashing inputs given an existing one, which lets you easily gauge the exploitability of a bug. Plus their fuzzing strategies complement each other nicely. So I guess it'd make sense to support both.

Swapping honggfuzz for AFL should be trivial - they share the fuzz! macro syntax, all you have to do is swap the extern crate and perhaps very minor additions otherwise. I'll see if I can hook up a harness to support both, a-la https://github.com/rust-fuzz/targets

I'm game to support both! In the commit history you can see I started with AFL and switched to honggfuzz. In fact, it's fair to say that I'm seeing the fuzzer as a smarter source of randomness than hardware RNG for a QuickCheck system.

AFL integration is not very Rust-y, so its users would have to consult the upstream README, but it's designed to be foolpoof. I'm already familiar with it, so I'll jump in to write the glue and quickstart guide as soon as I'm done with my current round of fuzzing using the aforementioned hack (with which I already got a zero-day).

Cool! and congrats on the zero-day. I'm very curious to see what could be turned up in std with the hack in place.

The downside to this approach is there's no tooling support for it: no shrinking...

I'll see if AFL's minimization is applicable

Neat. It's not something I'm knowledgeable about so I'll be very curious to know what you turn up.

failure reports are given as the byte inputs.

I imagine this can be solved with a println! in the interpreter, along the lines of "Calling function F with parameters X, Y, Z". That's a lot of boilerplate, though; perhaps a procedural macro would help?

Probably! I've got a couple of vague ideas tickling my brain right now but I need to let them sit there. Might write a few more fuzz targets with this current approach.

Speaking of, I'm inclined to merge this up to master after adjusting the README and move forward with the marriage of fuzzing and property testing based off this one target. QuickCheck didn't turn up the reserve crash while both AFL and honggfuzz did within seconds. The developer ergonomics are a bit weird right now, but that's okay.

Shnatsel · 2018-08-26T01:08:14Z

fuzz to find a crash, pass to a shrinking step to find a 'smaller' example, 'smaller' here being defined as calling shrink on the test inputs.

Once a crash is discovered, AFL's crash exploration mode can find shorter streams that still produce a crash completely automatically. All you need is your regular fuzz target, you don't have to write a single line of extra code.

blt · 2018-08-26T01:20:46Z

fuzz to find a crash, pass to a shrinking step to find a 'smaller' example, 'smaller' here being defined as calling shrink on the test inputs.

Once a crash is discovered, AFL's crash exploration mode can find shorter streams that still produce a crash completely automatically. All you need is your regular fuzz target, you don't have to write a single line of extra code.

Interesting, okay. This is something I need to read up on.

I guess my thinking was, AFL will find a shorter byte blob to push into the program but this doesn't necessarily map to a shorter test input. For example, if a 512 byte blob were found to cause a crash in this target and a 256 byte blob were derived from that but the new, shorter blob remapped the total_ops bytes to some larger value then the test input will grow even if the byte blob itself is smaller.

Shnatsel · 2018-08-26T01:31:26Z

Ah, total_ops as an explicitly set variable does indeed interfere with fuzzing. How about ditching it and simply reading from the buffer as long as random bytes are available in it? So instead of ring buffer you'd use a vector or some such and stop the moment the iterator is exhausted. This would let AFL testcase minimization do its job.

blt · 2018-08-26T01:40:03Z

You know, I don't see why that shouldn't work. The arbitrary crate will need a little work, but nothing too nuts.

llogiq · 2018-08-26T08:44:05Z

Just to be completely fair, quickcheck also found the capacity overflow until I added aa check to avoid it.

blt · 2018-08-27T19:17:54Z

@llogiq hmm, except I don't think the capacity overflow check functions? Or, it does avoid the problem but doesn't resolves it. The reservation will panic if "the new allocation size overflows usize" but we don't have insight into how big of an allocation HashMap is going to make. Checking that the capacity doesn't overflow usize is independent and there's still a crash to be found if you can search the state space sufficiently.

Here's a program that exhaustively searches for a reservation panic on a HashMap with a very large key:

use std::collections::HashMap;

type Key = (u64, u64, u64, u64, u64, u64, u64, u64, u64, u64, u64, u64);
type Value = u16;

fn main() {
    let	mut capacity: usize = 0;
    let mut map: HashMap<Key, Value> = HashMap::new();

    for r in 0..u32::max_value() {
        let res = 2_usize.pow(r);
	capacity += res;
        println!("[PLAIN] cap: {:016} | max {}", capacity, usize::max_value());
        map.reserve(res);
    }
}

This panics having just output [PLAIN] cap: 0000017179869183 | max 18446744073709551615. Here's a version with the checked add:

use std::collections::HashMap;

type Key = (u64, u64, u64, u64, u64, u64, u64, u64, u64, u64, u64, u64);
type Value = u16;

fn main() {
    let mut capacity: usize = 0;
    let mut map: HashMap<Key, Value> = HashMap::with_capacity(capacity);

    for r in 0..u32::max_value() {
        let res = 2_usize.pow(r);
        if capacity.checked_add(res).is_some() {
            capacity += res;
            println!(
                "[PROTECTED] cap: {:016} | max {}",
                capacity,
                usize::max_value()
            );
            map.reserve(res);
	}
    }
}

This panics having output [PROTECTED] cap: 0000017179869183 | max 18446744073709551615. Both programs die at the same reservation step and the capacity is well below usize max.

This commit introduces FiniteBuffer into the hash map fuzz. FiniteBuffer, unlike RingBuffer, does not produce an infinite amount of data but will limit production to the total number of bytes available in `data`. This change means we have to guard each Arbitrary and it's possible that if the fuzzer is not correctly configured we'll only check small total operations. But, it is quite a bit faster and fuzzers that do minimization -- like AFL -- are able to eat into the shrink step needed for proper quickchecking. This is promising, I think. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

Clippy has changed its lint names to scoped form. This plays with CI pipelines using stable release until 'tool_lint' feature lands. Seems like this is targeted for the 2018 edition so, uh, this change will surely be reverted in the near term. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

blt · 2018-09-09T15:50:58Z

I'm going to proceed with using a fuzzer as the search stage of a QC implementation. Depending on fuzzer, there will be some minimization applied before a shrinking stage as a QC implementation would carry it out. There's no bugs turned up here yet so, uh, I'll punt on smoothing out the interface for fuzzing+QC until later.

jakubadamw · 2019-08-18T15:31:07Z

I imagine this can be solved with a println! in the interpreter, along the lines of "Calling function F with parameters X, Y, Z". That's a lot of boilerplate, though; perhaps a procedural macro would help?

@Shnatsel, @blt, if you're interested, I am experimenting in that direction in https://github.com/jakubadamw/arbitrary-model-tests. It's based on honggfuzz. When a crash is found, you cargo hfuzz run-debug the test, which gets you straight into lldb, and there all you do is inspect an _op_trace variable, which contains the quasi-equivalent code for all operations leading to a crash. Here's an example of a dump I get for an issue in HashMap that I am investigating today. From lldb I do:

(lldb) frame s 5
(lldb) write op_trace.dump p _op_trace

(where write is an lldb script from https://github.com/4iar/lldb-write)

and I get a file with:

(…)
v.insert(27242, 27242);
v.insert(27242, 27242);
v.insert(56389, 27242);
v.insert(27242, 27242);
v.insert(27242, 27242);
v.insert(27242, 27242);
v.insert(27243, 27242);
v.insert(29517, 57394);
v.values();
v.insert(6040, 2394);
v.insert(32582, 1480);
v.get(&13621);
v.get(&13113);
v.get(&25652);
v.remove(&29517);
v.keys();
v.insert(63016, 47943);
v.contains_key(&49944);
v.values();
v.remove(&38828);
v.values_mut();
v.values();
v.remove(&32481);
v.iter_mut();
v.clear();

which then, of course, has to be manually adapted for a running test case but is otherwise close enough.

Shnatsel · 2019-08-27T18:57:43Z

That's very clever! I'm also experimenting with automatically generating fuzzing harnesses in https://github.com/Eh2406/auto-fuzz-test

blt added 3 commits August 25, 2018 13:16

Resolve test build failure

ad45847

I neglected to remove quickcheck from `lib.rs`, though it is no longer a project dependency. This caused test build of the project to fail. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

8573 mentioned this pull request Aug 26, 2018

Explore fuzz testing 8573/irc-bot.rs#37

Open

blt force-pushed the fuzzer_as_qc branch from 60b2a23 to 1092e7f Compare September 9, 2018 15:39

blt force-pushed the fuzzer_as_qc branch from 1092e7f to 1e8544d Compare September 9, 2018 15:43

blt merged commit 2b6450d into master Sep 9, 2018

blt deleted the fuzzer_as_qc branch September 9, 2018 15:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use fuzzing to drive property testing #7

Use fuzzing to drive property testing #7

blt commented Aug 25, 2018

Shnatsel commented Aug 25, 2018

Shnatsel commented Aug 25, 2018

blt commented Aug 26, 2018

Shnatsel commented Aug 26, 2018

blt commented Aug 26, 2018

Shnatsel commented Aug 26, 2018

blt commented Aug 26, 2018

llogiq commented Aug 26, 2018

blt commented Aug 27, 2018

blt commented Sep 9, 2018

jakubadamw commented Aug 18, 2019 •

edited

Shnatsel commented Aug 27, 2019

Use fuzzing to drive property testing #7

Use fuzzing to drive property testing #7

Conversation

blt commented Aug 25, 2018

Shnatsel commented Aug 25, 2018

Shnatsel commented Aug 25, 2018

blt commented Aug 26, 2018

Shnatsel commented Aug 26, 2018

blt commented Aug 26, 2018

Shnatsel commented Aug 26, 2018

blt commented Aug 26, 2018

llogiq commented Aug 26, 2018

blt commented Aug 27, 2018

blt commented Sep 9, 2018

jakubadamw commented Aug 18, 2019 • edited

Shnatsel commented Aug 27, 2019

jakubadamw commented Aug 18, 2019 •

edited