Skip to content
This repository has been archived by the owner on Jan 12, 2020. It is now read-only.

Use fuzzing to drive property testing #7

Merged
merged 5 commits into from Sep 9, 2018
Merged

Use fuzzing to drive property testing #7

merged 5 commits into from Sep 9, 2018

Conversation

blt
Copy link
Owner

@blt blt commented Aug 25, 2018

This series of commits introduces the use of arbitrary and honggfuzz-rs in driving property tests, as discussed briefly by @Shnatsel and myself in #2. The code is now split between the library bits -- the Op, Arbitrary definition etc -- and the interpreter loop, now in the fuzz target called hash_map. If you have honggfuzz-rs installed you can run the fuzz target like:

> cargo hfuzz run hash_map

Installation instructions for honggfuzz can be found at the above project link.

In terms of finding bugs, this was effective at finding the panic crash resulting from @llogiq's introduction of Op::Reserve. The problem, as discussed in 9596448, is that we can make very large capacity requests of the HashMap and it will be unable to allocate memory to meet that request, causing a panic. HashMap does not publish its overhead per pair so we cannot determine if a reservation will fail ahead of time. The check we have in place mostly guards against this but fuzzing managed to turn up truly large reservation failures. It was neat.

The downside to this approach is there's no tooling support for it: no shrinking, failure reports are given as the byte inputs. Dropping into a debugger works well enough but it's not the best.

There's some problems with the available QC tools, at least for the
purposes of this project. Namely:

 * the Rust testing framework limits the amount of concurrency
   available, starving some tests of runtime
 * there is no distribution available
 * test-case generation can be slow

Considering we're randomly searching through a state space the name of
the game is speed. To that end, I have in mind a notion of
QuickChecking which operates first in fuzzer mode--quickly throwing
random nonsense into your program to find crashes, track branch
conditions in the program--and, once a crash has been found, switches
into QC mode to shrink the found case. This commit does not have that
mode operation notion, only shuffles code around to run a property
test under AFL. There is no shrinking.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
This commit adjust the fuzzer from AFL to honggfuzz. Honggfuzz has the
advantage of being multi-threaded by default but, honestly, the
initial big draw was easy hook-in to lldb. The AFL fuzzer from the
previous commit 'detected' HashMap panicing when we requested too much
capacity but I found that an easy call to "cargo hfuzz run-debug
hash_map" made the failure much more clear.

The main change here is around the Reserve operation, limited now to a
smallish reservation bump. This avoids the case where HashMap will
panic on call to its `reserve`, being unable to signal that its
allocation failed. The model is also careful about requested too much
capacity, as noted. If HashMap published its overhead per pair we
could calculate when the allocations would fail but it, reasonably,
does not publish such information.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
I neglected to remove quickcheck from `lib.rs`, though it is no longer
a project dependency. This caused test build of the project to fail.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
@Shnatsel
Copy link

Oooh, this is exciting! Makes me want to hook up my brand new hack into this and see what happens.

AFL also supports testcase minimization (and performs it as it goes along even on valid inputs), which I find essential, although it's unclear how well it would work for such a test setup. Honggfuzz does not support testcase minimization at all. AFL also has a crash exploration mode that lets you find more crashing inputs given an existing one, which lets you easily gauge the exploitability of a bug. Plus their fuzzing strategies complement each other nicely. So I guess it'd make sense to support both.

Swapping honggfuzz for AFL should be trivial - they share the fuzz! macro syntax, all you have to do is swap the extern crate and perhaps very minor additions otherwise. I'll see if I can hook up a harness to support both, a-la https://github.com/rust-fuzz/targets

AFL integration is not very Rust-y, so its users would have to consult the upstream README, but it's designed to be foolpoof. I'm already familiar with it, so I'll jump in to write the glue and quickstart guide as soon as I'm done with my current round of fuzzing using the aforementioned hack (with which I already got a zero-day).

@Shnatsel
Copy link

The downside to this approach is there's no tooling support for it: no shrinking...

I'll see if AFL's minimization is applicable

failure reports are given as the byte inputs.

I imagine this can be solved with a println! in the interpreter, along the lines of "Calling function F with parameters X, Y, Z". That's a lot of boilerplate, though; perhaps a procedural macro would help?

@blt
Copy link
Owner Author

blt commented Aug 26, 2018

Oooh, this is exciting! Makes me want to hook up my brand new hack into this and see what happens.

I'm curious too!

AFL also supports testcase minimization (and performs it as it goes along even on valid inputs), which I find essential, although it's unclear how well it would work for such a test setup.

That's tricky, I think, since the fuzzer's view of the input is a byte blob and the checker view of the input is a stream of Arbitrary instances derived from the byte blob. have a vague notion that to support shrinking we'll need a two-phased approach: fuzz to find a crash, pass to a shrinking step to find a 'smaller' example, 'smaller' here being defined as calling shrink on the test inputs. This is... hard. In this PR I've moved the test from generating an ops: Vec<Op<u16, u16>> to producing a stream of Op<u16, u16> instead. This has benefits for search performance--the test spends less time building its inputs upfront--but means there's not really anything to call shrink on per se. What I think I'll experiment with is cooking up an Arbitrary producing stream which takes some size. When it comes time to shrink, we call shrink on the stream's size and shrink on every resulting element out of that stream. I guess this implies introducing a fuzz_qc! macro to introduce the shrinking step.

Anyway, somewhat thinking off the top of my head here.

Honggfuzz does not support testcase minimization at all. AFL also has a crash exploration mode that lets you find more crashing inputs given an existing one, which lets you easily gauge the exploitability of a bug. Plus their fuzzing strategies complement each other nicely. So I guess it'd make sense to support both.

Swapping honggfuzz for AFL should be trivial - they share the fuzz! macro syntax, all you have to do is swap the extern crate and perhaps very minor additions otherwise. I'll see if I can hook up a harness to support both, a-la https://github.com/rust-fuzz/targets

I'm game to support both! In the commit history you can see I started with AFL and switched to honggfuzz. In fact, it's fair to say that I'm seeing the fuzzer as a smarter source of randomness than hardware RNG for a QuickCheck system.

AFL integration is not very Rust-y, so its users would have to consult the upstream README, but it's designed to be foolpoof. I'm already familiar with it, so I'll jump in to write the glue and quickstart guide as soon as I'm done with my current round of fuzzing using the aforementioned hack (with which I already got a zero-day).

Cool! and congrats on the zero-day. I'm very curious to see what could be turned up in std with the hack in place.

The downside to this approach is there's no tooling support for it: no shrinking...

I'll see if AFL's minimization is applicable

Neat. It's not something I'm knowledgeable about so I'll be very curious to know what you turn up.

failure reports are given as the byte inputs.

I imagine this can be solved with a println! in the interpreter, along the lines of "Calling function F with parameters X, Y, Z". That's a lot of boilerplate, though; perhaps a procedural macro would help?

Probably! I've got a couple of vague ideas tickling my brain right now but I need to let them sit there. Might write a few more fuzz targets with this current approach.

Speaking of, I'm inclined to merge this up to master after adjusting the README and move forward with the marriage of fuzzing and property testing based off this one target. QuickCheck didn't turn up the reserve crash while both AFL and honggfuzz did within seconds. The developer ergonomics are a bit weird right now, but that's okay.

@Shnatsel
Copy link

fuzz to find a crash, pass to a shrinking step to find a 'smaller' example, 'smaller' here being defined as calling shrink on the test inputs.

Once a crash is discovered, AFL's crash exploration mode can find shorter streams that still produce a crash completely automatically. All you need is your regular fuzz target, you don't have to write a single line of extra code.

@blt
Copy link
Owner Author

blt commented Aug 26, 2018

fuzz to find a crash, pass to a shrinking step to find a 'smaller' example, 'smaller' here being defined as calling shrink on the test inputs.

Once a crash is discovered, AFL's crash exploration mode can find shorter streams that still produce a crash completely automatically. All you need is your regular fuzz target, you don't have to write a single line of extra code.

Interesting, okay. This is something I need to read up on.

I guess my thinking was, AFL will find a shorter byte blob to push into the program but this doesn't necessarily map to a shorter test input. For example, if a 512 byte blob were found to cause a crash in this target and a 256 byte blob were derived from that but the new, shorter blob remapped the total_ops bytes to some larger value then the test input will grow even if the byte blob itself is smaller.

@Shnatsel
Copy link

Ah, total_ops as an explicitly set variable does indeed interfere with fuzzing. How about ditching it and simply reading from the buffer as long as random bytes are available in it? So instead of ring buffer you'd use a vector or some such and stop the moment the iterator is exhausted. This would let AFL testcase minimization do its job.

@blt
Copy link
Owner Author

blt commented Aug 26, 2018

You know, I don't see why that shouldn't work. The arbitrary crate will need a little work, but nothing too nuts.

@llogiq
Copy link
Contributor

llogiq commented Aug 26, 2018

Just to be completely fair, quickcheck also found the capacity overflow until I added aa check to avoid it.

@blt
Copy link
Owner Author

blt commented Aug 27, 2018

@llogiq hmm, except I don't think the capacity overflow check functions? Or, it does avoid the problem but doesn't resolves it. The reservation will panic if "the new allocation size overflows usize" but we don't have insight into how big of an allocation HashMap is going to make. Checking that the capacity doesn't overflow usize is independent and there's still a crash to be found if you can search the state space sufficiently.

Here's a program that exhaustively searches for a reservation panic on a HashMap with a very large key:

use std::collections::HashMap;

type Key = (u64, u64, u64, u64, u64, u64, u64, u64, u64, u64, u64, u64);
type Value = u16;

fn main() {
    let	mut capacity: usize = 0;
    let mut map: HashMap<Key, Value> = HashMap::new();

    for r in 0..u32::max_value() {
        let res = 2_usize.pow(r);
	capacity += res;
        println!("[PLAIN] cap: {:016} | max {}", capacity, usize::max_value());
        map.reserve(res);
    }
}

This panics having just output [PLAIN] cap: 0000017179869183 | max 18446744073709551615. Here's a version with the checked add:

use std::collections::HashMap;

type Key = (u64, u64, u64, u64, u64, u64, u64, u64, u64, u64, u64, u64);
type Value = u16;

fn main() {
    let mut capacity: usize = 0;
    let mut map: HashMap<Key, Value> = HashMap::with_capacity(capacity);

    for r in 0..u32::max_value() {
        let res = 2_usize.pow(r);
        if capacity.checked_add(res).is_some() {
            capacity += res;
            println!(
                "[PROTECTED] cap: {:016} | max {}",
                capacity,
                usize::max_value()
            );
            map.reserve(res);
	}
    }
}

This panics having output [PROTECTED] cap: 0000017179869183 | max 18446744073709551615. Both programs die at the same reservation step and the capacity is well below usize max.

This commit introduces FiniteBuffer into the hash map
fuzz. FiniteBuffer, unlike RingBuffer, does not produce an infinite
amount of data but will limit production to the total number of bytes
available in `data`. This change means we have to guard each Arbitrary
and it's possible that if the fuzzer is not correctly configured we'll
only check small total operations. But, it is quite a bit faster and
fuzzers that do minimization -- like AFL -- are able to eat into the
shrink step needed for proper quickchecking.

This is promising, I think.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
Clippy has changed its lint names to scoped form. This plays with
CI pipelines using stable release until 'tool_lint' feature lands.
Seems like this is targeted for the 2018 edition so, uh, this
change will surely be reverted in the near term.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
@blt
Copy link
Owner Author

blt commented Sep 9, 2018

I'm going to proceed with using a fuzzer as the search stage of a QC implementation. Depending on fuzzer, there will be some minimization applied before a shrinking stage as a QC implementation would carry it out. There's no bugs turned up here yet so, uh, I'll punt on smoothing out the interface for fuzzing+QC until later.

@blt blt merged commit 2b6450d into master Sep 9, 2018
@blt blt deleted the fuzzer_as_qc branch September 9, 2018 15:53
@jakubadamw
Copy link

jakubadamw commented Aug 18, 2019

I imagine this can be solved with a println! in the interpreter, along the lines of "Calling function F with parameters X, Y, Z". That's a lot of boilerplate, though; perhaps a procedural macro would help?

@Shnatsel, @blt, if you're interested, I am experimenting in that direction in https://github.com/jakubadamw/arbitrary-model-tests. It's based on honggfuzz. When a crash is found, you cargo hfuzz run-debug the test, which gets you straight into lldb, and there all you do is inspect an _op_trace variable, which contains the quasi-equivalent code for all operations leading to a crash. Here's an example of a dump I get for an issue in HashMap that I am investigating today. From lldb I do:

(lldb) frame s 5
(lldb) write op_trace.dump p _op_trace

(where write is an lldb script from https://github.com/4iar/lldb-write)

and I get a file with:

()
v.insert(27242, 27242);
v.insert(27242, 27242);
v.insert(56389, 27242);
v.insert(27242, 27242);
v.insert(27242, 27242);
v.insert(27242, 27242);
v.insert(27243, 27242);
v.insert(29517, 57394);
v.values();
v.insert(6040, 2394);
v.insert(32582, 1480);
v.get(&13621);
v.get(&13113);
v.get(&25652);
v.remove(&29517);
v.keys();
v.insert(63016, 47943);
v.contains_key(&49944);
v.values();
v.remove(&38828);
v.values_mut();
v.values();
v.remove(&32481);
v.iter_mut();
v.clear();

which then, of course, has to be manually adapted for a running test case but is otherwise close enough.

@Shnatsel
Copy link

That's very clever! I'm also experimenting with automatically generating fuzzing harnesses in https://github.com/Eh2406/auto-fuzz-test

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants