New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap for 0.3.0 #261

Open
bheisler opened this Issue Jan 25, 2019 · 0 comments

Comments

@bheisler
Copy link
Owner

bheisler commented Jan 25, 2019

I'm planning to take some time off working on Criterion.rs shortly, but before I do I'd like to describe my plans for 0.3.0 and request some comments from the community. 0.3.0 is the first breaking-change release since Criterion.rs became stable-compatible, and I have a handful of things I'd like to change.

Major Changes

Removing Benchmark/ParameterizedBenchmark

I initially created these as a generalization of the various Criterion::bench_* functions and they mostly work, but there are a few flaws in this design. For instance, there isn't an obvious way to handle cases like #225, where the user wants to construct a large input once. Additionally, it imposes strange limitations on the parameter types (why must they implement Debug?) and strange contortions to deal with lifetimes (#260). Additionally, sometimes the user doesn't want to have the debug representation of the parameter be used as the parameter in the benchmark ID - for example, if I'm testing a parser I don't want my benchmark to be called "my_group/my_function/Some really really long file full of text that my parser parses but isn't really relevant to the benchmark...". After thinking about this for a while, I wondered why we even need these structures at all? After all, they basically just define a nested for loop:

for function in self.functions {
    for parameter in self.parameters {
        // Run a benchmark of this function with this parameter
    }
}

I'm sure the user is capable of writing their own for-loops, so why not let the user provide that? Here's a sketch of what I'm thinking:

let criterion: &mut Criterion = ...;

// For simple benchmarks, the existing bench_function would still work.
criterion.bench_function("my_function", move |b| {
    b.iter(|| my_function())
}):

// Benchmark and ParameterizedBenchmark, as well as bench_functions and 
// bench_function_over_inputs would be replaced with a new BenchmarkGroup struct
let mut benchmark_group = criterion.benchmark_group("Hashes");

// The user can compare different functions by just calling bench_function multiple times
// This would produce benchmarks called "Hashes/SHA1" and "Hashes/MD5".
benchmark_group.bench_function("SHA1", |b| b.iter(|| sha1sum("some input string"));
benchmark_group.bench_function("MD5", |b| b.iter(|| md5sum("some input string"));

// `finish` would consume the benchmark group structure and generate the summary reports.
// The benchmark group could be marked `must_use` to warn people if they forget?
benchmark_group.finish();

// Handling multiple parameters is straightforward
let mut benchmark_group = criterion.benchmark_group("SizedHashes");
for input_size in vec![10, 100, 1000, 10000, 100000] {
    // Now the user can generate large inputs directly without needing support from Criterion.rs
    let input = generate_hash_input(input_size);

    // The input size can be automatically converted to String with Debug (or should it be Display?)
    // or the user can provide their own string.
    // I'm imagining some trait-magic so this could accept either a string or a BenchmarkId, so
    // that the user doesn't have to provide a fake parameter string
    benchmark_group.bench_function(BenchmarkId::new("SHA1", input_size).with_throughput(Throughput::Bytes(input_size)), |b| b.iter(|| sha1sum(input)));
    // Not totally sold on the with_throughput thing here, will need to work on that a bit more.
    benchmark_group.bench_function(BenchmarkId::new("MD5", input_size).with_throughput(Throughput::Bytes(input_size)), |b| b.iter(|| md5sum(input)));
}
benchmark_group.finish();

Now, this is obviously much more flexible than the current design. It would be trivial to extend this to benchmarking over multi-dimensional input spaces (just nest more for-loops) or to only benchmark certain functions for certain inputs, etc. It does put more burden on the user not to do anything silly, but I think that's reasonable. It would also require the report-generation code to be flexible enough to handle the oddball cases in a sensible way, but I don't think that will be too hard (alas, unless HTML gains the ability to describe multi-dimensional tables, I'll probably have to flatten the inputs down to one dimension for display).

This would be a pretty major breaking-change, though; nearly everyone would have to update. If I go ahead with this, I might consider deprecating the existing APIs and hiding them from the docs, which would allow existing users to continue using them.

Preliminary Support for Custom Measurements

Criterion.rs has always measured wall-clock time, but users have been requesting other measurements (ranging from memory usage to CPU or even GPU performance counters) since before it was even released. I've been putting it off until I thought of a decent design, but I think I have one now. I'm thinking of something like this:

trait Measurement {
    type Intermediate;
    type Value;
    type Analysis: Analysis<Self::Value>;
    type Report: Report<Self::Analysis::Output>;

    fn start() -> Self::Intermediate;
    fn end(i: Self::Intermediate) -> Self::Value;
    // Probably some more functions to perform analysis and generate the report(s)
}

struct WallTime;
impl Measurement for Walltime {
    type Intermediate = Instant;
    type Value = Duration;
    type Analysis = TimeAnalysis;
    type Report = StatisticalReport;

    fn start() -> Instant { Instant::now() }
    fn end(i: Instant) -> Duration { i.elapsed() }
}

// For example...
struct MemoryUsage;
impl Measurement for MemoryUsage {
    type Intermediate = Bytes;
    type Value = BytesDifference;
    type Analysis = MemoryAnalysis;
    type Report = SingleValueReport;

    fn start() -> Bytes { get_bytes_allocated() };
    fn end(b: Bytes) -> BytesDifference { get_bytes_allocated() - b  }
}

The Criterion and Bencher structs would then gain a new type parameter M: Measurement, defaulting to WallTime. Some way to obtain a Criterion<MemoryUsage> would be needed as well; I'm not totally sure what that would look like.

I like this traits-and-types approach for a few reasons. It means that the measurements are statically known and can be inlined to reduce measurement overhead. It means that measurements can be defined in third-party crates, or customized by the user. It also means that I can gate functions in the Bencher on different measurements - the Bencher::iter_* family don't really make sense for mostly-constant measurements like memory allocated in an iteration of a function, so maybe those would only be defined for some measurement types

To start with, the analysis and report traits would probably be hidden and sealed to give me some more time to pin down their definitions, but this would allow various other timing measurements to be defined and analyzed/reported with the same code that handles the wall-clock measurements.

Custom "Test" Framework

I'm not sure if I'll do this for 0.3.0 at launch or at some point after, but now that custom test framework proc-macros are available in nightly Criterion.rs should add support for them. If we need any breaking changes for ergonomic usage of a #[criterion] macro, I should figure out what they are and make them in 0.3.0.

Misc.

There's a variety of other, smaller breakages listed under the Breaking Changes issue label. Mostly they're for removing deprecated things and/or disallowing legal-but-probably-wrong values to various functions.

Notably, external-program benchmarks will be removed, but they've been deprecated for several versions now and nobody has complained. I'm guessing nobody actually uses those.


Alright, that's all I've got. Comments or ideas would be welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment