Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upRoadmap for 0.3.0 #261
Comments
bheisler
added
Enhancement
question
Investigate
labels
Jan 25, 2019
bheisler
added this to the Version 0.3.0 milestone
Jan 25, 2019
bheisler
self-assigned this
Jan 25, 2019
bheisler
added
Bigger Project
Request For Comments
and removed
Investigate
question
labels
Feb 2, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
bheisler commentedJan 25, 2019
I'm planning to take some time off working on Criterion.rs shortly, but before I do I'd like to describe my plans for 0.3.0 and request some comments from the community. 0.3.0 is the first breaking-change release since Criterion.rs became stable-compatible, and I have a handful of things I'd like to change.
Major Changes
Removing Benchmark/ParameterizedBenchmark
I initially created these as a generalization of the various
Criterion::bench_*functions and they mostly work, but there are a few flaws in this design. For instance, there isn't an obvious way to handle cases like #225, where the user wants to construct a large input once. Additionally, it imposes strange limitations on the parameter types (why must they implement Debug?) and strange contortions to deal with lifetimes (#260). Additionally, sometimes the user doesn't want to have the debug representation of the parameter be used as the parameter in the benchmark ID - for example, if I'm testing a parser I don't want my benchmark to be called "my_group/my_function/Some really really long file full of text that my parser parses but isn't really relevant to the benchmark...". After thinking about this for a while, I wondered why we even need these structures at all? After all, they basically just define a nested for loop:I'm sure the user is capable of writing their own for-loops, so why not let the user provide that? Here's a sketch of what I'm thinking:
Now, this is obviously much more flexible than the current design. It would be trivial to extend this to benchmarking over multi-dimensional input spaces (just nest more for-loops) or to only benchmark certain functions for certain inputs, etc. It does put more burden on the user not to do anything silly, but I think that's reasonable. It would also require the report-generation code to be flexible enough to handle the oddball cases in a sensible way, but I don't think that will be too hard (alas, unless HTML gains the ability to describe multi-dimensional tables, I'll probably have to flatten the inputs down to one dimension for display).
This would be a pretty major breaking-change, though; nearly everyone would have to update. If I go ahead with this, I might consider deprecating the existing APIs and hiding them from the docs, which would allow existing users to continue using them.
Preliminary Support for Custom Measurements
Criterion.rs has always measured wall-clock time, but users have been requesting other measurements (ranging from memory usage to CPU or even GPU performance counters) since before it was even released. I've been putting it off until I thought of a decent design, but I think I have one now. I'm thinking of something like this:
The Criterion and Bencher structs would then gain a new type parameter
M: Measurement, defaulting toWallTime. Some way to obtain aCriterion<MemoryUsage>would be needed as well; I'm not totally sure what that would look like.I like this traits-and-types approach for a few reasons. It means that the measurements are statically known and can be inlined to reduce measurement overhead. It means that measurements can be defined in third-party crates, or customized by the user. It also means that I can gate functions in the Bencher on different measurements - the
Bencher::iter_*family don't really make sense for mostly-constant measurements like memory allocated in an iteration of a function, so maybe those would only be defined for some measurement typesTo start with, the analysis and report traits would probably be hidden and sealed to give me some more time to pin down their definitions, but this would allow various other timing measurements to be defined and analyzed/reported with the same code that handles the wall-clock measurements.
Custom "Test" Framework
I'm not sure if I'll do this for 0.3.0 at launch or at some point after, but now that custom test framework proc-macros are available in nightly Criterion.rs should add support for them. If we need any breaking changes for ergonomic usage of a
#[criterion]macro, I should figure out what they are and make them in 0.3.0.Misc.
There's a variety of other, smaller breakages listed under the Breaking Changes issue label. Mostly they're for removing deprecated things and/or disallowing legal-but-probably-wrong values to various functions.
Notably, external-program benchmarks will be removed, but they've been deprecated for several versions now and nobody has complained. I'm guessing nobody actually uses those.
Alright, that's all I've got. Comments or ideas would be welcome.