Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upCsv Report #147
Comments
This comment has been minimized.
This comment has been minimized.
|
Hey, thanks for the suggestion and sorry for the slow response. There have been requests for a stable machine-readable output format (eg. for lolbench). The current JSON storage format is an internal implementation detail. I was leaning more towards defining a stable JSON interchange format that could be different from the storage format, rather than CSV, but I could be convinced otherwise. Why did you choose to go with CSV over JSON? That said, I don't agree that a single large file is the way to go, for a lot of reasons. The most important one is that it's actually really difficult to have a complete list of the benchmarks for a given repository - some benchmarks might have been filtered out, some might be run with a different instance of the Criterion struct, some might be in a totally different binary. The HTML report works around this to generate the index by walking the directory tree, which is kind of an ugly hack and often involves overwriting the file multiple times. I'd prefer not to repeat that hack if possible. I don't see the need for combining all of the output into a single file anyway; anyone who is writing a script to parse these files can surely handle reading multiple files. Also if Criterion.rs does add something like this, the output format should include the number of iterations in each sample. That will be necessary for the consumer to calculate summary statistics correctly. It's not a bad idea, but I think it needs some more work. |
This comment has been minimized.
This comment has been minimized.
|
I didn't mean to pump and dump an issue and PR (the PR was to communicate my thinking in case the issue wasn't clear
A CSV seems to be the lowest bar for data analysis (including BI tools). However, I can be convinced too My use case involves graphing benchmark data using R/Python, so CSV or JSON doesn't make a difference to me. I had done the CSV PR as it is easier to append to a CSV after each measurement
Only include benchmarks that finished. It's how google's benchmark, BenchmarkDotNet, and JMH do it. Seems like a reasonable philosophy to follow. If a benchmark didn't run, don't include it in the output.
While switching up the sample size could conceivably affect the number of records in the CSV / JSON, I don't see how it invalidates writing to a single file. If you're worried about reporting differences in warmup, confidence level, etc, you can always note these values as csv columns or json entries.
I haven't done criterion binary benchmarking (though I think it is a cool feature!). Does it not fall under the same philosophy of warmup, samples, and confidence? If only a subset of those apply, you can always mark those values are null or missing. Er -- if this in reference to having two benchmarks
Right, it's not a question of technical ability. It's a question of convenience.
TLDR: I'm lazy If criterion's goal is to give Rust programmers the tools to easily interpret their benchmarks then it also has to recognize that the cli and html output can be complemented with a single file (or a file per suite) that can be slurped up into any Viz tool or programming language. Especially if the canonical benchmarking frameworks for C++, Java, and C# has shown that such reports are useful. |
This comment has been minimized.
This comment has been minimized.
|
I'm sorry, I wasn't clear earlier. What you're proposing is much harder to accomplish than you realize, but it's not immediately obvious why that's the case so I'll try to explain. Where do you propose to store the results before writing them to the CSV file, and how do you decide when to overwrite (as opposed to appending to) that file? If we store them in the Criterion struct, we run into the problem that the Criterion struct gets discarded after every benchmark group. You could write to disk at the end of the benchmark (as your current implementation does) but then each benchmark will overwrite the last. You can get around that by storing the results in some sort of global static storage. However, Cargo makes it easy to define multiple benchmark executables - what you refer to as As far as I'm aware, Cargo doesn't provide any way to know which executable we're in or what other executables there are or what benchmarks might be in them, so you can't even write to different files. I don't think you can even look at the executable path in You could append to the file instead of overwriting it, but now you have a different problem - you can't tell if the existing results file was written by a previous executable in the same The problem here is that we don't have any good way to determine a complete list of which benchmarks were run in a given As I said before, the way I work around this to generate the HTML report index is to walk the file tree to scan for stored benchmark files, but this is unsatisfactory too. It's expensive - a full directory traversal. It's wasteful - all benchmark executables will perform this traversal and write a file, only to have it overwritten by the next executable. And it's not even all that accurate - if I delete a benchmark from my code and run It's probably not possible to correctly handle all of these corner cases without some kind of support from Cargo. Although I suppose you could get close, it would involve a lot of guesswork and complexity on Criterion's part and I don't want that maintenance burden. If you're feeling particularly ambitious, you could try to convince the developers of Cargo to set environment variables when running the benchmark executables. For example, you could tell Criterion 'this is executable 1 of 3' and then Criterion could clear the CSV file. But even there you run into corner cases - what if some of the benchmark executables don't use Criterion? If the first executable uses Bencher or something instead, then you'll never clear the CSV file. Or you could just write a separate CSV file for every benchmarked function. That's easy to do and there's no problem with different benchmarks or different executables interfering with each other. True, the user now has to do I hope this rant explains some of my thinking on this issue. I've already made all of these mistakes while trying to implement the HTML report index, you see, and it was not so much fun that I want to do it again.
Well, that's not really intended to be a user-facing format. In fact, it could change without warning (and already has at least once) so it's probably not a good idea to rely on it too heavily. I'd also like to think some more about what should be in this hypothetical CSV file. Is it more useful to have the raw samples, or would you want to read the summary statistics instead? It's tempting to say that Criterion should save both, but that means more code to maintain. I'm kind of inclined to only expose the raw sample data to keep the API surface as minimal as possible, but then users have to compute their own summary statistics. I don't want to inflict work on users without good reason, but I also don't want to have to support something that nobody uses. |
This comment has been minimized.
This comment has been minimized.
|
Ah thanks for the patience! It appears that we're blocked on Cargo before we can even begin to talk specifics. I'm feeling strangely motivated to make this possible, so I'll look to open a pre-rfc thread on internals in the next couple days and see what others think. There may be a solution that custom test harnesses could use too
That would be fine with me.
True, but if / when that is too much work, people will open an issue with suggestions, so you'll know what's important to others and that others are using the feature. |
This comment has been minimized.
This comment has been minimized.
|
So I've updated the PR so that criterion takes advantage of Was |
This comment has been minimized.
This comment has been minimized.
|
Well, the issue there is that |
This comment has been minimized.
This comment has been minimized.
|
Yes, colons can be replaced with a permissible character (like a dash) or take everything after the last colon |
nickbabcock commentedMay 23, 2018
While the criterion html report is decent for most situations, there are occasions when a different graph allows for an easier interpretation. Instead making the html report endlessly configurable, I propose exposing the data in a csv format. This way, anyone can easily create their graph in R, python, etc. While the json in
target/criterion/$BENCHMARK/$FUNCTIONis a good start -- pulling everything together into a (preferably) single file would greatly lower the bar.For instance a
$cargobenchname-raw.csvcould hold the raw sampling data with headers like:Where group id and parameter can be left blank for the benchmarks that don't apply. I'm torn if throughput should be included. I'm leaning towards no, as I believe it's calculable. Users of the raw data can calculate additional aggregations they are interested in (like percentiles)
Alternatively (or in addition), the aggregated values can be written to
$cargobenchname.csvMore than just the mean statistics would be included.
As a comparison, the C# BenchmarkDotNet outputs a csv like this, which can be slurped up to create nice looking graphs.
Maybe this could help people transition to presenting rust benchmark results in graphical form. I know that my Rust projects don't get enough love in the dataviz department😄