-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When we make PRs like @jaylmiller 's #5292 or #3463 we often want to know "does this make existing benchmarks faster / slower". To answer this question we would like to:
- Run benchmarks on
main - Run benchmarks on the PR
- Compare the results
This workflow is supported well for the criterion based microbenchmarks in https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/benches (by using criterion directly or using the https://github.com/BurntSushi/critcmp)
However, for the "end to end" benchmarks in https://github.com/apache/arrow-datafusion/tree/main/benchmarks there is no easy way I know of to do two runs and compare results.
Describe the solution you'd like
There is a "machine readable" output format generated with the -o parameter (as shown below)
- I would like a script that that compares the output of two benchmark runs. Ideally written either in bash or python.
- Instructions on how to run the script added to https://github.com/apache/arrow-datafusion/tree/main/benchmarks
So the workflow would be
Step 1: to create two or more output files using -o:
alamb@aal-dev:~/arrow-datafusion2/benchmarks$ cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path ~/tpch_data/parquet_data_SF1 --format parquet -o main
This produces files like in benchmarks.zip. Here is an example
{
"context": {
"benchmark_version": "19.0.0",
"datafusion_version": "19.0.0",
"num_cpus": 8,
"start_time": 1678622986,
"arguments": [
"benchmark",
"datafusion",
"--iterations",
"5",
"--path",
"/home/alamb/tpch_data/parquet_data_SF1",
"--format",
"parquet",
"-o",
"main"
]
},
"queries": [
{
"query": 1,
"iterations": [
{
"elapsed": 1555.030709,
"row_count": 4
},
{
"elapsed": 1533.61753,
"row_count": 4
},
{
"elapsed": 1551.0951309999998,
"row_count": 4
},
{
"elapsed": 1539.953467,
"row_count": 4
},
{
"elapsed": 1541.992357,
"row_count": 4
}
],
"start_time": 1678622986
},
...
Step 2: Compare the two files and prepare a report
benchmarks/compare_results branch.json main.jsonWhich would produce an output report of some type. Here is an example of an output output (from @korowa on #5490 (comment)). Maybe they have a script they could share
Query branch main
----------------------------------------------
Query 1 avg time: 1047.93 ms 1135.36 ms
Query 2 avg time: 280.91 ms 286.69 ms
Query 3 avg time: 323.87 ms 351.31 ms
Query 4 avg time: 146.87 ms 146.58 ms
Query 5 avg time: 482.85 ms 463.07 ms
Query 6 avg time: 274.73 ms 342.29 ms
Query 7 avg time: 750.73 ms 762.43 ms
Query 8 avg time: 443.34 ms 426.89 ms
Query 9 avg time: 821.48 ms 775.03 ms
Query 10 avg time: 585.21 ms 584.16 ms
Query 11 avg time: 247.56 ms 232.90 ms
Query 12 avg time: 258.51 ms 231.19 ms
Query 13 avg time: 899.16 ms 885.56 ms
Query 14 avg time: 300.63 ms 282.56 ms
Query 15 avg time: 346.36 ms 318.97 ms
Query 16 avg time: 198.33 ms 184.26 ms
Query 17 avg time: 4197.54 ms 4101.92 ms
Query 18 avg time: 2726.41 ms 2548.96 ms
Query 19 avg time: 566.67 ms 535.74 ms
Query 20 avg time: 1193.82 ms 1319.49 ms
Query 21 avg time: 1027.00 ms 1050.08 ms
Query 22 avg time: 120.03 ms 111.32 ms
Describe alternatives you've considered
Another possibility might be to move the specialized benchmark binaries into criterion (so they look like "microbench"es but I think this is non ideal because of the number of parameters supported by the benchmarks
Additional context