Skip to content

Report and compare benchmark runs against two branches #5561

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When we make PRs like @jaylmiller 's #5292 or #3463 we often want to know "does this make existing benchmarks faster / slower". To answer this question we would like to:

  1. Run benchmarks on main
  2. Run benchmarks on the PR
  3. Compare the results

This workflow is supported well for the criterion based microbenchmarks in https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/benches (by using criterion directly or using the https://github.com/BurntSushi/critcmp)

However, for the "end to end" benchmarks in https://github.com/apache/arrow-datafusion/tree/main/benchmarks there is no easy way I know of to do two runs and compare results.

Describe the solution you'd like
There is a "machine readable" output format generated with the -o parameter (as shown below)

  1. I would like a script that that compares the output of two benchmark runs. Ideally written either in bash or python.
  2. Instructions on how to run the script added to https://github.com/apache/arrow-datafusion/tree/main/benchmarks

So the workflow would be

Step 1: to create two or more output files using -o:

alamb@aal-dev:~/arrow-datafusion2/benchmarks$ cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path ~/tpch_data/parquet_data_SF1 --format parquet -o main

This produces files like in benchmarks.zip. Here is an example

{
  "context": {
    "benchmark_version": "19.0.0",
    "datafusion_version": "19.0.0",
    "num_cpus": 8,
    "start_time": 1678622986,
    "arguments": [
      "benchmark",
      "datafusion",
      "--iterations",
      "5",
      "--path",
      "/home/alamb/tpch_data/parquet_data_SF1",
      "--format",
      "parquet",
      "-o",
      "main"
    ]
  },
  "queries": [
    {
      "query": 1,
      "iterations": [
        {
          "elapsed": 1555.030709,
          "row_count": 4
        },
        {
          "elapsed": 1533.61753,
          "row_count": 4
        },
        {
          "elapsed": 1551.0951309999998,
          "row_count": 4
        },
        {
          "elapsed": 1539.953467,
          "row_count": 4
        },
        {
          "elapsed": 1541.992357,
          "row_count": 4
        }
      ],
      "start_time": 1678622986
    },
    ...

Step 2: Compare the two files and prepare a report

benchmarks/compare_results branch.json main.json

Which would produce an output report of some type. Here is an example of an output output (from @korowa on #5490 (comment)). Maybe they have a script they could share

Query               branch         main
----------------------------------------------
Query 1 avg time:   1047.93 ms     1135.36 ms
Query 2 avg time:   280.91 ms      286.69 ms
Query 3 avg time:   323.87 ms      351.31 ms
Query 4 avg time:   146.87 ms      146.58 ms
Query 5 avg time:   482.85 ms      463.07 ms
Query 6 avg time:   274.73 ms      342.29 ms
Query 7 avg time:   750.73 ms      762.43 ms
Query 8 avg time:   443.34 ms      426.89 ms
Query 9 avg time:   821.48 ms      775.03 ms
Query 10 avg time:  585.21 ms      584.16 ms
Query 11 avg time:  247.56 ms      232.90 ms
Query 12 avg time:  258.51 ms      231.19 ms
Query 13 avg time:  899.16 ms      885.56 ms
Query 14 avg time:  300.63 ms      282.56 ms
Query 15 avg time:  346.36 ms      318.97 ms
Query 16 avg time:  198.33 ms      184.26 ms
Query 17 avg time:  4197.54 ms     4101.92 ms
Query 18 avg time:  2726.41 ms     2548.96 ms
Query 19 avg time:  566.67 ms      535.74 ms
Query 20 avg time:  1193.82 ms     1319.49 ms
Query 21 avg time:  1027.00 ms     1050.08 ms
Query 22 avg time:  120.03 ms      111.32 ms

Describe alternatives you've considered
Another possibility might be to move the specialized benchmark binaries into criterion (so they look like "microbench"es but I think this is non ideal because of the number of parameters supported by the benchmarks

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions