# MQT Benchmarking Workflow

MQT Core provides a benchmark suite to evaluate its performance. If you are developing MQT Core and want to know how the changes you made in a certain branch or commit affect the performance, this workflow is especially helpful.

## First Step
Run the script `mqt-core/eval/eval_dd_package.cpp`. Note that convenience variables to get the current branch and commit hash are defined in the script. On the line 154, change the key `CURRENT_BRANCH` to the name that you wish to use to distinguish your results.
A typical `results.json` file is shown below.

![results.json](eval/results.json.png)

## JSON Files
After you run the script, by default, you will see a `results.json` file and a `results_reduced.json` file. The `results.json` file contains all the data that is collected during the benchmarking process. The `results_reduced.json` is generated from the `results.json` file whereas the fields irrelevant for the benchmark comparison are removed, and the data is aggregated that every metric will consist of a dictionary with your branch/commit name as key, and the metric as the value.

Normally, a `results_reduced.json` file should resemble the following image. We will use the reduced json file to visualize the benchmarking results.

![results_reduced.json](eval/results_reduced.json.png)

## Running the Visualization
In the Python module `mqt.core.evaluation`, simply calling the function `compare` while passing the file name of the reduced json will give us detailed and clear comparisons.
Note that you should change the variables on lines 17-18 in `evaluation.py` according to the keys of benchmark data in your json file.
Note that you can and should change the variables on lines 23-26 to customize the header of the outputs.
An exemplary CLI output after running `compare` is shown below.

![compare-default](eval/compare_default.png)

Note that the above image only shows a comparison with default parameters. Several parameters are implemented to customize the comparison.
- `factor`. How much does the new metric have to differ from the baseline to be considered as a significant change? Default is 0.1.
- `only_changed`. If False, a table with the benchmarks that haven't changed significantly will be shown. Default is True.
- `sort`. The sort order of the output tables. Default is 'ratio'. The other option is 'experiment'.
- `no_split`. If True, the output tables (improved and worsened. Also the results that stay the same if `only_changed` in False) will be merged into one table. Default is False.

An exemplary output with `only_changed` set False is shown below.

![compare-with-same](eval/compare_with_same.png)

An exemplary output with `sort` set 'experiment' is shown below.

![compare-sort-exp](eval/compare_sort_exp.png)

A `no_split` example:

![compare-no-split](eval/compare_no_split.png)

<div class="alert alert-info">
Note

If you want to compare multiple branches/commits, simply set the name in `eval_dd_package.cpp`. The `results_reduced.json` file will then have dictionaries with multiple value-key pairs with keys being the branch/commit names you set. When calling `compare`, you can set the variables in `evaluation.py` to specify which two you want to compare. The function will then automatically ignore the key(s) that are not specified and it will be analogous to comparing only two branches/commits.
</div>