Run history difference reporting / exporting #973

lahma · 2018-11-25T10:14:28Z

I'm filing an issue just the check whether it would be a valid feature and reasonable to implement on BenchmarkDotNet's side. I've built a small and ugly helper that produces difference between two BenchmarkDotNet runs using CSV reports: https://github.com/lahma/BenchmarkDotNet.ResultDiff .

So the parsing is ugly and brittle, but having a feature that would clearly state difference between various runs in percentages/absolute values seems beneficial to me. I've used when optimizing the Jint library and I feel it's great way to easily communicate the difference. I see people throwing two sets of results (before/after) when creating optimization PRs and if there are more than 3 rows to mentally diff, it gets burdensome (or it's just me).

So what I would suggest is some form of exporter that keeps track of every run done in efficient raw data format, say normal_file_name_yyyy-MM-dd-HH-mm-ss.data and then runs a diff of oldest and newest (by default) and produces similar output like the tool I linked which shows what's the actual difference between work done.

adamsitnik · 2019-01-11T18:58:57Z

Hi @lahma

I like this idea and I have even implemented a similar tool in the dotnet/performance repo https://github.com/dotnet/performance/tree/master/src/tools/ResultsComparer

Maybe we should add it as a new, global tool similar to #1006 ?

@AndreyAkinshin what do you think about adding such command line tools to the BDN repo?

CodeTherapist · 2019-01-12T18:43:21Z

@adamsitnik @adamsitnik
I would like to implement that as well.

Design suggestion

Implementing it into the existing BenchmarkDotNet.Tool project with the sub-commands concept.

The subcommand run would execute benchmarks:
dotnet benchmarkdotnet run [arguments] [options]

The subcommand diff would get the difference between a baseline report and another report:
dotnet benchmarkdotnet diff [arguments] [options]

This design is similiar to other well known dotnet tools (e.g. tooling for EF Core).
In the following example database is a sub-command of the ef command:
dotnet ef database [arguments]

Advantages

One single tool to install
All (sub)-commands are compatible with a specific BenchmarkDotNet version
Similarity with other dotnet tools

How would you like that?

AndreyAkinshin · 2019-01-13T08:28:00Z

I think it's a good idea to reuse BenchmarkDotNet.Tool for all kinds of commands. We don't need many different NuGet packages for different commands.
If we want to have another kind of the summary table, we should have a corresponding exporter which allows getting this new "diff" table for the current benchmarking session.
Currently, we don't have a proper serialization format for benchmark results (see Summary Save/Load/Combine #305). I have some design notes for this format, but I still didn't come with good specifications for that. I think that the best way is to finish these specifications and introduce commands for post-processing. In this case, we will be able to apply any kinds of exporters to benchmark session which are already finished. Also, we can introduce a special kind of exporters which can consume two different reports (we can call it Differ).
The current CSV export format is a temporary hack which I needed for the RPlotExporter. It wasn't designed for other kinds of post-processing (e.g., it doesn't include the environment data: we can't form a proper header for the diff summary table automatically).

lahma · 2019-01-13T09:06:59Z

I love all the possibilities listed here. I'd also like to point out the case which is important to me. I usually have baseline run that tests system (multiple methods, sub-systems) and I try to see how the results have been affetected by change. So in my case it's more of a "overall rps/s change after I fine-tuned data-structures allocation patterns" instead of "which of the two methods is faster". So I usually run multiple benchmarks stressing library from the top and check that no regressions are introduced when I tweak some particular case.

So in short I'll run same benchmarks and I usually want to see allocation and duration changes for the same benchmark over time.

adamsitnik · 2019-01-13T13:29:34Z

I think it's a good idea to reuse BenchmarkDotNet.Tool for all kinds of commands. We don't need many different NuGet packages for different commands.

@AndreyAkinshin personally I would prefer a dedicated tool for every command. It would give us a better overview of what our users are using (nuget stats) and cleaner commands, more Unix-like?

Commands with a single tool:

dotnet benchmark run abc.dll
dotnet benchmark compare x.json y.json

With dedicated tools:

dotnet benchmark abc.dll
dotnet compare x.json y.json

Also in the future I would like to move some of our code to stand-alone tools. Examples: disassembler (could be reused by others) and profilers (could also be reused)

dotnet disassembler --processId 1234 --method My.Program.Main --depth 3 
dotnet profiler start --type ETW
dotnet profiler stop --type ETW

@AndreyAkinshin what do you think about this idea in general?

Speaking of the files, as of today every run overwrites the previous result. I think that we should change it (maybe include a time stamp in the file name or sth like that?). Also, prefer JSON over CSV. It's more "type safe" to me ;p

AndreyAkinshin · 2019-01-13T14:03:40Z

I think that option "install one package and get all of the command line out of the box" is better than forcing users to install a separate NuGet package for each command.

Also, I don't like this command line:

dotnet compare x.json y.json

It's OK for us to reserve the dotnet benchmark keyword because BenchmarkDotNet is the most popular benchmarking library. However, I don't want to reserve dotnet compare for comparing BenchmarkDotNet-specific files (the same for disassembler and `profiler). Maybe we can resolve it via arguments like this:

dotnet benchmark --info
dotnet benchmark --version
dotnet benchmark --compare x.json y.json
dotnet benchmark abc.dll

adamsitnik · 2019-01-13T14:07:36Z

@AndreyAkinshin You are right.

BTW if we switch to System.CommandLine (#1016) it should be easier to write a single global tool that handles everything we want (it was designed for global tools, including support of auto-complete for the argument names!)

paule96 · 2019-01-16T21:56:11Z

This would be super nice. Currently, I try to compare the results in azure devops put don't find a good way to do this.

Wildenhaus · 2019-01-25T03:20:22Z

It would be nice to have additional metrics reported with each summary, such as being able to display the net increase or decrease in each benchmark's mean execution time (compared to the previous run).

For example, something like:

Method |     Mean |     Error |    StdDev | Delta Mean |
------ |---------:|----------:|----------:|-----------:|
 TestA | 1.617 ns | 0.0583 ns | 0.0924 ns |  ▼ 0.143ns |  Increase in performance
 TestB | 1.383 ns | 0.0218 ns | 0.0204 ns |  ▲ 0.522ns |  Decrease in performance
 TestC | 4.288 ns | 0.1344 ns | 0.0819 ns |        n/a |  No records to compare to

Adding on to that idea, being able to plot changes in performance over time (even if it meant opening a file in a third-party program) would also be awesome and help greatly with development.

MarcoRossignoli · 2019-02-21T10:43:24Z

@adamsitnik ask to me to follow up here after dotnet/performance#314 (review)

I don't know if new comparer tool will be the same of BenchmarkDotNet.Tool but as I said could be useful:

specify for base/new folder or csv file and tool should picks up file or last run from folder, my use case is one fixed base but multiple new with updates and every test I want to compare only last run, now comparer crashes.
we should pass the list of metrics we want to compare and tool should produce one result(fast/slow list https://github.com/dotnet/performance/tree/master/src/tools/ResultsComparer#sample-results) for every metric

dominikjeske · 2020-04-18T13:42:48Z

This feature looks promising - is there any progress on this?

svengeance · 2020-09-25T06:49:45Z

I was also looking for something like this. It would be wonderful if I could easily run BDN with a comparison CLI argument as part of a PR cycle, and have PR submissions report the net increase (or decrease) in performance.

rymeskar · 2020-11-25T17:47:00Z

@adamsitnik and @AndreyAkinshin is there any news around 'run history difference reporting / exporting'?

gioce90 · 2022-06-01T13:04:25Z

This feature looks promising - is there any progress on this?

Same question

lloydjatkinson · 2024-02-26T22:50:58Z

One of the biggest pain points I'm finding with BDN so far is there's no convenient way of comparing a before and after. So far, what I'm having to do is write a MyClass and MyClass2 which are identical except the performance change I'm trying out. This is a pretty grim methodology. It would be really good if it could compare across git commits.

Tarun047 · 2024-03-10T08:56:53Z

+1 for this.
Please let me know if anyone is taking this up.
If not I can take it up.
As this would be of huge benefit for many people trying to use this on CI/CD pipelines.

AndreyAkinshin · 2024-03-10T11:41:39Z

@Tarun047 I'm working on it right now. A huge refactoring is coming with a new serialization format + a lot of new features including various reports.

adamsitnik mentioned this issue Feb 21, 2019

CoreFX workflow docs dotnet/performance#314

Merged

adamsitnik mentioned this issue Mar 12, 2019

[Comparer]Take last result dotnet/performance#323

Closed

adamsitnik added up-for-grabs help wanted labels Mar 12, 2019

adamsitnik mentioned this issue Apr 16, 2019

Make ResultsComparer into a stand-alone global tool dotnet/performance#412

Closed

adamsitnik mentioned this issue Aug 1, 2019

Comparing runs between machines? #1216

Closed

stebet mentioned this issue Dec 10, 2020

Speeding up shortstr/longstr (de)serialization. rabbitmq/rabbitmq-dotnet-client#985

Merged

12 tasks

AndreyAkinshin removed up-for-grabs help wanted labels Mar 10, 2024

AndreyAkinshin self-assigned this Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run history difference reporting / exporting #973

Run history difference reporting / exporting #973

lahma commented Nov 25, 2018

adamsitnik commented Jan 11, 2019

CodeTherapist commented Jan 12, 2019 •

edited

AndreyAkinshin commented Jan 13, 2019

lahma commented Jan 13, 2019 •

edited

adamsitnik commented Jan 13, 2019

AndreyAkinshin commented Jan 13, 2019

adamsitnik commented Jan 13, 2019

paule96 commented Jan 16, 2019

Wildenhaus commented Jan 25, 2019 •

edited

MarcoRossignoli commented Feb 21, 2019

dominikjeske commented Apr 18, 2020

svengeance commented Sep 25, 2020 •

edited

rymeskar commented Nov 25, 2020

gioce90 commented Jun 1, 2022

lloydjatkinson commented Feb 26, 2024

Tarun047 commented Mar 10, 2024

AndreyAkinshin commented Mar 10, 2024

Run history difference reporting / exporting #973

Run history difference reporting / exporting #973

Comments

lahma commented Nov 25, 2018

adamsitnik commented Jan 11, 2019

CodeTherapist commented Jan 12, 2019 • edited

AndreyAkinshin commented Jan 13, 2019

lahma commented Jan 13, 2019 • edited

adamsitnik commented Jan 13, 2019

AndreyAkinshin commented Jan 13, 2019

adamsitnik commented Jan 13, 2019

paule96 commented Jan 16, 2019

Wildenhaus commented Jan 25, 2019 • edited

MarcoRossignoli commented Feb 21, 2019

dominikjeske commented Apr 18, 2020

svengeance commented Sep 25, 2020 • edited

rymeskar commented Nov 25, 2020

gioce90 commented Jun 1, 2022

lloydjatkinson commented Feb 26, 2024

Tarun047 commented Mar 10, 2024

AndreyAkinshin commented Mar 10, 2024

CodeTherapist commented Jan 12, 2019 •

edited

lahma commented Jan 13, 2019 •

edited

Wildenhaus commented Jan 25, 2019 •

edited

svengeance commented Sep 25, 2020 •

edited