Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper axis limits in histograms #253

Open
mkborregaard opened this issue Sep 13, 2021 · 6 comments
Open

Improper axis limits in histograms #253

mkborregaard opened this issue Sep 13, 2021 · 6 comments

Comments

@mkborregaard
Copy link

The new Benchmark histograms have the x axis limit set by the edges of the data, but that is not the best for interpretation. Waiting times for a function to run follows a distribution that is bounded at the bottom by 0, and where 0 as a value has a real interpretation. Importantly, the magnitude of variation in the benchmark histogram is best interpreted in terms of the size of the mean value.
As such histograms should be bounded on the left by the value 0. On the right it would greatly ease interpretation if they were bounded by a whole power of 10 seconds, e.g. 100 microseconds.

@gdalle
Copy link
Collaborator

gdalle commented Jun 13, 2023

Not sure I agree. Imagine you have a benchmark which takes between 7 and 8 ms. Then you only use 10% of your available display width

@mkborregaard
Copy link
Author

That is true. But is it a criteria to use as much of the display width as possible? I would rather say that the fact that they clump closely together gives the most useful information you can extract from the histogram.

@gdalle
Copy link
Collaborator

gdalle commented Jun 13, 2023

But then again that information could be retrieved from the min and max values alone. If we plot the histogram, it's because we want the fine-grained details inside, the ones that need the display width to come to light

@mkborregaard
Copy link
Author

Hmm, I guess I'm not entirely certain what those details are. Could you give an example?

@gdalle
Copy link
Collaborator

gdalle commented Jun 13, 2023

Tbh I never use the histograms either 🤣 But I can only assume they're here for a reason, and that they are useful beyond what extremal values can provide 🤷

@mkborregaard
Copy link
Author

OK, so either the distribution is normal, at which point I think the most important info is the size of the std to the mean - that is most easily seen with a histogram with 0 at the left. Or there are outliers (usually due to garbage collection I guess), not really sure what meaning they hold, but the relative distances from 0 is still the most useful piece of information.

Honestly I think what has happened here is that perhaps the design phase did not phrase clearly what information the histograms are supposed to provide. I am perfectly fine with removing them (as you hint at above), but if they should be kept I have yet to see a compelling argument for not doing them the way usually suggested by best practice: on arithmetic scales with 0 on the left.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants