-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Couple of thoughts on benchmark results in PRs #23085
Comments
This can be done but it requires something a bit more than the current approach which just formats the asv output as diff like
sympy/.github/workflows/comment-on-pr.yml Lines 88 to 106 in 131caae
Ideally everyone would be paying attention to these slowdowns and doing something about them before the release comes. For previous releases I took the approach you describe but then had to spend a lot of time investigating and trying to address possible slowdowns before the release which added a huge amount of work to the process of actually making the release. I do not want to commit to doing that as part of the release process: the task of keeping on top of performance should be a continuous process with the responsibility distributed among all contributors. One simple thing that we could do to reduce the output is adjust the thresholds so that less significant changes don't show up e.g. the output you show does represent real changes in some sense but most of them are not necessarily that significant. |
Ok, I'm not sure how to fix #1 off hand with the diff, but maybe there is some way to do so. I'll think about it. For the later, I think having people pay attention if their PR slows things down wrt to the tip of master is sufficient. Collectively, we should end up with no need to fix much at the release. Showing the slowdowns wrt to the prior release can't be fixed in each PR because the cause is from prior merged code since the release, right? We aren't expecting new PR's to fix these slowdowns wrt to the prior release, are we? |
I'm not sure how often people check the benchmark results before merging. The PR comment auto-updates but doesn't give any notification that it has done so. For example there was clearly a PR at some point that lead to some slowdowns for certain operations with matrices. That's what the result is showing you. I think that whoever wrote that PR and merged it probably did not check the benchmark results at the time. We can notice the effect of it though because it shows up on other PRs as well. If we only showed the change for each individual PR wrt master then that change would go unnoticed until the release. Also there is a threshold for reporting any slowdown. The change since the previous release will show the cumulative effect of all PRs. |
For A bigger problem is that the benchmark output is not really representative of an actual slowdown because it's reporting the time taken with the cache enabled. This is what the from sympy import *
x = symbols('x')
A = Matrix([
[ x, 8, 10, 5],
[10, 9, 3, 7],
[ 5, 9, x, 0],
[ 1, 8, 0, 7]])
import time
t = time.time()
A.det()
print(time.time() - t) For that I don't see a significant difference between 1.9 and master but it looks like a difference when you use timeit e.g.
|
I was about to open a new issue for this but I found this one. I think the benchmarks comments could be improved a lot. Right now it dumps a lot of information which means that it is likely to be ignored. The green/red thing has actually been "fixed" because asv now puts a vertical line to the left of the results, so it no longer looks like a diff. However, I think we can go much further:
renders as
We could additionally do some post-processing of this to make it easier to read, like
With this kind of change, we should also be able to cut down on the boilerplate text as well. I would cut it down to no more than a sentence. If there's more to say about benchmarks, it should be put in the docs with and just linked to. |
I think that the main thing would be to make it so that no comment gets posted at all if there are no significant changes or at least the comment could just say "no significant changes". Then most of the time it would not show up on a PR at all or at least it would not be distracting. Perhaps then if the PR does have significant changes then a new comment should be posted after each benchmark run (rather than editing the existing comment). Then if there are new changes that are relevant to the PR they can be reported very loudly. Otherwise though I think that showing master vs previous release is useful because we just won't see this very often if it is only shown in CI. |
Yeah, no comment would be nice, but at the same time, it would then be hard to tell if it's actually working or not, especially given how delayed it tends to be. It doesn't help that the benchmarks status is nonrequired.
Is it that important? I guess we could include it under a |
The problem at the moment is that a comment is added with likely irrelevant information but then gets quietly updated after each push. That makes it quite likely that a subsequent push might cause a significant change without a noisy notification to the effect of that change. The master vs previous release comparison is useful because if there was a significant slowdown in any benchmarks then it would definitely get noticed in a subsequent PR even it was missed when merging the first PR that introduced the slowdown. Without the master vs previous release comparison as soon as a measurable slowdown is merged it will be forgotten regardless of how big the slowdown is. If we could make it so that any possible slowdowns are always noisy (not just a silent PR comment edit) then maybe it would less necessary to have the master vs previous release comparison. |
I agree, we should try to avoid regressions in the first place. Also it's much less necessary to see the speedups than the slowdowns when comparing to the last release. Right now master has a lot of speedups, which is very nice, but is not necessary to see that all the time. |
This is what we now see:
This is great new feature but there are a couple of things that confuse me or don't seem so ideal.
The text was updated successfully, but these errors were encountered: