Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit regression detection calculation #270

Closed
jrbourbeau opened this issue Aug 19, 2022 · 3 comments · Fixed by #283
Closed

Revisit regression detection calculation #270

jrbourbeau opened this issue Aug 19, 2022 · 3 comments · Fixed by #283
Assignees
Labels
enhancement New feature or request infrastructure Work related to infrastucture

Comments

@jrbourbeau
Copy link
Member

In #226 we added automatic regression detection. We've already identified a legitimate performance regression, which has been great to see. On the other hand, there have also been some false positives (xref #264 (comment), #247 (comment)). We should revisit our existing regression detection calculation to see how we might be able to reduce the rate of false positives

@jrbourbeau jrbourbeau added enhancement New feature or request infrastructure Work related to infrastucture labels Aug 19, 2022
@ncclementi
Copy link
Contributor

I've been paying attention to the regressions reports, and even though we have some false positives, I think we have been able to detect quite a few legitimate regressions.

There are some false positives, yes, but if the conditions we were more relaxed we would have missed the legitimate regressions reported above.

However, I think we need to have a better system to be able to discern quickly if a regression is legitimate or if it is a false positive.

With this information at hand I believe that we would be able to discard false positives faster.

@ian-r-rose
Copy link
Contributor

One thing which would be nice to have: I think the criterion for regression detection should be different in PRs vs the scheduled runs. For the scheduled runs, we look at the last three to be sure that there is a consistent measurement (and not some weird network hiccup or EC2 wobble). But for PRs that's not really relevant, since we just have one measurement to compare to the time series. So I'd suggest that we instead look at just the most recent run in PRs (maybe with a 2stdev threshold, maybe not) and compare that to the timeseries.

@ncclementi
Copy link
Contributor

@ian-r-rose This makes sense. I'll take a look at how to pass this to the script.
I'm thinking of creating an env variable on the action that indicates if it's a PR or not and based on that taking different routes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request infrastructure Work related to infrastucture
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants