Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[build] Add performance testing of PRs vs current #4023

Merged
merged 26 commits into from
Mar 29, 2024
Merged

Conversation

kaby76
Copy link
Contributor

@kaby76 kaby76 commented Mar 22, 2024

This PR, for #4013, helps to solve a significant problem in detecting regressions in grammar performance. Changes to a grammar that solve a syntax problem often result in a performance degradation that is unknown until well after the PR is merged. To detect this problem, a workflow is added to perform a statistical analysis of the performance for a grammar, comparing the grammar before and after a change. While no errors should be raised if the testing fails, the workflow will give a "heads up" that people can look at to see whether the PR introduces a regression.

While it does not solve the greater problem of finding ambiguity and full-context parser fallbacks, these problems can be inferred via a statistical analysis of the performance.

Changes

  • A new workflow is added: "perf.yml".
  • The workflow executes _scripts/perf-changed.sh.
  • The workflow passes to the Bash script what the PR modifies. For each grammar modified, the script checks the runtime of the grammar before and after the PR.
  • The sample size (i.e., the number of times all tests in examples/ are run) is the minimum of 40 or the number of times one can run the test under 20 minutes total (10 m before + 10 m after, for each grammar changed). For example, if the test normally takes 3 minutes, then only 10/3 = 3 iterations can be done. The test driver always outputs a "Total Time", which is stored in files for the PR and before the PR, which is in units of seconds.
  • A bar graph showing the mean time and standard deviations for each grammar/before-after PR. The bar graph is output as TTY since artifacts are generally not available. Note, it is important to note that you can have a "statistical significance" while the bar graph shows the mean times roughly the same and error bars overlapping.
  • A Welch's test is computed to determine "statistical" significance.
  • We also check the ratio of the mean runtimes to determine "practical" significance. A cut off of 5% is assumed.
  • If there is both a "statistical" and "practical" significant difference in performance, the workflow outputs a message to indicate there is a performance drop. See https://github.com/antlr/grammars-v4/actions/runs/8466843554/job/23196490098#step:15:359
  • This PR alters the sql/plsql grammar so that the statistical test can be performed. The grammar was reformatted according to the coding style rules for the repo.

@kaby76 kaby76 changed the title [build] Add performance testing of PR against current [build] Add performance testing of PRs vs current Mar 23, 2024
@kaby76 kaby76 marked this pull request as ready for review March 28, 2024 12:20
@kaby76
Copy link
Contributor Author

kaby76 commented Mar 28, 2024

All set.

@teverett
Copy link
Member

@kaby76 thanks!

@teverett teverett merged commit c7883a5 into antlr:master Mar 29, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants