[build] Add performance testing of PRs vs current #4023

kaby76 · 2024-03-22T00:29:23Z

This PR, for #4013, helps to solve a significant problem in detecting regressions in grammar performance. Changes to a grammar that solve a syntax problem often result in a performance degradation that is unknown until well after the PR is merged. To detect this problem, a workflow is added to perform a statistical analysis of the performance for a grammar, comparing the grammar before and after a change. While no errors should be raised if the testing fails, the workflow will give a "heads up" that people can look at to see whether the PR introduces a regression.

While it does not solve the greater problem of finding ambiguity and full-context parser fallbacks, these problems can be inferred via a statistical analysis of the performance.

Changes

A new workflow is added: "perf.yml".
The workflow executes _scripts/perf-changed.sh.
The workflow passes to the Bash script what the PR modifies. For each grammar modified, the script checks the runtime of the grammar before and after the PR.
The sample size (i.e., the number of times all tests in examples/ are run) is the minimum of 40 or the number of times one can run the test under 20 minutes total (10 m before + 10 m after, for each grammar changed). For example, if the test normally takes 3 minutes, then only 10/3 = 3 iterations can be done. The test driver always outputs a "Total Time", which is stored in files for the PR and before the PR, which is in units of seconds.
A bar graph showing the mean time and standard deviations for each grammar/before-after PR. The bar graph is output as TTY since artifacts are generally not available. Note, it is important to note that you can have a "statistical significance" while the bar graph shows the mean times roughly the same and error bars overlapping.
A Welch's test is computed to determine "statistical" significance.
We also check the ratio of the mean runtimes to determine "practical" significance. A cut off of 5% is assumed.
If there is both a "statistical" and "practical" significant difference in performance, the workflow outputs a message to indicate there is a performance drop. See https://github.com/antlr/grammars-v4/actions/runs/8466843554/job/23196490098#step:15:359
This PR alters the sql/plsql grammar so that the statistical test can be performed. The grammar was reformatted according to the coding style rules for the repo.

…s, and only once.

Cancel this action if (1) workflow name the same; (2) the branch is the same; (3) the PR number is the same. Otherwise, don't cancel prior jobs.

…ever in 10 minutes.

kaby76 · 2024-03-28T12:20:56Z

All set.

teverett · 2024-03-29T16:20:41Z

@kaby76 thanks!

kaby76 added 11 commits March 21, 2024 20:28

Add workflow and script to performance testing a PR.

b4fab65

Fix typo in workflow.

7a74054

Alter workflow and a grammar to test.

3cb6d97

Fix prerequisites for this script.

a8a7885

Typo.

bd71100

Typo.

7a53352

Fix global/local Trash tool issues.

3150565

Fix up Octave install and call.

9c27b4f

Typo.

51375a9

Add Octave statistics. Print out results.

f8107f0

Add in Student t test and output signifance.

70f0f7c

kaby76 changed the title ~~[build] Add performance testing of PR against current~~ [build] Add performance testing of PRs vs current Mar 23, 2024

kaby76 added 14 commits March 23, 2024 09:35

Remove Bash trasce and add printfs for no-change reuslts.

cbc67b3

Add bash echo back in to find problem in redirect. Test on .g4 change…

56634b2

…s, and only once.

Cat out the octave program to verify.

67b0ec0

Change to Welch's test, and add in 'practical' difference test of 5%.

e57beb0

Reformat longest grammar to force testing on it.

b2f00cb

Updates to test tip vs last PR.

d0ac458

Update test list computation.

140ae02

Fix typo.

30866f9

Update concurrency.

bf34dd1

Cancel this action if (1) workflow name the same; (2) the branch is the same; (3) the PR number is the same. Otherwise, don't cancel prior jobs.

Scale number or times to run test so that is min of 40 times, or what…

141979e

…ever in 10 minutes.

Fix typo.

f4b61d0

Fix for-loop.

4878b50

Convert float to int as Bash cannot handle floats well.

13fddd5

Turn off echo and remove extraneous echo.

b5cc35e

kaby76 marked this pull request as ready for review March 28, 2024 12:20

Remove temporary file.

66ffdf3

teverett merged commit c7883a5 into antlr:master Mar 29, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[build] Add performance testing of PRs vs current #4023

[build] Add performance testing of PRs vs current #4023

kaby76 commented Mar 22, 2024 •

edited

Loading

kaby76 commented Mar 28, 2024

teverett commented Mar 29, 2024

[build] Add performance testing of PRs vs current #4023

[build] Add performance testing of PRs vs current #4023

Conversation

kaby76 commented Mar 22, 2024 • edited Loading

Changes

kaby76 commented Mar 28, 2024

teverett commented Mar 29, 2024

kaby76 commented Mar 22, 2024 •

edited

Loading