-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sql/plsql] Degradation of performance with recent changes #4014
Comments
Updated stats with a sample size of 40. We now "see" the clear decrease in performance with #4003. As I look at #4003, rules were added to enforce the type of the expression through syntax. This is something that is never done. Static semantics is enforced by a static analysis phase. Why was this added? Ideally, precedence of expression operators should be implemented using the Antlr "ordered alt" rather than the "chain of rules" method. Then, expressions would be far more compact and parse times much improved. But, that is another issue. |
So a solution would be to move |
The problem is that you added ambiguity to expressions because AND and OR operators are now in two rules for expressions. We now have two different ways to parse logical expressions from condition. Correcting "lax"-ness should be done via semantic analysis not syntax. |
Yes, I completely figure out the problem, now. I am quite new in the grammar area so I did it the naive way. |
- Replace antlr#4003 - Avoids performance impact (fixes antlr#4014) - Avoids semantic rules that should be done in static analysis
@lbovet @KvanTTT
There have been changes to plsql that I suspect to impact performance. I wrote a script to track performance in the grammar over the last 8 relevant PRs. The script performs a grouped-parse of the 374 .sql input files for the grammar 10 times and computes the mean and standard deviation, taking care to make sure no new input files were tested in subsequent PRs, which would add to the length of the time of test and be misleading.. The results are shown in the figure below.
I have not done a t-test, but eyeballing the data suggest that the performance took a hit with #4003, which may be caused by a large "k" lookahead being introduced. I suggest investigating whether #4003 indeed caused a performance drop.
I wouldn't say it's a large drop in performance, but I have seen grammars with significant ambiguity or large k. It's sometimes hard to know whether a change causes a catastrophic performance drop given how Github Actions only returns "pass" or "fail".
I am planning to repeat the test with a larger repetition count in order to see if that affects the standard deviation. I also plan to perform an "individual" parse test (one test input file per one run of the test application) to see how the performance was affected.
I am planning to make a PR to test performance changes with a PR.
The text was updated successfully, but these errors were encountered: