-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] possible optimization of ParserATNSimulator #2366
Comments
You should definitely make a PR instead. Otherwise reviewing the code is too difficult. |
Yeah, you could check it out now #2471 |
@garzon I've compiled your patch in and to my surprise that code isn't even called once in my parser (MySQL, which is fairly complex and large). So I cannot validate the speed improvements you claimed. I see you significantly lowered the number of copies that are done for All ANTLR4 tests passed in your PR, but comparing the entire build times (which include all unit test execution times) with previous builds don't show improvements, e.g. Your optimization: The last successful build from my ANTLR4 fork That's 2% for the LEXER part (which actually does not use the ParserATNSimulator), while the PARSER group even performed 23% slower. That shouldn't be overvalued, since there can be other influences for the time results, but it's an indication. Can you provide a prove where we can see the improvement? What's also interesting is that |
Forgot to mention that, in fact in my use case MOST of the text being parsed contains syntax errors(or, not being parsed as the language it actually is) so it is really a big deal, at least in my case. I have such millions data and parsing them as MySQL and other programming languages (to detect the actual language of the text), and it shows 15%+ improvements. It seems that the times of the CI tests are quite unstable if you check out the times in other code-irrelevant PRs. And, you can find the actual time of execution(without initializing the CI test environment) by clicking in to check the "Job Log" and find the times are almost the same in current and patched version, which makes sense if the text being parsed are all valid, according to your description above. So I think a formal benchmarking is necessary in both situations(text with and without syntax errors) |
OK, I see. However, we need a prove that the change doesn't break functionality. Best would be you add a test case to the test suite dedicated to this problem. Unfortunately, adding a test case and running the test suite is a pretty lengthy process (even though @parrt improved it significantly). Writing the test will probably take significantly more time than what was required to write the patch. |
📝 This is not really a change that can be verified by testing. It will require a manual review to verify that the new algorithm is either equivalent to the original or acceptably correct. |
OK, then this issue can be closed. Everything else is handled in the PR. |
Closing and jumping over to PR. |
Implements the optimization identified by @garzon. See antlr#2366
I have found that
antlr4::atn::ATNConfigSet::add
takes 25.76% of whole CPU time in my parsing-intensive program. And I am wondering if I can modifyParserATNSimulator.cpp
(line 726) to the following, whichsignificantly reduces the time of
antlr4::atn::ATNConfigSet::add
to 8.76%. In this way the time-consuming memory allocation and hash calculating and tracking ofconfig
inATNConfigSet::add
could be prevented.If there is no problem with it, I could make a PR later.
The text was updated successfully, but these errors were encountered: