-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[many grammars] Useless parentheses. #3403
Comments
|
|
The latest version of the "find-useless" Bash/Trash script for finding useless parentheses is now in a repo here. |
Hi! I have the following thoughts:
|
BTW, interesting, the text within UPDATE: got it. @kaby76 please use double line break after opening
|
@KvanTTT It is a minor issue, but it adds to the readability of the grammar. I doubt--as well--that it would affect the performance in any way. But, I haven't compared the generated parser/lexer code to see if there is a difference. I would prefer to create a commit based on a functional basis ("one PR for fixing all useless parentheses in all grammars") rather than a per-grammar basis ("PR for abb, PR for abnf, ..., PR for z). It's a lot of busy work all around. And, the changes can easily be done in a few minutes with Trash, and verified via one build of a few hours. The alternative is to add the check to the build and output warnings. So, whenever the grammar is next changed, the check is run automatically.
First, this is not at all a string-based algorithm! If anything, you could say this is a parse tree-based approach. Grammars are parsed, a parse tree is produced, and XPath expressions are evaluated to find useless parentheses. The fact is that operating against the PT works. But, only if the patterns are correct. The patterns are declarative and at the level of the grammar. This is exactly the right way to write these kinds of algorithms. The main problem with the check is writing XPath expressions to find the useless parentheses. But, this is because the grammar is poorly written! Your observation that this should be as an AST is in the right direction, but unnecessary. (And, after studying compiler construction for 40+ years, I would boldly state that ASTs in compiler implementation have been one of the greatest failures and impediments toward progress. Another data representation that disassociates syntax and semantics adds more complexity, more errors, etc. At the very least, ASTs should always have an inverse mapping back to the PT.) The problem with the current grammar is that the precedence and associativity of regular expressions in implemented manually using a chain of rules rather than in "Antlr style". If it were implemented in "Antlr style", then the pattern would be far simpler: "If the precedence of the child is higher than the precedence of the parent of the 'block' containing parentheses, then remove the 'block' node of the tree". As an example of what I mean, consider the arithmetic expression "(1*2)+3". The tree for an "Antlr style" grammar implementation would be: We know that the parentheses are unnecessary because the precedence of "*" is greater than "+". That all said, in other systems like Rascal and ASF+SDF, patterns are strings that are converted to parse trees, which makes the pattern easier to write. For now, I've only implemented XPath expressions. |
There are useless parentheses spread across many grammars in the repo, which I've been trying to correct. Here is a list as produced by a Trash script (see the following comments).
This is the script used to find useless parentheses. I could not have come up with the various patterns without the valuable help from and keen eyes of @KvanTTT and @msagca.
There is probably a simple pattern based on operator precedence to know when it is safe to remove parentheses. For now, what I wrote will do.
(Updated 5/14/23 8AM EST.)
The text was updated successfully, but these errors were encountered: