New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transforming IIFE results in extra parentheses #327
Comments
This is exactly right. Since getting this right is tricky, Recast errs on the side of adding extra parens, instead of sometimes not parenthesizing when necessary. Detecting whether parens are already present is tricky because most parsers do not preserve information about parentheses in the AST (Babylon is the exception). If you have any good ideas for how to detect reliably whether a given node is wrapped in parentheses, please let me know! |
Hmm... Does this mean that there won't be this problem when running Recast with Babylon? I've been considering switching to Babylon for some time. |
Using Babylon won't help, because I haven't added any Babylon-specific logic to Recast. I believe Recast should work equally well with all parsers, rather than implicitly preferring the parser most willing to defy conventions. If everyone agreed that parentheses should be preserved in the AST, I would love to get |
Ah... Thanks for clarification. |
We just hit this issue too running some codemods across a few hundred repos. There's a lot of things that trigger the parenthesis issue. As @benjamn mentionned, the info is in the babylon ast but not in other ASTs. The part that I don't understand is why this affects unchanged code. If I have a no-op transform and print (not pretty print), the parenthesis gets messed up (valid, but change the original code). Since usually recast ignores unchanged nodes, I'm a little confused as to why this is happening (reading the linked issues seems to say that this confuses recast into thinking there IS a change when there is not?). recast seems to happily reprint anything else when I don't change the AST, no matter how wildly formatted (the ast doesn't contain every space I type either, but those aren't an issue). Why are these parenthesis special? |
Hmm, edit to above, pasting the arrow function in the example above doesn't actually change anything (thus it works correctly) unless I modify the ast node, so I think I'm having a different issue. |
Although there are definitely larger issues of over-parenthesization at play here, which I am currently working to solve, I do not think the behavior reported in this issue should be considered a bug, since (() => {}()) would not be valid syntax, and moving the The With that said, if you want to make sure code like this gets re-parenthesized, you can explicitly force pretty-printing the const ast = recast.parse('(function(){})()');
const callExpression = ast.program.body[0].expression;
callExpression.callee.type = 'ArrowFunctionExpression';
callExpression.original = null;
console.log(recast.print(ast).code); which will print (() => {})(); as desired. |
Recast has suffered for a long time because it did not have reliable access to the lexical analysis of source tokens during reprinting. Most importantly, accurate token information could be used to detect whether a node was originally wrapped with parentheses, even if the parentheses are separated from the node by comments or other incidental non-whitespace text, such as trailing commas. Here are just some of the issues that have resulted from the lack of reliable token information: - #533 - #528 - #513 - #512 - #366 - #327 - #286 With this change, every node in the AST returned by recast.parse will now have a node.loc.tokens array representing the entire sequence of original source tokens, as well as node.loc.{start,end}.token indexes into this array of tokens, such that node.loc.tokens.slice( node.loc.start.token, node.loc.end.token ) returns a complete list of all source tokens contained by the node. Note that some nodes (such as comments) may contain no source tokens, in which case node.loc.start.token === node.loc.end.token, which will be the index of the first token *after* the position where the node appeared. Most parsers can expose token information for free / very cheaply, as a byproduct of the parsing process. In case a custom parser is provided that does not expose token information, we fall back to Esprima's tokenizer. While there is considerable variation between different parsers in terms of AST format, there is much less variation in tokenization, so the Esprima tokenizer should be adequate in most cases (even for JS dialects like TypeScript). If it is not adequate, the caller should simply ensure that the custom parser exposes an ast.tokens array containing token objects with token.loc.{start,end}.{line,column} information.
Normally the genericPrint function will attempt to wrap its result with parentheses if path.needsParens(), but that's not what we want if we're printing the path in order to patch it into a location that already has parentheses, so it's important to be able to disable this behavior in such special cases. Note that children printed recursively by the genericPrint[NoParens] functions will still be wrapped with parentheses if necessary, since this option applies to the printing of the root node only. Should help with #327.
Recast has suffered for a long time because it did not have reliable access to the lexical analysis of source tokens during reprinting. Most importantly, accurate token information could be used to detect whether a node was originally wrapped with parentheses, even if the parentheses are separated from the node by comments or other incidental non-whitespace text, such as trailing commas. Here are just some of the issues that have resulted from the lack of reliable token information: - #533 - #528 - #513 - #512 - #366 - #327 - #286 With this change, every node in the AST returned by recast.parse will now have a node.loc.tokens array representing the entire sequence of original source tokens, as well as node.loc.{start,end}.token indexes into this array of tokens, such that node.loc.tokens.slice( node.loc.start.token, node.loc.end.token ) returns a complete list of all source tokens contained by the node. Note that some nodes (such as comments) may contain no source tokens, in which case node.loc.start.token === node.loc.end.token, which will be the index of the first token *after* the position where the node appeared. Most parsers can expose token information for free / very cheaply, as a byproduct of the parsing process. In case a custom parser is provided that does not expose token information, we fall back to Esprima's tokenizer. While there is considerable variation between different parsers in terms of AST format, there is much less variation in tokenization, so the Esprima tokenizer should be adequate in most cases (even for JS dialects like TypeScript). If it is not adequate, the caller should simply ensure that the custom parser exposes an ast.tokens array containing token objects with token.loc.{start,end}.{line,column} information.
Normally the genericPrint function will attempt to wrap its result with parentheses if path.needsParens(), but that's not what we want if we're printing the path in order to patch it into a location that already has parentheses, so it's important to be able to disable this behavior in such special cases. Note that children printed recursively by the genericPrint[NoParens] functions will still be wrapped with parentheses if necessary, since this option applies to the printing of the root node only. Should help with #327.
Thanks @benjamn, resetting I also agree that Recast behaves correctly here. |
When transforming normal function to arrow function:
the above outputs:
instead of the expected:
Recast seems to correctly detect that this IIFE needs to be wrapped in parens, but fails to detect that there are already parenthesis present. When I just let Recast print a plain AST, it gets the parenthesis right:
Seems to be similar to an old issue #81
The text was updated successfully, but these errors were encountered: