Transforming IIFE results in extra parentheses #327

nene · 2016-10-02T12:57:09Z

When transforming normal function to arrow function:

const recast = require('recast');

const ast = recast.parse('(function(){})()');
ast.program.body[0].expression.callee.type = 'ArrowFunctionExpression';

console.log(recast.print(ast).code);

the above outputs:

((() => {}))()

instead of the expected:

(() => {})()

Recast seems to correctly detect that this IIFE needs to be wrapped in parens, but fails to detect that there are already parenthesis present. When I just let Recast print a plain AST, it gets the parenthesis right:

console.log(recast.print({
    type: 'ExpressionStatement',
    expression: {
        type: 'CallExpression',
        callee: {
            type: 'ArrowFunctionExpression',
            params: [],
            body: {
                type: 'BlockStatement',
                body: [],
            }
        },
        arguments: [],
    }
}).code);

Seems to be similar to an old issue #81

The text was updated successfully, but these errors were encountered:

benjamn · 2016-10-05T15:11:00Z

Recast seems to correctly detect that this IIFE needs to be wrapped in parens, but fails to detect that there are already parenthesis present.

This is exactly right. Since getting this right is tricky, Recast errs on the side of adding extra parens, instead of sometimes not parenthesizing when necessary.

Detecting whether parens are already present is tricky because most parsers do not preserve information about parentheses in the AST (Babylon is the exception). If you have any good ideas for how to detect reliably whether a given node is wrapped in parentheses, please let me know!

nene · 2016-10-05T15:44:29Z

Hmm... Does this mean that there won't be this problem when running Recast with Babylon?

I've been considering switching to Babylon for some time.

benjamn · 2016-10-05T15:53:22Z

Using Babylon won't help, because I haven't added any Babylon-specific logic to Recast. I believe Recast should work equally well with all parsers, rather than implicitly preferring the parser most willing to defy conventions. If everyone agreed that parentheses should be preserved in the AST, I would love to get ParenthesizedExpression added to some kind of AST spec, and then Recast could rely on it here.

nene · 2016-10-05T15:57:45Z

Ah... Thanks for clarification.

Phoenixmatrix · 2018-02-08T23:18:44Z

We just hit this issue too running some codemods across a few hundred repos. There's a lot of things that trigger the parenthesis issue. As @benjamn mentionned, the info is in the babylon ast but not in other ASTs.

The part that I don't understand is why this affects unchanged code. If I have a no-op transform and print (not pretty print), the parenthesis gets messed up (valid, but change the original code). Since usually recast ignores unchanged nodes, I'm a little confused as to why this is happening (reading the linked issues seems to say that this confuses recast into thinking there IS a change when there is not?).

recast seems to happily reprint anything else when I don't change the AST, no matter how wildly formatted (the ast doesn't contain every space I type either, but those aren't an issue). Why are these parenthesis special?

Phoenixmatrix · 2018-02-08T23:23:47Z

Hmm, edit to above, pasting the arrow function in the example above doesn't actually change anything (thus it works correctly) unless I modify the ast node, so I think I'm having a different issue.

benjamn · 2018-09-10T16:36:14Z

Although there are definitely larger issues of over-parenthesization at play here, which I am currently working to solve, I do not think the behavior reported in this issue should be considered a bug, since

(() => {}())

would not be valid syntax, and moving the () outside the larger parenthesized expression is a matter of taste and human judgment where Recast cannot safely take a position.

The ArrowFunctionExpression needs parentheses because it's the callee of a CallExpression, plain and simple. Achieving the ideal output of (() => {})() would require an additional step of actively removing an outer layer of parentheses that was already present. I think that's going "too far," since it means changing the formatting of unmodified ancestor nodes (something Recast strives not to do, in general).

With that said, if you want to make sure code like this gets re-parenthesized, you can explicitly force pretty-printing the CallExpression by setting callExpression.original = null:

const ast = recast.parse('(function(){})()');
const callExpression = ast.program.body[0].expression;
callExpression.callee.type = 'ArrowFunctionExpression';
callExpression.original = null;
console.log(recast.print(ast).code);

which will print

(() => {})();

as desired.

Recast has suffered for a long time because it did not have reliable access to the lexical analysis of source tokens during reprinting. Most importantly, accurate token information could be used to detect whether a node was originally wrapped with parentheses, even if the parentheses are separated from the node by comments or other incidental non-whitespace text, such as trailing commas. Here are just some of the issues that have resulted from the lack of reliable token information: - #533 - #528 - #513 - #512 - #366 - #327 - #286 With this change, every node in the AST returned by recast.parse will now have a node.loc.tokens array representing the entire sequence of original source tokens, as well as node.loc.{start,end}.token indexes into this array of tokens, such that node.loc.tokens.slice( node.loc.start.token, node.loc.end.token ) returns a complete list of all source tokens contained by the node. Note that some nodes (such as comments) may contain no source tokens, in which case node.loc.start.token === node.loc.end.token, which will be the index of the first token *after* the position where the node appeared. Most parsers can expose token information for free / very cheaply, as a byproduct of the parsing process. In case a custom parser is provided that does not expose token information, we fall back to Esprima's tokenizer. While there is considerable variation between different parsers in terms of AST format, there is much less variation in tokenization, so the Esprima tokenizer should be adequate in most cases (even for JS dialects like TypeScript). If it is not adequate, the caller should simply ensure that the custom parser exposes an ast.tokens array containing token objects with token.loc.{start,end}.{line,column} information.

Normally the genericPrint function will attempt to wrap its result with parentheses if path.needsParens(), but that's not what we want if we're printing the path in order to patch it into a location that already has parentheses, so it's important to be able to disable this behavior in such special cases. Note that children printed recursively by the genericPrint[NoParens] functions will still be wrapped with parentheses if necessary, since this option applies to the printing of the root node only. Should help with #327.

#327 (comment)

Recast has suffered for a long time because it did not have reliable access to the lexical analysis of source tokens during reprinting. Most importantly, accurate token information could be used to detect whether a node was originally wrapped with parentheses, even if the parentheses are separated from the node by comments or other incidental non-whitespace text, such as trailing commas. Here are just some of the issues that have resulted from the lack of reliable token information: - #533 - #528 - #513 - #512 - #366 - #327 - #286 With this change, every node in the AST returned by recast.parse will now have a node.loc.tokens array representing the entire sequence of original source tokens, as well as node.loc.{start,end}.token indexes into this array of tokens, such that node.loc.tokens.slice( node.loc.start.token, node.loc.end.token ) returns a complete list of all source tokens contained by the node. Note that some nodes (such as comments) may contain no source tokens, in which case node.loc.start.token === node.loc.end.token, which will be the index of the first token *after* the position where the node appeared. Most parsers can expose token information for free / very cheaply, as a byproduct of the parsing process. In case a custom parser is provided that does not expose token information, we fall back to Esprima's tokenizer. While there is considerable variation between different parsers in terms of AST format, there is much less variation in tokenization, so the Esprima tokenizer should be adequate in most cases (even for JS dialects like TypeScript). If it is not adequate, the caller should simply ensure that the custom parser exposes an ast.tokens array containing token objects with token.loc.{start,end}.{line,column} information.

Normally the genericPrint function will attempt to wrap its result with parentheses if path.needsParens(), but that's not what we want if we're printing the path in order to patch it into a location that already has parentheses, so it's important to be able to disable this behavior in such special cases. Note that children printed recursively by the genericPrint[NoParens] functions will still be wrapped with parentheses if necessary, since this option applies to the printing of the root node only. Should help with #327.

#327 (comment)

nene · 2018-09-12T15:15:40Z

Thanks @benjamn, resetting .original on parent node solves my issue.

I also agree that Recast behaves correctly here.

nene mentioned this issue Oct 2, 2016

Transforming IIFE results in extra parentheses lebab/lebab#187

Closed

benjamn mentioned this issue Nov 18, 2016

Doesn't preserve original source for a multiline expression with comments #342

Open

xixixao mentioned this issue Jun 4, 2017

Use Babylon instead of outdated babel-core parser facebookarchive/nuclide-format-js#11

Merged

Retsam mentioned this issue Aug 12, 2018

Transforming destructure reassignment results in extra parenthesis #533

Open

benjamn self-assigned this Sep 10, 2018

benjamn added a commit that referenced this issue Sep 10, 2018

Add a test that captures the intended behavior in issue #327.

64107e9

#327 (comment)

benjamn mentioned this issue Sep 10, 2018

Use node.loc.tokens to improve handling of parentheses. #537

Merged

8 tasks

benjamn added a commit that referenced this issue Sep 11, 2018

Add a test that captures the intended behavior in issue #327.

a50a5dd

#327 (comment)

NickHeiner mentioned this issue May 7, 2021

Over-parenthesization with ArrowFunctionExpression #914

Open

NickHeiner mentioned this issue Sep 8, 2021

Parens are dropped, changing program semantics / introducing syntax error #985

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transforming IIFE results in extra parentheses #327

Transforming IIFE results in extra parentheses #327

nene commented Oct 2, 2016

benjamn commented Oct 5, 2016

nene commented Oct 5, 2016

benjamn commented Oct 5, 2016

nene commented Oct 5, 2016

Phoenixmatrix commented Feb 8, 2018 •

edited

Phoenixmatrix commented Feb 8, 2018 •

edited

benjamn commented Sep 10, 2018 •

edited

nene commented Sep 12, 2018

Transforming IIFE results in extra parentheses #327

Transforming IIFE results in extra parentheses #327

Comments

nene commented Oct 2, 2016

benjamn commented Oct 5, 2016

nene commented Oct 5, 2016

benjamn commented Oct 5, 2016

nene commented Oct 5, 2016

Phoenixmatrix commented Feb 8, 2018 • edited

Phoenixmatrix commented Feb 8, 2018 • edited

benjamn commented Sep 10, 2018 • edited

nene commented Sep 12, 2018

Phoenixmatrix commented Feb 8, 2018 •

edited

Phoenixmatrix commented Feb 8, 2018 •

edited

benjamn commented Sep 10, 2018 •

edited