Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transforming IIFE results in extra parentheses #327

Open
nene opened this issue Oct 2, 2016 · 8 comments
Open

Transforming IIFE results in extra parentheses #327

nene opened this issue Oct 2, 2016 · 8 comments
Assignees

Comments

@nene
Copy link
Contributor

nene commented Oct 2, 2016

When transforming normal function to arrow function:

const recast = require('recast');

const ast = recast.parse('(function(){})()');
ast.program.body[0].expression.callee.type = 'ArrowFunctionExpression';

console.log(recast.print(ast).code);

the above outputs:

((() => {}))()

instead of the expected:

(() => {})()

Recast seems to correctly detect that this IIFE needs to be wrapped in parens, but fails to detect that there are already parenthesis present. When I just let Recast print a plain AST, it gets the parenthesis right:

console.log(recast.print({
    type: 'ExpressionStatement',
    expression: {
        type: 'CallExpression',
        callee: {
            type: 'ArrowFunctionExpression',
            params: [],
            body: {
                type: 'BlockStatement',
                body: [],
            }
        },
        arguments: [],
    }
}).code);

Seems to be similar to an old issue #81

@benjamn
Copy link
Owner

benjamn commented Oct 5, 2016

Recast seems to correctly detect that this IIFE needs to be wrapped in parens, but fails to detect that there are already parenthesis present.

This is exactly right. Since getting this right is tricky, Recast errs on the side of adding extra parens, instead of sometimes not parenthesizing when necessary.

Detecting whether parens are already present is tricky because most parsers do not preserve information about parentheses in the AST (Babylon is the exception). If you have any good ideas for how to detect reliably whether a given node is wrapped in parentheses, please let me know!

@nene
Copy link
Contributor Author

nene commented Oct 5, 2016

Hmm... Does this mean that there won't be this problem when running Recast with Babylon?

I've been considering switching to Babylon for some time.

@benjamn
Copy link
Owner

benjamn commented Oct 5, 2016

Using Babylon won't help, because I haven't added any Babylon-specific logic to Recast. I believe Recast should work equally well with all parsers, rather than implicitly preferring the parser most willing to defy conventions. If everyone agreed that parentheses should be preserved in the AST, I would love to get ParenthesizedExpression added to some kind of AST spec, and then Recast could rely on it here.

@nene
Copy link
Contributor Author

nene commented Oct 5, 2016

Ah... Thanks for clarification.

@Phoenixmatrix
Copy link
Contributor

Phoenixmatrix commented Feb 8, 2018

We just hit this issue too running some codemods across a few hundred repos. There's a lot of things that trigger the parenthesis issue. As @benjamn mentionned, the info is in the babylon ast but not in other ASTs.

The part that I don't understand is why this affects unchanged code. If I have a no-op transform and print (not pretty print), the parenthesis gets messed up (valid, but change the original code). Since usually recast ignores unchanged nodes, I'm a little confused as to why this is happening (reading the linked issues seems to say that this confuses recast into thinking there IS a change when there is not?).

recast seems to happily reprint anything else when I don't change the AST, no matter how wildly formatted (the ast doesn't contain every space I type either, but those aren't an issue). Why are these parenthesis special?

@Phoenixmatrix
Copy link
Contributor

Phoenixmatrix commented Feb 8, 2018

Hmm, edit to above, pasting the arrow function in the example above doesn't actually change anything (thus it works correctly) unless I modify the ast node, so I think I'm having a different issue.

@benjamn
Copy link
Owner

benjamn commented Sep 10, 2018

Although there are definitely larger issues of over-parenthesization at play here, which I am currently working to solve, I do not think the behavior reported in this issue should be considered a bug, since

(() => {}())

would not be valid syntax, and moving the () outside the larger parenthesized expression is a matter of taste and human judgment where Recast cannot safely take a position.

The ArrowFunctionExpression needs parentheses because it's the callee of a CallExpression, plain and simple. Achieving the ideal output of (() => {})() would require an additional step of actively removing an outer layer of parentheses that was already present. I think that's going "too far," since it means changing the formatting of unmodified ancestor nodes (something Recast strives not to do, in general).

With that said, if you want to make sure code like this gets re-parenthesized, you can explicitly force pretty-printing the CallExpression by setting callExpression.original = null:

const ast = recast.parse('(function(){})()');
const callExpression = ast.program.body[0].expression;
callExpression.callee.type = 'ArrowFunctionExpression';
callExpression.original = null;
console.log(recast.print(ast).code);

which will print

(() => {})();

as desired.

benjamn added a commit that referenced this issue Sep 10, 2018
Recast has suffered for a long time because it did not have reliable
access to the lexical analysis of source tokens during reprinting.

Most importantly, accurate token information could be used to detect
whether a node was originally wrapped with parentheses, even if the
parentheses are separated from the node by comments or other incidental
non-whitespace text, such as trailing commas. Here are just some of the
issues that have resulted from the lack of reliable token information:
- #533
- #528
- #513
- #512
- #366
- #327
- #286

With this change, every node in the AST returned by recast.parse will now
have a node.loc.tokens array representing the entire sequence of original
source tokens, as well as node.loc.{start,end}.token indexes into this
array of tokens, such that

  node.loc.tokens.slice(
    node.loc.start.token,
    node.loc.end.token
  )

returns a complete list of all source tokens contained by the node. Note
that some nodes (such as comments) may contain no source tokens, in which
case node.loc.start.token === node.loc.end.token, which will be the index
of the first token *after* the position where the node appeared.

Most parsers can expose token information for free / very cheaply, as a
byproduct of the parsing process. In case a custom parser is provided that
does not expose token information, we fall back to Esprima's tokenizer.
While there is considerable variation between different parsers in terms
of AST format, there is much less variation in tokenization, so the
Esprima tokenizer should be adequate in most cases (even for JS dialects
like TypeScript). If it is not adequate, the caller should simply ensure
that the custom parser exposes an ast.tokens array containing token
objects with token.loc.{start,end}.{line,column} information.
benjamn added a commit that referenced this issue Sep 10, 2018
Normally the genericPrint function will attempt to wrap its result with
parentheses if path.needsParens(), but that's not what we want if we're
printing the path in order to patch it into a location that already has
parentheses, so it's important to be able to disable this behavior in such
special cases.

Note that children printed recursively by the genericPrint[NoParens]
functions will still be wrapped with parentheses if necessary, since this
option applies to the printing of the root node only.

Should help with #327.
benjamn added a commit that referenced this issue Sep 11, 2018
Recast has suffered for a long time because it did not have reliable
access to the lexical analysis of source tokens during reprinting.

Most importantly, accurate token information could be used to detect
whether a node was originally wrapped with parentheses, even if the
parentheses are separated from the node by comments or other incidental
non-whitespace text, such as trailing commas. Here are just some of the
issues that have resulted from the lack of reliable token information:
- #533
- #528
- #513
- #512
- #366
- #327
- #286

With this change, every node in the AST returned by recast.parse will now
have a node.loc.tokens array representing the entire sequence of original
source tokens, as well as node.loc.{start,end}.token indexes into this
array of tokens, such that

  node.loc.tokens.slice(
    node.loc.start.token,
    node.loc.end.token
  )

returns a complete list of all source tokens contained by the node. Note
that some nodes (such as comments) may contain no source tokens, in which
case node.loc.start.token === node.loc.end.token, which will be the index
of the first token *after* the position where the node appeared.

Most parsers can expose token information for free / very cheaply, as a
byproduct of the parsing process. In case a custom parser is provided that
does not expose token information, we fall back to Esprima's tokenizer.
While there is considerable variation between different parsers in terms
of AST format, there is much less variation in tokenization, so the
Esprima tokenizer should be adequate in most cases (even for JS dialects
like TypeScript). If it is not adequate, the caller should simply ensure
that the custom parser exposes an ast.tokens array containing token
objects with token.loc.{start,end}.{line,column} information.
benjamn added a commit that referenced this issue Sep 11, 2018
Normally the genericPrint function will attempt to wrap its result with
parentheses if path.needsParens(), but that's not what we want if we're
printing the path in order to patch it into a location that already has
parentheses, so it's important to be able to disable this behavior in such
special cases.

Note that children printed recursively by the genericPrint[NoParens]
functions will still be wrapped with parentheses if necessary, since this
option applies to the printing of the root node only.

Should help with #327.
@nene
Copy link
Contributor Author

nene commented Sep 12, 2018

Thanks @benjamn, resetting .original on parent node solves my issue.

I also agree that Recast behaves correctly here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants