Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start position discrepancy with esprima #136

Closed
msridhar opened this issue Oct 2, 2014 · 10 comments
Closed

start position discrepancy with esprima #136

msridhar opened this issue Oct 2, 2014 · 10 comments

Comments

@msridhar
Copy link

msridhar commented Oct 2, 2014

Consider the following code:

((((function () {
}()))));

According to esprima, the start column of both the ExpressionStatement and CallExpression node in the AST is 4. With acorn, the start column is 0. I don't know if there is a correct answer here, but 4 seems more right to me. Thanks for the great parser!

EDIT: see my next comment below for a more accurate characterization of Esprima's current behavior for this example.

@RReverser
Copy link
Member

AFAIK this deviation was always intentional (and I counted on that when adapting ES6 tests). There is no "correct answer" as parenthesized expression doesn't have it's representation in AST. However, it's IMO even more correct to have parenthesis counted as part of expression node which they belong to and not part of parent block.

In any case, positions have sense mostly for source maps, and both cases you'll have correct will be handled correctly by any debugger.

@marijnh
Copy link
Member

marijnh commented Oct 3, 2014

The statement definitely starts at position zero in this case. An argument could be made for the CallExpression starting at 4, but Acorn's current behavior is a response to issue #14, which interestingly enough points at Esprima for a precedent of the behavior you are now claiming Esprima does not show.

Or am I missing something?

@msridhar
Copy link
Author

msridhar commented Oct 3, 2014

Ok, so I re-tested with the latest code from esprima and acorn master. For my example above, both esprima and acorn actually agree that the start column of the ExpressionStatement is 0. esprima says the CallExpression starts at column 4, while acorn says column 0.

For the example from #14, both esprima and acorn agree that the BinaryExpression that is the test of the conditional starts at column 4. But for the Literal 1 that is parenthesized, acorn says it starts at column 4, but esprima says 5.

IMHO, for both these cases, the esprima logic for handling parentheses is more sensible. To give more context, my client is an IDE tool where I'd to be able to highlight the relevant expressions. Particularly for my original example, when highlighting the CallExpression, I'd rather not highlight the parentheses. However, I'm not sure if esprima's handling of parentheses would break @RReverser's use case.

@RReverser
Copy link
Member

It doesn't break anything for me if ask the tests will be fixed correspondingly. However I'm still not sure if in (1)+2 it's correct to make parenthesis part of BinaryExpression and not Literal that they semantically belong to.

@msridhar
Copy link
Author

msridhar commented Oct 3, 2014

It's not really clear to me where the location of parentheses should be accounted for in an "abstract" syntax tree, where presumably the function of the parentheses has been expressed in the tree's structure. Having some kind of ParenthesizedExpression would make things more explicit, but then the tree is less abstract. And that's not an option here anyway. FWIW, I can't think of a particular use case where a client interested in an expression e in the AST really needs the parentheses enclosing e to be part of its location. If someone knows of such a case, that would be interesting to hear about.

@getify
Copy link

getify commented Oct 3, 2014

I've suggested a heuristic before when discussing where to unambiguously associate (in the tree) various concrete syntax elements (whitespace, comments, unnecessary parens, etc), and I think it is worth considering:

These concrete syntax elements (that is, items which are not strictly required for program execution) should be associated as low (leaf-wise) in the tree as possible, and when applicable, left-associative (to the left, to the left...). That seems like the best and easiest location for code generation, and is probably a wash for any of the analytical use-cases.

So, to your situation, I'd say the ( ) around the 1 should associate to the 1 element, not to the statement. But, in 2 * (3 + 4), of course the ( ) has to associate to the 3 + 4 expression, not to the 3 and 4 elements.

@msridhar
Copy link
Author

msridhar commented Oct 3, 2014

@getify the issue here is that the parentheses are not explicitly associated with the 1 element; it's just that its location range includes the parentheses. If I instead write this program:

if ((    1    ) === 1) {
}

Now, acorn says the 1 literal has a start position of {"line":1,"column":4} and an end position of {"line":1,"column":15}. But, if I understand correctly, there is no other information in the AST to tell a code generator how many nested parentheses pairs to use to fill that space. Given the AST structure we have, to me it makes more sense to not include these characters in the the location range of the enclosed expression.

@getify
Copy link

getify commented Oct 3, 2014

I was talking about extending the tree structure with the additional concrete syntax elements metadata. I'm aware the standard tree doesn't keep this info.

Even if you didn't actually extend the tree to hold the data, I still think the heuristic works in terms of where to assume the data (and thus position counts) should be associated to.

@marijnh
Copy link
Member

marijnh commented Oct 8, 2014

I've pushed two patches that implement the following:

  • Parenthesized expressions now have a range that includes only the expression itself
  • Composite expressions and statements cover all of the text inside them, including leading and trailing parentheses that wrap inner expressions.

I think this should cover everybody's cases. It is still not consistent with Esprima (which will leave parentheses out of the statement nodes), but I think what I implemented is preferable. You can, for example, replace the range given for a statement with another statement in the program text, and the result will continue to make sense.

@msridhar
Copy link
Author

msridhar commented Oct 8, 2014

Actually, for my test cases, the behavior of the current master branch of acorn and esprima is now exactly the same (my original bug report was based on an older version of esprima). Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants