Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul comment attachment #13521

Merged
merged 35 commits into from Jul 7, 2021
Merged

Conversation

@JLHwung
Copy link
Contributor

@JLHwung JLHwung commented Jun 29, 2021

Q                       A
Fixed Issues? Some comments after } / + are mis attached as leading/trailing comments. Fixes #11576, closes #12560
Patch: Bug Fix?
Major: Breaking Change?
Minor: New Feature?
Tests Added + Pass? Yes
Documentation PR Link
Any Dependency Changes?
License MIT

Abstract

This PR overhauls current comment attachments. The updated test fixtures are all bugfixes. It is faster than current approach in general cases.

Design docs

Benchmark Results

leading comments + trailing comments (common use case, O(n^2) -> O(n))
// c
a
// c
a
// c
...
a
baseline 128 leading comments + 127 trailing comments: 4_155 ops/sec ±2.05% (0.241ms)
baseline 256 leading comments + 255 trailing comments: 1_792 ops/sec ±2.07% (0.558ms)
baseline 512 leading comments + 511 trailing comments: 669 ops/sec ±2.2% (1.495ms)
baseline 1024 leading comments + 1023 trailing comments: 217 ops/sec ±2.08% (4.599ms)
current 128 leading comments + 127 trailing comments: 9_686 ops/sec ±56.64% (0.103ms)
current 256 leading comments + 255 trailing comments: 6_698 ops/sec ±0.52% (0.149ms)
current 512 leading comments + 511 trailing comments: 3_303 ops/sec ±0.38% (0.303ms)
current 1024 leading comments + 1023 trailing comments: 1_535 ops/sec ±1.84% (0.651ms)
leading comments (10%)
// c
// c
...
// c
{}
baseline 128 leading comments: 77_947 ops/sec ±6.78% (0.013ms)
baseline 256 leading comments: 43_160 ops/sec ±2.44% (0.023ms)
baseline 512 leading comments: 22_572 ops/sec ±2.22% (0.044ms)
baseline 1024 leading comments: 10_918 ops/sec ±2.47% (0.092ms)
current 128 leading comments: 83_604 ops/sec ±8.04% (0.012ms)
current 256 leading comments: 46_832 ops/sec ±2.47% (0.021ms)
current 512 leading comments: 26_373 ops/sec ±1.53% (0.038ms)
current 1024 leading comments: 12_952 ops/sec ±2.26% (0.077ms)
nested leading comments (O(n^2) -> O(n))
// c
{
// c
{
...
// c
{}} ... }
baseline 128 nested leading comments: 9_454 ops/sec ±35.7% (0.106ms)
baseline 256 nested leading comments: 4_724 ops/sec ±1.6% (0.212ms)
baseline 512 nested leading comments: 1_755 ops/sec ±2.16% (0.57ms)
baseline 1024 nested leading comments: 348 ops/sec ±2.33% (2.871ms)
current 128 nested leading comments: 11_226 ops/sec ±32.57% (0.089ms)
current 256 nested leading comments: 6_470 ops/sec ±1.47% (0.155ms)
current 512 nested leading comments: 2_834 ops/sec ±0.63% (0.353ms)
current 1024 nested leading comments: 1_380 ops/sec ±0.99% (0.725ms)
trailing comments (marginal improvements)
{{ ... {}// c
} // c
} // c
...
} // c
baseline 128 trailing comments: 83_635 ops/sec ±8.24% (0.012ms)
baseline 256 trailing comments: 40_747 ops/sec ±2.67% (0.025ms)
baseline 512 trailing comments: 24_274 ops/sec ±2.11% (0.041ms)
baseline 1024 trailing comments: 10_469 ops/sec ±3.81% (0.096ms)
current 128 trailing comments: 73_240 ops/sec ±11.22% (0.014ms)
current 256 trailing comments: 42_596 ops/sec ±3.48% (0.023ms)
current 512 trailing comments: 25_424 ops/sec ±2.47% (0.039ms)
current 1024 trailing comments: 13_662 ops/sec ±0.98% (0.073ms)
nested trailing comments (25% slower)
{{ ... {
} // c
} // c
...
} // c
baseline 128 nested trailing comments: 11_380 ops/sec ±52.74% (0.088ms)
baseline 256 nested trailing comments: 8_513 ops/sec ±0.63% (0.117ms)
baseline 512 nested trailing comments: 3_777 ops/sec ±1.83% (0.265ms)
baseline 1024 nested trailing comments: 1_806 ops/sec ±0.97% (0.554ms)
current 128 nested trailing comments: 10_530 ops/sec ±39.15% (0.095ms)
current 256 nested trailing comments: 6_344 ops/sec ±1.14% (0.158ms)
current 512 nested trailing comments: 3_064 ops/sec ±1.05% (0.326ms)
current 1024 nested trailing comments: 1_332 ops/sec ±1.03% (0.751ms)

I think it is acceptable since trailing-only comments is rare. Most comments are both leading / trailing comments of adjacent AST nodes.

inner comments (10%)
[ // c
// c
...
// c
]
baseline 128 inner comments: 74_560 ops/sec ±9.44% (0.013ms)
baseline 256 inner comments: 43_778 ops/sec ±1.93% (0.023ms)
baseline 512 inner comments: 23_776 ops/sec ±1.6% (0.042ms)
baseline 1024 inner comments: 11_676 ops/sec ±1.5% (0.086ms)
current 128 inner comments: 86_420 ops/sec ±9.67% (0.012ms)
current 256 inner comments: 48_437 ops/sec ±1.56% (0.021ms)
current 512 inner comments: 25_333 ops/sec ±1.42% (0.039ms)
current 1024 inner comments: 13_312 ops/sec ±2.39% (0.075ms)
nested inner comments (current main incorrectly produces leading comments)
[, // c
[, // c
...
[, // c
]] ... ]
baseline 128 nested inner comments: 6_698 ops/sec ±66.03% (0.149ms)
baseline 256 nested inner comments: 4_459 ops/sec ±1.21% (0.224ms)
baseline 512 nested inner comments: 1_565 ops/sec ±1.62% (0.639ms)
baseline 1024 nested inner comments: 519 ops/sec ±2% (1.927ms)
current 128 nested inner comments: 8_711 ops/sec ±63.38% (0.115ms)
current 256 nested inner comments: 6_451 ops/sec ±1.39% (0.155ms)
current 512 nested inner comments: 3_061 ops/sec ±0.85% (0.327ms)
current 1024 nested inner comments: 1_276 ops/sec ±1.18% (0.784ms)
many identifiers (without comments, 15%)
a;a; ... a;
baseline 64 length-1 identifiers: 18_321 ops/sec ±84.1% (0.055ms)
baseline 128 length-1 identifiers: 17_076 ops/sec ±1.72% (0.059ms)
baseline 256 length-1 identifiers: 8_546 ops/sec ±1.64% (0.117ms)
baseline 512 length-1 identifiers: 4_339 ops/sec ±2.54% (0.23ms)
baseline 1024 length-1 identifiers: 2_185 ops/sec ±1.39% (0.458ms)
current 64 length-1 identifiers: 23_844 ops/sec ±82.38% (0.042ms)
current 128 length-1 identifiers: 21_509 ops/sec ±1.75% (0.046ms)
current 256 length-1 identifiers: 9_722 ops/sec ±3.6% (0.103ms)
current 512 length-1 identifiers: 4_970 ops/sec ±1.77% (0.201ms)
current 1024 length-1 identifiers: 2_404 ops/sec ±1.6% (0.416ms)
@babel-bot
Copy link
Collaborator

@babel-bot babel-bot commented Jun 29, 2021

Build successful! You can test your changes in the REPL here: https://babeljs.io/repl/build/47263/

@codesandbox
Copy link

@codesandbox codesandbox bot commented Jun 29, 2021

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit 4f076f8:

Sandbox Source
babel-repl-custom-plugin Configuration
babel-plugin-multi-config Configuration

Copy link
Member

@kaicataldo kaicataldo left a comment

Amazing work! 🎉

@@ -0,0 +1 @@
foo /* 1 */ (/* 2 */)

This comment has been minimized.

@JLHwung

JLHwung Jun 29, 2021
Author Contributor

The comment /* 2 */ is now an innerComment of CallExpression, which is not bad since 1) it is lost on current main and 2) we don't have AST structures for the parenthesis token.

* @property {Array<Comment>} comments - the containing comments
* @property {Node | null} leadingNode - the immediately preceding AST node of the whitespace token
* @property {Node | null} trailingNode - the immediately following AST node of the whitespace token
* @property {Node | null} containerNode - the outermost AST node containing the whitespace

This comment has been minimized.

@JLHwung

JLHwung Jun 29, 2021
Author Contributor

I think in the future we can change the behaviour to have innerComments attached to the innermost AST node. So in the example foo (/* 2 */), /* 2 */ will the an innerComments of call expression. The idea is to provide the generator more information about how to insert innerComments.

Or should we change in this PR now?

Edit: I realize I have to implement such change as required by the trailing comma comment adjustments. I am surprised that changing outermost to innermost does not break any old tests. I added new tests to capture this behaviour change.

@JLHwung JLHwung marked this pull request as ready for review Jun 30, 2021
@kaicataldo
Copy link
Member

@kaicataldo kaicataldo commented Jun 30, 2021

Sorry, didn't realize it wasn't ready for review 😅

@JLHwung JLHwung marked this pull request as draft Jun 30, 2021
@JLHwung
Copy link
Contributor Author

@JLHwung JLHwung commented Jun 30, 2021

Converted back to draft as I just find new bugs: e.g. Babel does not attach comments after a DirectiveLabel:

"use strict"/* foo */;

@JLHwung JLHwung marked this pull request as ready for review Jun 30, 2021
@nicolo-ribaudo nicolo-ribaudo self-requested a review Jun 30, 2021
@KFlash
Copy link

@KFlash KFlash commented Jul 1, 2021

@JLHwung Kataw have a bunch of comment tests you can find here. In fact Kataw is the only one that get all comments 100% correct,, but I havent attached all of them yet.

I still think you will have issues with class semicolon and elisons in array literal and pattern. There are also some other edge cases.

This is basically because of how the Babel AST is designed, but you can work around this internally if you e.g use a dictionary lookup table and save all loc pos in that one and do a comparison against the "real nodes" when you try to attach. That way you would get the location for cases like [,,,, /*babel */ ,,,,] because you are missing an AST node here - elison. See the ECMA specs.

The same for this case class x { ;;/*1*/;;; }. But a lookup table for comment pos should solve it. Babel set null here instead of the real class element semicolon. Once again see the ECMA specs.

@KFlash
Copy link

@KFlash KFlash commented Jul 1, 2021

@JLHwung I forgot that you may suffer with trailing comments if Babel in the future should allow optional trailing comments as in Prettier.
You may need to extend your algorithm to fix this. [a/*1*/] and [a,/*1*/]

Typescript have a few edge cases with and without comments where trailing comma is required. Not sure how Babel works around it.

Prettier suffer from bugs when it comes to comments and trailing comma and doesn't attach comments in 8 out of 10 cases so it may be a win case for Babel if getting it right.

For example function* a(b, c, d/*1*/,/*2*/) { } is attached wrong in Babel so it looks like there exist trailing comma issues

Copy link
Member

@nicolo-ribaudo nicolo-ribaudo left a comment

I only read the PR description so far. We should probably link this PR in the code, or add a src/parser/comments.md file, so that whenever someone will have to modify the algorithm they can read the rationale behind it.

  1. (P2) should probably be

    w1.start ≤ w2.start ≤ w1.end
    
  2. What does (P3) mean for an input code which is just foo (without any spaces or newlines)?

unattachedCommentStack.splice(i, 1);
} else if (node.type !== "Program") {
// we have a node share the same length of containerNode, but its finishNode is invoked later
// than containerNode, so this node is the outer node. E.g. ExpressionStatement contains a VariableDeclaration

This comment has been minimized.

@nicolo-ribaudo

nicolo-ribaudo Jul 1, 2021
Member

Nit: ExpressionStatement cannot contain a VariableDeclaration, but maybe an AssignmentExpression?

This comment has been minimized.

@JLHwung

JLHwung Jul 2, 2021
Author Contributor

Yes! You are right. This branch is removed so we are all good here.

case "ObjectPattern":
this.adjustInnerComments(node, node.properties, comments);
break;
case "CallExpression":

This comment has been minimized.

@nicolo-ribaudo

nicolo-ribaudo Jul 1, 2021
Member

We also need to do this for OptionalCallExpression, and maybe for function definitions?

This comment has been minimized.

@JLHwung

JLHwung Jul 2, 2021
Author Contributor

Yes it is addressed in e5f09a5.

Just in case, are you reviewing on old diff?

This comment has been minimized.

@nicolo-ribaudo

nicolo-ribaudo Jul 2, 2021
Member

Uh I think I was accidentally reviewing a specific commit 😅

@KFlash
Copy link

@KFlash KFlash commented Jul 1, 2021

The main issue is the performance. I see the benchmarks, but in real life. How many source code files per second does this generator perform? And how much memory is consumed?

Seems to be too much. Still using a class to handle comments, unnecessary iterations that could have been done in one go etc.

@KFlash
Copy link

@KFlash KFlash commented Jul 2, 2021

Playing around with Babel REPL and found lots of comment issues

Here is a few

switch (c) { /*1*/ } /*2*/
switch (c) { /*1*/ } /*2*/
switch (c)/*1*/ { /*2*/ } /*3*/
x(/*1*/)
(a(/*1*/))
(a())(a(/*1*/))
[,,,/*1*/,,,,]

// should not break 
"string"; /*1*/

@JLHwung
Copy link
Contributor Author

@JLHwung JLHwung commented Jul 2, 2021

@nicolo-ribaudo

What does (P3) mean for an input code which is just foo (without any spaces or newlines)?

Good question! In that case the list of comment whitespaces is an empty set. P3 is then deduced to nothingness.

We should probably link this PR in the code, or add a src/parser/comments.md file

I can add it to the docs/ directory.

@KFlash

How many source code files per second does this generator perform?

I run the performance test on https://github.com/babel/parser_performance. When parsing es5/angular, this branch is faster than 7.14.7. The repo has not been updated for long, I should have used seafox here.

fixture acorn Babel 7.14.7 This branch meriyah
es5/angular.js 28.51 ops/sec (27ms) 17.41 ops/sec (38ms) 24.19 ops/sec (28ms) 28.21 ops/sec (24ms)

I have added a new identifier benchmark result (without any comments). Because the overhead of comment tracking is reduced, the new algorithm is 15% faster than current one. The identifier benchmark predict the performance improvements on es5/angular.

Playing around with Babel REPL and found lots of comment issues

Thanks! I checked the cases and the comments are now all attached to AST but @babel/generator does not print them. Before this PR the innerComments does not work quite well so I won't be surprised if @babel/generator does not print them. The issue will be addressed in separate PRs.

// should not break 
"string"; /*1*/

Why? I think they are equivalent. Babel generator does not preserve the whitespace.

@KFlash
Copy link

@KFlash KFlash commented Jul 2, 2021

@JLHwung This case is a trailing of the empty stmt. After removing the semicolon ";" - it should still be the same, but be a "trailing of string literal". In current REPL a line break is inserted and the comment is no longer a trailing comment. It's a detached comment.

// should not break 
"string"; /*1*/

Babel REPL

"use strict";
"string";
/*1*/

Why using Acorn and Meriyah in the benchmark? They doesn't have any printer / generator support. And this PR is about Babel generator and internal code?

Why not add Kataw to the benchmark? The printer is located here.

@JLHwung
Copy link
Contributor Author

@JLHwung JLHwung commented Jul 2, 2021

@KFlash This PR is focused on parser because before this PR we failed to attach comments in edge cases, so we should prioritize the parser part. The generator support would be addressed later.

Babel parses ; as empty statement. So

; /*1*/

is an empty statement with trailing comments.

@KFlash
Copy link

@KFlash KFlash commented Jul 2, 2021

The Kataw parser can be located here in case you want to add it ;)

Regarding the comment. It was printed wrong in the REPL but maybe parsed correctly. Compare against Prettier. They also handle it as an trailing comment. Not a detached one.

@JLHwung
Copy link
Contributor Author

@JLHwung JLHwung commented Jul 2, 2021

@KFlash I think the comment attachment can only be perfectly handled on a CST where every non-whitespace token has a node representation to which a comment can be attached. The ideal implementation should be language-agnostic as long as space and non-space are well defined.

If Babel had sort of TrailingCommaWrapper AST node, we could get rid of the current adjustments for inner comments. Here it is the extra cost of addressing comment attachment on AST.

@KFlash
Copy link

@KFlash KFlash commented Jul 2, 2021

After my opinion there is a "design flaw" in all AST parser when it comes to loc tracking. It shouldn't count whitespace at all. That way you have more control, and you can use the start / end value of each AST / CST node in the printer to use a separated WS skipping that starts at given position and ends if hit a token that is not whitespace.
Then there is no need to attach this to AST either and you can very easy collect all comments that is between two tokens except for "lists".

Lists is defined in the ECMA specs and you would need an extra AST node to get correct loc position after the list token has been consumed. E.g. '(', '{' etc.

This is out of scope for Babel, but the only way you can get comments 100% correct.

You can even do a "slice" between end loc of previous node and start loc of current node to collect the whitespace with comments if you want a 1:1 printing.

This isn't possible with AST parsers.

With this kind of algorithm you will not experience any overhead either if you want to "collect" a specific comment. You can do that directly in the lexer. See here how I collect a single line ignore comment. No slice or any string manipulation.

@KFlash
Copy link

@KFlash KFlash commented Jul 3, 2021

@JLHwung Here is close to 200 comment tests. This PR should parse them all.

@nicolo-ribaudo Did you validate all possible comment attachment combinations before approved this PR? Check the test I linked too. This PR still fails on 2 out of 5 of this tests, so there are still things to fix.

@JLHwung
Copy link
Contributor Author

@JLHwung JLHwung commented Jul 3, 2021

@KFlash Can you offer a list of failing tests? If the comment is not attached to AST nodes, it should be addressed in this PR. Otherwise it is a generator bug and will be addressed later.

@KFlash
Copy link

@KFlash KFlash commented Jul 3, 2021

@JLHwung Just test all the 200 tests I linked too. You can see if the comments are attached or not. As said earlier 2 out of 5 cases the comments are not attached.
Why do you need an list? Just add all the tests to Babel and you can see witch one is failing.

You can find a sub-folder in the tests I linked too - "Babel issues". I haven't tested if they are fixed with this PR

@JLHwung JLHwung force-pushed the JLHwung:overhaul-comment-attachment branch from b256e51 to 4f076f8 Jul 7, 2021
@@ -0,0 +1,129 @@
# Comment attachment

This comment has been minimized.

@JLHwung JLHwung merged commit 79d3276 into babel:main Jul 7, 2021
24 of 26 checks passed
24 of 26 checks passed
@github-actions
Prepare Cache
Details
@github-actions
Validate Yarn dependencies and constraints
Details
@github-actions
Test on Node.js Latest
Details
@github-actions
Build Babel Artifacts
Details
@github-actions
Test Babel 8 breaking changes
Details
@github-actions
Publish to local Verdaccio registry
Details
@github-actions
Lint
Details
@github-actions
Test on Node.js (14)
Details
@github-actions
Test on Node.js (12)
Details
@github-actions
Test on Node.js (10)
Details
@github-actions
Test on Node.js (8)
Details
@github-actions
Test on Node.js (6)
Details
@github-actions
Test on Windows
Details
@github-actions
Third-party Parser Tests
Details
@github-actions
Test @babel/runtime integrations
Details
@github-actions
E2E (babel)
Details
@github-actions
E2E (babel-old-version)
Details
@github-actions
E2E (create-react-app)
Details
@github-actions
E2E (vue-cli)
Details
@github-actions
E2E (jest)
Details
@circleci-checks
e2e-breaking-pr Workflow: e2e-breaking-pr
Details
@circleci-checks
test262-pr Workflow: test262-pr
Details
@gitpod-io
Gitpod Open an online workspace in Gitpod
Details
@circleci-checks
build-standalone Workflow: build-standalone
Details
@codesandbox
ci/codesandbox Building packages succeeded.
Details
@codecov
codecov/project 92.04% (target 90.00%)
Details
@JLHwung JLHwung deleted the JLHwung:overhaul-comment-attachment branch Jul 7, 2021
nicolo-ribaudo added a commit to nicolo-ribaudo/babel that referenced this pull request Jul 30, 2021
* refactor: inline pushComment

* chore: add benchmark cases

* perf: overhaul comment attachment

* cleanup

* update test fixtures

They are all bugfixes.

* fix: merge HTMLComment parsing to skipSpace

* perf: remove unattachedCommentStack

baseline 128 nested leading comments: 11_034 ops/sec ±50.64% (0.091ms)
baseline 256 nested leading comments: 6_037 ops/sec ±11.46% (0.166ms)
baseline 512 nested leading comments: 3_077 ops/sec ±2.31% (0.325ms)
baseline 1024 nested leading comments: 1_374 ops/sec ±3.22% (0.728ms)
current 128 nested leading comments: 11_027 ops/sec ±37.41% (0.091ms)
current 256 nested leading comments: 6_736 ops/sec ±1.39% (0.148ms)
current 512 nested leading comments: 3_306 ops/sec ±0.69% (0.302ms)
current 1024 nested leading comments: 1_579 ops/sec ±2.09% (0.633ms)

baseline 128 nested trailing comments: 10_073 ops/sec ±42.95% (0.099ms)
baseline 256 nested trailing comments: 6_294 ops/sec ±2.19% (0.159ms)
baseline 512 nested trailing comments: 3_041 ops/sec ±0.8% (0.329ms)
baseline 1024 nested trailing comments: 1_530 ops/sec ±1.18% (0.654ms)
current 128 nested trailing comments: 11_461 ops/sec ±44.89% (0.087ms)
current 256 nested trailing comments: 7_212 ops/sec ±1.6% (0.139ms)
current 512 nested trailing comments: 3_403 ops/sec ±1% (0.294ms)
current 1024 nested trailing comments: 1_539 ops/sec ±1.49% (0.65ms)

* fix: do not expose CommentWhitespace type

* add comments on CommentWhitespace

* add test case for babel#11576

* fix: mark containerNode be the innermost node containing commentWS

* fix: adjust trailing comma comments for Record/Tuple/OptionalCall

* fix: drain comment stacks in parseExpression

* docs: update comments

* add a new benchmark

* chore: containerNode => containingNode

* add more benchmark cases

* fix: avoid finishNodeAt in stmtToDirective

* finalize comment right after containerNode is set

* add testcase about directive

* fix: finish SequenceExpression at current pos and adjust later

* chore: rename test cases

* add new test case on switch statement

* fix: adjust comments after trailing comma of function params

* add comment attachment design doc

* misc fix

* fix: reset previous trailing comments when parsing async method/accessor

* chore: add more comment testcases

* fix flow errors

* fix: handle comments when parsing async arrow

* fix: handle comments when "static" is a class modifier

* fix flow errors

* fix: handle comments when parsing async function/do

* refactor: simplify resetPreviousNodeTrailingComments

* update test fixtures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants