Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull checkstyle#14615: fix ordering of AST under STRING_TEMPLATE_BEGIN and remove empty STRING_TEMPLATE_CONTENT nodes #14615

Merged
merged 1 commit into from Mar 22, 2024

Conversation

nrmancuso
Copy link
Member

@nrmancuso nrmancuso commented Mar 5, 2024

The current AST building was buggy and unmaintainable, manually setting line numbers was crazy. Now that we have the context cache, we can readily break the string template tokens into smaller tokens to do much less work (reliably) when we build the AST.

  • As you can see in the diff, we were missing some tokens
  • Fixed token ordering
  • Improved AST building to do less manual code in the visitor
  • Breaking change: removed empty STRING_TEMPLATE_CONTENT nodes.

config/import-control.xml Outdated Show resolved Hide resolved
@nrmancuso nrmancuso force-pushed the rework-string-template branch 2 times, most recently from 2c1f74f to 1c73037 Compare March 5, 2024 20:53
nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 9, 2024
Copy link
Member Author

@nrmancuso nrmancuso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments:

Comment on lines -1843 to -1848
final TerminalNode context = ctx.STRING_TEMPLATE_BEGIN();
final Token token = context.getSymbol();
final String tokenText = context.getText();
final int tokenStartIndex = token.getCharPositionInLine();
final int tokenLineNumber = token.getLine();
final int tokenTextLength = tokenText.length();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was crazy, glad to see this go.

@@ -254,11 +254,12 @@ LITERAL_FALSE: 'false';

CHAR_LITERAL: '\'' (EscapeSequence | ~['\\\r\n]) '\'';

fragment StringFragment: (EscapeSequence | ~["\\\r\n])*;
fragment StringFragment: (EscapeSequence | ~["\\\r\n]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to be a singular fragment and allow us to define the number of these fragments in the rule where this fragment is used.


STRING_TEMPLATE_BEGIN: '"' StringFragment '\\' '{'
STRING_TEMPLATE_BEGIN: '"'
{ _input.LA(1) != '"' }?
Copy link
Member Author

@nrmancuso nrmancuso Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra predicate here that will be helpful in the future when we start parsing text block templates (but is also good for performance now)

@@ -469,10 +470,8 @@ mode TextBlock;

mode StringTemplate;

STRING_TEMPLATE_MID: StringFragment '\\' '{'
-> pushMode(DEFAULT_MODE), type(STRING_TEMPLATE_MID);
STRING_TEMPLATE_CONTENT: StringFragment+;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty "content" is nonsense, producing this token for static analysis purposes is worthless.

Comment on lines +778 to 794
stringTemplateBegin
: STRING_TEMPLATE_BEGIN
STRING_TEMPLATE_CONTENT?
EMBEDDED_EXPRESSION_BEGIN
;

stringTemplateMid
: EMBEDDED_EXPRESSION_END
STRING_TEMPLATE_CONTENT?
EMBEDDED_EXPRESSION_BEGIN
;

stringTemplateEnd
: EMBEDDED_EXPRESSION_END
STRING_TEMPLATE_CONTENT?
STRING_TEMPLATE_END
;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| | |--EMBEDDED_EXPRESSION_BEGIN -> \{ [62:21]
| | |--EMBEDDED_EXPRESSION_END -> } [62:25]
| | |--STRING_TEMPLATE_CONTENT -> [62:26]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These needed to go.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #14615 (comment)

Comment on lines -566 to +553
| | |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:33]
| | | `--IDENT -> y [64:33]
| | |--STRING_TEMPLATE_CONTENT -> . [64:29]
| | |--EMBEDDED_EXPRESSION_BEGIN -> \{ [64:30]
| | |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:33]
| | | `--IDENT -> y [64:33]
| | |--EMBEDDED_EXPRESSION_END -> } [64:35]
| | |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:40]
| | | `--IDENT -> x [64:40]
| | |--STRING_TEMPLATE_CONTENT -> . [64:36]
| | |--EMBEDDED_EXPRESSION_BEGIN -> \{ [64:37]
| | |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:40]
| | | `--IDENT -> x [64:40]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see, the old implementation caused us to have a bunch of out of order elements in the AST

@nrmancuso nrmancuso changed the title Rework string template Pull checkstyle#14615: rework AST building/parsing Mar 9, 2024
@nrmancuso nrmancuso requested a review from rnveach March 9, 2024 13:49
@nrmancuso
Copy link
Member Author

nrmancuso commented Mar 9, 2024

@rnveach let's proceed with review on the final commit only so that we can get this merged and unblock #14390 ASAP

I will rebase this PR after we merge the first commit in #14568

Report generation: https://github.com/nrmancuso/checkstyle-diff-report-generator/actions/runs/8215006015

Edit: #14390 is merged

Projects file: projects-to-test-on.properties
Patch branch: rework-string-template

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/sevntu-check-regression_part_2/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part5/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part6/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part3/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part1/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part4/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/sevntu-check-regression_part_1/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part2/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/index.html

Reports looks good, all differences are expected

@nrmancuso nrmancuso marked this pull request as ready for review March 9, 2024 13:53
nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 11, 2024
nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 11, 2024
nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 11, 2024
@nrmancuso nrmancuso requested a review from romani March 11, 2024 03:57
nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 11, 2024
@rnveach
Copy link
Member

rnveach commented Mar 11, 2024

@nrmancuso Regression is good we ran it all. But have we done any time regression for this PR? :)

Also CI is failing.

@rnveach
Copy link
Member

rnveach commented Mar 11, 2024

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A35
I am not seeing a removal of the SINGLE_LINE_COMMENT and only the addition. I also do not see it mention above.

Does this mean your fix is finding other issues we don't know about? I don't remember seeing a test case for something like this.

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A111
This is similarly a new item.

Also what is your opinion of running antlr regression against a later openjdk project (is this 21 syntax?) which might have more of these type of code and more?

@nrmancuso
Copy link
Member Author

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A35
I am not seeing a removal of the SINGLE_LINE_COMMENT and only the addition. I also do not see it mention above.

I will add a test like this to make sure we don't have any regressions.

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A111
This is similarly a new item.

This one is captured in the AST tests, so we are covered there.

Also what is your opinion of running antlr regression against a later openjdk project (is this 21 syntax?) which might have more of these type of code and more?

This is difficult, we usually OOM in Github actions on OpenJDK sadly. I have decommissioned my dedicated report generation box in favor of Github actions. I would like to find some time to make this work in CI somehow.

@nrmancuso
Copy link
Member Author

Assigning back to me to extend test cases and make CI happy

@nrmancuso nrmancuso assigned nrmancuso and unassigned rnveach Mar 11, 2024
@rnveach
Copy link
Member

rnveach commented Mar 12, 2024

OpenJDK 21 ANTLR Regression: https://rveach.no-ip.org/checkstyle/regression/342/

@romani
Copy link
Member

romani commented Mar 15, 2024

This link is not working

nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 15, 2024
nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 15, 2024
@nrmancuso
Copy link
Member Author

nrmancuso commented Mar 15, 2024

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A35
I am not seeing a removal of the SINGLE_LINE_COMMENT and only the addition. I also do not see it mention above.

@rnveach test is added

But have we done any time regression for this PR? :

No, I do not have an easy, reliable way to do this yet, but OpenJDK no-exception CI jobs are a good indicator of this, and I am keeping an eye on them. My plan is to get all the new language features for Java21 in (with manual verification of performance), then focus on performance regression testing.

* many nested curly braces within an embedded template expression.
*
* @throws Exception upon failure
*/
@Test
public void testStringTemplateNested() throws Exception {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've extended this test case to get our nesting level of curly braces even deeper within one "context", and left a comment here about this test case.

Comment on lines 10 to +17
return STR."x\{ sp(() -> {
return STR."x\{ x }x" + "{" + "}}}";
return STR."x\{
sp(() -> {
return sp(() -> {
return sp(() -> {
return sp(() -> {
return sp(() -> "");});});});})
}x" + "{" + "}}}";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous test case only went to a nesting level of 1, we needed to go higher to kill a pitest mutation (and be more thorough in our testing in general).

@nrmancuso nrmancuso assigned rnveach and unassigned nrmancuso Mar 15, 2024
@rnveach
Copy link
Member

rnveach commented Mar 18, 2024

This link is not working

IM me if you need it back @nrmancuso . I took it down since I thought we were done. It is up for now.

Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only one minor:

@@ -81,10 +81,8 @@ COMPILATION_UNIT -> COMPILATION_UNIT [2:0]
| | `--DOT -> . [16:26]
| | |--IDENT -> STR [16:23]
| | `--STRING_TEMPLATE_BEGIN -> " [16:27]
| | |--STRING_TEMPLATE_CONTENT -> [16:28]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to update https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/api/TokenTypes.html#STRING_TEMPLATE_CONTENT
only single token example.
we need one more compact example, to show that STRING_TEMPLATE_CONTENT might not be present.

We are very inconsistent on presence and not presence.
example is MODIFIERS but there are more:

 VARIABLE_DEF -> VARIABLE_DEF
  |--MODIFIERS -> MODIFIERS
  |--TYPE -> TYPE
  |   `--IDENT -> String
  |--IDENT -> s

it is always tricky to deal with it.
I am not sure what is right way to make AST: 1) always define 2) skip if empty.
I am ok to skip.
If it was not obvious for us, better to explicitly show it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue with it was that we call it "content" but it is an imaginary node that has no content, with the same line/column number that it's next sibling does. This is nonsense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example is updated

@romani romani assigned nrmancuso and unassigned romani Mar 21, 2024
@nrmancuso nrmancuso changed the title Pull checkstyle#14615: rework AST building/parsing Pull checkstyle#14615: fix String template AST ordering, drop empty content nodes Mar 22, 2024
@nrmancuso nrmancuso assigned romani and unassigned nrmancuso Mar 22, 2024
Copy link
Member

@romani romani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to merge.

please update PR title to reference token that will be skipped.
PR/issue titile is used in release notes, users need to easily catch by keywords what changed.

fact of refactoring, user doesnot care.
They might be affected by missing token begining from this release.

@romani romani merged commit 7c50484 into checkstyle:master Mar 22, 2024
113 checks passed
@github-actions github-actions bot added this to the 10.15.0 milestone Mar 22, 2024
@nrmancuso nrmancuso deleted the rework-string-template branch March 22, 2024 13:10
@nrmancuso
Copy link
Member Author

ok to merge.

please update PR title to reference token that will be skipped. PR/issue titile is used in release notes, users need to easily catch by keywords what changed.

fact of refactoring, user doesnot care. They might be affected by missing token begining from this release.

done

@nrmancuso nrmancuso changed the title Pull checkstyle#14615: fix String template AST ordering, drop empty content nodes Pull checkstyle#14615: fix String template AST ordering and remove empty STRING_TEMPLATE_CONTENT nodes Mar 22, 2024
@nrmancuso nrmancuso changed the title Pull checkstyle#14615: fix String template AST ordering and remove empty STRING_TEMPLATE_CONTENT nodes Pull checkstyle#14615: fix ordering of AST under STRING_TEMPLATE_BEGIN and remove empty STRING_TEMPLATE_CONTENT nodes Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants