Pull checkstyle#14615: fix ordering of AST under STRING_TEMPLATE_BEGIN and remove empty STRING_TEMPLATE_CONTENT nodes #14615

nrmancuso · 2024-03-05T18:57:21Z

The current AST building was buggy and unmaintainable, manually setting line numbers was crazy. Now that we have the context cache, we can readily break the string template tokens into smaller tokens to do much less work (reliably) when we build the AST.

As you can see in the diff, we were missing some tokens
Fixed token ordering
Improved AST building to do less manual code in the visitor
Breaking change: removed empty STRING_TEMPLATE_CONTENT nodes.

config/import-control.xml

nrmancuso

Comments:

nrmancuso · 2024-03-09T13:40:14Z

src/main/java/com/puppycrawl/tools/checkstyle/JavaAstVisitor.java

-        final TerminalNode context = ctx.STRING_TEMPLATE_BEGIN();
-        final Token token = context.getSymbol();
-        final String tokenText = context.getText();
-        final int tokenStartIndex = token.getCharPositionInLine();
-        final int tokenLineNumber = token.getLine();
-        final int tokenTextLength = tokenText.length();


This was crazy, glad to see this go.

nrmancuso · 2024-03-09T13:41:49Z

src/main/resources/com/puppycrawl/tools/checkstyle/grammar/java/JavaLanguageLexer.g4

@@ -254,11 +254,12 @@ LITERAL_FALSE:           'false';

 CHAR_LITERAL:            '\'' (EscapeSequence | ~['\\\r\n]) '\'';

-fragment StringFragment: (EscapeSequence | ~["\\\r\n])*;
+fragment StringFragment: (EscapeSequence | ~["\\\r\n]);


I changed this to be a singular fragment and allow us to define the number of these fragments in the rule where this fragment is used.

nrmancuso · 2024-03-09T13:43:01Z

src/main/resources/com/puppycrawl/tools/checkstyle/grammar/java/JavaLanguageLexer.g4


-STRING_TEMPLATE_BEGIN:  '"' StringFragment '\\' '{'
+STRING_TEMPLATE_BEGIN:  '"'
+                        { _input.LA(1) != '"' }?


Extra predicate here that will be helpful in the future when we start parsing text block templates (but is also good for performance now)

nrmancuso · 2024-03-09T13:44:20Z

src/main/resources/com/puppycrawl/tools/checkstyle/grammar/java/JavaLanguageLexer.g4

@@ -469,10 +470,8 @@ mode TextBlock;

 mode StringTemplate;

-    STRING_TEMPLATE_MID: StringFragment '\\' '{'
-                         -> pushMode(DEFAULT_MODE), type(STRING_TEMPLATE_MID);
+    STRING_TEMPLATE_CONTENT: StringFragment+;


Empty "content" is nonsense, producing this token for static analysis purposes is worthless.

nrmancuso · 2024-03-09T13:47:49Z

src/main/resources/com/puppycrawl/tools/checkstyle/grammar/java/JavaLanguageParser.g4

+stringTemplateBegin
+    : STRING_TEMPLATE_BEGIN
+      STRING_TEMPLATE_CONTENT?
+      EMBEDDED_EXPRESSION_BEGIN
+    ;
+
+stringTemplateMid
+    : EMBEDDED_EXPRESSION_END
+      STRING_TEMPLATE_CONTENT?
+      EMBEDDED_EXPRESSION_BEGIN
+    ;
+
+stringTemplateEnd
+    : EMBEDDED_EXPRESSION_END
+      STRING_TEMPLATE_CONTENT?
+      STRING_TEMPLATE_END
    ;


These parser rules are now more reflective of what's in the JLS: https://docs.oracle.com/javase/specs/jls/se21/preview/specs/string-templates-jls.html#:~:text=template%20(3.13).-,3.13%20Fragments,-A%20template%20(

nrmancuso · 2024-03-09T13:48:21Z

...noncompilable/com/puppycrawl/tools/checkstyle/grammar/java21/ExpectedStringTemplateBasic.txt

        |       |           |--EMBEDDED_EXPRESSION_BEGIN -> \{ [62:21]
        |       |           |--EMBEDDED_EXPRESSION_END -> } [62:25]
-        |       |           |--STRING_TEMPLATE_CONTENT ->  [62:26]


These needed to go.

Related to #14615 (comment)

nrmancuso · 2024-03-09T13:48:54Z

...noncompilable/com/puppycrawl/tools/checkstyle/grammar/java21/ExpectedStringTemplateBasic.txt

-        |       |           |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:33]
-        |       |           |   `--IDENT -> y [64:33]
        |       |           |--STRING_TEMPLATE_CONTENT -> . [64:29]
        |       |           |--EMBEDDED_EXPRESSION_BEGIN -> \{ [64:30]
+        |       |           |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:33]
+        |       |           |   `--IDENT -> y [64:33]
        |       |           |--EMBEDDED_EXPRESSION_END -> } [64:35]
-        |       |           |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:40]
-        |       |           |   `--IDENT -> x [64:40]
        |       |           |--STRING_TEMPLATE_CONTENT -> . [64:36]
        |       |           |--EMBEDDED_EXPRESSION_BEGIN -> \{ [64:37]
+        |       |           |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:40]
+        |       |           |   `--IDENT -> x [64:40]


As you can see, the old implementation caused us to have a bunch of out of order elements in the AST

nrmancuso · 2024-03-09T13:51:22Z

@rnveach let's proceed with review on the final commit only so that we can get this merged and unblock #14390 ASAP

I will rebase this PR after we merge the first commit in #14568

Report generation: https://github.com/nrmancuso/checkstyle-diff-report-generator/actions/runs/8215006015

Edit: #14390 is merged

Projects file: projects-to-test-on.properties
Patch branch: rework-string-template

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/sevntu-check-regression_part_2/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part5/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part6/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part3/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part1/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part4/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/sevntu-check-regression_part_1/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/part2/index.html

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/index.html

Reports looks good, all differences are expected

...lable/com/puppycrawl/tools/checkstyle/grammar/java21/ExpectedStringTemplateBasicWithTabs.txt

rnveach · 2024-03-11T19:31:43Z

@nrmancuso Regression is good we ran it all. But have we done any time regression for this PR? :)

Also CI is failing.

rnveach · 2024-03-11T19:34:34Z

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A35
I am not seeing a removal of the SINGLE_LINE_COMMENT and only the addition. I also do not see it mention above.

Does this mean your fix is finding other issues we don't know about? I don't remember seeing a test case for something like this.

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A111
This is similarly a new item.

Also what is your opinion of running antlr regression against a later openjdk project (is this 21 syntax?) which might have more of these type of code and more?

nrmancuso · 2024-03-11T19:54:15Z

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A35
I am not seeing a removal of the SINGLE_LINE_COMMENT and only the addition. I also do not see it mention above.

I will add a test like this to make sure we don't have any regressions.

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A111
This is similarly a new item.

This one is captured in the AST tests, so we are covered there.

Also what is your opinion of running antlr regression against a later openjdk project (is this 21 syntax?) which might have more of these type of code and more?

This is difficult, we usually OOM in Github actions on OpenJDK sadly. I have decommissioned my dedicated report generation box in favor of Github actions. I would like to find some time to make this work in CI somehow.

nrmancuso · 2024-03-11T19:54:48Z

Assigning back to me to extend test cases and make CI happy

rnveach · 2024-03-12T15:20:17Z

OpenJDK 21 ANTLR Regression: https://rveach.no-ip.org/checkstyle/regression/342/

romani · 2024-03-15T15:20:29Z

This link is not working

nrmancuso · 2024-03-15T17:03:01Z

https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A35
I am not seeing a removal of the SINGLE_LINE_COMMENT and only the addition. I also do not see it mention above.

@rnveach test is added

But have we done any time regression for this PR? :

No, I do not have an easy, reliable way to do this yet, but OpenJDK no-exception CI jobs are a good indicator of this, and I am keeping an eye on them. My plan is to get all the new language features for Java21 in (with manual verification of performance), then focus on performance regression testing.

nrmancuso · 2024-03-15T17:04:29Z

src/test/java/com/puppycrawl/tools/checkstyle/grammar/java21/Java21AstRegressionTest.java

+     * many nested curly braces within an embedded template expression.
+     *
+     * @throws Exception upon failure
+     */
    @Test
    public void testStringTemplateNested() throws Exception {


I've extended this test case to get our nesting level of curly braces even deeper within one "context", and left a comment here about this test case.

nrmancuso · 2024-03-15T17:05:45Z

...-noncompilable/com/puppycrawl/tools/checkstyle/grammar/java21/InputStringTemplateNested.java

                return STR."x\{ sp(() -> {
-                    return STR."x\{ x }x" + "{" + "}}}";
+                    return STR."x\{
+                            sp(() -> {
+        return sp(() -> {
+            return sp(() -> {
+                return sp(() -> {
+                    return sp(() -> "");});});});})
+                            }x" + "{" + "}}}";


Previous test case only went to a nesting level of 1, we needed to go higher to kill a pitest mutation (and be more thorough in our testing in general).

rnveach · 2024-03-18T14:27:41Z

This link is not working

IM me if you need it back @nrmancuso . I took it down since I thought we were done. It is up for now.

...lable/com/puppycrawl/tools/checkstyle/grammar/java21/ExpectedStringTemplateBasicWithTabs.txt

romani

only one minor:

romani · 2024-03-21T13:09:31Z

...noncompilable/com/puppycrawl/tools/checkstyle/grammar/java21/ExpectedStringTemplateBasic.txt

@@ -81,10 +81,8 @@ COMPILATION_UNIT -> COMPILATION_UNIT [2:0]
        |   |       `--DOT -> . [16:26]
        |   |           |--IDENT -> STR [16:23]
        |   |           `--STRING_TEMPLATE_BEGIN -> " [16:27]
-        |   |               |--STRING_TEMPLATE_CONTENT ->  [16:28]


we need to update https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/api/TokenTypes.html#STRING_TEMPLATE_CONTENT
only single token example.
we need one more compact example, to show that STRING_TEMPLATE_CONTENT might not be present.

We are very inconsistent on presence and not presence.
example is MODIFIERS but there are more:

VARIABLE_DEF -> VARIABLE_DEF |--MODIFIERS -> MODIFIERS |--TYPE -> TYPE | `--IDENT -> String |--IDENT -> s

it is always tricky to deal with it.
I am not sure what is right way to make AST: 1) always define 2) skip if empty.
I am ok to skip.
If it was not obvious for us, better to explicitly show it.

My issue with it was that we call it "content" but it is an imaginary node that has no content, with the same line/column number that it's next sibling does. This is nonsense.

Example is updated

romani

ok to merge.

please update PR title to reference token that will be skipped.
PR/issue titile is used in release notes, users need to easily catch by keywords what changed.

fact of refactoring, user doesnot care.
They might be affected by missing token begining from this release.

nrmancuso · 2024-03-22T13:11:27Z

ok to merge.

please update PR title to reference token that will be skipped. PR/issue titile is used in release notes, users need to easily catch by keywords what changed.

fact of refactoring, user doesnot care. They might be affected by missing token begining from this release.

done

nrmancuso commented Mar 5, 2024

View reviewed changes

config/import-control.xml Outdated Show resolved Hide resolved

nrmancuso force-pushed the rework-string-template branch 2 times, most recently from 2c1f74f to 1c73037 Compare March 5, 2024 20:53

nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 9, 2024

Pull checkstyle#14615: rework AST building/parsing

f242ce8

nrmancuso force-pushed the rework-string-template branch from 1c73037 to f242ce8 Compare March 9, 2024 13:37

nrmancuso commented Mar 9, 2024

View reviewed changes

nrmancuso changed the title ~~Rework string template~~ Pull checkstyle#14615: rework AST building/parsing Mar 9, 2024

nrmancuso requested a review from rnveach March 9, 2024 13:49

nrmancuso assigned rnveach Mar 9, 2024

nrmancuso marked this pull request as ready for review March 9, 2024 13:53

nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 11, 2024

Pull checkstyle#14615: rework AST building/parsing

18b911d

nrmancuso force-pushed the rework-string-template branch from f242ce8 to 18b911d Compare March 11, 2024 01:51

nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 11, 2024

Pull checkstyle#14615: rework AST building/parsing

d1f35ab

nrmancuso force-pushed the rework-string-template branch from 18b911d to d1f35ab Compare March 11, 2024 01:53

nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 11, 2024

Pull checkstyle#14615: rework AST building/parsing

610a430

nrmancuso force-pushed the rework-string-template branch from d1f35ab to 610a430 Compare March 11, 2024 03:53

nrmancuso requested a review from romani March 11, 2024 03:57

nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 11, 2024

Pull checkstyle#14615: rework AST building/parsing

be772fb

nrmancuso force-pushed the rework-string-template branch from 610a430 to be772fb Compare March 11, 2024 04:10

rnveach reviewed Mar 11, 2024

View reviewed changes

...lable/com/puppycrawl/tools/checkstyle/grammar/java21/ExpectedStringTemplateBasicWithTabs.txt Show resolved Hide resolved

nrmancuso assigned nrmancuso and unassigned rnveach Mar 11, 2024

nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 15, 2024

Pull checkstyle#14615: rework AST building/parsing

26887f8

nrmancuso added a commit to nrmancuso/checkstyle that referenced this pull request Mar 15, 2024

Pull checkstyle#14615: rework AST building/parsing

c188312

nrmancuso force-pushed the rework-string-template branch from 26887f8 to c188312 Compare March 15, 2024 17:01

nrmancuso commented Mar 15, 2024

View reviewed changes

nrmancuso assigned rnveach and unassigned nrmancuso Mar 15, 2024

rnveach requested changes Mar 18, 2024

View reviewed changes

...lable/com/puppycrawl/tools/checkstyle/grammar/java21/ExpectedStringTemplateBasicWithTabs.txt Show resolved Hide resolved

rnveach mentioned this pull request Mar 18, 2024

update JavadocTokenTypes.java to new format of AST print #14631

Open

rnveach approved these changes Mar 18, 2024

View reviewed changes

rnveach assigned romani and unassigned rnveach Mar 18, 2024

romani added the breaking compatibility label Mar 21, 2024

romani requested changes Mar 21, 2024

View reviewed changes

romani assigned nrmancuso and unassigned romani Mar 21, 2024

Pull checkstyle#14615: rework AST building/parsing

eee0562

nrmancuso force-pushed the rework-string-template branch from c188312 to eee0562 Compare March 22, 2024 04:03

nrmancuso changed the title ~~Pull checkstyle#14615: rework AST building/parsing~~ Pull checkstyle#14615: fix String template AST ordering, drop empty content nodes Mar 22, 2024

nrmancuso assigned romani and unassigned nrmancuso Mar 22, 2024

romani approved these changes Mar 22, 2024

View reviewed changes

romani merged commit 7c50484 into checkstyle:master Mar 22, 2024
113 checks passed

github-actions bot added this to the 10.15.0 milestone Mar 22, 2024

nrmancuso deleted the rework-string-template branch March 22, 2024 13:10

nrmancuso changed the title ~~Pull checkstyle#14615: fix String template AST ordering, drop empty content nodes~~ Pull checkstyle#14615: fix String template AST ordering and remove empty STRING_TEMPLATE_CONTENT nodes Mar 22, 2024

nrmancuso changed the title ~~Pull checkstyle#14615: fix String template AST ordering and remove empty STRING_TEMPLATE_CONTENT nodes~~ Pull checkstyle#14615: fix ordering of AST under STRING_TEMPLATE_BEGIN and remove empty STRING_TEMPLATE_CONTENT nodes Mar 22, 2024

romani added the approved label Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull checkstyle#14615: fix ordering of AST under STRING_TEMPLATE_BEGIN and remove empty STRING_TEMPLATE_CONTENT nodes #14615

Pull checkstyle#14615: fix ordering of AST under STRING_TEMPLATE_BEGIN and remove empty STRING_TEMPLATE_CONTENT nodes #14615

nrmancuso commented Mar 5, 2024 •

edited

nrmancuso left a comment

nrmancuso Mar 9, 2024

nrmancuso Mar 9, 2024

nrmancuso Mar 9, 2024 •

edited

nrmancuso Mar 9, 2024

nrmancuso Mar 9, 2024

nrmancuso Mar 9, 2024

nrmancuso Mar 9, 2024

nrmancuso Mar 9, 2024

nrmancuso commented Mar 9, 2024 •

edited

rnveach commented Mar 11, 2024 •

edited

rnveach commented Mar 11, 2024 •

edited

nrmancuso commented Mar 11, 2024

nrmancuso commented Mar 11, 2024

rnveach commented Mar 12, 2024

romani commented Mar 15, 2024

nrmancuso commented Mar 15, 2024 •

edited

nrmancuso Mar 15, 2024

nrmancuso Mar 15, 2024

rnveach commented Mar 18, 2024

romani left a comment

romani Mar 21, 2024

nrmancuso Mar 22, 2024

nrmancuso Mar 22, 2024

romani left a comment

nrmancuso commented Mar 22, 2024

Pull checkstyle#14615: fix ordering of AST under STRING_TEMPLATE_BEGIN and remove empty STRING_TEMPLATE_CONTENT nodes #14615

Pull checkstyle#14615: fix ordering of AST under STRING_TEMPLATE_BEGIN and remove empty STRING_TEMPLATE_CONTENT nodes #14615

Conversation

nrmancuso commented Mar 5, 2024 • edited

nrmancuso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nrmancuso Mar 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nrmancuso commented Mar 9, 2024 • edited

rnveach commented Mar 11, 2024 • edited

rnveach commented Mar 11, 2024 • edited

nrmancuso commented Mar 11, 2024

nrmancuso commented Mar 11, 2024

rnveach commented Mar 12, 2024

romani commented Mar 15, 2024

nrmancuso commented Mar 15, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rnveach commented Mar 18, 2024

romani left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romani left a comment

Choose a reason for hiding this comment

nrmancuso commented Mar 22, 2024

nrmancuso commented Mar 5, 2024 •

edited

nrmancuso Mar 9, 2024 •

edited

nrmancuso commented Mar 9, 2024 •

edited

rnveach commented Mar 11, 2024 •

edited

rnveach commented Mar 11, 2024 •

edited

nrmancuso commented Mar 15, 2024 •

edited