New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull checkstyle#14615: fix ordering of AST under STRING_TEMPLATE_BEGIN and remove empty STRING_TEMPLATE_CONTENT nodes #14615
Conversation
2c1f74f
to
1c73037
Compare
1c73037
to
f242ce8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments:
final TerminalNode context = ctx.STRING_TEMPLATE_BEGIN(); | ||
final Token token = context.getSymbol(); | ||
final String tokenText = context.getText(); | ||
final int tokenStartIndex = token.getCharPositionInLine(); | ||
final int tokenLineNumber = token.getLine(); | ||
final int tokenTextLength = tokenText.length(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was crazy, glad to see this go.
@@ -254,11 +254,12 @@ LITERAL_FALSE: 'false'; | |||
|
|||
CHAR_LITERAL: '\'' (EscapeSequence | ~['\\\r\n]) '\''; | |||
|
|||
fragment StringFragment: (EscapeSequence | ~["\\\r\n])*; | |||
fragment StringFragment: (EscapeSequence | ~["\\\r\n]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to be a singular fragment and allow us to define the number of these fragments in the rule where this fragment is used.
|
||
STRING_TEMPLATE_BEGIN: '"' StringFragment '\\' '{' | ||
STRING_TEMPLATE_BEGIN: '"' | ||
{ _input.LA(1) != '"' }? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra predicate here that will be helpful in the future when we start parsing text block templates (but is also good for performance now)
@@ -469,10 +470,8 @@ mode TextBlock; | |||
|
|||
mode StringTemplate; | |||
|
|||
STRING_TEMPLATE_MID: StringFragment '\\' '{' | |||
-> pushMode(DEFAULT_MODE), type(STRING_TEMPLATE_MID); | |||
STRING_TEMPLATE_CONTENT: StringFragment+; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty "content" is nonsense, producing this token for static analysis purposes is worthless.
stringTemplateBegin | ||
: STRING_TEMPLATE_BEGIN | ||
STRING_TEMPLATE_CONTENT? | ||
EMBEDDED_EXPRESSION_BEGIN | ||
; | ||
|
||
stringTemplateMid | ||
: EMBEDDED_EXPRESSION_END | ||
STRING_TEMPLATE_CONTENT? | ||
EMBEDDED_EXPRESSION_BEGIN | ||
; | ||
|
||
stringTemplateEnd | ||
: EMBEDDED_EXPRESSION_END | ||
STRING_TEMPLATE_CONTENT? | ||
STRING_TEMPLATE_END | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These parser rules are now more reflective of what's in the JLS: https://docs.oracle.com/javase/specs/jls/se21/preview/specs/string-templates-jls.html#:~:text=template%20(3.13).-,3.13%20Fragments,-A%20template%20(
| | |--EMBEDDED_EXPRESSION_BEGIN -> \{ [62:21] | ||
| | |--EMBEDDED_EXPRESSION_END -> } [62:25] | ||
| | |--STRING_TEMPLATE_CONTENT -> [62:26] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These needed to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to #14615 (comment)
| | |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:33] | ||
| | | `--IDENT -> y [64:33] | ||
| | |--STRING_TEMPLATE_CONTENT -> . [64:29] | ||
| | |--EMBEDDED_EXPRESSION_BEGIN -> \{ [64:30] | ||
| | |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:33] | ||
| | | `--IDENT -> y [64:33] | ||
| | |--EMBEDDED_EXPRESSION_END -> } [64:35] | ||
| | |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:40] | ||
| | | `--IDENT -> x [64:40] | ||
| | |--STRING_TEMPLATE_CONTENT -> . [64:36] | ||
| | |--EMBEDDED_EXPRESSION_BEGIN -> \{ [64:37] | ||
| | |--EMBEDDED_EXPRESSION -> EMBEDDED_EXPRESSION [64:40] | ||
| | | `--IDENT -> x [64:40] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you can see, the old implementation caused us to have a bunch of out of order elements in the AST
@rnveach let's proceed with review on the final commit only so that we can get this merged and unblock #14390 ASAP I will rebase this PR after we merge the first commit in #14568 Report generation: https://github.com/nrmancuso/checkstyle-diff-report-generator/actions/runs/8215006015 Edit: #14390 is merged Projects file: projects-to-test-on.properties Reports looks good, all differences are expected |
f242ce8
to
18b911d
Compare
18b911d
to
d1f35ab
Compare
d1f35ab
to
610a430
Compare
610a430
to
be772fb
Compare
...lable/com/puppycrawl/tools/checkstyle/grammar/java21/ExpectedStringTemplateBasicWithTabs.txt
Show resolved
Hide resolved
@nrmancuso Regression is good we ran it all. But have we done any time regression for this PR? :) Also CI is failing. |
https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A35 Does this mean your fix is finding other issues we don't know about? I don't remember seeing a test case for something like this. https://checkstyle-reports.s3.us-east-2.amazonaws.com/reports/rework-string-template/2024-03-09-T-16-30-11/antlr-report-checkstyle/checkstyle/index.html#A111 Also what is your opinion of running antlr regression against a later openjdk project (is this 21 syntax?) which might have more of these type of code and more? |
I will add a test like this to make sure we don't have any regressions.
This one is captured in the AST tests, so we are covered there.
This is difficult, we usually OOM in Github actions on OpenJDK sadly. I have decommissioned my dedicated report generation box in favor of Github actions. I would like to find some time to make this work in CI somehow. |
Assigning back to me to extend test cases and make CI happy |
OpenJDK 21 ANTLR Regression: https://rveach.no-ip.org/checkstyle/regression/342/ |
This link is not working |
26887f8
to
c188312
Compare
@rnveach test is added
No, I do not have an easy, reliable way to do this yet, but OpenJDK no-exception CI jobs are a good indicator of this, and I am keeping an eye on them. My plan is to get all the new language features for Java21 in (with manual verification of performance), then focus on performance regression testing. |
* many nested curly braces within an embedded template expression. | ||
* | ||
* @throws Exception upon failure | ||
*/ | ||
@Test | ||
public void testStringTemplateNested() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've extended this test case to get our nesting level of curly braces even deeper within one "context", and left a comment here about this test case.
return STR."x\{ sp(() -> { | ||
return STR."x\{ x }x" + "{" + "}}}"; | ||
return STR."x\{ | ||
sp(() -> { | ||
return sp(() -> { | ||
return sp(() -> { | ||
return sp(() -> { | ||
return sp(() -> "");});});});}) | ||
}x" + "{" + "}}}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous test case only went to a nesting level of 1, we needed to go higher to kill a pitest mutation (and be more thorough in our testing in general).
IM me if you need it back @nrmancuso . I took it down since I thought we were done. It is up for now. |
...lable/com/puppycrawl/tools/checkstyle/grammar/java21/ExpectedStringTemplateBasicWithTabs.txt
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only one minor:
@@ -81,10 +81,8 @@ COMPILATION_UNIT -> COMPILATION_UNIT [2:0] | |||
| | `--DOT -> . [16:26] | |||
| | |--IDENT -> STR [16:23] | |||
| | `--STRING_TEMPLATE_BEGIN -> " [16:27] | |||
| | |--STRING_TEMPLATE_CONTENT -> [16:28] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to update https://checkstyle.sourceforge.io/apidocs/com/puppycrawl/tools/checkstyle/api/TokenTypes.html#STRING_TEMPLATE_CONTENT
only single token example.
we need one more compact example, to show that STRING_TEMPLATE_CONTENT might not be present.
We are very inconsistent on presence and not presence.
example is MODIFIERS
but there are more:
VARIABLE_DEF -> VARIABLE_DEF
|--MODIFIERS -> MODIFIERS
|--TYPE -> TYPE
| `--IDENT -> String
|--IDENT -> s
it is always tricky to deal with it.
I am not sure what is right way to make AST: 1) always define 2) skip if empty.
I am ok to skip.
If it was not obvious for us, better to explicitly show it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My issue with it was that we call it "content" but it is an imaginary node that has no content, with the same line/column number that it's next sibling does. This is nonsense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example is updated
c188312
to
eee0562
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to merge.
please update PR title to reference token that will be skipped.
PR/issue titile is used in release notes, users need to easily catch by keywords what changed.
fact of refactoring, user doesnot care.
They might be affected by missing token begining from this release.
done |
The current AST building was buggy and unmaintainable, manually setting line numbers was crazy. Now that we have the context cache, we can readily break the string template tokens into smaller tokens to do much less work (reliably) when we build the AST.
STRING_TEMPLATE_CONTENT
nodes.