New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Javadoc comments containing unescaped Java code with generic types leads to enormous parsing times #4390

Closed
blerner opened this Issue May 30, 2017 · 8 comments

Comments

@blerner

blerner commented May 30, 2017

(Split off from #3332 as requested)

The following code leads to enormous parsing times. It is derived from an actual student-written homework, where the student had commented out a @Test method because it wasn't working...but used /** ... **/ instead of /* ... */. Since we use checkstyle as a component of our auto-grading setup, we need the checks to run automatically and quickly (and not DDoS other students by tying up the server resources).

class DontCare {
/**
  @Test
  public void someTestMethod() {
    List<Foo> x = makeAList();
    List<Foo> x = makeAList();
    ...add more copies of this line to make the runtime worse...
    feefiefoefum
    feefiefoefum
    ...and add more copies of this line, too.
  }
**/
}

The problem stems from the Javadoc parser seeing <Foo> and thinking it's an HTML tag, rather than Java code (because it's not marked off by {@code ... }), and therefore looking for the matching close tag. My hunch is there's something in the parser that's forcing a backtracking search, rather than a more efficient table-driven search, and triggering pathological ANTLR behavior.

As far as I can tell:

  • You need at least two List<Foo> lines before the parser performance chokes.
  • With ~20 lines of "feefiefoefum", each additional List<Foo> line adds ~2 seconds to the parse time.
  • With more lines of "feefiefoefum", the runtime grows accordingly.
  • With more List<Foo> lines, the runtime seems to grow superlinearly, though I couldn't tell exactly how badly.
  • If I replace every List<Foo> with List?Foo?, so that the text is identical in length but contains nothing that looks like unclosed HTML tags, the runtime is ~ 1sec (including the startup time of my surrounding application, so, basically instantly).
  • Interleaving List<Foo> lines with List?Foo? lines doesn't much seem to matter: it seems like runtime is growing as a function of the number of List<Foo> lines and the total length of the comment, not the position of the what-appear-to-be-unclosed-tags within the comment.

Per #3332 (comment), this might be related to #3727 -- but unlike that issue, which tried running Javadoc checkstyle rules on a purely .java file and so could be plausibly ruled out as user error, this one is asking to run Javadoc checkstyle rules on a Java file that happens to have a Javadoc comment with egregiously weird contents. (I'm happy to call this one student error, too...but it's student error that means I can't use checkstyle any more.)

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach May 30, 2017

Member

This issue should be looked at in as part of GSoC's "Javadoc style coverage and parser optimization".

Member

rnveach commented May 30, 2017

This issue should be looked at in as part of GSoC's "Javadoc style coverage and parser optimization".

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Jun 1, 2017

Member

We need somehow skip parsing of javadoc comments that are located in side methods at ALL - they are not a goal of Javadoc Checks.

We should be able to detect this easily by AST analysys before running javadoc parser.

@ps-sp , this issue should be ease to make.

Member

romani commented Jun 1, 2017

We need somehow skip parsing of javadoc comments that are located in side methods at ALL - they are not a goal of Javadoc Checks.

We should be able to detect this easily by AST analysys before running javadoc parser.

@ps-sp , this issue should be ease to make.

@romani romani added the approved label Jun 1, 2017

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Jun 1, 2017

Member

We need somehow skip parsing of javadoc comments that are located in side methods at ALL

@romani Example code is not javadoc inside method, it is inside class.
Also, I believe we ignore javadocs in methods via

private static boolean isCorrectJavadocPosition(DetailAST blockComment) {
return BlockCommentPosition.isOnClass(blockComment)
|| BlockCommentPosition.isOnInterface(blockComment)
|| BlockCommentPosition.isOnEnum(blockComment)
|| BlockCommentPosition.isOnMethod(blockComment)
|| BlockCommentPosition.isOnField(blockComment)
|| BlockCommentPosition.isOnConstructor(blockComment)
|| BlockCommentPosition.isOnEnumConstant(blockComment)
|| BlockCommentPosition.isOnAnnotationDef(blockComment);
}
. I cannot produce an exception with bad javadoc inside method.

$ cat TestClass.java
public class TestClass {
    void method() {
/** <htm */
    }
}

$ cat TestConfig.xml
<?xml version="1.0"?>
<!DOCTYPE module PUBLIC
          "-//Puppy Crawl//DTD Check Configuration 1.3//EN"
          "http://www.puppycrawl.com/dtds/configuration_1_3.dtd">

<module name="Checker">
    <property name="charset" value="UTF-8"/>

    <module name="TreeWalker">
    <module name="AtclauseOrder"/>
    <module name="JavadocParagraph"/>
    <module name="JavadocTagContinuationIndentation"/>
    <module name="NonEmptyAtclauseDescription"/>
    <module name="SingleLineJavadoc"/>
    <module name="SummaryJavadoc"/>
    </module>
</module>

$ java -jar checkstyle-7.8-all.jar -c TestConfig.xml TestClass.java
Starting audit...
Audit done.
Member

rnveach commented Jun 1, 2017

We need somehow skip parsing of javadoc comments that are located in side methods at ALL

@romani Example code is not javadoc inside method, it is inside class.
Also, I believe we ignore javadocs in methods via

private static boolean isCorrectJavadocPosition(DetailAST blockComment) {
return BlockCommentPosition.isOnClass(blockComment)
|| BlockCommentPosition.isOnInterface(blockComment)
|| BlockCommentPosition.isOnEnum(blockComment)
|| BlockCommentPosition.isOnMethod(blockComment)
|| BlockCommentPosition.isOnField(blockComment)
|| BlockCommentPosition.isOnConstructor(blockComment)
|| BlockCommentPosition.isOnEnumConstant(blockComment)
|| BlockCommentPosition.isOnAnnotationDef(blockComment);
}
. I cannot produce an exception with bad javadoc inside method.

$ cat TestClass.java
public class TestClass {
    void method() {
/** <htm */
    }
}

$ cat TestConfig.xml
<?xml version="1.0"?>
<!DOCTYPE module PUBLIC
          "-//Puppy Crawl//DTD Check Configuration 1.3//EN"
          "http://www.puppycrawl.com/dtds/configuration_1_3.dtd">

<module name="Checker">
    <property name="charset" value="UTF-8"/>

    <module name="TreeWalker">
    <module name="AtclauseOrder"/>
    <module name="JavadocParagraph"/>
    <module name="JavadocTagContinuationIndentation"/>
    <module name="NonEmptyAtclauseDescription"/>
    <module name="SingleLineJavadoc"/>
    <module name="SummaryJavadoc"/>
    </module>
</module>

$ java -jar checkstyle-7.8-all.jar -c TestConfig.xml TestClass.java
Starting audit...
Audit done.
@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Jun 1, 2017

Member

yes, my bad.

Ok, this is just another performance issue for javadoc parser.
Not easy. Work is in progress in terms of GSoC.

Member

romani commented Jun 1, 2017

yes, my bad.

Ok, this is just another performance issue for javadoc parser.
Not easy. Work is in progress in terms of GSoC.

@ps-sp

This comment has been minimized.

Show comment
Hide comment
@ps-sp

ps-sp Jun 25, 2017

Collaborator

@rnveach @romani @Vladlis The issue lies with this alternative of the htmlTag rule. Taking that alternative out results in this error: [ERROR:36] Javadoc comment at column 1 has parse error. Details: no viable alternative at input '<EOF>' while parsing HTML_TAG but there is no lag while parsing. I suggest that we remove that alternative and in situations when there is a successful parse even when corresponding htmlElementClose is missing (though I don't think it is possible ; since once the parser matches htmlTag it must match the subrule htmlElementClose, though perhaps there can be potential mismatches, not sure) we can post process the parse tree and generate an error if for a given html tag a close must be present. And if it is not a must for the close to be present we can set the boolean hasUnclosedTag which relates to #3311 .

Collaborator

ps-sp commented Jun 25, 2017

@rnveach @romani @Vladlis The issue lies with this alternative of the htmlTag rule. Taking that alternative out results in this error: [ERROR:36] Javadoc comment at column 1 has parse error. Details: no viable alternative at input '<EOF>' while parsing HTML_TAG but there is no lag while parsing. I suggest that we remove that alternative and in situations when there is a successful parse even when corresponding htmlElementClose is missing (though I don't think it is possible ; since once the parser matches htmlTag it must match the subrule htmlElementClose, though perhaps there can be potential mismatches, not sure) we can post process the parse tree and generate an error if for a given html tag a close must be present. And if it is not a must for the close to be present we can set the boolean hasUnclosedTag which relates to #3311 .

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Jun 25, 2017

Member

The issue lies with this alternative of the htmlTag rule

I agree that this issue lies with the | of htmlTag from little testing I did before.

and in situations when there is a successful parse even when corresponding htmlElementClose is missing ... we can post process the parse tree and generate an error if for a given html tag a close must be present

I'm not sure what you are fully suggesting, but Details: no viable alternative at input '<EOF>' while parsing HTML_TAG is not as informative as the original message Missed HTML close tag 'XXX'. Sometimes it means that close tag missed for one of previous tags.. So we must try to keep original message as best as we can.

Since the 2 or groups are so similar, can't we remove | at

and almost everything that comes after it, and instead wrap the notifyErrorListeners around an | on htmlElementClose?
Example: ...)* (htmlElementClose | notifyErrorListeners(...))

though I don't think it is possible
And if it is not a must for the close to be present we can set the boolean hasUnclosedTag

No one knows this ANTLR very well. You will have to test and see how it is working to understand if something is affected or not.
You can use antlr regression to verify any changes.

Member

rnveach commented Jun 25, 2017

The issue lies with this alternative of the htmlTag rule

I agree that this issue lies with the | of htmlTag from little testing I did before.

and in situations when there is a successful parse even when corresponding htmlElementClose is missing ... we can post process the parse tree and generate an error if for a given html tag a close must be present

I'm not sure what you are fully suggesting, but Details: no viable alternative at input '<EOF>' while parsing HTML_TAG is not as informative as the original message Missed HTML close tag 'XXX'. Sometimes it means that close tag missed for one of previous tags.. So we must try to keep original message as best as we can.

Since the 2 or groups are so similar, can't we remove | at

and almost everything that comes after it, and instead wrap the notifyErrorListeners around an | on htmlElementClose?
Example: ...)* (htmlElementClose | notifyErrorListeners(...))

though I don't think it is possible
And if it is not a must for the close to be present we can set the boolean hasUnclosedTag

No one knows this ANTLR very well. You will have to test and see how it is working to understand if something is affected or not.
You can use antlr regression to verify any changes.

@ps-sp

This comment has been minimized.

Show comment
Hide comment
@ps-sp

ps-sp Jun 26, 2017

Collaborator

@rnveach

I'm not sure what you are fully suggesting

Basically, I was saying that we stop generating our customized error and just use tha ANTLR error. Further, I was asserting that even with the problematic alternative deleted, for all cases, there should be a parse error where there was a parse error previously.

So we must try to keep original message as best as we can.

Ok. I will see what else can be done.

Since the 2 or groups are so similar, can't we remove | at...

(htmlElementClose | {notifyErrorListeners($htmlElementOpen.ctx.getToken(HTML_TAG_NAME, 0).getSymbol() , "javadoc.missed.html.close", null);}) doesn't help improve the time.

I haven't been able to figure out how adaptivePredict works but certain calls to that method are the cause of delays in parser.

Collaborator

ps-sp commented Jun 26, 2017

@rnveach

I'm not sure what you are fully suggesting

Basically, I was saying that we stop generating our customized error and just use tha ANTLR error. Further, I was asserting that even with the problematic alternative deleted, for all cases, there should be a parse error where there was a parse error previously.

So we must try to keep original message as best as we can.

Ok. I will see what else can be done.

Since the 2 or groups are so similar, can't we remove | at...

(htmlElementClose | {notifyErrorListeners($htmlElementOpen.ctx.getToken(HTML_TAG_NAME, 0).getSymbol() , "javadoc.missed.html.close", null);}) doesn't help improve the time.

I haven't been able to figure out how adaptivePredict works but certain calls to that method are the cause of delays in parser.

@Vladlis Vladlis moved this from To Do to In Progress in Javadoc style coverage and parser optimization Jun 27, 2017

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 4, 2017

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 6, 2017

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 7, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 7, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

@Vladlis Vladlis moved this from In Progress to Review in Javadoc style coverage and parser optimization Jul 9, 2017

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 9, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 9, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 10, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 12, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 12, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 12, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 12, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 13, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 13, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 19, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 19, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 20, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

@Vladlis Vladlis moved this from Blocked to In Progress in Javadoc style coverage and parser optimization Jul 28, 2017

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Jul 31, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Aug 5, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Aug 5, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Aug 5, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

ps-sp added a commit to ps-sp/checkstyle that referenced this issue Aug 6, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

@Vladlis Vladlis moved this from In Progress to Review in Javadoc style coverage and parser optimization Aug 8, 2017

@romani romani added the bug label Aug 21, 2017

romani added a commit that referenced this issue Aug 21, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc

@romani romani added this to the 8.2 milestone Aug 21, 2017

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Aug 21, 2017

Member

fix is merged after discussion with Vladlis.

Member

romani commented Aug 21, 2017

fix is merged after discussion with Vladlis.

@romani romani closed this Aug 21, 2017

ArneLimburg pushed a commit to ArneLimburg/checkstyle that referenced this issue Aug 23, 2017

Issue #4390: Modified error handling of errors that are to be generat…
…ed when missing HTML tags are encountered while parsing javadoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment