New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reevaluate tokens in IllegalTokenText #3729

Closed
rnveach opened this Issue Jan 15, 2017 · 6 comments

Comments

Projects
None yet
3 participants
@rnveach
Member

rnveach commented Jan 15, 2017

Taken from PR #3723 ,
When reviewing the tokens of IllegalTokenText, it was noticed that some tokens have no text that users can change, they are constants. It doesn't make sense to leave these tokens for the purpose of this check. If user wants to disallow a token, they should IllegalToken instead.

We need to identify these constant value tokens and remove them from the check.
We should try to make a UT for this for when any new tokens are added.

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Jan 28, 2017

Member

There doesn't seem to be a way to identify token's with constant text from dynamic text unless we parse the grammar or generated java file.

Grammar: #(#[ANNOTATION,"ANNOTATION"],
Generate Java: astFactory.create(ANNOTATIONS,"ANNOTATIONS")

Member

rnveach commented Jan 28, 2017

There doesn't seem to be a way to identify token's with constant text from dynamic text unless we parse the grammar or generated java file.

Grammar: #(#[ANNOTATION,"ANNOTATION"],
Generate Java: astFactory.create(ANNOTATIONS,"ANNOTATIONS")

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Feb 15, 2017

Member

@Vladlis , can we reuse checkstyle/target/generated-sources/antlr/com/puppycrawl/tools/checkstyle/grammars/GeneratedJavaTokenTypes.txt file for this validation ?
or reuse logic that generate this file and avoid dependencies to this file and do all in memory.

related files:
https://github.com/checkstyle/checkstyle/blob/master/src/main/java/com/puppycrawl/tools/checkstyle/doclets/TokenTypesDoclet.java
checkstyle/target/classes/com/puppycrawl/tools/checkstyle/api/tokentypes.properties

Member

romani commented Feb 15, 2017

@Vladlis , can we reuse checkstyle/target/generated-sources/antlr/com/puppycrawl/tools/checkstyle/grammars/GeneratedJavaTokenTypes.txt file for this validation ?
or reuse logic that generate this file and avoid dependencies to this file and do all in memory.

related files:
https://github.com/checkstyle/checkstyle/blob/master/src/main/java/com/puppycrawl/tools/checkstyle/doclets/TokenTypesDoclet.java
checkstyle/target/classes/com/puppycrawl/tools/checkstyle/api/tokentypes.properties

@Vladlis

This comment has been minimized.

Show comment
Hide comment
@Vladlis

Vladlis Feb 15, 2017

Member

@romani , I don't see how these files can be useful
@rnveach , parsing might be too hard, try to find out what is the text of CTOR_CALL by parsing for example

I suppose we should try another @romani's idea: make a java file with all the tokens inside and see there whether names match texts taking into account a hardcoded list of exclusions (like text ; for SEMI).

Member

Vladlis commented Feb 15, 2017

@romani , I don't see how these files can be useful
@rnveach , parsing might be too hard, try to find out what is the text of CTOR_CALL by parsing for example

I suppose we should try another @romani's idea: make a java file with all the tokens inside and see there whether names match texts taking into account a hardcoded list of exclusions (like text ; for SEMI).

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Feb 15, 2017

Member

make a java file with all the tokens inside

This would work, but would require this file to have all tokens incase we update the grammar in the future. Even with all tokens, I'm not sure if we would need different ways of creating the tree in-case we use the same token differently elsewhere. The grammar file and it's outputted java file would have this information, but it may not be future proof if we move to antlr4.

All the files checkstyle\src\test\resources\com\puppycrawl\tools\checkstyle\grammars\InputRegressionJava* were made for this purpose. You could just reuse them by either re-running the Java tree or reading their printed parse tree.

taking into account a hardcoded list of exclusions

Doesn't that defeat the purpose of trying to make a list dynamically instead of hardcoding said list?
If we find a token with a value that is not the same as the token name, save it to a list and never look at that token again.

Member

rnveach commented Feb 15, 2017

make a java file with all the tokens inside

This would work, but would require this file to have all tokens incase we update the grammar in the future. Even with all tokens, I'm not sure if we would need different ways of creating the tree in-case we use the same token differently elsewhere. The grammar file and it's outputted java file would have this information, but it may not be future proof if we move to antlr4.

All the files checkstyle\src\test\resources\com\puppycrawl\tools\checkstyle\grammars\InputRegressionJava* were made for this purpose. You could just reuse them by either re-running the Java tree or reading their printed parse tree.

taking into account a hardcoded list of exclusions

Doesn't that defeat the purpose of trying to make a list dynamically instead of hardcoding said list?
If we find a token with a value that is not the same as the token name, save it to a list and never look at that token again.

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Feb 15, 2017

Member

I did review of issue one more time.

I do not see good dynamic model of verification.
To resolve this issue I recommend to:

  • make special UT in this check to control amount of Tokens to fail each time we will add new token.
  • in UT we should make hardcoded list of tokens that will not have text from source/input file.
  • to minimize hardcoded list, we can skip all tokens that have "LITERAL" in name (GeneratedJavaTokenTypes.java could help)
Member

romani commented Feb 15, 2017

I did review of issue one more time.

I do not see good dynamic model of verification.
To resolve this issue I recommend to:

  • make special UT in this check to control amount of Tokens to fail each time we will add new token.
  • in UT we should make hardcoded list of tokens that will not have text from source/input file.
  • to minimize hardcoded list, we can skip all tokens that have "LITERAL" in name (GeneratedJavaTokenTypes.java could help)

Vladlis added a commit to Vladlis/checkstyle that referenced this issue Feb 16, 2017

Vladlis added a commit to Vladlis/checkstyle that referenced this issue Feb 17, 2017

Vladlis added a commit to Vladlis/checkstyle that referenced this issue Feb 17, 2017

Vladlis added a commit to Vladlis/checkstyle that referenced this issue Feb 17, 2017

rnveach added a commit that referenced this issue Feb 18, 2017

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Feb 18, 2017

Member

Fix is merged

Member

rnveach commented Feb 18, 2017

Fix is merged

@rnveach rnveach closed this Feb 18, 2017

@rnveach rnveach added this to the 7.6 milestone Feb 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment