New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design a new format of suppression dtd schema to support XPath queries #4364

Closed
MEZk opened this Issue May 20, 2017 · 19 comments

Comments

4 participants
@MEZk
Contributor

MEZk commented May 20, 2017

From my and @timurt point of view, XPath for queering AST nodes will be a good idea for the new suppression model, because:

  1. XPath can be used for queering AST tree nodes in a simple and transparent way;
  2. There are two ready to use open-source libraries which allow to work with XPath to query AST nodes easily (Jaxen and Saxon). Both libraries are used by PMD, so we can adopt their experience. No need to reinvent the wheel;
  3. XPath is a good solution for xml-format as other tools such as PMD and Sonarqube already support it, and it will be easy for the programmer to write suppression rules for the Checkstyle’s violations if the format of the suppressions will be similar.
  4. User who is already familiar with AST tokens we use in the checks, can have the base understanding of tree structure which will help him to write XPath expression.

Based on the arguments, mentioned above, we decided that we need to use XPath in the new model.

What should be done in the scope of this issue:

  1. The decision on whether to extend the existing dtd schema or create a new one specially for the new suppression model should be made;
  2. In case we decide that we need to extend the existing schema, the new format of the suppression element need to be proposed, otherwise the new format of dtd schema should be introduced.

My proposal:

  1. We need to extend the existing dtd schema to avoid breaking of backward compatibility. When we introduce the new suppression model nobody should notice the problem. The projects which use the old line-based approach should have an ability to continue suppressing Checkstyle violations by lines.
  2. In his GSoC 2017 proposal @timurt suggested adding a new attribute for the existing suppression element:
<suppress-xpath
  checks="FileLength"
  files="TokenTypes.java"
  query="xpath expression here”
/>

I think that this is a good idea, but from my point of view, "xpath" would be a better name for the attribute. Such approach will allow us to support two models in the same file. User will not need to rewrite the whole suppression file from scratch.

Other solution is to add a new attribute to the existing "suppress" element:

<suppression
   checks="FileLength"
   lines="1,2,3"
   files="TokenTypes.java"
   xpath="xpath expression here”
/>

In the second approach the influence on dtd schema will be minimal.

Warning
In this issue we do not discuss all details of the implementation. We just discuss the new format of the dtd schema which will be a user-friendly and will not break backward compatibility.

@MEZk

This comment has been minimized.

Show comment
Hide comment
@MEZk

MEZk May 20, 2017

Contributor

@romani, @rnveach, @Vladlis, @sabaka, @timurt, please, join the discussion.

Contributor

MEZk commented May 20, 2017

@romani, @rnveach, @Vladlis, @sabaka, @timurt, please, join the discussion.

@MEZk MEZk changed the title from Introduce new format of suppression dtd schema to support XPath queries to Design a new format of suppression dtd schema to support XPath queries May 20, 2017

@MEZk MEZk assigned MEZk and unassigned MEZk May 20, 2017

@MEZk MEZk moved this from To Do to In Progress in Flexible Suppression Model May 20, 2017

@MEZk MEZk added the question label May 22, 2017

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach May 26, 2017

Member

Just my opinion,

User who is already familiar with AST tokens we use in the checks, can have the base understanding of tree structure which will help him to write XPath expression.

These type of users will be rare outside our own development team. Some people probably have delved into writing their own checks, but I expect most don't.
Some of our AST tree doesn't make sense, even to us, and we have open issues to fix them.
Also, any changes to our grammar has the possibility to break xpath suppression especially if we move to Antlr4 for Java.

extend the existing dtd schema or create a new one
Other solution is to add a new attribute to the existing "suppress" element

I like this solution the best.

the new format of the suppression element
<suppression ... xpath="xpath expression here”

Again, I like this solution.

Such approach will allow us to support two models in the same file.

I don't see it as 2 separate models, just that we are adding a new feature onto the existing model.
For example, I see no reason you can't use xpath and files together. I assume only file's AST will be part of the xpath and not external things like file/folder name.

Member

rnveach commented May 26, 2017

Just my opinion,

User who is already familiar with AST tokens we use in the checks, can have the base understanding of tree structure which will help him to write XPath expression.

These type of users will be rare outside our own development team. Some people probably have delved into writing their own checks, but I expect most don't.
Some of our AST tree doesn't make sense, even to us, and we have open issues to fix them.
Also, any changes to our grammar has the possibility to break xpath suppression especially if we move to Antlr4 for Java.

extend the existing dtd schema or create a new one
Other solution is to add a new attribute to the existing "suppress" element

I like this solution the best.

the new format of the suppression element
<suppression ... xpath="xpath expression here”

Again, I like this solution.

Such approach will allow us to support two models in the same file.

I don't see it as 2 separate models, just that we are adding a new feature onto the existing model.
For example, I see no reason you can't use xpath and files together. I assume only file's AST will be part of the xpath and not external things like file/folder name.

@MEZk

This comment has been minimized.

Show comment
Hide comment
@MEZk

MEZk May 28, 2017

Contributor

@rnveach
Thanks for the answer.

Also, any changes to our grammar has the possibility to break xpath suppression especially if we move to Antlr4 for Java.

The original GSoC task says that

We need to invent new Suppression model that is flexible to code changes. This will allow user to have legacy code and new code in the same file and not suffer from updating suppression configuration on any code changes. Unlike of suppression that is based on lines' numbers in file, new model should be based on [Abstract Syntax Tree (AST) ](http://en.wikipedia.org /wiki/Abstract_syntax_tree) structure.

Thus, the new model should be based on the AST structure. I could not find any better ways to refer AST structure, except for XPath. We do not change token types names very often. Thus, from my point of view, we can rely on their names in the xpath expression. I thought about other solutions, for example, in the issue #2136 it is requested to add the new attribute "methods" to suppress methods with the specified names. This approach is not good as it will require to support many arguments to suppress violations for classes, methods, variables, and other language units. In the issue #2804 you suggest suppressing by message. However, it will require writing of regular expressions and sometimes the expression may become bigger and bigger, for example in import control configuration.

These type of users will be rare outside our own development team. Some people probably have delved into writing their own checks, but I expect most don't.

That is why it is required to create suppression file generator to ease suppression configuration creation for legacy code. I will increase the priority of this task.

Contributor

MEZk commented May 28, 2017

@rnveach
Thanks for the answer.

Also, any changes to our grammar has the possibility to break xpath suppression especially if we move to Antlr4 for Java.

The original GSoC task says that

We need to invent new Suppression model that is flexible to code changes. This will allow user to have legacy code and new code in the same file and not suffer from updating suppression configuration on any code changes. Unlike of suppression that is based on lines' numbers in file, new model should be based on [Abstract Syntax Tree (AST) ](http://en.wikipedia.org /wiki/Abstract_syntax_tree) structure.

Thus, the new model should be based on the AST structure. I could not find any better ways to refer AST structure, except for XPath. We do not change token types names very often. Thus, from my point of view, we can rely on their names in the xpath expression. I thought about other solutions, for example, in the issue #2136 it is requested to add the new attribute "methods" to suppress methods with the specified names. This approach is not good as it will require to support many arguments to suppress violations for classes, methods, variables, and other language units. In the issue #2804 you suggest suppressing by message. However, it will require writing of regular expressions and sometimes the expression may become bigger and bigger, for example in import control configuration.

These type of users will be rare outside our own development team. Some people probably have delved into writing their own checks, but I expect most don't.

That is why it is required to create suppression file generator to ease suppression configuration creation for legacy code. I will increase the priority of this task.

@MEZk

This comment has been minimized.

Show comment
Hide comment
@MEZk

MEZk May 28, 2017

Contributor

@romani
Are you OK with this format:

<suppression
   checks="FileLength"
   lines="1,2,3"
   files="TokenTypes.java"
   xpath="xpath expression here”
/>
Contributor

MEZk commented May 28, 2017

@romani
Are you OK with this format:

<suppression
   checks="FileLength"
   lines="1,2,3"
   files="TokenTypes.java"
   xpath="xpath expression here”
/>
@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach May 28, 2017

Member

We do not change token types names very often.

@MEZk I was referring to changing the AST structure, not names. As an example, we have 1 issue opened for variable declarations that are wrong which may be a common suppression area. #3151

I was not specifically saying we shouldn't do this xpath suppression with AST, I was just stating one problem that will arise with this and break compatibility between releases.

Member

rnveach commented May 28, 2017

We do not change token types names very often.

@MEZk I was referring to changing the AST structure, not names. As an example, we have 1 issue opened for variable declarations that are wrong which may be a common suppression area. #3151

I was not specifically saying we shouldn't do this xpath suppression with AST, I was just stating one problem that will arise with this and break compatibility between releases.

@timurt

This comment has been minimized.

Show comment
Hide comment
@timurt

timurt May 31, 2017

Collaborator

@MEZk @rnveach
I just wanted to clarify,

<suppression
   checks="FileLength"
   lines="1,2,3"
   files="TokenTypes.java"
   xpath="xpath expression here”
/>

Here lines and xpath attributes have not effect on each other? it is like two suppression models in one?

Collaborator

timurt commented May 31, 2017

@MEZk @rnveach
I just wanted to clarify,

<suppression
   checks="FileLength"
   lines="1,2,3"
   files="TokenTypes.java"
   xpath="xpath expression here”
/>

Here lines and xpath attributes have not effect on each other? it is like two suppression models in one?

@MEZk

This comment has been minimized.

Show comment
Hide comment
@MEZk

MEZk Jun 1, 2017

Contributor

Here lines and xpath attributes have not effect on each other? it is like two suppression models in one?

No, as they should be handled by different filters. We will apply all filters which are in filter chain. If at least one filter denies audit event, than there is no need in filtering by others.

Contributor

MEZk commented Jun 1, 2017

Here lines and xpath attributes have not effect on each other? it is like two suppression models in one?

No, as they should be handled by different filters. We will apply all filters which are in filter chain. If at least one filter denies audit event, than there is no need in filtering by others.

@MEZk

This comment has been minimized.

Show comment
Hide comment
@MEZk

MEZk Jun 1, 2017

Contributor

@romani
ping

Contributor

MEZk commented Jun 1, 2017

@romani
ping

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Jun 1, 2017

Member

@MEZk
if

<suppression
   checks="FileLength"
   lines="1,2,3"
   files="TokenTypes.java"
   xpath="xpath expression here”
/>

is handled the same as

<suppression
   xpath="xpath expression here”
/>
<suppression
   checks="FileLength"
   lines="1,2,3"
   files="TokenTypes.java"
/>

then the combined suppression seems misleading to me as I expected all attributes to be combined together, the same way files and lines does now.
This is the way I understood the proposed model when we specified they would be all together.

If you are saying that even though they are combined they are handled as separately, then I see no reason to keep them together, even though I would like that way better.
It hasn't been specified what are the limits of what we can query on for xpath. I assumed file name would not be possible in xpath since it is only based on the Java tree. So how would I be able to specify a specific xpath for a specific file to avoid conflicts over many files?

Member

rnveach commented Jun 1, 2017

@MEZk
if

<suppression
   checks="FileLength"
   lines="1,2,3"
   files="TokenTypes.java"
   xpath="xpath expression here”
/>

is handled the same as

<suppression
   xpath="xpath expression here”
/>
<suppression
   checks="FileLength"
   lines="1,2,3"
   files="TokenTypes.java"
/>

then the combined suppression seems misleading to me as I expected all attributes to be combined together, the same way files and lines does now.
This is the way I understood the proposed model when we specified they would be all together.

If you are saying that even though they are combined they are handled as separately, then I see no reason to keep them together, even though I would like that way better.
It hasn't been specified what are the limits of what we can query on for xpath. I assumed file name would not be possible in xpath since it is only based on the Java tree. So how would I be able to specify a specific xpath for a specific file to avoid conflicts over many files?

@MEZk

This comment has been minimized.

Show comment
Hide comment
@MEZk

MEZk Jun 1, 2017

Contributor

@rnveach

then the combined suppression seems misleading to me as I expected all attributes to be combined together, the same way files and lines does now.
This is the way I understood the proposed model when we specified they would be all together.

Lines and Xpath should never be combined. Xpath model is opposite to line-based. Maybe you are right and it is better to add a new element to the schema (suppression-xpath) to make it clear to the user that we do not support mixing of line-based suppressions with xpath. The schema should validate that suppression-xpath element does not have lines attribute.

It hasn't been specified what are the limits of what we can query on for xpath. I assumed file name would not be possible in xpath since it is only based on the Java tree. So how would I be able to specify a specific xpath for a specific file to avoid conflicts over many files?

There can be 2 or more source files with the same class name, for example. suppression-xpath without file name attribute will require to specify package name token to distinguish two classes with the same name. So, lets use files attribute. Some Checks can violate same AST nodes. In this case xpath queries will be the same. Thus, lets use checks attribute for suppression-xpath element.

If you agree, lets add a new element to the schema:

<suppression-xpath
  checks="MethodLength"
  files="MyClass.java, FooBar.java"
  query="//ClassDef//MethodDef[foo]”
/>

and save the old one

<suppression
   checks="FileLength"
   lines="1,2,3"
   files="TokenTypes.java"
/>

@timurt
Please, share your opinion.

Contributor

MEZk commented Jun 1, 2017

@rnveach

then the combined suppression seems misleading to me as I expected all attributes to be combined together, the same way files and lines does now.
This is the way I understood the proposed model when we specified they would be all together.

Lines and Xpath should never be combined. Xpath model is opposite to line-based. Maybe you are right and it is better to add a new element to the schema (suppression-xpath) to make it clear to the user that we do not support mixing of line-based suppressions with xpath. The schema should validate that suppression-xpath element does not have lines attribute.

It hasn't been specified what are the limits of what we can query on for xpath. I assumed file name would not be possible in xpath since it is only based on the Java tree. So how would I be able to specify a specific xpath for a specific file to avoid conflicts over many files?

There can be 2 or more source files with the same class name, for example. suppression-xpath without file name attribute will require to specify package name token to distinguish two classes with the same name. So, lets use files attribute. Some Checks can violate same AST nodes. In this case xpath queries will be the same. Thus, lets use checks attribute for suppression-xpath element.

If you agree, lets add a new element to the schema:

<suppression-xpath
  checks="MethodLength"
  files="MyClass.java, FooBar.java"
  query="//ClassDef//MethodDef[foo]”
/>

and save the old one

<suppression
   checks="FileLength"
   lines="1,2,3"
   files="TokenTypes.java"
/>

@timurt
Please, share your opinion.

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Jun 5, 2017

Member

So, lets use files attribute.
Thus, lets use checks attribute

I agree.

Lines and Xpath should never be combined.

While this is probably generally true, I don't see a downside to allowing it, but I don't know the full syntax we will support in XPath. lines allows ranges. Even, if XPath supports everything lines does, using lines attribute will still be faster as fail first before doing xpath parsing.

They will be processed separately by different filters.

As discussed in chat, CsvFilter and the like are actually, what I will call for now, sub-filters. So you will be creating a new sub-filter and not a new filter (like SuppressionFilter) for the xpaths.

it is better to add a new element to the schema (suppression-xpath)

You will still be using SuppressElement which has lines in it's declaration, so you still have to 'technically' support it no matter what. The only thing the new tag is for is to dis-allow putting lines and xpath together in suppression, which as I stated above, I don't see it as a bad thing.

Member

rnveach commented Jun 5, 2017

So, lets use files attribute.
Thus, lets use checks attribute

I agree.

Lines and Xpath should never be combined.

While this is probably generally true, I don't see a downside to allowing it, but I don't know the full syntax we will support in XPath. lines allows ranges. Even, if XPath supports everything lines does, using lines attribute will still be faster as fail first before doing xpath parsing.

They will be processed separately by different filters.

As discussed in chat, CsvFilter and the like are actually, what I will call for now, sub-filters. So you will be creating a new sub-filter and not a new filter (like SuppressionFilter) for the xpaths.

it is better to add a new element to the schema (suppression-xpath)

You will still be using SuppressElement which has lines in it's declaration, so you still have to 'technically' support it no matter what. The only thing the new tag is for is to dis-allow putting lines and xpath together in suppression, which as I stated above, I don't see it as a bad thing.

@MEZk

This comment has been minimized.

Show comment
Hide comment
@MEZk

MEZk Jun 5, 2017

Contributor

@timurt

  1. Please add the new element to the suppression dtd schema;
  2. Bump schema version;
  3. Add schema to catalog file.
Contributor

MEZk commented Jun 5, 2017

@timurt

  1. Please add the new element to the suppression dtd schema;
  2. Bump schema version;
  3. Add schema to catalog file.
@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Jun 7, 2017

Member

@timurt , @rnveach , @MEZk
Sorry for delay.

unfortunately dtd syntax does not allow "choice" for attributes - https://www.w3.org/TR/REC-xml/#attdecls
so we can restrict to use line or xpath by dtd. We cannot allow the both attributes to exists as they are on the same level of match - it will not be intuitive for user.
So new tag is ok to appear.

as new schema is about to appear please make its version as experimental suppressions_1_1_xpath_experimental.dtd, register it in code, but do not update other code to use it. It should not be public till we ready to share Xpath functionality with community.

<suppression-xpath
  checks="MethodLength"
  files="MyClass.java, FooBar.java"
  query="//ClassDef//MethodDef[foo]”
/>

I am agree of format. files has to be but as optional attribute, as it is a different level of match could be used for folder match, example <suppress checks="IllegalCatch" files="[\\/]internal[\\/]\w+Util\.java"/> from https://github.com/checkstyle/checkstyle/blob/master/config/suppressions.xml#L33
or <suppress checks="WriteTag" files=".*[\\/]src[\\/](test|it)[\\/]"/>

. I will increase the priority of this task.

@MEZk , no need. Lets make Xpath work first. If any changes happen to AST structure, user will need to relaunch suppression generator. We will think about how to ease users life when we come to this problem, we will have this problem ourself first :) (as we eat food that we produce).

Member

romani commented Jun 7, 2017

@timurt , @rnveach , @MEZk
Sorry for delay.

unfortunately dtd syntax does not allow "choice" for attributes - https://www.w3.org/TR/REC-xml/#attdecls
so we can restrict to use line or xpath by dtd. We cannot allow the both attributes to exists as they are on the same level of match - it will not be intuitive for user.
So new tag is ok to appear.

as new schema is about to appear please make its version as experimental suppressions_1_1_xpath_experimental.dtd, register it in code, but do not update other code to use it. It should not be public till we ready to share Xpath functionality with community.

<suppression-xpath
  checks="MethodLength"
  files="MyClass.java, FooBar.java"
  query="//ClassDef//MethodDef[foo]”
/>

I am agree of format. files has to be but as optional attribute, as it is a different level of match could be used for folder match, example <suppress checks="IllegalCatch" files="[\\/]internal[\\/]\w+Util\.java"/> from https://github.com/checkstyle/checkstyle/blob/master/config/suppressions.xml#L33
or <suppress checks="WriteTag" files=".*[\\/]src[\\/](test|it)[\\/]"/>

. I will increase the priority of this task.

@MEZk , no need. Lets make Xpath work first. If any changes happen to AST structure, user will need to relaunch suppression generator. We will think about how to ease users life when we come to this problem, we will have this problem ourself first :) (as we eat food that we produce).

@timurt

This comment has been minimized.

Show comment
Hide comment
@timurt

timurt Jun 7, 2017

Collaborator

@MEZk @romani
should I include new dtd inside catalog.xml?

Collaborator

timurt commented Jun 7, 2017

@MEZk @romani
should I include new dtd inside catalog.xml?

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Jun 7, 2017

Member

@timurt , yes

Member

romani commented Jun 7, 2017

@timurt , yes

@MEZk

This comment has been minimized.

Show comment
Hide comment
@MEZk

MEZk Jun 7, 2017

Contributor

@romani

files has to be but as optional attribute, as it is a different level of match could be used for folder match

Yes, it will be the optional attribute. XPath will allow to refer to the package name.

Contributor

MEZk commented Jun 7, 2017

@romani

files has to be but as optional attribute, as it is a different level of match could be used for folder match

Yes, it will be the optional attribute. XPath will allow to refer to the package name.

timurt added a commit to timurt/checkstyle that referenced this issue Jun 8, 2017

romani added a commit that referenced this issue Jun 8, 2017

@MEZk MEZk moved this from In Progress to Done in Flexible Suppression Model Jun 9, 2017

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Jun 9, 2017

Member

@MEZk if this issue is done, please close it.

Member

rnveach commented Jun 9, 2017

@MEZk if this issue is done, please close it.

@rnveach rnveach added this to the 7.8.2 milestone Jun 9, 2017

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Jun 9, 2017

Member

@romani Please confirm, is this just a miscellaneous change?

Member

rnveach commented Jun 9, 2017

@romani Please confirm, is this just a miscellaneous change?

@MEZk MEZk closed this Jun 9, 2017

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Jun 10, 2017

Member

yes, we do not want to make it public, so it is internal update , so "misc" label should be used to move it in releasenotes to "Notes" group

Member

romani commented Jun 10, 2017

yes, we do not want to make it public, so it is internal update , so "misc" label should be used to move it in releasenotes to "Notes" group

@romani romani removed this from the 7.8.2 milestone Jun 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment