New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indentation UTs should not use ROOT locale when they test violation/error message #4003

Closed
romani opened this Issue Mar 14, 2017 · 10 comments

Comments

Projects
None yet
3 participants
@romani
Member

romani commented Mar 14, 2017

taken from #3993 (comment)

The IndentationCheckTest is quite different from other tests. There are many verifyWarns in test methods. The actual assertion of verifyWarns is at https://github.com/checkstyle/checkstyle/blob/master/src/test/java/com/puppycrawl/tools/checkstyle/checks/indentation/IndentationCheckTest.java#L1785-L1786, that to use an endsWith but not equals to judge the correctness.

The getExpectedMessage() is defined here, https://github.com/checkstyle/checkstyle/blob/master/src/test/java/com/puppycrawl/tools/checkstyle/checks/indentation/IndentationCheckTest.java#L1818-L1827
And the error message in English could be one of the followings:

indentation.error.multi=''{0}'' have incorrect indentation level {1}, expected level should be one of the following: {2}.
indentation.child.error.multi=''{0}'' child have incorrect indentation level {1}, expected level should be one of the following: {2}.
indentation.error=''{0}'' have incorrect indentation level {1}, expected level should be {2}.
indentation.child.error=''{0}'' child have incorrect indentation level {1}, expected level should be {2}.

The endWith way could work when the message is in English or Chinese. But in many other languages, it will not work at all. The messages could have different postfixes due to the difference of singular-plural or even pronounce stress.
i.e. in Finnish

indentation.error.multi=''{0}'' on virheellinen sisennystason {1}, odottaa tasolla pitäisi olla yksi seuraavista: {2}.
indentation.child.error.multi=''{0}'' lapsi on väärä sisennystason {1}, odottaa tasolla pitäisi olla yksi seuraavista: {2}.
indentation.error={0} sisennyssyvyydellä {1} ei ole oikealla syvyydellä {2}.
indentation.child.error=''{0}'' lapsi on väärä sisennystason {1}, odottaa taso olisi {2}.

We can see the messages have totally different postfixes.

I have logged what information the IndentComment really has and found that it doesn't have the token type info at all. It only has the actual and expected intent.
So we can only know that the message is smth like ''{an unknown token type}'' have incorrect indentation level {actual intent}, expected level should be {expected intent}. I guess it is why the original contributor didn't use equals but endWith.

To sum up, I haven't found another way to do the assertion without endWith. And if we are using endWith, it is not able to do the assertion in many non-English languages. Therefore, I use the old/ugly way to enforce the locale in English.

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Mar 14, 2017

Member

So problem is test was written in no localization consideration.
https://github.com/checkstyle/checkstyle/blob/master/src/test/java/com/puppycrawl/tools/checkstyle/checks/indentation/IndentationCheckTest.java#L1841

return "incorrect indentation level " + (indent + indentOffset)
+ ", expected level should be " + expectedWarning + ".";

so looks like whole problems is due to "getExpectedMessage" method that generate message in English base on some number and string value.
It looks like we can load all localized messages and do substitution of arguments ourself. So in result of this method will be localized message so "endWith" will pass.

@rnveach , am I miss something ? please share you ideas on this.

Member

romani commented Mar 14, 2017

So problem is test was written in no localization consideration.
https://github.com/checkstyle/checkstyle/blob/master/src/test/java/com/puppycrawl/tools/checkstyle/checks/indentation/IndentationCheckTest.java#L1841

return "incorrect indentation level " + (indent + indentOffset)
+ ", expected level should be " + expectedWarning + ".";

so looks like whole problems is due to "getExpectedMessage" method that generate message in English base on some number and string value.
It looks like we can load all localized messages and do substitution of arguments ourself. So in result of this method will be localized message so "endWith" will pass.

@rnveach , am I miss something ? please share you ideas on this.

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Mar 14, 2017

Member

@romani sounds right. If we are just going to put the arguments into the message, it might be better to drop endWith and just do an equals on the entire localized string.

Member

rnveach commented Mar 14, 2017

@romani sounds right. If we are just going to put the arguments into the message, it might be better to drop endWith and just do an equals on the entire localized string.

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Mar 14, 2017

Member

@Luolc , please help us to resolve this issue.

Member

romani commented Mar 14, 2017

@Luolc , please help us to resolve this issue.

@Luolc

This comment has been minimized.

Show comment
Hide comment
@Luolc

Luolc Mar 15, 2017

Contributor

@romani

It looks like we can load all localized messages and do substitution of arguments ourself. So in result of this method will be localized message so "endWith" will pass.

Actually I have tried it in some early commits of #3993 , smth like:
https://github.com/Luolc/checkstyle/blob/e44df4b5803c6e713c8e8e80ff185fa4a07eccf5/src/test/java/com/puppycrawl/tools/checkstyle/checks/indentation/IndentationCheckTest.java#L1820.
But I failed. As I mentioned in the previous PR, the problem is:

The messages could have different postfixes due to the difference of singular-plural or even pronounce stress.

The messages we need to concern about are indentation.error, indentation.error.multi, indentation.child.error.multi, indentation.child.error.
In getExpectedMessage() method we only split the messages into two cases: multi or not multi.

In English, indentation.error and indentation.child.error have the same postfix have incorrect indentation level {1}, expected level should be {2}. But in Finnish and some other languages they are different. So one thing must be done at first is to split the messages into four independent cases. I don't know how to do that or whether possible to do that now. I will investigate this.

@rnveach

If we are just going to put the arguments into the message, it might be better to drop endWith and just do an equals on the entire localized string.

The messages are smth like ''{0}'' have incorrect indentation level {1}, expected level should be {2}. which need three arguments, but we only have the last two.

I have logged what information the IndentComment really has and found that it doesn't have the token type info at all. It only has the actual and expected intent.
So we can only know that the message is smth like ''{an unknown token type}'' have incorrect indentation level {actual intent}, expected level should be {expected intent}. I guess it is why the original contributor didn't use equals but endWith.

https://github.com/checkstyle/checkstyle/blob/master/src/test/java/com/puppycrawl/tools/checkstyle/checks/indentation/IndentationCheckTest.java#L1827-L1839
There are two arguments of the constructor of IndentComment, a Matcher instance and the line number. I logged the plain text of matcher, it is a specific line of input file, smth like

do System.getProperty("foo"); while (test); //indent:0 exp:8 warn

We need to find a way to parse the token type of it if we want to drop endWith. i.e. for the text above, we need to know the first argument is do...while. I will look over the codes and try to find a solution.

---
That's what I have got during my work on #3993 . There are quite a few things need to be investigated by now. I will update once I get anything helpful later 😄

Contributor

Luolc commented Mar 15, 2017

@romani

It looks like we can load all localized messages and do substitution of arguments ourself. So in result of this method will be localized message so "endWith" will pass.

Actually I have tried it in some early commits of #3993 , smth like:
https://github.com/Luolc/checkstyle/blob/e44df4b5803c6e713c8e8e80ff185fa4a07eccf5/src/test/java/com/puppycrawl/tools/checkstyle/checks/indentation/IndentationCheckTest.java#L1820.
But I failed. As I mentioned in the previous PR, the problem is:

The messages could have different postfixes due to the difference of singular-plural or even pronounce stress.

The messages we need to concern about are indentation.error, indentation.error.multi, indentation.child.error.multi, indentation.child.error.
In getExpectedMessage() method we only split the messages into two cases: multi or not multi.

In English, indentation.error and indentation.child.error have the same postfix have incorrect indentation level {1}, expected level should be {2}. But in Finnish and some other languages they are different. So one thing must be done at first is to split the messages into four independent cases. I don't know how to do that or whether possible to do that now. I will investigate this.

@rnveach

If we are just going to put the arguments into the message, it might be better to drop endWith and just do an equals on the entire localized string.

The messages are smth like ''{0}'' have incorrect indentation level {1}, expected level should be {2}. which need three arguments, but we only have the last two.

I have logged what information the IndentComment really has and found that it doesn't have the token type info at all. It only has the actual and expected intent.
So we can only know that the message is smth like ''{an unknown token type}'' have incorrect indentation level {actual intent}, expected level should be {expected intent}. I guess it is why the original contributor didn't use equals but endWith.

https://github.com/checkstyle/checkstyle/blob/master/src/test/java/com/puppycrawl/tools/checkstyle/checks/indentation/IndentationCheckTest.java#L1827-L1839
There are two arguments of the constructor of IndentComment, a Matcher instance and the line number. I logged the plain text of matcher, it is a specific line of input file, smth like

do System.getProperty("foo"); while (test); //indent:0 exp:8 warn

We need to find a way to parse the token type of it if we want to drop endWith. i.e. for the text above, we need to know the first argument is do...while. I will look over the codes and try to find a solution.

---
That's what I have got during my work on #3993 . There are quite a few things need to be investigated by now. I will update once I get anything helpful later 😄

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Mar 15, 2017

Member

@Luolc , Correct me if I am wrong ...

  1. Exact validation for message we check in UT method. Here we have a case when we do extra validation for commented content of input file. So we could be less strict in matching or match to 1 of two messages in input comments validation process.

  2. as we have "," in comment we should match to "indentation.error.multi" or "indentation.child.error.multi" (such messages are completely different between each other) it is OK. We just need to check that message match any of this 2. Exact message match is done in UT method where expected array is defined.
    It is additional validation, it could be a bit relaxed.

@rnveach , please step in , may be am wrong in some of the points.

Member

romani commented Mar 15, 2017

@Luolc , Correct me if I am wrong ...

  1. Exact validation for message we check in UT method. Here we have a case when we do extra validation for commented content of input file. So we could be less strict in matching or match to 1 of two messages in input comments validation process.

  2. as we have "," in comment we should match to "indentation.error.multi" or "indentation.child.error.multi" (such messages are completely different between each other) it is OK. We just need to check that message match any of this 2. Exact message match is done in UT method where expected array is defined.
    It is additional validation, it could be a bit relaxed.

@rnveach , please step in , may be am wrong in some of the points.

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Mar 15, 2017

Member

@Luolc Sorry, it has been a while since I had to look at this test since I made it.

@romani He is correct. We are missing the 3rd message property, the token type, to fully produce the violation message.

In English, indentation.error and indentation.child.error have the same postfix ... But in Finnish and some other languages they are different.

All messages have the indexes in order, but we shouldn't be relying on this as it could change in the future. comments.indentation message has them out of order (1,2,0) for fi.

So we could be less strict in matching or match to 1 of two messages in input comments validation process.

The problem isn't with we are dealing with too many messages.
We only retrieve the full message from the auditer, not the single out properties. We can't be sure where each message property is located for each specific locale message without some other extra testing.

Exact message match is done in UT method
We just need to check that message match any of this 2.

We are already doing this.
The problem is the location of the message properties.
English is {0} some text {1} some text {2}.
If another language does {1} some text {2} some text {0} or {1} some text {0} some text {2} (see numbers are out of order) then it makes it harder to match anything up.
This is why we are using endsWith right now as we are expecting the expected properties to be at the end. Right now we are missing property 0, which is the token type.

It is additional validation, it could be a bit relaxed.

The only way to relax this is to remove 3-way input validation completely.


Here are my thoughts on how to fix this:

  1. Add the missing token type to the indentation warn comment and then we will have the complete message.
  2. Indentation is special, so how about we split this into 2 test classes. First uses no locale and will test all messages in one test with no 3-way input verification. Localization testing only cares that the messages come out as expected. Second test class uses English locale and does current regression doing 3-way input verification on all tests.
  3. I have already created another issue at #3815 , where we should capture more information from the check to do more validation and find errors. If we make an injectable method call at IndentationCheck.indentationLog and expand it like I did in my example at #3815 , we can capture everything we need to do whatever we want.

Edit: updated for @romani's comments.

Member

rnveach commented Mar 15, 2017

@Luolc Sorry, it has been a while since I had to look at this test since I made it.

@romani He is correct. We are missing the 3rd message property, the token type, to fully produce the violation message.

In English, indentation.error and indentation.child.error have the same postfix ... But in Finnish and some other languages they are different.

All messages have the indexes in order, but we shouldn't be relying on this as it could change in the future. comments.indentation message has them out of order (1,2,0) for fi.

So we could be less strict in matching or match to 1 of two messages in input comments validation process.

The problem isn't with we are dealing with too many messages.
We only retrieve the full message from the auditer, not the single out properties. We can't be sure where each message property is located for each specific locale message without some other extra testing.

Exact message match is done in UT method
We just need to check that message match any of this 2.

We are already doing this.
The problem is the location of the message properties.
English is {0} some text {1} some text {2}.
If another language does {1} some text {2} some text {0} or {1} some text {0} some text {2} (see numbers are out of order) then it makes it harder to match anything up.
This is why we are using endsWith right now as we are expecting the expected properties to be at the end. Right now we are missing property 0, which is the token type.

It is additional validation, it could be a bit relaxed.

The only way to relax this is to remove 3-way input validation completely.


Here are my thoughts on how to fix this:

  1. Add the missing token type to the indentation warn comment and then we will have the complete message.
  2. Indentation is special, so how about we split this into 2 test classes. First uses no locale and will test all messages in one test with no 3-way input verification. Localization testing only cares that the messages come out as expected. Second test class uses English locale and does current regression doing 3-way input verification on all tests.
  3. I have already created another issue at #3815 , where we should capture more information from the check to do more validation and find errors. If we make an injectable method call at IndentationCheck.indentationLog and expand it like I did in my example at #3815 , we can capture everything we need to do whatever we want.

Edit: updated for @romani's comments.

@rnveach

This comment has been minimized.

Show comment
Hide comment
@rnveach

rnveach Mar 15, 2017

Member

@Luolc Me and @romani talked offline.

We will not think about the future of what messages will be. It will probably become a requirement for this check that message properties be in numerical order for this check for testing.
Messages are currently in numerical order, so we will rely on that. Since they are in order, we will continue to chop off property 0, and validate 1 and 2.


So this is what we should do with the issue.

  1. Let's make a test that verifies message is in property order. Add assertion that if it fails, message must be changed so properties are in order.

https://github.com/Luolc/checkstyle/blob/e44df4b5803c6e713c8e8e80ff185fa4a07eccf5/src/test/java/com/puppycrawl/tools/checkstyle/checks/indentation/IndentationCheckTest.java#L1820
We can use this code, we just need to chop off property 0 and return the rest as the result.
We could put something unique into 0, like ###### and do an indexOf to find it and do a substring after it. If anyone knows a quicker, better way, please mention it.

@Luolc This sound like it will work?
@romani Please confirm if I missed anything.

Member

rnveach commented Mar 15, 2017

@Luolc Me and @romani talked offline.

We will not think about the future of what messages will be. It will probably become a requirement for this check that message properties be in numerical order for this check for testing.
Messages are currently in numerical order, so we will rely on that. Since they are in order, we will continue to chop off property 0, and validate 1 and 2.


So this is what we should do with the issue.

  1. Let's make a test that verifies message is in property order. Add assertion that if it fails, message must be changed so properties are in order.

https://github.com/Luolc/checkstyle/blob/e44df4b5803c6e713c8e8e80ff185fa4a07eccf5/src/test/java/com/puppycrawl/tools/checkstyle/checks/indentation/IndentationCheckTest.java#L1820
We can use this code, we just need to chop off property 0 and return the rest as the result.
We could put something unique into 0, like ###### and do an indexOf to find it and do a substring after it. If anyone knows a quicker, better way, please mention it.

@Luolc This sound like it will work?
@romani Please confirm if I missed anything.

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Mar 15, 2017

Member

Confirmed

Member

romani commented Mar 15, 2017

Confirmed

Luolc added a commit to Luolc/checkstyle that referenced this issue Mar 16, 2017

Luolc added a commit to Luolc/checkstyle that referenced this issue Mar 16, 2017

@Luolc

This comment has been minimized.

Show comment
Hide comment
@Luolc

Luolc Mar 16, 2017

Contributor

@rnveach Yes, sounds good. Thanks. 😄 I have a similar idea as yours.
I have made a commit following your suggestions. Since it is impossible to distinguish between indentation.error and indentation.child.error currently, I just check that message matches either of the valid two.

Contributor

Luolc commented Mar 16, 2017

@rnveach Yes, sounds good. Thanks. 😄 I have a similar idea as yours.
I have made a commit following your suggestions. Since it is impossible to distinguish between indentation.error and indentation.child.error currently, I just check that message matches either of the valid two.

Luolc added a commit to Luolc/checkstyle that referenced this issue Mar 16, 2017

Luolc added a commit to Luolc/checkstyle that referenced this issue Mar 16, 2017

Luolc added a commit to Luolc/checkstyle that referenced this issue Mar 16, 2017

Luolc added a commit to Luolc/checkstyle that referenced this issue Mar 17, 2017

Luolc added a commit to Luolc/checkstyle that referenced this issue Mar 17, 2017

Luolc added a commit to Luolc/checkstyle that referenced this issue Mar 19, 2017

Luolc added a commit to Luolc/checkstyle that referenced this issue Mar 19, 2017

romani added a commit that referenced this issue Mar 19, 2017

@romani romani added this to the 7.7 milestone Mar 19, 2017

@romani

This comment has been minimized.

Show comment
Hide comment
@romani

romani Mar 19, 2017

Member

fix is merged.

Member

romani commented Mar 19, 2017

fix is merged.

@romani romani closed this Mar 19, 2017

SergeyDzyuba added a commit to SergeyDzyuba/checkstyle that referenced this issue Mar 22, 2017

GitToasterhub added a commit to GitToasterhub/checkstyle that referenced this issue Mar 26, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment