-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate calls to String::trim() to Guava's whitespace trimming #5105
Comments
Actually, that's a bad idea because Java's definition of "whitespace" doesn't match Unicode's |
@elroykanye Are you working on this? |
Hi @sritejap , I have not had enough time to look into this. If you are interested in the issue, I can assign this to you. |
I could look into this if no one is working on it right now |
wrt this, I've found 1 usage of that class for white space stripping (link) Shall I leave this one as is? |
So we have different options for stripping whitespace/space characters from a string:
Character::IsWhitespace() specifically excludes non-breaking spaces ('\u00A0', '\u2007', '\u202F') which is why the tests are failing when you change the method there. Since the use of the Guava method was introduced explicitly so that non-breaking spaces were included in the GREL trim command (see f29f77e) it seems like we should preserve this behaviour. I would flag that trimming of non-breaking spaces is also being asked for explicitly by #5246 - but in a place we currently use String::strip() ... so I wonder if we should rather move to the use of the Guava method at least in that scenario for consistency and to meet this user need? |
Thanks @ostephens for investigating this - I have not looked much in the details but I trust your judgment and update the issue to reflect that. |
I agree with @ostephens analysis. I had forgotten that Java's definition of whitespace doesn't match the Unicode Consortium's definition, making https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html#trim() |
What is the expected result of running ./refine server_test with the latest build i.e. are all of the test cases passing? Just checking as the last time I did anything Java was 4 years ago so I may have screwed up the setup. PS - The build succeeds and I am able to run the app |
The expected status is success (it should write |
thanks, if that's the case then I've definitely messed up somewhere when setting up this repo in IntelliJ. For example this error comes from the latest build. |
This should have been fixed a few days ago by #5326, so make sure you have an up to date version of the master branch. |
Java 11 introduced theString::strip()
method, which strips whitespace around a string, including newer Unicode whitespace characters.We currently use the
String::trim()
method in a number of places. In many of those places, the string being trimmed can legitimately contain any Unicode character, soString::strip()
should be used instead.Since OpenRefine now requires Java 11+, we can migrate toString::strip()
in most places. Note thatString::trim()
apparently removes the\u0000
character andString::strip()
does not, but I am not sure if that is relevant anywhere for us.We should rather migrate to Guava's whitespace trimming, see @ostephens's post below.
The text was updated successfully, but these errors were encountered: