Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate calls to String::trim() to Guava's whitespace trimming #5105

Closed
wetneb opened this issue Jul 24, 2022 · 12 comments · Fixed by #5336
Closed

Migrate calls to String::trim() to Guava's whitespace trimming #5105

wetneb opened this issue Jul 24, 2022 · 12 comments · Fixed by #5336
Assignees
Labels
Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. java version compatibility Making sure OpenRefine runs on as many Java versions as possible Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.

Comments

@wetneb
Copy link
Member

wetneb commented Jul 24, 2022

Java 11 introduced the String::strip() method, which strips whitespace around a string, including newer Unicode whitespace characters.
We currently use the String::trim() method in a number of places. In many of those places, the string being trimmed can legitimately contain any Unicode character, so String::strip() should be used instead.

Since OpenRefine now requires Java 11+, we can migrate to String::strip() in most places. Note that String::trim() apparently removes the \u0000 character and String::strip() does not, but I am not sure if that is relevant anywhere for us.

We should rather migrate to Guava's whitespace trimming, see @ostephens's post below.

@wetneb wetneb added Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. java version compatibility Making sure OpenRefine runs on as many Java versions as possible labels Jul 24, 2022
@elroykanye elroykanye self-assigned this Aug 16, 2022
@tfmorris
Copy link
Member

tfmorris commented Sep 12, 2022

It's probably worth replacing the Guava com.google.common.base.CharMatcher calls used for whitespace stripping as well to minimize external dependencies.

Actually, that's a bad idea because Java's definition of "whitespace" doesn't match Unicode's

@sritejap
Copy link

@elroykanye Are you working on this?

@elroykanye
Copy link
Member

Hi @sritejap , I have not had enough time to look into this. If you are interested in the issue, I can assign this to you.

@Abbe98 Abbe98 added the Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. label Sep 24, 2022
@vijal-patel
Copy link
Contributor

I could look into this if no one is working on it right now

@wetneb wetneb assigned vijal-patel and unassigned elroykanye Sep 30, 2022
@vijal-patel
Copy link
Contributor

vijal-patel commented Oct 11, 2022

It's probably worth replacing the Guava com.google.common.base.CharMatcher calls used for whitespace stripping as well to minimize external dependencies.

wrt this, I've found 1 usage of that class for white space stripping (link)
but chaning this to use .strip() results in unit test failures for non-breaking space, figure space etc.

Shall I leave this one as is?

@ostephens
Copy link
Member

So we have different options for stripping whitespace/space characters from a string:

  • String::trim() -> strips any character whose codepoint is less than or equal to 'U+0020' (the space character).
  • String::strip() -> strips any characters for which Character::IsWhitspace() == TRUE
  • com.google.common.base.CharMatcher.whitespace().trimFrom() -> strips any characters defined by Unicode as whitespace

Character::IsWhitespace() specifically excludes non-breaking spaces ('\u00A0', '\u2007', '\u202F') which is why the tests are failing when you change the method there.

Since the use of the Guava method was introduced explicitly so that non-breaking spaces were included in the GREL trim command (see f29f77e) it seems like we should preserve this behaviour.

I would flag that trimming of non-breaking spaces is also being asked for explicitly by #5246 - but in a place we currently use String::strip() ... so I wonder if we should rather move to the use of the Guava method at least in that scenario for consistency and to meet this user need?

@wetneb
Copy link
Member Author

wetneb commented Oct 11, 2022

Thanks @ostephens for investigating this - I have not looked much in the details but I trust your judgment and update the issue to reflect that.

@wetneb wetneb changed the title Migrate calls to String::trim() to String::strip() Migrate calls to String::trim() to Guava's whitespace trimming Oct 11, 2022
@tfmorris
Copy link
Member

@vijal-patel
Copy link
Contributor

vijal-patel commented Oct 12, 2022

What is the expected result of running ./refine server_test with the latest build i.e. are all of the test cases passing? Just checking as the last time I did anything Java was 4 years ago so I may have screwed up the setup.

PS - The build succeeds and I am able to run the app

@wetneb
Copy link
Member Author

wetneb commented Oct 12, 2022

The expected status is success (it should write BUILD SUCCESS at the end).

@vijal-patel
Copy link
Contributor

vijal-patel commented Oct 12, 2022

thanks, if that's the case then I've definitely messed up somewhere when setting up this repo in IntelliJ. For example this error comes from the latest build.
[ERROR] ToStringTests.testToString:50 expected [100.0] but found [100,0]

@wetneb
Copy link
Member Author

wetneb commented Oct 13, 2022

This should have been fixed a few days ago by #5326, so make sure you have an up to date version of the master branch.
Can we move this discussion to a support forum (such as the openrefine-dev mailing list or the Gitter channel)? We try to keep GitHub issues focused on one particular topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. java version compatibility Making sure OpenRefine runs on as many Java versions as possible Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
7 participants