Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LANG-1453: using toUpperCase instead of toLowerCase solve the problem #420

Closed
wants to merge 3 commits into from
Closed

LANG-1453: using toUpperCase instead of toLowerCase solve the problem #420

wants to merge 3 commits into from

Conversation

geratorres
Copy link
Contributor

I changed the using of toLowerCase to toUpperCase on replace function on StringUtils. more detailed info is in a comment on Jira

@coveralls
Copy link

coveralls commented May 8, 2019

Coverage Status

Coverage increased (+0.007%) to 95.396% when pulling e510d42 on geratorres:LANG-1453-IdxOutBndsEx into 6934228 on apache:master.

@cesarte789
Copy link

cesarte789 commented May 8, 2019

Please, review the conversation of the pull request #340
It seems that using String#toUpperCase instead of String#toLowerCase doesn't fix all the cases.

@@ -2687,6 +2687,9 @@ public void testRemoveIgnoreCase_String() {

// StringUtils.removeIgnoreCase("queued", "zZ") = "queued"
assertEquals("queued", StringUtils.removeIgnoreCase("queued", "zZ"));

// StringUtils.removeIgnoreCase("İa", "a") = "İ"
assertEquals("İ", StringUtils.removeIgnoreCase("İa", "a"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to minimally replace "İ" with "\u0130" for this to properly work. For example try running:

System.out.println("İ");
System.out.println("\u0130");

only the second will give your desired output. That said, your point is still quite valid.

@chtompki
Copy link
Member

chtompki commented May 8, 2019

I agree with @cesartxt - it seems that this resolves the unit tests as written for StringUtils, but the question becomes are there similar lower case letters that would yield a similar result when ran through toUpperCase.

@chtompki
Copy link
Member

chtompki commented May 8, 2019

An example of another character that has this problem from the lowercase alphabet is ʼn or \u0149. So the question becomes how do we accommodate for these oddities? I'm open to thoughts here

@garydgregory
Copy link
Member

@chtompki
Copy link
Member

chtompki commented May 8, 2019

Another non-deprecated Unicode character that has this IndexOutOfBoundsException as well is \uFB01 or after switching to use toUpperCase as opposed to toLowerCase.

and

yes @garydgregory there is a point here as I read an article that proported that there was at least one Turkish loss of life due to a cell phone mistranslating the specific character in question in the proposed tests for LANG-1453

@geratorres
Copy link
Contributor Author

Thanks! for the context with this new info I'll be trying to find a good solution

@chtompki
Copy link
Member

chtompki commented May 8, 2019

Yeah. I fiddled with it some today and didn’t see any clever solution that wasn’t brute force. I think we may have to check both situations.

@geratorres
Copy link
Contributor Author

Closing this pull request because doesn't resolve the problem and the issue is duplicated with LANG-1406

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants