LANG-1453: using toUpperCase instead of toLowerCase solve the problem #420

geratorres · 2019-05-08T00:28:00Z

I changed the using of toLowerCase to toUpperCase on replace function on StringUtils. more detailed info is in a comment on Jira

coveralls · 2019-05-08T00:37:56Z

Coverage increased (+0.007%) to 95.396% when pulling e510d42 on geratorres:LANG-1453-IdxOutBndsEx into 6934228 on apache:master.

…aracter in iso-8859-1 encoding

cesarte789 · 2019-05-08T04:16:12Z

Please, review the conversation of the pull request #340
It seems that using String#toUpperCase instead of String#toLowerCase doesn't fix all the cases.

chtompki · 2019-05-08T14:56:34Z

src/test/java/org/apache/commons/lang3/StringUtilsTest.java

@@ -2687,6 +2687,9 @@ public void testRemoveIgnoreCase_String() {

        // StringUtils.removeIgnoreCase("queued", "zZ") = "queued"
        assertEquals("queued", StringUtils.removeIgnoreCase("queued", "zZ"));
+
+        // StringUtils.removeIgnoreCase("İa", "a") = "İ"
+        assertEquals("İ", StringUtils.removeIgnoreCase("İa", "a"));


We need to minimally replace "İ" with "\u0130" for this to properly work. For example try running:

System.out.println("İ"); System.out.println("\u0130");

only the second will give your desired output. That said, your point is still quite valid.

chtompki · 2019-05-08T14:57:35Z

I agree with @cesartxt - it seems that this resolves the unit tests as written for StringUtils, but the question becomes are there similar lower case letters that would yield a similar result when ran through toUpperCase.

chtompki · 2019-05-08T15:17:00Z

An example of another character that has this problem from the lowercase alphabet is ŉ or \u0149. So the question becomes how do we accommodate for these oddities? I'm open to thoughts here

garydgregory · 2019-05-08T15:28:04Z

Some background: https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/

chtompki · 2019-05-08T15:30:05Z

Another non-deprecated Unicode character that has this IndexOutOfBoundsException as well is \uFB01 or ﬁ after switching to use toUpperCase as opposed to toLowerCase.

and

yes @garydgregory there is a point here as I read an article that proported that there was at least one Turkish loss of life due to a cell phone mistranslating the specific character in question in the proposed tests for LANG-1453

geratorres · 2019-05-08T19:58:08Z

Thanks! for the context with this new info I'll be trying to find a good solution

chtompki · 2019-05-08T20:11:35Z

Yeah. I fiddled with it some today and didn’t see any clever solution that wasn’t brute force. I think we may have to check both situations.

geratorres · 2019-05-14T01:02:36Z

Closing this pull request because doesn't resolve the problem and the issue is duplicated with LANG-1406

LANG-1453: using toUpperCase instead of toLowerCase solve the problem

49cf9d7

geratorres added 2 commits May 7, 2019 18:51

Change file encoding to be UTF-8 to add assertion with unsupported ch…

39d2a80

…aracter in iso-8859-1 encoding

Fix assertion comment

e510d42

chtompki reviewed May 8, 2019

View reviewed changes

geratorres closed this May 14, 2019

geratorres deleted the LANG-1453-IdxOutBndsEx branch May 14, 2019 01:02

chtompki mentioned this pull request May 14, 2019

LANG-1406: avoid StringIndexOutOfBounds exceptions for some cases of … #422

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LANG-1453: using toUpperCase instead of toLowerCase solve the problem #420

LANG-1453: using toUpperCase instead of toLowerCase solve the problem #420

geratorres commented May 8, 2019

coveralls commented May 8, 2019 •

edited

cesarte789 commented May 8, 2019 •

edited

chtompki May 8, 2019

chtompki commented May 8, 2019

chtompki commented May 8, 2019

garydgregory commented May 8, 2019

chtompki commented May 8, 2019 •

edited

geratorres commented May 8, 2019

chtompki commented May 8, 2019

geratorres commented May 14, 2019

LANG-1453: using toUpperCase instead of toLowerCase solve the problem #420

LANG-1453: using toUpperCase instead of toLowerCase solve the problem #420

Conversation

geratorres commented May 8, 2019

coveralls commented May 8, 2019 • edited

cesarte789 commented May 8, 2019 • edited

chtompki May 8, 2019

Choose a reason for hiding this comment

chtompki commented May 8, 2019

chtompki commented May 8, 2019

garydgregory commented May 8, 2019

chtompki commented May 8, 2019 • edited

geratorres commented May 8, 2019

chtompki commented May 8, 2019

geratorres commented May 14, 2019

coveralls commented May 8, 2019 •

edited

cesarte789 commented May 8, 2019 •

edited

chtompki commented May 8, 2019 •

edited