Skip to content

[TEXT-189] Fix CaseUtils when the input string contains only delimiters#179

Closed
ZhuGongpu wants to merge 1 commit intoapache:masterfrom
ZhuGongpu:fix-case-utils
Closed

[TEXT-189] Fix CaseUtils when the input string contains only delimiters#179
ZhuGongpu wants to merge 1 commit intoapache:masterfrom
ZhuGongpu:fix-case-utils

Conversation

@ZhuGongpu
Copy link
Contributor

Given a str that only contains delimiters, the output of CaseUtils.toCamelCase should be "", instead of the original str.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0005%) to 98.671% when pulling c4253a6 on ZhuGongpu:fix-case-utils into 704f726 on apache:master.

@kinow
Copy link
Member

kinow commented Oct 27, 2020

@ZhuGongpu great catch! Thanks for the pull request, and for updating the tests.

I wonder if we could improve the Javadocs too? Maybe one more <p> paragraph with something like "An empty String with only delimiter characters, removes the delimiters", or something like that. And also include an example from the tests in the <pre> showing what we mean by the text in the new paragraph.

We will also need a JIRA for the changelog in the next release. Could you create one in https://issues.apache.org/jira/projects/TEXT/issues and update the title of this PR to [TEXT-????] Fix CaseUtils, please?

@garydgregory, @chtompki, and others, I think this is a legit issue. Imagine you have two combo boxes in a web page, with name and surname, and for whatever reason you want to camel case the two.

In master, if the user selects name="Randy" and surname="Marsh", CaseUtils.toCamelCase(String.format("%s %s", name, surname), false) returns "randyMarsh".

But if the user does not select values, and there is no validation in the web page or on the server side, then the code called would have name="" and surname="", and the invocation be CaseUtils.toCamelCase(String.format("%s %s", name, surname), false) which results in " " (an empty space). It's worse if you have multiple spaces.

IMO, we should be consistent and always remove the delimiters (unless the given str is null, as per docs).

@garydgregory
Copy link
Member

The existing method needs a review first IMO. For this patch though, I don't see why applying the function to a string should change its size, that's just weird to me.

One problem I see with the current implementation is the use of the default character set in the call to toLowerCase(). At the very least, we need a test to account for the "Turkish Surprise", see https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/ and see also the last comment about “Kedi”.toUpperCase() which I am not sure applies to title case.

@kinow
Copy link
Member

kinow commented Oct 27, 2020

The existing method needs a review first IMO. For this patch though, I don't see why applying the function to a string should change its size, that's just weird to me.

I think that's just the way it was designed. You provide the delimiters for each part of a camel-cased String. Say you have a snake-cased String s="no-more-spaces", then CaseUtils.toCamelCase(s, false, '-') gives you noMoreSpaces. The function expects a delimiter that tells it where a new token to be camel-cased starts, and replaces this delimiter when camel-casing.

IMHO, it's more intuitive to be consistent with the removal of delimiters for every non-empty String. So s="a---" passing through CaseUtils.toCamelCase(s, false, '-') gives you [a], and s="---" gives you [], and not [___](i.e. three empty spaces) asmaster` does at the moment.

One problem I see with the current implementation is the use of the default character set in the call to toLowerCase(). At the very least, we need a test to account for the "Turkish Surprise", see https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/ and see also the last comment about “Kedi”.toUpperCase() which I am not sure applies to title case.

+1 but I think this doesn't need to block this change, we can create a separate ticket about surrogate pairs in CaseUtils methods for camel-case (I suspect if we inspect more of [text], we would find more places where we could find similar issues). They can be fixed separately. I'd be fine with either way.

@garydgregory
Copy link
Member

The existing method needs a review first IMO. For this patch though, I don't see why applying the function to a string should change its size, that's just weird to me.

I think that's just the way it was designed. You provide the delimiters for each part of a camel-cased String. Say you have a snake-cased String s="no-more-spaces", then CaseUtils.toCamelCase(s, false, '-') gives you noMoreSpaces. The function expects a delimiter that tells it where a new token to be camel-cased starts, and replaces this delimiter when camel-casing.

Gotcha, makes sense, I was thinking of the case of " " -> "", where a transformation took place and its not any more camel cased than before, which feels a bit misleading.

IMHO, it's more intuitive to be consistent with the removal of delimiters for every non-empty String. So s="a---" passing through CaseUtils.toCamelCase(s, false, '-') gives you [a], and s="---" gives you [], and not [___](i.e. three empty spaces) asmaster` does at the moment.

One problem I see with the current implementation is the use of the default character set in the call to toLowerCase(). At the very least, we need a test to account for the "Turkish Surprise", see https://garygregory.wordpress.com/2015/11/03/java-lowercase-conversion-turkey/ and see also the last comment about “Kedi”.toUpperCase() which I am not sure applies to title case.

+1 but I think this doesn't need to block this change, we can create a separate ticket about surrogate pairs in CaseUtils methods for camel-case (I suspect if we inspect more of [text], we would find more places where we could find similar issues). They can be fixed separately. I'd be fine with either way.

Agreed.

@ZhuGongpu ZhuGongpu changed the title Fix CaseUtils [TEXT-189] Fix CaseUtils Oct 28, 2020
@ZhuGongpu
Copy link
Contributor Author

@ZhuGongpu great catch! Thanks for the pull request, and for updating the tests.

I wonder if we could improve the Javadocs too? Maybe one more <p> paragraph with something like "An empty String with only delimiter characters, removes the delimiters", or something like that. And also include an example from the tests in the <pre> showing what we mean by the text in the new paragraph.

We will also need a JIRA for the changelog in the next release. Could you create one in https://issues.apache.org/jira/projects/TEXT/issues and update the title of this PR to [TEXT-????] Fix CaseUtils, please?

@garydgregory, @chtompki, and others, I think this is a legit issue. Imagine you have two combo boxes in a web page, with name and surname, and for whatever reason you want to camel case the two.

In master, if the user selects name="Randy" and surname="Marsh", CaseUtils.toCamelCase(String.format("%s %s", name, surname), false) returns "randyMarsh".

But if the user does not select values, and there is no validation in the web page or on the server side, then the code called would have name="" and surname="", and the invocation be CaseUtils.toCamelCase(String.format("%s %s", name, surname), false) which results in " " (an empty space). It's worse if you have multiple spaces.

IMO, we should be consistent and always remove the delimiters (unless the given str is null, as per docs).

Done.

@kinow kinow changed the title [TEXT-189] Fix CaseUtils [TEXT-189] Fix CaseUtils when the input string contains only delimiters Oct 28, 2020
@kinow kinow closed this in b35cb56 Oct 28, 2020
@kinow
Copy link
Member

kinow commented Oct 28, 2020

Merged in b35cb56, thanks @ZhuGongpu !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants