Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(text): cases #4082

Merged
merged 21 commits into from
Jan 10, 2024
Merged

feat(text): cases #4082

merged 21 commits into from
Jan 10, 2024

Conversation

timreichen
Copy link
Contributor

@timreichen timreichen commented Jan 3, 2024

This is the successor of #3440. Big thanks to @Gustrb for the initial PR.

implements the following cases:

  • toCamelCase()
  • toKebabCase()
  • toPascalCase()
  • toSentenceCase()
  • toSnakeCase()
  • toTitleCase()

They are interoperable, so this works:

let input = "hello world";
input = input.toUpperCase();
input = toCamelCase(input);
input = toKebabCase(input);
input = toPascalCase(input);
input = toSnakeCase(input);
input = input.toLowerCase();
console.log(input) // "hello world"

@timreichen timreichen requested a review from kt3k as a code owner January 3, 2024 22:22
@github-actions github-actions bot added the text label Jan 3, 2024
Copy link
Contributor

@iuioiua iuioiua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! Just a few nits.

text/_util.ts Outdated Show resolved Hide resolved
text/_util.ts Outdated Show resolved Hide resolved
text/case.ts Outdated Show resolved Hide resolved
text/case.ts Outdated Show resolved Hide resolved
timreichen and others added 6 commits January 5, 2024 10:11
Co-authored-by: Asher Gomez <ashersaupingomez@gmail.com>
Co-authored-by: Asher Gomez <ashersaupingomez@gmail.com>
@timreichen
Copy link
Contributor Author

timreichen commented Jan 5, 2024

I updated the PR so split takes singleDelimiter and/or removeSpecialCharacters options. It also no longers trims as toSentenceCase and toTitleCase want to keep whitespace.

Note there might be data loss between cases, but I think that makes sense for those cases.

text/_util.ts Outdated
Comment on lines 8 to 23
export function split(
input: string,
{ singleDelimiter = false, removeSpecialCharacters }: SplitOptions = {},
) {
if (removeSpecialCharacters) {
input = input.replaceAll(/[^a-zA-Z0-9\s-_]/g, "");
}
if (singleDelimiter) {
if (/\s+/.test(input)) return input.split(/\s+/).filter(Boolean);
if (/-+/.test(input)) return input.split(/-+/).filter(Boolean);
if (/_+/.test(input)) return input.split(/_+/).filter(Boolean);
} else {
if (/[\s-_]+/.test(input)) return input.split(/[\s-_]+/).filter(Boolean);
}
return input.split(/(?=[A-Z])+/).filter(Boolean);
}
Copy link
Contributor

@iuioiua iuioiua Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revert this to the original version. It was good for the first-pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? I think it is an improvement that handles a bunch of edge cases concerning delimiters, whitespaces and special characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for now. I know it won't be what's published upon release, but it allows us to explore ideas. Frankly, I'm not a fan of the SplitOptions argument here. My idea was to perhaps have a 2nd optional parameter, say separator: string | RegExp, that the user can define but has a reasonable default. Again, I'd want to play with the idea a little first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you mean. But then again, that complicates things. Imo instead of providing configurations and options for special cases to the user, special cases should be handled on the string manually, or via preprocess and then apply the case after.
split() and SplitOptions is just used internally, so this can be changed if needed. The return values of public to*Case is what most would expected now imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this function is that there's a seemingly arbitrary order of precedence in the characters that the word is split by. I.e. the word is split by spaces if spaces are what's first detected. Then dashes. Then underscores. Why in that order?

I think we should just split by all of these delimiters in one hit, as I previously suggested. I understand that it'll also split words with delimiters of mixed types. But that seems like the correct behaviour to me and is behaviour that's easier to understand. Also, that's what lodash does.

import camelCase from "npm:lodash.camelcase";
console.log(camelCase("why hello-there")); // "whyHelloThere"

text/case.ts Outdated Show resolved Hide resolved
text/_util.ts Outdated Show resolved Hide resolved
text/_util.ts Outdated Show resolved Hide resolved
text/_util.ts Outdated
Comment on lines 8 to 23
export function split(
input: string,
{ singleDelimiter = false, removeSpecialCharacters }: SplitOptions = {},
) {
if (removeSpecialCharacters) {
input = input.replaceAll(/[^a-zA-Z0-9\s-_]/g, "");
}
if (singleDelimiter) {
if (/\s+/.test(input)) return input.split(/\s+/).filter(Boolean);
if (/-+/.test(input)) return input.split(/-+/).filter(Boolean);
if (/_+/.test(input)) return input.split(/_+/).filter(Boolean);
} else {
if (/[\s-_]+/.test(input)) return input.split(/[\s-_]+/).filter(Boolean);
}
return input.split(/(?=[A-Z])+/).filter(Boolean);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this function is that there's a seemingly arbitrary order of precedence in the characters that the word is split by. I.e. the word is split by spaces if spaces are what's first detected. Then dashes. Then underscores. Why in that order?

I think we should just split by all of these delimiters in one hit, as I previously suggested. I understand that it'll also split words with delimiters of mixed types. But that seems like the correct behaviour to me and is behaviour that's easier to understand. Also, that's what lodash does.

import camelCase from "npm:lodash.camelcase";
console.log(camelCase("why hello-there")); // "whyHelloThere"

text/case_test.ts Outdated Show resolved Hide resolved
@timreichen
Copy link
Contributor Author

timreichen commented Jan 9, 2024

@iuioiua I think we could solve that if we create a separate splitter for toSentenceCase() and toTitleCase() these two can contain whitespace. The order of delimiters doesn't matter for all others. I'lll do some tests today.
But the problem @kt3k pointed out for title case about grammar structure is still valid and probably cannot be solved without a language and grammar analyzer.

@timreichen
Copy link
Contributor Author

Ok, I looked at some more implementations and would suggest the following:

  • drop toTitleCase() for now because it needs to understand grammar for good output. (ref: this implementation has a basic grammar analysis but only in english)
  • rewrite split() regexp to split without delimiter order preference as suggested.
  • handle toSentenceCase() split separately with input.split(/\s/).

Copy link
Contributor

@iuioiua iuioiua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL at some of the suggestions in my previous review.

@iuioiua
Copy link
Contributor

iuioiua commented Jan 9, 2024

  • drop toTitleCase() for now because it needs to understand grammar for good output. (ref: this implementation has a basic grammar analysis but only in English)

I'm happy to add toTitleCase() later. I think we can have further discussions on how it can be done.

  • rewrite split() regexp to split without delimiter order preference as suggested.

👍🏾

  • handle toSentenceCase() split separately with input.split(/\s/).

One question I have is why each function must split its words differently. IMO, each phrase should be split in the same way. It'd make understanding and working with all these functions much easier and still provide good DX. WDYT, @kt3k and @timreichen?

@timreichen
Copy link
Contributor Author

One question I have is why each function must split its words differently. IMO, each phrase should be split in the same way. It'd make understanding and working with all these functions much easier and still provide good DX. WDYT, @kt3k and @timreichen?

What I meant was that toSentenceCase() is different, because it keeps whitespaces and special chars while all other functions do not. We can use splitToWords() for all except toSentenceCase().

@iuioiua
Copy link
Contributor

iuioiua commented Jan 10, 2024

I had a chat with Yoshiya. He also thinks that splitToWords() is perhaps too complex. I'm still adamant that all input strings should be split in an identical manner, irrespective of the output case.

It'd be good to get this PR over the line. I suggest we do the following:

  1. Remove toSentenceCase() for now. This can be tackled later.
  2. Simplify splitToWords() and ensure all functions split words in the same way. I.e. remove options from it.

That should be good enough for the first pass. We can discuss and iterate after.

@timreichen
Copy link
Contributor Author

I had a chat with Yoshiya. He also thinks that splitToWords() is perhaps too complex. I'm still adamant that all input strings should be split in an identical manner, irrespective of the output case.

It'd be good to get this PR over the line. I suggest we do the following:

  1. Remove toSentenceCase() for now. This can be tackled later.
  2. Simplify splitToWords() and ensure all functions split words in the same way. I.e. remove options from it.

That should be good enough for the first pass. We can discuss and iterate after.

Ok, done.

Copy link
Contributor

@iuioiua iuioiua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! LGTM. Thank you again, Tim.

@iuioiua iuioiua requested a review from kt3k January 10, 2024 09:30
Copy link
Member

@kt3k kt3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iuioiua iuioiua merged commit 3601c64 into denoland:main Jan 10, 2024
12 checks passed
@timreichen timreichen deleted the text_cases branch January 12, 2024 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants