Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support unicode for subject-case rule #3574

Closed
1 of 4 tasks
amariq opened this issue Mar 29, 2023 · 14 comments
Closed
1 of 4 tasks

feat: support unicode for subject-case rule #3574

amariq opened this issue Mar 29, 2023 · 14 comments
Labels

Comments

@amariq
Copy link
Contributor

amariq commented Mar 29, 2023

Expected Behavior

Would it be okay to allow the subject-case rule handle not only latin but any other language alphabet?

Current Behavior

Currently, the subject-case rule skips validation if message doesn't start with a latin alphabet character (only a to z with case irrelevant, as of now).

It became like that 5 years ago with the following change made — @commitlint/rules/src/subject-case.ts#L14.

Affected packages

  • cli
  • core
  • prompt
  • config-angular

Possible Solution

The solution, as I see it, would only require to change the RegExp that is used to filter invalid subject messages.

So, change what is used now (which only includes latin alphabet):

subject.match(/^[a-z]/i)

to a pattern with unicode support:

subject.match(/^\p{L}/ui)

where \p{L} means any kind of letter from any language and the u flag enables it.

Context

Basically, if a commit subject is written in non-English language then subject-case rule is just skipped.

This feature would benefit someone who (for one reason or another) writes commit messages (or, rather, subject) using different language than English but still wants to follow the commit convention.

@amariq amariq added the feature label Mar 29, 2023
@escapedcat
Copy link
Member

Sure, sounds reasonable. Wanna create a PR and extend the test with a non-English language?

@amariq
Copy link
Contributor Author

amariq commented Mar 29, 2023

Oh yeah, gladly!

I'd do it right away, actually, but decided to make an issue first to ask for opinions in case someone would be against modifying the old behaviour 😇. Besides, if I understood it right, the rules can be added/overridden via plugins, so it could be an alternative way of doing it.

@amariq
Copy link
Contributor Author

amariq commented Mar 30, 2023

@escapedcat
Btw, I wonder what such change should really be — feat or fix?

It seems like a feature since it adds what never was there (support for unicode).

But it also seems to be a fix (and breaking at that, probably?) since it "patches" overly-strict RegExp, allowing to process more use-cases again as it was before that alphabetic symbol restriction was introduced (which purpose was mainly on skipping semver string patterns if I catched it right).

What do you think?

@escapedcat
Copy link
Member

Hm, yeah. Hard to decide for me as well. I feel like as long as all current tests are passing it's non-breaking, does that make sense? We could add it as feature I guess.

@amariq
Copy link
Contributor Author

amariq commented Mar 30, 2023

Yup, tests are passing. But those future changes might affect some scenarios that could be relying on fact that messages starting with some arbitrary symbols always considered valid. Or maybe I'm just over-thinking it... 😅

Can we ping someone who manages releases or something, maybe? To help sort this out.

I'll mark it as a feature in the meantime: commit is here.

PS. What I can think of:

  • English (Latin alphabet) letters: no changes
  • Numeric or other actually special symbols: no changes
  • non-English (non-Latin alphabet) letters: are now not preventing the rule from running, thus check results may vary

@escapedcat
Copy link
Member

True, I get your point.
Let's try to summon @knocte or @dangreen

@amariq
Copy link
Contributor Author

amariq commented Mar 31, 2023

I dug some more while we waiting and looked at the commit that added [a-z] check.

The reasoning was just to prevent case check fail on version-like strings (ie. 1.0.0), not to limit the character set it seems. But that blocked any strings which start with non-Latin letter from actually being processed anyway.

I think the strings like what would fail the check not because they don't start with a letter but because they only consist of symbols that do not have a case, so to speak (like numbers, punctuation symbols and even 'letters' in some languages, etc).

So, instead of messing with Unicode, we could probably simply replace RegExp with subject.toLowerCase() === subject.toUpperCase() to filter out 'case-less' strings that mess up the check result. But it would have even more impact on backward compatibility than Unicode-solution since then even strings like 1.0.0-beta.1 would've get checked and may lead to a different result... 🤔

@escapedcat
Copy link
Member

Also tagging @gilmoreorless maybe you have some opinion as well.

@gilmoreorless
Copy link
Contributor

Ha, that's what I get for having opinions on breaking changes; suddenly people think I have more opinions about breaking changes! 😆

So, keeping in mind that I'm not a maintainer of this project (and have only really looked at the code that relates to plugins, not the core rules themselves), here are my thoughts:

  1. First up, yay for proper Unicode support — nice work @amariq!
  2. I see the supporting of extended Unicode character ranges as an enhancement, so a feat scope. I get the concerns about possible breakages, but I think it's unlikely in reality. If someone's commit format does get broken by this change, it indicates that their config probably wasn't right in the first place, and it wasn't doing what they thought it was.
    1. Case in point: I've worked at several places that enforce an issue tracker ticket at the start of every commit, in the form of feat: [ABC-123] Actual message. This format completely bypasses the subject-case rule because the first character is a symbol [. So the config is kind of useless and varies across projects. If commitlint was ever updated to use only the text after the issue reference, some projects would break — but it would be because their config was already incorrect and no-one noticed. 😉
  3. Do any of the other *-case rules support extended Unicode characters? It could be confusing to have support only in subject-case but nothing else.

@amariq
Copy link
Contributor Author

amariq commented Apr 8, 2023

It could be confusing to have support only in subject-case but nothing else

Yep, I agree.
I checked other case-related rules and found that only header-case rule has the same restriction as the subject-case had... 🤔

This format completely bypasses the subject-case rule

Yes, I also been thinking about that.

And I think it could be fixed by moving always/never condition handling inside the ensure-case function.

That way, we would be able to check if a message can even be validated properly in the first place — that is, it does not consist of case-less symbols only nor contain one at the positions to be transformed — because comparing '123' with never upperFirst('123') otherwise will obviously never pass anyway. Or even try to strip/replace sequences of case-less symbols out of original message to only check the transformation of actual letters.

Plus, as a bonus, no more worrying about Unicode here and there. But since it would be better to refactor all case-related rules at once and me having missed header-case back then, I decided to start with 'fixing' the subject-case. 😇

@escapedcat
Copy link
Member

Ha, that's what I get for having opinions on breaking changes; suddenly people think I have more opinions about breaking changes! 😆

😆 It's because you do it so well! (btw can I quote this in a tweet?)

[...] I'm not a maintainer of this project [...]

Wanna talk?

Jokes aside, thanks for your feedback, apprecaited and helpful! ❤️

@amariq I guess then we could continue and merge your fetaure, right?

@gilmoreorless
Copy link
Contributor

It's because you do it so well! (btw can I quote this in a tweet?)

Haha go for it.

Wanna talk?

I've already fallen for that trap once; not again! 😆

@escapedcat
Copy link
Member

I've already fallen for that trap once; not again! 😆

image

@amariq
Copy link
Contributor Author

amariq commented Apr 10, 2023

I guess then we could continue and merge your fetaure, right?

If everything seems fine to you then yeah, I think its good to go 😀👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants