Implementation Of Case-Insensitive Mode #27
Replies: 4 comments 8 replies
-
They are not mutually exclusive, and many RegEx library offer both solution. The parameter allows to set the base case-sensitiveness mode, whereas the in-line modifier allows to handle exceptions within complex RegExs. Usually the default mode is case-sensitive, so I'd suggest the parameter should be
I think that's the most popular modifier, so I'd stick with that. Besides switching the case-sensitivity context, this also allows to create case-sensitive specific group definitions, e.g.:
where the modifier only applies to the group within parenthesis. There's also a further advantage in having the double standard of parameter option plus inline modifier: the same RegEx definition can be used in different contexts (i.e. with different parameter settings), where the inline modifiers only enforce specific sensitiveness in some places, leaving the main context open to reuse according to parameter.
These symbols are used for named capturing groups in many RegEx engines, so I'd rather leave them free for future uses. Also, I'd rather use the Case Sensitive How-toHow are you going to implement case-sensitiveness? For the base Latin characters (i.e. those of the ASCII set, with no accents or diacritics) there's always the old bitwise trick
But if you need to take into account accented letters and special Latin letters with diacritics, etc., then things are not so simple. Not to mention the complexity of Unicode when it comes to supporting case-sensitivity across all languages that support different letter casing in Unicode. You might want to have a look at the Unicode/ICU documentation and libraries regarding the various algorithms for implementing case-sensitiveness:
In general, case-operations in Unicode are rather complex, and there are various level of support (and equivalent flag/options) for this feature. Even if you're not planning to add full Unicode aware support of case-sensitive operations, it would still be good to have a clear understanding of how Unicode classifies these operations so that you may:
|
Beta Was this translation helpful? Give feedback.
-
I have now looked at everything, made a decision and created an issue (#28) where the implementation details are described. Thanks, @tajmone. |
Beta Was this translation helpful? Give feedback.
-
I thought about it again, and I think I'll leave out the syntax I also wonder if a parameter is really necessary because a RegEx can be reused this way:
|
Beta Was this translation helpful? Give feedback.
-
When I took a closer look at Unicode's case-folding, I realized that case-folding makes the characters in the match not only independent of upper and lower case, but also of more character variants. Example:
The feature is now implemented. |
Beta Was this translation helpful? Give feedback.
-
@tajmone
I am currently thinking about how to implement enabling/disabling case-insensitive mode.
My first thought was to give the
Create(regEx$)
function an optionalcaseInsensitive
parameter (#True
/#False
), but it probably makes more sense to change the mode within the RegEx.https://www.regular-expressions.info/modifiers.html
According to the website, other RegEx engines use
(?i)
to enable and(?-i)
to disable case-insensitive mode. But the other RegEx engines also support some other mode that justifies such a syntax.Maybe this syntax would be good: RegEx between
<
and>
is case-insensitive.What do you think?
Beta Was this translation helpful? Give feedback.
All reactions