Add support for canonically normalized Unicode form translations in matching/search sub-commands #8

chrissimpkins · 2020-07-14T17:48:26Z

Add optional support for translation of text file Unicode code points to either NFC (composed) or NFD (decomposed) normalized forms in commands that match on text contents of files. This will reduce text data encoding variation by establishing a standard Unicode code point sequence for composed characters that have fully composed and separate decomposed (e.g., mark and base form components of the character) code points that define the same canonical form. This support will allow for text match consistency across composed characters that are canonically equivalent forms by allowing the user to define the underlying code point format to use in their matches vs. the pattern that they use to define the match on the command line.

chrissimpkins · 2020-07-15T16:45:20Z

FAQ reference in Unicode documentation: https://www.unicode.org/faq/normalization.html

chrissimpkins added this to the v0.5.0 milestone Jul 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for canonically normalized Unicode form translations in matching/search sub-commands #8

Add support for canonically normalized Unicode form translations in matching/search sub-commands #8

chrissimpkins commented Jul 14, 2020 •

edited

Loading

chrissimpkins commented Jul 15, 2020

Add support for canonically normalized Unicode form translations in matching/search sub-commands #8

Add support for canonically normalized Unicode form translations in matching/search sub-commands #8

Comments

chrissimpkins commented Jul 14, 2020 • edited Loading

chrissimpkins commented Jul 15, 2020

chrissimpkins commented Jul 14, 2020 •

edited

Loading