Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for canonically normalized Unicode form translations in matching/search sub-commands #8

Open
chrissimpkins opened this issue Jul 14, 2020 · 1 comment
Milestone

Comments

@chrissimpkins
Copy link
Owner

chrissimpkins commented Jul 14, 2020

Add optional support for translation of text file Unicode code points to either NFC (composed) or NFD (decomposed) normalized forms in commands that match on text contents of files. This will reduce text data encoding variation by establishing a standard Unicode code point sequence for composed characters that have fully composed and separate decomposed (e.g., mark and base form components of the character) code points that define the same canonical form. This support will allow for text match consistency across composed characters that are canonically equivalent forms by allowing the user to define the underlying code point format to use in their matches vs. the pattern that they use to define the match on the command line.

@chrissimpkins
Copy link
Owner Author

FAQ reference in Unicode documentation: https://www.unicode.org/faq/normalization.html

@chrissimpkins chrissimpkins added this to the v0.5.0 milestone Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant