-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Labels
Type: Feature RequestIdentifies requests for new features or enhancements. These involve proposing new improvements.Identifies requests for new features or enhancements. These involve proposing new improvements.expression languageSupport for scripting languages (GREL, Python…)Support for scripting languages (GREL, Python…)localizationanything to do with i18n Internationalization and I10n localizationanything to do with i18n Internationalization and I10n localization
Milestone
Description
Although the fingerprint keyers currently do diacritic folding for alphabetic characters, they don't correctly handle all Unicode whitespace and punctuation characters.
Proposed solution
Both the FingeprintKeyer and NGramFingerprintKeyer should be extended to correctly handle all Unicode whitespace characters (e.g. em space, NBSP, ZWSP, etc) and punctuation.
Additionally, the (almost) duplicate code in the N-gram keyer should be removed and use the common methods from the fingerprint keyer to make maintenance easier and less bug prone.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type: Feature RequestIdentifies requests for new features or enhancements. These involve proposing new improvements.Identifies requests for new features or enhancements. These involve proposing new improvements.expression languageSupport for scripting languages (GREL, Python…)Support for scripting languages (GREL, Python…)localizationanything to do with i18n Internationalization and I10n localizationanything to do with i18n Internationalization and I10n localization