Tokenizer 1.37.0

guillaumekln released this 28 Feb 15:06

· 7 commits to master since this release

New features

Add tokenization option allow_isolated_marks to allow combining marks to appear isolated in the tokenization output in specific conditions

Fixes and improvements

Fix infinite loop when the text contains an invalid Unicode character
Fix segmentation fault when the BPELearner does not not find any pairs of characters in the tokenized data
[Python] Update ICU to 72.1

Assets 2