Skip to content

Tokenizer 1.37.0

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 28 Feb 15:06
· 7 commits to master since this release

New features

  • Add tokenization option allow_isolated_marks to allow combining marks to appear isolated in the tokenization output in specific conditions

Fixes and improvements

  • Fix infinite loop when the text contains an invalid Unicode character
  • Fix segmentation fault when the BPELearner does not not find any pairs of characters in the tokenized data
  • [Python] Update ICU to 72.1