v0.31.0
Breaking Changes
- Updated tokenizers to v0.23 and tiktoken-rs to v0.12. Some Hugging Face tokenizers include truncation settings, and tokenizers v0.23 may return the truncated size for those tokenizers instead of exposing overflow encodings. Disable truncation with tokenizer.with_truncation(None) before constructing a splitter if chunk sizes should reflect the full input text.
Full Changelog: v0.30.1...v0.31.0