v0.10.1
What's New
.patterns() API now available in Python and JavaScript
The multi-byte pattern support from v0.10.0 is now exposed in all binding layers:
Python:
from chonkie_core import Chunker, chunk, chunk_offsets
# Composable with delimiters
chunks = list(Chunker(text, delimiters="\n.?!", patterns=["。", ",", "!"]))
# Convenience function
for c in chunk(text, delimiters=".", patterns=["。"]):
print(bytes(c))
# Offsets
offsets = chunk_offsets(text, delimiters=".", patterns=["。"])JavaScript:
import { chunk, chunk_offsets, Chunker } from '@chonkiejs/chunk';
// Generator
for (const c of chunk(text, { delimiters: ".", patterns: ["。", ","] })) { ... }
// Offsets
const offsets = chunk_offsets(text, { delimiters: ".", patterns: ["。"] });
// Class
const chunker = new Chunker(text, { delimiters: ".", patterns: ["。"] });Also fixes an existing bug in the JS wrapper where consecutive and forwardFallback options weren't being passed through in the non-pattern code path.