Skip to content

v0.10.1

Choose a tag to compare

@chonknick chonknick released this 30 Mar 06:13
· 7 commits to main since this release

What's New

.patterns() API now available in Python and JavaScript

The multi-byte pattern support from v0.10.0 is now exposed in all binding layers:

Python:

from chonkie_core import Chunker, chunk, chunk_offsets

# Composable with delimiters
chunks = list(Chunker(text, delimiters="\n.?!", patterns=["。", ",", "!"]))

# Convenience function
for c in chunk(text, delimiters=".", patterns=["。"]):
    print(bytes(c))

# Offsets
offsets = chunk_offsets(text, delimiters=".", patterns=["。"])

JavaScript:

import { chunk, chunk_offsets, Chunker } from '@chonkiejs/chunk';

// Generator
for (const c of chunk(text, { delimiters: ".", patterns: ["。", ","] })) { ... }

// Offsets
const offsets = chunk_offsets(text, { delimiters: ".", patterns: ["。"] });

// Class
const chunker = new Chunker(text, { delimiters: ".", patterns: ["。"] });

Also fixes an existing bug in the JS wrapper where consecutive and forwardFallback options weren't being passed through in the non-pattern code path.