v0.8.0
What's New
Token-aware Merging for RecursiveChunker
- Added
merge_splitsfunction to Rust, Python, and JavaScript bindings - Equivalent to Chonkie's Cython
_merge_splitsfunction - Supports whitespace-aware merging (n-1 join tokens for n segments)
Usage
Rust:
use chunk::merge_splits;
let token_counts = vec![1, 1, 1, 1, 1, 1, 1];
let result = merge_splits(&token_counts, 3, false);
// result.indices = [3, 6, 7]
// result.token_counts = [3, 3, 1]Python:
from chonkie_core import merge_splits
result = merge_splits([1, 1, 1, 1, 1, 1, 1], chunk_size=3)
# result.indices = [3, 6, 7]
# result.token_counts = [3, 3, 1]JavaScript:
import { init, merge_splits } from '@chonkiejs/chunk';
await init();
const result = merge_splits([1, 1, 1, 1, 1, 1, 1], 3);
// result.indices = [3, 6, 7]
// result.tokenCounts = [3, 3, 1]