v0.2.0
A major release. Project rebrand from pinyin-sort → hanzi-sort, the public
library API was reshaped around a pluggable Collator trait, three new
opt-in collators (Cantonese Jyutping, Mandarin Zhuyin, Kangxi Radical) joined
Pinyin and Strokes, and the CLI grew Unix-friendly stdin / -r / -u
behavior plus shell completions. Tone3 data normalization now treats
neutral tone as 5, fixing a long-standing dictionary-order bug for
characters like 了. See breakdown below.
Highlights
- 5 sort schemes: Pinyin, Strokes, Jyutping, Zhuyin, Radical (all but
Pinyin/Strokes are opt-incargo features). - Stable sort guarantee: equal-key inputs preserve input order in every
collator. - Stdin-friendly CLI:
cat names.txt | hanzi-sortworks the way Unix
users expect. - Pluggable collator API: the trait is exposed; downstream Rust code can
add its own collators without forking the crate.
Added
Library
Collatortrait,Mapped<T>,CharToken<T>,SortKey<T>,sort_key_of,
andsort_strings_with— a pluggable per-character sort strategy
abstraction.AnyCollatorenum for runtime collator selection across all enabled
schemes.PinyinCollator(renamed fromPinyinContext),StrokesCollator,
JyutpingCollator,ZhuyinCollator,RadicalCollator.PinyinCollator::with_override(and the analogous fallible builder for
any future override-aware collators) for explicit override loading.RuntimeConfig::with_uniqueandRuntimeConfig::with_reverse
builder-style setters.InputSource::Stdinvariant on the publicInputSourceenum.
CLI
-r/--reverseflag.-u/--uniqueflag (adjacent dedup;uniqueis applied beforereverse).hanzi-sort completions <shell>subcommand emitting a completion script
for bash / zsh / fish / powershell / elvish (viaclap_complete).--helpexamples block (after the options list).--helpdisplays default values for every printable option.- Stdin fallback when neither
--filenor--textis provided and stdin
is non-TTY;-f -accepted as a stdin alias. - CLI rejects
--configtogether with any non-pinyin--sort-by(override
is pinyin-specific).
New collators (opt-in)
--sort-by jyutping/--features collator-jyutping— Cantonese Jyutping
from UnihankCantonese(~30k characters covered).--sort-by zhuyin/--features collator-zhuyin— Mandarin Zhuyin /
Bopomofo derived from the bundled pinyin data.--sort-by radical/--features collator-radical— Kangxi radical
index + residual stroke count from UnihankRSUnicode.
Build / testing / docs
criterionbenchmark suite (cargo bench) covering every collator's
sort path pluspinyin_ofandformat_itemsat 1k / 10k / 100k inputs.proptest-based property tests verifying: encoded sort key preserves
byte-wise lex order, unchecked vs checked encoders agree on valid input,
sort is idempotent, sort is a permutation, sort key induces a total order.build.rsvalidates exact column counts, codepoint↔char correspondence,
primary syllable ASCII /≤16bytes / mandatory tone digit, on every
generated map.CONTRIBUTING.mdwith a step-by-step recipe for adding a new collator
and a worktree-parallel workflow note.- CI now runs
cargo test --all-featuresin addition to default features
and verifies all benchmarks compile under all features. PinyinOverride::validaterejects empty phrase keys, empty syllables,
non-ASCII syllables, and tone3 shapes outside^[a-z]+[1-5]$.- Integration coverage for stdin behavior,
-r/-ucomposition, completions
output, override correctness, and per-collator CLI invocation.
Changed
Breaking (library)
- Renamed crate / binary from
pinyin-sorttohanzi-sort. No
compatibility alias. - Renamed
PinyinContexttoPinyinCollator. - Renamed
PinyinSortErrortoHanziSortError. RuntimeConfig::newsignature is now(input, format, collator: AnyCollator)
instead of(input, format, override_data, sort_mode). The old
RuntimeConfig::newandRuntimeConfig::with_sort_modeare gone.app::renderdispatches viaAnyCollator::sortinstead ofsort_strings_by.PinyinCollator::new()is now infallible (no override); use
PinyinCollator::with_override(PinyinOverride)for override-aware
construction.encode_primary_pinyinreturns aResultinstead of panicking on
invalid input.
Breaking (data semantics)
- Toneless primary syllables in
data/pinyin.csvare now normalized to
neutral tone5(e.g.了 → le5instead of了 → le). Override
validation and build-time checks both enforce that every syllable ends
in a tone digit1-5, so neutral-tone characters now sort after
tone-4 variants instead of before tone-1 variants — matching
conventional Chinese dictionary ordering. Override files using toneless
syllables (e.g.'了' = 'le') must update to'le5'.
Behavior (non-breaking)
--file/--textare mutually exclusive at parse time;--filereads
one non-blank line per record and rejects directory inputs.-o/--outputwrites to a file instead of stdout (instead of being
silently ignored as in0.1.1).- Formatting width calculations use terminal display width
(unicode-width), andleft/rightalignment semantics are
corrected. build.rsreports row / codepoint context on failure and re-runs when
the build script itself changes.
Removed
- public
SortModeenum (replaced byAnyCollatorvariants). - public
sort_strings/sort_strings_by(usesort_strings_withor
AnyCollator::sort). - internal
EncodedSortToken/EncodedSortKey/compare_encoded_sort_key
(subsumed by the trait-based key infrastructure).
Fixed
- preserve unknown characters in sort keys instead of dropping them.
- return non-zero exits for invalid input, invalid override config, and
write failures. - include the first CSV record in the generated pinyin map so
〇
resolves correctly. - preserve original input order for duplicate or equal-key entries via an
index tiebreak in both pinyin and stroke sort.