Releases: Acture/hanzi-sort
v0.2.2
A docs-and-packaging release. The prebuilt release binaries now ship every
collator, the --help/rustdoc text no longer claims the opt-in collators are
unimplemented, and the README leads with direct binary usage.
Changed
- Prebuilt release binaries (GitHub Releases) are now built with
--all-features, so a downloaded binary includes all five sort schemes
(pinyin, strokes, jyutping, zhuyin, radical) with no recompile. Source
installs viacargo installstill default to pinyin + strokes; add the
collator-*features (or--all-features) to opt in. - README restructured to separate Usage from Development, leading
with the prebuilt-binary install path, a per-platform asset table, and a
full read→sort→format→write pipeline example.README.crates.mdpoints at
the prebuilt binaries for an all-collators install.
Fixed
--sort-byhelp text forjyutping,zhuyin, andradicaldescribed them
as "Phase 3.1 … placeholder until implemented" even though they have shipped
since 0.2.0. The possible-values now carry accurate one-line descriptions,
andpinyin/strokesgained descriptions too. The same stale wording was
removed from the rustdoc onAnyCollator::zhuyin/AnyCollator::radical
and thejyutping/zhuyin/radicalmodule docs.--blank-everyhelp wording now says "every N rows" (was "every N lines"),
matching--columnsand the README.- Homebrew tap auto-sync never ran: the
Sync Homebrew Tapworkflow triggered
onrelease: published, but releases are created by github-actions[bot] with
GITHUB_TOKEN, which GitHub does not allow to trigger further workflows. It
now triggers on the tag push, and the generated formula builds with
--all-features. (The tap formula had been stuck at 0.1.1.)
v0.2.1
A small follow-up to 0.2.0 that finishes the deferred Jyutping API
parity work, hardens CI, and exposes a sort entry point that lets
library users verify the crate's stable-sort guarantee directly.
Added
JyutpingOverridestruct (gated bycollator-jyutping) with the same
TOML schema asPinyinOverridebut tone digits in1..=6. Loadable
viaJyutpingOverride::load_from_file.JyutpingCollator::with_override(JyutpingOverride) -> Result<Self>
for building an override-aware collator.JyutpingCollator::jyutping_of(&str)returns readings honoring any
configured phrase or per-character overrides.AnyCollator::jyutping_with_overrideconstructor.- CLI accepts
--config <path>together with--sort-by jyutping; the
file is parsed as aJyutpingOverride(tone 1-6) when sort_by is
Jyutping and as aPinyinOverride(tone 1-5) for the default mode. sort_indices_with<C>(&[String], &C) -> Vec<usize>exposes the sort
permutation, which makes the index-tiebreak stability guarantee
directly verifiable from outside the crate.- Cargo.toml
rust-version = "1.85"(MSRV declared, matching edition
2024 requirement). .gitattributesenforcing LF line endings on text files (CSVs,
Python scripts, generated PHF files), so Windows clones with default
autocrlf don't corrupt build-time inputs.
Changed
- CI now runs on a
ubuntu-latest/macos-latest/windows-latest
matrix withfail-fast: false. Previously only Linux was tested,
so Windows path / line-ending /IsTerminalbehavior had no
coverage. - The error message for
--configpaired with an unsupported
--sort-byis now"--config is not supported with --sort-by <scheme>"(was previously phrased as "only supported with
--sort-by pinyin", which is now incorrect since jyutping also
accepts overrides). - Internal
validate_syllableinsrc/override.rsis parameterized
over the valid tone range soPinyinOverrideandJyutpingOverride
share the syllable-shape check.
Fixed
- Stability of equal-key inputs is now provable. Previous tests on
duplicate strings could not actually verify that the unstable
backend was promoted to stable behavior — the rubber-duck Phase 1
review flagged this as a smoke-level test. Withsort_indices_with
exposed, proptest verifies the stronger property:
for any input, equal-sort-key items preserve their input-order
relative position.
v0.2.0
A major release. Project rebrand from pinyin-sort → hanzi-sort, the public
library API was reshaped around a pluggable Collator trait, three new
opt-in collators (Cantonese Jyutping, Mandarin Zhuyin, Kangxi Radical) joined
Pinyin and Strokes, and the CLI grew Unix-friendly stdin / -r / -u
behavior plus shell completions. Tone3 data normalization now treats
neutral tone as 5, fixing a long-standing dictionary-order bug for
characters like 了. See breakdown below.
Highlights
- 5 sort schemes: Pinyin, Strokes, Jyutping, Zhuyin, Radical (all but
Pinyin/Strokes are opt-incargo features). - Stable sort guarantee: equal-key inputs preserve input order in every
collator. - Stdin-friendly CLI:
cat names.txt | hanzi-sortworks the way Unix
users expect. - Pluggable collator API: the trait is exposed; downstream Rust code can
add its own collators without forking the crate.
Added
Library
Collatortrait,Mapped<T>,CharToken<T>,SortKey<T>,sort_key_of,
andsort_strings_with— a pluggable per-character sort strategy
abstraction.AnyCollatorenum for runtime collator selection across all enabled
schemes.PinyinCollator(renamed fromPinyinContext),StrokesCollator,
JyutpingCollator,ZhuyinCollator,RadicalCollator.PinyinCollator::with_override(and the analogous fallible builder for
any future override-aware collators) for explicit override loading.RuntimeConfig::with_uniqueandRuntimeConfig::with_reverse
builder-style setters.InputSource::Stdinvariant on the publicInputSourceenum.
CLI
-r/--reverseflag.-u/--uniqueflag (adjacent dedup;uniqueis applied beforereverse).hanzi-sort completions <shell>subcommand emitting a completion script
for bash / zsh / fish / powershell / elvish (viaclap_complete).--helpexamples block (after the options list).--helpdisplays default values for every printable option.- Stdin fallback when neither
--filenor--textis provided and stdin
is non-TTY;-f -accepted as a stdin alias. - CLI rejects
--configtogether with any non-pinyin--sort-by(override
is pinyin-specific).
New collators (opt-in)
--sort-by jyutping/--features collator-jyutping— Cantonese Jyutping
from UnihankCantonese(~30k characters covered).--sort-by zhuyin/--features collator-zhuyin— Mandarin Zhuyin /
Bopomofo derived from the bundled pinyin data.--sort-by radical/--features collator-radical— Kangxi radical
index + residual stroke count from UnihankRSUnicode.
Build / testing / docs
criterionbenchmark suite (cargo bench) covering every collator's
sort path pluspinyin_ofandformat_itemsat 1k / 10k / 100k inputs.proptest-based property tests verifying: encoded sort key preserves
byte-wise lex order, unchecked vs checked encoders agree on valid input,
sort is idempotent, sort is a permutation, sort key induces a total order.build.rsvalidates exact column counts, codepoint↔char correspondence,
primary syllable ASCII /≤16bytes / mandatory tone digit, on every
generated map.CONTRIBUTING.mdwith a step-by-step recipe for adding a new collator
and a worktree-parallel workflow note.- CI now runs
cargo test --all-featuresin addition to default features
and verifies all benchmarks compile under all features. PinyinOverride::validaterejects empty phrase keys, empty syllables,
non-ASCII syllables, and tone3 shapes outside^[a-z]+[1-5]$.- Integration coverage for stdin behavior,
-r/-ucomposition, completions
output, override correctness, and per-collator CLI invocation.
Changed
Breaking (library)
- Renamed crate / binary from
pinyin-sorttohanzi-sort. No
compatibility alias. - Renamed
PinyinContexttoPinyinCollator. - Renamed
PinyinSortErrortoHanziSortError. RuntimeConfig::newsignature is now(input, format, collator: AnyCollator)
instead of(input, format, override_data, sort_mode). The old
RuntimeConfig::newandRuntimeConfig::with_sort_modeare gone.app::renderdispatches viaAnyCollator::sortinstead ofsort_strings_by.PinyinCollator::new()is now infallible (no override); use
PinyinCollator::with_override(PinyinOverride)for override-aware
construction.encode_primary_pinyinreturns aResultinstead of panicking on
invalid input.
Breaking (data semantics)
- Toneless primary syllables in
data/pinyin.csvare now normalized to
neutral tone5(e.g.了 → le5instead of了 → le). Override
validation and build-time checks both enforce that every syllable ends
in a tone digit1-5, so neutral-tone characters now sort after
tone-4 variants instead of before tone-1 variants — matching
conventional Chinese dictionary ordering. Override files using toneless
syllables (e.g.'了' = 'le') must update to'le5'.
Behavior (non-breaking)
--file/--textare mutually exclusive at parse time;--filereads
one non-blank line per record and rejects directory inputs.-o/--outputwrites to a file instead of stdout (instead of being
silently ignored as in0.1.1).- Formatting width calculations use terminal display width
(unicode-width), andleft/rightalignment semantics are
corrected. build.rsreports row / codepoint context on failure and re-runs when
the build script itself changes.
Removed
- public
SortModeenum (replaced byAnyCollatorvariants). - public
sort_strings/sort_strings_by(usesort_strings_withor
AnyCollator::sort). - internal
EncodedSortToken/EncodedSortKey/compare_encoded_sort_key
(subsumed by the trait-based key infrastructure).
Fixed
- preserve unknown characters in sort keys instead of dropping them.
- return non-zero exits for invalid input, invalid override config, and
write failures. - include the first CSV record in the generated pinyin map so
〇
resolves correctly. - preserve original input order for duplicate or equal-key entries via an
index tiebreak in both pinyin and stroke sort.
v0.1.1
🚀 Features
- (project) Initialize pinyin-sort crate
- (sorting) Add pinyin-based sorting functionality
- (build) Integrate Nix flake and sparse checkout for dependencies
- (scripts) Add script to convert Pinyin data to CSV
- (build) Generate static Pinyin map with phf
- (build) Replace generated module with static Pinyin map
- (data) Add static Pinyin CSV file to
datadirectory - (build) Add build command to
justfileand update dependencies - (pinyin) Refactor Pinyin handling with
derive_builder - (pinyin) Add debug output for pinyin_of function results
- (sort) Enhance pinyin comparison and add unit tests
- (args) Add command-line argument parsing with Clap
- (args) Enhance argument parsing with alignment options and additional parameters
- (format) Add formatting utilities with alignment options and tests
- (format, args) Integrate dynamic formatting overrides and enhance argument structure
- (main, args) Enhance input handling and formatting flow
- (pinyin) Add override support and serialization for Pinyin handling
- (ci) Add GitHub Actions workflow for release automation
- (dependencies) Update dependencies and add project metadata
- (metadata) Update project metadata and introduce
README.md - (metadata) Update project metadata and introduce
README.md
🐛 Bug Fixes
- (gitignore) Remove unused data directory from ignored files
- (gitignore) Add
/resultdirectory to ignored files
🚜 Refactor
- (flake.nix) Clean up formatting and remove commented code
- (pinyin, sort) Clean up unused variables and update dependencies
⚙️ Miscellaneous Tasks
- (release) Bump version to 0.1.1
- (ci) Update release workflow binary name to
pinyin-sort