Released on June 15, 2026.
gukhanmun
- Collapse redundant parenthetical reading annotations by default. The new
Builder::collapse_redundant_parensopt-out disables it. [#3, #4] - Added the
opendictfeature and made theko-kppreset include the bundled Open Korean Dictionary (우리말샘) North Korean (北韓語) category by default. AddedBuilder::no_bundled_dictionaries()to disable every preset-selected bundled dictionary, plusBuilder::no_bundled_stdict()andBuilder::no_bundled_opendict()for disabling only one bundled dictionary family. [#5, #6]
gukhanmun-core
-
Fixed Arabic hanja numeral strategies so dictionary calendar entries no longer split numeric normalization.
NumeralStrategy::PositionalArabicnow renders dates such as二〇二六年 六月 二〇日as2026년 6월 20일, whileNumeralStrategy::HangulPhonetic, the default library preset strategy, still keeps lexicalized dictionary readings such as六月as유월.NumeralStrategy::Smartalso leaves standalone large place markers such as京and萬, plus ambiguous small-marker words such as百濟and十長生, as fallback readings instead of splitting them into numeric text. -
Fixed a bug where proper names and unknown multi-character hanja words were split into individual character annotations when the bundled dictionary contained single-character entries for some (but not all) of the characters. The segmenter now emits a
TrivialDictionarysegment variant for single-character dictionary matches that carry no special rendering marks, and the engine merges consecutiveTrivialDictionaryandFallbacksegments into a single annotation without losingfrom_dictionaryprovenance so homophone marking still works. [#7, #8] -
Added
RedundantParenCollapser, a streaming middleware that collapses an explicit parenthetical reading annotation into the hanja word it duplicates.庫間(곳간)and곳간(庫間)now render with both scripts in every mode instead of duplicating the reading, and a parenthetical that pins an alternative reading (such as “數字(수자)”) overrides the dictionary reading for that occurrence. A definition gloss such as “庫間(物件을 간직하여 두는 곳)” is left untouched. Regenerated the bundled Unihan reading data to also carry every kHangul reading per character (KHANGUL_ALL_READINGS), which the collapser uses to validate alternative readings. [#3, #4] -
Marked
Annotation#[non_exhaustive]so its policy flags can grow without a breaking change (it gained afrom_source_glossflag here). Construct it fromAnnotation::default()and set the fields you need. [#3, #4]
gukhanmun-dict-extract
- Added a shared extraction helper crate for dictionary dump key normalization, original-language parsing, and mixed-script key generation.
gukhanmun-stdictandgukhanmun-opendictnow use the same core extraction rules. [#5, #6]
gukhanmun-cli
- Collapse redundant parenthetical reading annotations by default across the plain-text, HTML, and Markdown pipelines. The new
--no-collapse-parensflag disables it. [#3, #4] - Changed the CLI default for
--numeralstosmart, so omitted numeral options render dates such as二〇二六年 六月 二〇日as2026년 6월 20일. Pass--numerals hangul-phoneticto keep Seonbi-style phonetic calendar readings such as六月as유월. - The
ko-kppreset now includes the bundled Open Korean Dictionary North Korean (北韓語) category by default. Added--no-bundled-dictionaries, which disables every preset-selected bundled dictionary. [#5, #6]
gukhanmun-opendict
- Added a bundled Open Korean Dictionary (우리말샘) crate generated from the 2026-06-03 JSON dump. The crate exposes separate
general(),north_korean(),dialect(), andarchaic()FST dictionaries so callers can compose the categories explicitly withChainDictionary. [#5, #6]
gukhanmun-stdict
- Reused the shared dictionary extraction helper and buffered direct JSON shard reads for large dump extraction. [#5, #6]
- Fixed “數字” converting to “수자” instead of the orthographically prescribed “숫자.” The six Standard Korean Orthography §30 (한글 맞춤法 第30項) saisiot (사이시옷) compounds (곳간, 셋방, 숫자, 찻간, 툇간, 횟수) now win over their saisiot-free homographs regardless of dump order. [#1, #2]
- Regenerated the bundled dictionary so single-hanja foreign-spelling head words (such as “元” → “위안” or “円” → “엔”) no longer shadow the Sino-Korean reading of those characters; the engine recovers their original sound from the bundled unihan readings instead.
- Regenerated the bundled Standard Korean Language Dictionary data from the 2026-06-06 JSON dump (260,690 entries, was 260,688).
@gukhanmun/napi
- Collapse redundant parenthetical reading annotations by default; added the
collapseRedundantParensoption to disable it. [#3, #4] - Documented that JavaScript presets still do not auto-load bundled dictionary data; use the new opendict packages explicitly when desired. [#5, #6]
@gukhanmun/wasm
- Collapse redundant parenthetical reading annotations by default; added the
collapseRedundantParensoption to disable it. [#3, #4] - Documented that JavaScript presets still do not auto-load bundled dictionary data; use the new opendict packages explicitly when desired. [#5, #6]
@gukhanmun/opendict-cdb
- Added a package containing Open Korean Dictionary general (一般語), North Korean (北韓語), dialect (方言), and archaic (옛말) categories as CDB binaries, with category-specific byte loaders and
FileDictionarySourcehelpers. The binaries ship gzip-compressed (as *.cdb.gz) to stay within the JSR per-file size limit, and the byte loaders inflate them transparently. [#5, #6]
@gukhanmun/opendict-fst
- Added a package containing Open Korean Dictionary general (一般語), North Korean (北韓語), dialect (方言), and archaic (옛말) categories as FST binaries, with category-specific byte loaders and
FileDictionarySourcehelpers. [#5, #6]
@gukhanmun/stdict-fst
- Regenerated the bundled FST binary from the 2026-06-06 Standard Korean Language Dictionary JSON dump.
@gukhanmun/stdict-cdb
- Regenerated the bundled CDB binary from the 2026-06-06 Standard Korean Language Dictionary JSON dump.