Skip to content

Upgrade vendored darts-clone to v0.32h and harden OCD dictionary loading#1372

Merged
frankslin merged 5 commits into
BYVoid:masterfrom
frankslin:upstream-master
Jul 1, 2026
Merged

Upgrade vendored darts-clone to v0.32h and harden OCD dictionary loading#1372
frankslin merged 5 commits into
BYVoid:masterfrom
frankslin:upstream-master

Conversation

@frankslin

@frankslin frankslin commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Vendor darts-clone v0.32h under deps/darts-clone-0.32h/ and adapt its
    id_type to size_t so the serialized OCD array layout stays compatible
    with the existing format.
  • Switch Bazel, CMake, Windows CLI Zig builds, and npm packaging to the new
    vendor directory; remove the old vendored header.
  • Harden DartsDict loading with three validation checks ported from
    google/sentencepiece@d685ef31: unit-size alignment of the serialized array,
    root/offset bounds via doubleArray->validate(), and lexicon value bounds.
    Malformed .ocd files now throw InvalidFormat instead of silently
    producing undefined behavior.

NOTE: The .ocd format has always been platform-dependent — id_type is size_t,
so files generated on 64-bit and 32-bit builds are mutually incompatible. This is a
pre-existing limitation predating this PR; .ocd2 (marisa-trie, the default format) is
unaffected.

Test plan

  • Existing DartsDictTest and ConfigTest suites pass.
  • Three new DartsDictTest cases cover the added rejection paths:
    • RejectsMisalignedDartsSize — array size not a multiple of unit size
    • RejectsInvalidDartsRoot — corrupted root unit
    • RejectsInvalidDartsValue — out-of-bounds value in a leaf unit
  • Bazel build passes: bazel build //src:opencc_lib //src:opencc
  • CMake build passes with -DENABLE_DARTS=ON

Add the upstream v0.32h header and BSD license under a separate dependency directory without changing the active build configuration.
Use OpenCC's existing size_t id_type for the vendored v0.32h header so the serialized Darts array layout stays compatible with the current OCD format.
Point Bazel, CMake, Windows CLI Zig builds, and npm packaging at the compatible vendored v0.32h dependency. Add local Bazel module metadata so @darts-clone resolves from the repository copy.
Delete the unused darts-clone 0.32 header after all build and package paths were switched to the compatible v0.32h vendor directory.
Apply the darts.h validation hardening from google/sentencepiece@d685ef31 and wire it into OpenCC's OCD loader. Validate serialized Darts arrays for unit alignment, root/offset bounds, and lexicon value bounds with regression coverage for malformed dictionaries.
@BYVoid

BYVoid commented Jul 1, 2026

Copy link
Copy Markdown
Owner

@frankslin frankslin merged commit eab7ef6 into BYVoid:master Jul 1, 2026
34 checks passed
@frankslin

Copy link
Copy Markdown
Collaborator Author

bazelbuild/bazel-central-registry#9468

You can upstream this updated version immediately in bcr.1:
557118e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants