Skip to content

fix(hashes): make SerdeHash tolerant of ContentDeserializer's HR-quirk#729

Merged
xdustinface merged 3 commits intov0.42-devfrom
fix/hashes-serde-content-deserializer
May 6, 2026
Merged

fix(hashes): make SerdeHash tolerant of ContentDeserializer's HR-quirk#729
xdustinface merged 3 commits intov0.42-devfrom
fix/hashes-serde-content-deserializer

Conversation

@shumkov
Copy link
Copy Markdown
Collaborator

@shumkov shumkov commented May 5, 2026

Summary

Same kind of serde-tag incompatibility as #708, but in a different macro family. #708 fixed OutPoint's serde_struct_human_string_impl!. This PR fixes the hash-newtype family: SerdeHash::deserialize (in hashes/src/serde_macros.rs), used by every hash_newtype! / serde_impl!-generated type — Txid, BlockHash, ProTxHash, PubkeyHash, QuorumHash, all the sha256/sha256d/hash160/hash_x11 wrappers.

SerdeHash::deserialize used two separate visitors — a string-only HR visitor (HexVisitor) and a bytes-only non-HR visitor (BytesVisitor). That works fine in isolation but breaks the moment a hash-bearing struct is wrapped by an internally-tagged enum (#[serde(tag = \"...\")]), flatten, or an untagged enum: serde routes those through ContentDeserializer, a format-agnostic intermediate buffer that always reports is_human_readable() == true regardless of the upstream format. A value originally written by a non-HR encoder is therefore replayed into the HR branch as raw bytes, which the previous HexVisitor::visit_str saw as "32 chars" instead of "64-char hex" and rejected with `bad hex string length 32 (expected 64)`.

This was hit downstream in dashpay/platform when Validator / ValidatorSet (which contain ProTxHash, PubkeyHash, QuorumHash) were configured for the dpp `tag = "$formatVersion"` versioning convention.

Fix

Rework SerdeHash::deserialize to use a single AnyShapeVisitor that accepts every shape a hash can arrive in:

  • visit_str / visit_borrowed_str — ASCII hex (canonical HR form).
  • visit_bytes / visit_borrowed_bytes — disambiguated by length: exactly N bytes → raw hash, exactly 2*N bytes → UTF-8 hex. Any other length errors.
  • visit_seq — length-prefixed u8 sequence (bincode and similar).

Use deserialize_any in the HR branch so the actual content shape — not the reported HR flag — drives dispatch. Keep deserialize_bytes in the non-HR branch since bincode is non-self-describing.

This is the well-established serde workaround documented in third-party crates that hit the same wall (e.g. BinaryData, Identifier, Bytes32 in rs-platform-value, and now OutPoint after #708).

Trade-off

Raw JSON now also accepts the byte form (\"\x11...\" UTF-8 bytes vs. \"11...\" hex string) because `deserialize_any` in serde_json's self-describing mode dispatches on the JSON token. We disambiguate strictly by length in `visit_bytes`, so anything that's neither `N` nor `2*N` bytes still errors. This is consistent with the OutPoint fix.

Implementation note: no_std / no alloc

dashcore_hashes does not enable serde/alloc (only serde-std which transitively does), so Visitor::visit_byte_buf and visit_string (gated behind serde's alloc feature) are unavailable. The `visit_seq` path uses a stack array sized to fit the largest hash (64 bytes — sha512) instead of a Vec, keeping the crate's no-alloc posture.

Tests

Two regression tests in dash/src/hash_types.rs:

  • serde_round_trip_through_internally_tagged_enum — wraps a Txid in a #[serde(tag = \"type\")] enum, round-trips through serde_json::Value (which forces buffering through ContentDeserializer), and asserts identity. Also verifies the canonical hex-string form still deserializes and bincode round-trip still succeeds via the byte/seq path.
  • serde_round_trip_through_internally_tagged_enum_pubkey_hash — same shape with PubkeyHash (20-byte hash) to exercise the smaller-length disambiguation path.

bincode dev-dep updated to features = [\"serde\"] (same change as #708) so the bincode regression assertion compiles.

Local test results

  • dashcore_hashes: 7 passed, 0 failed.
  • dashcore --features serde: 551 passed, 0 failed (17 pre-existing ignores), including the two new tests.

Related

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests

    • Added regression tests validating serialization and deserialization of core data types across multiple encoding formats (JSON, binary, and hex variants)
  • Chores

    • Updated development dependencies
    • Improved deserialization robustness to handle multiple input formats

This is the same kind of serde-tag incompatibility fixed for `OutPoint`
in #708, applied to the hash-newtype family (sha256, sha256d, hash160,
hash_x11, ripemd160, sha1, sha512 — affecting Txid, BlockHash, ProTxHash,
PubkeyHash, QuorumHash, and every other type generated by `hash_newtype!`
or `serde_impl!`).

`SerdeHash::deserialize` used two separate visitors — a string-only HR
visitor (`HexVisitor`) and a bytes-only non-HR visitor (`BytesVisitor`).
That works fine in isolation but breaks the moment a hash-bearing struct
is wrapped by an internally-tagged enum (`#[serde(tag = "...")]`),
`flatten`, or an untagged enum. Serde routes those through
`ContentDeserializer`, a format-agnostic intermediate buffer that always
reports `is_human_readable() == true` regardless of the upstream format.
A value originally written by a non-HR encoder is therefore replayed
into the HR branch as raw bytes, which the previous `HexVisitor::visit_str`
saw as "32 chars" instead of "64-char hex" and rejected with
`bad hex string length 32 (expected 64)`.

This was hit downstream in dashpay/platform when validators / validator
sets (which contain `ProTxHash`, `PubkeyHash`, `QuorumHash`) were
configured for the dpp `tag = "$formatVersion"` versioning convention.

## Fix

Rework `SerdeHash::deserialize` to use a single `AnyShapeVisitor` that
accepts every shape a hash can arrive in:

- `visit_str` / `visit_borrowed_str` — ASCII hex (canonical HR form).
- `visit_bytes` / `visit_borrowed_bytes` — disambiguated by length:
  exactly `N` bytes → raw hash, exactly `2*N` bytes → UTF-8 hex.
  Any other length is rejected.
- `visit_seq` — length-prefixed `u8` sequence (used by bincode and
  other non-self-describing formats).

Use `deserialize_any` in the HR branch so the actual content shape —
not the reported HR flag — drives dispatch. Keep `deserialize_bytes`
in the non-HR branch since bincode is non-self-describing and does not
support `deserialize_any`.

## Trade-off

Raw JSON now also accepts the byte-form (`"\x11..."` UTF-8 bytes vs.
`"11..."` hex string) because `deserialize_any` in serde_json's
self-describing mode dispatches based on the JSON token. We disambiguate
strictly by length in `visit_bytes`, so anything that's neither N bytes
nor 2*N bytes still errors. This is consistent with the OutPoint fix
in #708 — accept any shape, validate by length.

## Implementation note: no_std / no alloc

`dashcore_hashes` does not enable `serde/alloc` (it has only `serde-std`
which transitively gates that), so `Visitor::visit_byte_buf` and
`visit_string` (defined behind serde's `alloc` feature) are unavailable.
The `visit_seq` path uses a stack array sized to fit the largest hash
(64 bytes — sha512) instead of a `Vec`, keeping the crate's no-alloc
posture.

## Tests

Two new regression tests in `dash/src/hash_types.rs`:

- `serde_round_trip_through_internally_tagged_enum` — wraps a `Txid`
  in a `#[serde(tag = "type")]` enum, round-trips through
  `serde_json::Value` (which forces buffering through
  `ContentDeserializer`), and asserts the round-trip is identity. Also
  verifies the canonical hex-string form still deserializes and that
  bincode round-trip still succeeds via the byte/seq path.
- `serde_round_trip_through_internally_tagged_enum_pubkey_hash` —
  same shape with `PubkeyHash` (20-byte hash) to exercise the
  smaller-length disambiguation path.

`bincode` dev-dep updated to `features = ["serde"]` (same change as #708)
so the bincode regression assertion compiles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Warning

Rate limit exceeded

@shumkov has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 32 minutes and 14 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 77f04166-a6d0-4bb1-87b8-e226724836b7

📥 Commits

Reviewing files that changed from the base of the PR and between d648444 and 3e74bd8.

📒 Files selected for processing (2)
  • dash/src/hash_types.rs
  • hashes/src/serde_macros.rs
📝 Walkthrough

Walkthrough

This PR refactors serde deserialization for hash types by introducing a unified AnyShapeVisitor to replace separate hex and bytes visitors, handles multiple input formats (hex strings, raw bytes, sequences), enables the bincode serde feature in dev-dependencies, and adds regression tests for the new deserialization logic.

Changes

Serde Deserialization Refactor

Layer / File(s) Summary
Core Deserialization Logic
hashes/src/serde_macros.rs
Replaces HexVisitor and BytesVisitor with a single AnyShapeVisitor that handles ASCII hex strings, raw byte slices, hex-encoded UTF-8 bytes, and length-prefixed u8 sequences via visit_seq. Updates SerdeHash::deserialize to route human-readable cases through deserialize_any and non-human-readable through deserialize_bytes.
Test Dependencies
dash/Cargo.toml
Enables bincode serde feature in dev-dependencies to support serde round-trip testing.
Regression Tests
dash/src/hash_types.rs
Adds gated #[cfg(all(test, feature = "serde"))] tests validating Txid and PubkeyHash round-trip through serde_json with internally tagged enums, including non-human-readable (byte) and human-readable (hex string) deserialization paths, plus bincode compatibility for Txid.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰 The desert hashes now flow through a single path so true,
AnyShape catches what comes through—hex, bytes, sequences too!
Tests whisper serde songs in tags internally bound,
Where round-trips and bincode now safely abound. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix(hashes): make SerdeHash tolerant of ContentDeserializer's HR-quirk' directly and specifically summarizes the main change—fixing a serde deserialization incompatibility in the SerdeHash trait when wrapped by internally-tagged enums.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/hashes-serde-content-deserializer

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

shumkov added a commit to dashpay/platform that referenced this pull request May 5, 2026
…e PR

dashcore PR #729 (dashpay/rust-dashcore#729) is
the companion to #708 — same `ContentDeserializer` HR-quirk root cause,
but for the separate `hashes::serde_macros::SerdeHash` macro family
that generates `Txid` / `BlockHash` / `ProTxHash` / `PubkeyHash` /
`QuorumHash` etc. (vs. #708 which fixed `OutPoint` via
`serde_struct_human_string_impl!`).

Update the two `#[ignore]` notes on `Validator::value_round_trip` and
`ValidatorSet::value_round_trip` to reference #729 instead of the vague
"follow-up PR" phrasing. When #729 lands and we bump dashcore, drop the
`#[ignore]`s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
dash/src/hash_types.rs (1)

435-453: ⚡ Quick win

The byte-shape regression path is documented but not actually tested.

This block defines raw_txid_bytes and then discards it, so the test still doesn’t assert the exact failure mode (bytes replayed through ContentDeserializer in a tagged context). Please convert this into an executable deserialization assertion to lock the bugfix in.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dash/src/hash_types.rs` around lines 435 - 453, The test currently defines
raw_txid_bytes then drops it; change it to perform an actual
serialization/deserialization round-trip that exercises the tagged-enum +
ContentDeserializer path and assert the resulting Txid/newtype equals the
original bytes. Concretely, construct the same tagged enum JSON/serde Value that
would produce Value::Bytes32 (e.g., wrap raw_txid_bytes as the
non-human-readable bytes form used by platform_value), feed it through the same
deserialization path used in this test (invoking ContentDeserializer /
serde_json round-trip or serde_test bincode-like raw bytes), deserialize into
the Txid/newtype type used in this file, and add an
assert_eq!(deserialized_txid.as_bytes(), &raw_txid_bytes). This ensures the
previous "bad hex string length 32 (expected 64)" regression is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@dash/src/hash_types.rs`:
- Around line 435-453: The test currently defines raw_txid_bytes then drops it;
change it to perform an actual serialization/deserialization round-trip that
exercises the tagged-enum + ContentDeserializer path and assert the resulting
Txid/newtype equals the original bytes. Concretely, construct the same tagged
enum JSON/serde Value that would produce Value::Bytes32 (e.g., wrap
raw_txid_bytes as the non-human-readable bytes form used by platform_value),
feed it through the same deserialization path used in this test (invoking
ContentDeserializer / serde_json round-trip or serde_test bincode-like raw
bytes), deserialize into the Txid/newtype type used in this file, and add an
assert_eq!(deserialized_txid.as_bytes(), &raw_txid_bytes). This ensures the
previous "bad hex string length 32 (expected 64)" regression is covered.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8d7b29f5-9a72-45ed-b977-1bd5458f31e2

📥 Commits

Reviewing files that changed from the base of the PR and between d67cc03 and d648444.

📒 Files selected for processing (3)
  • dash/Cargo.toml
  • dash/src/hash_types.rs
  • hashes/src/serde_macros.rs

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 5, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

❌ Patch coverage is 90.47619% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.00%. Comparing base (d67cc03) to head (3e74bd8).
⚠️ Report is 1 commits behind head on v0.42-dev.

Files with missing lines Patch % Lines
hashes/src/serde_macros.rs 80.48% 8 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##           v0.42-dev     #729      +/-   ##
=============================================
+ Coverage      70.96%   71.00%   +0.03%     
=============================================
  Files            319      319              
  Lines          68387    68457      +70     
=============================================
+ Hits           48531    48605      +74     
+ Misses         19856    19852       -4     
Flag Coverage Δ
core 75.92% <90.47%> (+0.09%) ⬆️
ffi 45.49% <ø> (ø)
rpc 20.00% <ø> (ø)
spv 87.52% <ø> (-0.01%) ⬇️
wallet 69.61% <ø> (ø)
Files with missing lines Coverage Δ
dash/src/hash_types.rs 63.87% <100.00%> (+13.87%) ⬆️
hashes/src/serde_macros.rs 85.91% <80.48%> (+20.00%) ⬆️

... and 3 files with indirect coverage changes

…flow

Two in-scope fixes from review:

1. The Txid round-trip test had an abandoned `raw_txid_bytes` literal
   followed by `let _ = raw_txid_bytes; // documentation only` — leftover
   exploration that misled readers into thinking the bytes were used.
   Replace with a real assertion that constructs a `serde_json::Value::Array`
   of u8 numbers, wraps it in a `#[serde(tag = "type")]` enum, and
   round-trips through `serde_json::from_value`. This now actually
   exercises the new `visit_seq` path through `ContentDeserializer` —
   the security review noted that the prior test only hit `visit_str`,
   leaving `visit_bytes`/`visit_seq` regression coverage thin.

2. The `MAX_HASH_BYTES = 64` overflow check in `visit_seq` was returning
   a runtime error with a debug-prose string ("recompile with larger
   MAX") that leaked an internal type name to user error logs. Convert
   to `debug_assert!` — failure mode is now a test panic in debug builds
   (caught at CI time when adding a wider hash type), zero overhead in
   release. The condition is unreachable in any release build that
   compiled at all, since adding a wider digest would require updating
   `serde_impl!` invocations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@shumkov shumkov self-assigned this May 5, 2026
@xdustinface xdustinface merged commit 56fe09d into v0.42-dev May 6, 2026
57 of 58 checks passed
@xdustinface xdustinface deleted the fix/hashes-serde-content-deserializer branch May 6, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants