fix(hashes): make SerdeHash tolerant of ContentDeserializer's HR-quirk#729
Conversation
This is the same kind of serde-tag incompatibility fixed for `OutPoint` in #708, applied to the hash-newtype family (sha256, sha256d, hash160, hash_x11, ripemd160, sha1, sha512 — affecting Txid, BlockHash, ProTxHash, PubkeyHash, QuorumHash, and every other type generated by `hash_newtype!` or `serde_impl!`). `SerdeHash::deserialize` used two separate visitors — a string-only HR visitor (`HexVisitor`) and a bytes-only non-HR visitor (`BytesVisitor`). That works fine in isolation but breaks the moment a hash-bearing struct is wrapped by an internally-tagged enum (`#[serde(tag = "...")]`), `flatten`, or an untagged enum. Serde routes those through `ContentDeserializer`, a format-agnostic intermediate buffer that always reports `is_human_readable() == true` regardless of the upstream format. A value originally written by a non-HR encoder is therefore replayed into the HR branch as raw bytes, which the previous `HexVisitor::visit_str` saw as "32 chars" instead of "64-char hex" and rejected with `bad hex string length 32 (expected 64)`. This was hit downstream in dashpay/platform when validators / validator sets (which contain `ProTxHash`, `PubkeyHash`, `QuorumHash`) were configured for the dpp `tag = "$formatVersion"` versioning convention. ## Fix Rework `SerdeHash::deserialize` to use a single `AnyShapeVisitor` that accepts every shape a hash can arrive in: - `visit_str` / `visit_borrowed_str` — ASCII hex (canonical HR form). - `visit_bytes` / `visit_borrowed_bytes` — disambiguated by length: exactly `N` bytes → raw hash, exactly `2*N` bytes → UTF-8 hex. Any other length is rejected. - `visit_seq` — length-prefixed `u8` sequence (used by bincode and other non-self-describing formats). Use `deserialize_any` in the HR branch so the actual content shape — not the reported HR flag — drives dispatch. Keep `deserialize_bytes` in the non-HR branch since bincode is non-self-describing and does not support `deserialize_any`. ## Trade-off Raw JSON now also accepts the byte-form (`"\x11..."` UTF-8 bytes vs. `"11..."` hex string) because `deserialize_any` in serde_json's self-describing mode dispatches based on the JSON token. We disambiguate strictly by length in `visit_bytes`, so anything that's neither N bytes nor 2*N bytes still errors. This is consistent with the OutPoint fix in #708 — accept any shape, validate by length. ## Implementation note: no_std / no alloc `dashcore_hashes` does not enable `serde/alloc` (it has only `serde-std` which transitively gates that), so `Visitor::visit_byte_buf` and `visit_string` (defined behind serde's `alloc` feature) are unavailable. The `visit_seq` path uses a stack array sized to fit the largest hash (64 bytes — sha512) instead of a `Vec`, keeping the crate's no-alloc posture. ## Tests Two new regression tests in `dash/src/hash_types.rs`: - `serde_round_trip_through_internally_tagged_enum` — wraps a `Txid` in a `#[serde(tag = "type")]` enum, round-trips through `serde_json::Value` (which forces buffering through `ContentDeserializer`), and asserts the round-trip is identity. Also verifies the canonical hex-string form still deserializes and that bincode round-trip still succeeds via the byte/seq path. - `serde_round_trip_through_internally_tagged_enum_pubkey_hash` — same shape with `PubkeyHash` (20-byte hash) to exercise the smaller-length disambiguation path. `bincode` dev-dep updated to `features = ["serde"]` (same change as #708) so the bincode regression assertion compiles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR refactors serde deserialization for hash types by introducing a unified ChangesSerde Deserialization Refactor
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…e PR dashcore PR #729 (dashpay/rust-dashcore#729) is the companion to #708 — same `ContentDeserializer` HR-quirk root cause, but for the separate `hashes::serde_macros::SerdeHash` macro family that generates `Txid` / `BlockHash` / `ProTxHash` / `PubkeyHash` / `QuorumHash` etc. (vs. #708 which fixed `OutPoint` via `serde_struct_human_string_impl!`). Update the two `#[ignore]` notes on `Validator::value_round_trip` and `ValidatorSet::value_round_trip` to reference #729 instead of the vague "follow-up PR" phrasing. When #729 lands and we bump dashcore, drop the `#[ignore]`s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
dash/src/hash_types.rs (1)
435-453: ⚡ Quick winThe byte-shape regression path is documented but not actually tested.
This block defines
raw_txid_bytesand then discards it, so the test still doesn’t assert the exact failure mode (bytes replayed throughContentDeserializerin a tagged context). Please convert this into an executable deserialization assertion to lock the bugfix in.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dash/src/hash_types.rs` around lines 435 - 453, The test currently defines raw_txid_bytes then drops it; change it to perform an actual serialization/deserialization round-trip that exercises the tagged-enum + ContentDeserializer path and assert the resulting Txid/newtype equals the original bytes. Concretely, construct the same tagged enum JSON/serde Value that would produce Value::Bytes32 (e.g., wrap raw_txid_bytes as the non-human-readable bytes form used by platform_value), feed it through the same deserialization path used in this test (invoking ContentDeserializer / serde_json round-trip or serde_test bincode-like raw bytes), deserialize into the Txid/newtype type used in this file, and add an assert_eq!(deserialized_txid.as_bytes(), &raw_txid_bytes). This ensures the previous "bad hex string length 32 (expected 64)" regression is covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@dash/src/hash_types.rs`:
- Around line 435-453: The test currently defines raw_txid_bytes then drops it;
change it to perform an actual serialization/deserialization round-trip that
exercises the tagged-enum + ContentDeserializer path and assert the resulting
Txid/newtype equals the original bytes. Concretely, construct the same tagged
enum JSON/serde Value that would produce Value::Bytes32 (e.g., wrap
raw_txid_bytes as the non-human-readable bytes form used by platform_value),
feed it through the same deserialization path used in this test (invoking
ContentDeserializer / serde_json round-trip or serde_test bincode-like raw
bytes), deserialize into the Txid/newtype type used in this file, and add an
assert_eq!(deserialized_txid.as_bytes(), &raw_txid_bytes). This ensures the
previous "bad hex string length 32 (expected 64)" regression is covered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8d7b29f5-9a72-45ed-b977-1bd5458f31e2
📒 Files selected for processing (3)
dash/Cargo.tomldash/src/hash_types.rshashes/src/serde_macros.rs
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## v0.42-dev #729 +/- ##
=============================================
+ Coverage 70.96% 71.00% +0.03%
=============================================
Files 319 319
Lines 68387 68457 +70
=============================================
+ Hits 48531 48605 +74
+ Misses 19856 19852 -4
|
…flow
Two in-scope fixes from review:
1. The Txid round-trip test had an abandoned `raw_txid_bytes` literal
followed by `let _ = raw_txid_bytes; // documentation only` — leftover
exploration that misled readers into thinking the bytes were used.
Replace with a real assertion that constructs a `serde_json::Value::Array`
of u8 numbers, wraps it in a `#[serde(tag = "type")]` enum, and
round-trips through `serde_json::from_value`. This now actually
exercises the new `visit_seq` path through `ContentDeserializer` —
the security review noted that the prior test only hit `visit_str`,
leaving `visit_bytes`/`visit_seq` regression coverage thin.
2. The `MAX_HASH_BYTES = 64` overflow check in `visit_seq` was returning
a runtime error with a debug-prose string ("recompile with larger
MAX") that leaked an internal type name to user error logs. Convert
to `debug_assert!` — failure mode is now a test panic in debug builds
(caught at CI time when adding a wider hash type), zero overhead in
release. The condition is unreachable in any release build that
compiled at all, since adding a wider digest would require updating
`serde_impl!` invocations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Same kind of serde-tag incompatibility as #708, but in a different macro family. #708 fixed
OutPoint'sserde_struct_human_string_impl!. This PR fixes the hash-newtype family:SerdeHash::deserialize(inhashes/src/serde_macros.rs), used by everyhash_newtype!/serde_impl!-generated type —Txid,BlockHash,ProTxHash,PubkeyHash,QuorumHash, all the sha256/sha256d/hash160/hash_x11 wrappers.SerdeHash::deserializeused two separate visitors — a string-only HR visitor (HexVisitor) and a bytes-only non-HR visitor (BytesVisitor). That works fine in isolation but breaks the moment a hash-bearing struct is wrapped by an internally-tagged enum (#[serde(tag = \"...\")]),flatten, or an untagged enum: serde routes those throughContentDeserializer, a format-agnostic intermediate buffer that always reportsis_human_readable() == trueregardless of the upstream format. A value originally written by a non-HR encoder is therefore replayed into the HR branch as raw bytes, which the previousHexVisitor::visit_strsaw as "32 chars" instead of "64-char hex" and rejected with `bad hex string length 32 (expected 64)`.This was hit downstream in
dashpay/platformwhenValidator/ValidatorSet(which containProTxHash,PubkeyHash,QuorumHash) were configured for the dpp `tag = "$formatVersion"` versioning convention.Fix
Rework
SerdeHash::deserializeto use a singleAnyShapeVisitorthat accepts every shape a hash can arrive in:visit_str/visit_borrowed_str— ASCII hex (canonical HR form).visit_bytes/visit_borrowed_bytes— disambiguated by length: exactlyNbytes → raw hash, exactly2*Nbytes → UTF-8 hex. Any other length errors.visit_seq— length-prefixedu8sequence (bincode and similar).Use
deserialize_anyin the HR branch so the actual content shape — not the reported HR flag — drives dispatch. Keepdeserialize_bytesin the non-HR branch since bincode is non-self-describing.This is the well-established serde workaround documented in third-party crates that hit the same wall (e.g.
BinaryData,Identifier,Bytes32inrs-platform-value, and nowOutPointafter #708).Trade-off
Raw JSON now also accepts the byte form (
\"\x11...\"UTF-8 bytes vs.\"11...\"hex string) because `deserialize_any` in serde_json's self-describing mode dispatches on the JSON token. We disambiguate strictly by length in `visit_bytes`, so anything that's neither `N` nor `2*N` bytes still errors. This is consistent with the OutPoint fix.Implementation note: no_std / no alloc
dashcore_hashesdoes not enableserde/alloc(onlyserde-stdwhich transitively does), soVisitor::visit_byte_bufandvisit_string(gated behind serde'sallocfeature) are unavailable. The `visit_seq` path uses a stack array sized to fit the largest hash (64 bytes — sha512) instead of aVec, keeping the crate's no-alloc posture.Tests
Two regression tests in
dash/src/hash_types.rs:serde_round_trip_through_internally_tagged_enum— wraps aTxidin a#[serde(tag = \"type\")]enum, round-trips throughserde_json::Value(which forces buffering throughContentDeserializer), and asserts identity. Also verifies the canonical hex-string form still deserializes and bincode round-trip still succeeds via the byte/seq path.serde_round_trip_through_internally_tagged_enum_pubkey_hash— same shape withPubkeyHash(20-byte hash) to exercise the smaller-length disambiguation path.bincodedev-dep updated tofeatures = [\"serde\"](same change as #708) so the bincode regression assertion compiles.Local test results
dashcore_hashes: 7 passed, 0 failed.dashcore --features serde: 551 passed, 0 failed (17 pre-existing ignores), including the two new tests.Related
OutPoint/serde_struct_human_string_impl!).Validator/ValidatorSetwhich were marked#[ignore]waiting for this fix.🤖 Generated with Claude Code
Summary by CodeRabbit
Tests
Chores