fix(rs-sdk,drive-abci): SDK emits incompatible getDocuments wire against pre-v3.1 networks#3699
Closed
lklimek wants to merge 6 commits into
Closed
fix(rs-sdk,drive-abci): SDK emits incompatible getDocuments wire against pre-v3.1 networks#3699lklimek wants to merge 6 commits into
lklimek wants to merge 6 commits into
Conversation
…l-version builder Adds SdkBuilder::with_initial_version() for auto-detect SDKs that must talk to a network whose protocol version is older than the binary's PlatformVersion::latest() (e.g. v3.0 testnet from a v3.1+ SDK). Unlike with_version(), this leaves auto-detect active so the existing fetch_max ratchet can still pick up newer network versions. Adds V0/V1 dispatch to the DocumentQuery -> GetDocumentsRequest encoder, driven by the platform_version's drive_abci.query.document_query.default_current_version feature version. V0 ships the legacy CBOR-encoded where/order_by shape; v1-only fields (selects/group_by/having/count_star projections) are rejected with Error::Config at request build time rather than silently emitting a malformed V0 request the server would round-trip-and-reject. The SDK trampolines (Fetch::fetch_with_metadata_and_proof, FetchMany::fetch_many_with_metadata_and_proof) populate the new DocumentQuery.protocol_version_override field from sdk.protocol_version_number() before transport. Dispatch is a single TypeId comparison via std::any::Any; no-op for every non-document request type. Adds a 'static bound to the Fetch::Request / FetchMany::Request associated types (all existing proto-generated request types satisfy it). Fixes the misleading 'could not decode data contracts query' error text emitted by the documents-query decoder when the V1 oneof tag is absent (e.g. when a v3.1 SDK sends V1 to a v3.0 server that only knows V0). The data-contracts handler still uses its own correct string. Tests cover V0/V1 wire-shape parity, dispatch by SDK version, v1-only feature rejection on V0, and with_initial_version atomic seeding semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…sion_override Replaces the smuggling-via-DocumentQuery-field mechanism added in 8d5de89 with a direct &Sdk argument on Query::query(). The GetDocumentsRequest V0/V1 encoder now reads sdk.version() at the call site, eliminating: - DocumentQuery::protocol_version_override field - #[cfg_attr(feature = "mocks", serde(skip))] workaround - apply_sdk_protocol_version helper + + 'static trait bounds - TypeId::downcast_mut hack in Fetch / FetchMany trampolines Same observable behaviour; cleaner trait shape; PV is now a first-class concern in the Query trait for future versioned request types.
… + adopt TryFromPlatformVersioned PV_V1..V10 were wired to DRIVE_ABCI_QUERY_VERSIONS_V1, causing the SDK to emit V1 getDocuments wire when seeded with an older PV. Testnet v3.0 HPMNs (PV_11) reject this. Sibling fix to 2b8eae0 which re-pinned PV_11. Also collapses encode_get_documents_request freestanding helper into a TryFromPlatformVersioned impl, removing one indirection layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ocument-query-v0-v1
…transport Commit 34e0395 added TryFromPlatformVersioned<DocumentQuery> with V0/V1 dispatch, but DocumentQuery::execute_transport still went through the ambient TryFrom<DocumentQuery> trap that hardcoded PlatformVersion::latest() (= PV_12 = V1 wire). The runtime PV from sdk.version() never reached the encoder, so v3.0 testnet (PV_11, V0 wire) still received V1 bytes and rejected with the "could not decode" storm (165 occurrences in last funded e2e run). This commit: - Adds a DocumentQuery.wire_protocol_version pin and reads it in execute_transport via TryFromPlatformVersioned (falls back to PlatformVersion::latest() with a debug trace when unset, so a direct TransportRequest caller is loud not silent). - Sets the pin from the Query<T> for T blanket impl in platform/query.rs via a runtime Any-downcast on the cloned request (the blanket is the only Query<DocumentQuery> path the Fetch / FetchMany trampolines reach, since Fetch::Request for Document is DocumentQuery, not GetDocumentsRequest). Lower-blast-radius than removing the blanket and re-impling for ~50 proto types. - Deletes the ambient TryFrom<DocumentQuery> for GetDocumentsRequest impl (silent PlatformVersion::latest() default was the trap). - Adds tracing::debug! at the encoder dispatch site and reworks the PV ratchet info! to use a stable target/from/to shape (closes Marvin's QA-004 observability gap). - #[serde(skip)] on wire_protocol_version keeps existing mock vectors hash-stable: the pin is a transport-side dispatch input, not part of the query's identity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the architectural smell flagged after 8c0d614 and prevents recurrence for the 58 other tracked query operations that may grow versioned wire in the future. Removes the `Any::downcast_mut::<DocumentQuery>()` runtime type-erasure from the blanket `Query<T> for T` impl and the `wire_protocol_version: Option<u32>` field from `DocumentQuery`. The PV-aware encoder now runs inside `Query::query(&self, sdk)` where `&Sdk` is in hand — extending V0/V1 dispatch to any future versioned request type is a 5-line `impl Query<NewWireType> for NewRichType` away. # What changed * `Fetch::Query` (new associated type): the user-facing query that callers hand to the SDK and that `FromProof` binds to. For non-versioned operations `type Query = Self::Request` (one extra line per impl); for documents and the six aggregate views (`DocumentCount`/`Sum`/`Average`/`SplitCounts`/ `SplitSums`/`SplitAverages`) `type Query = DocumentQuery` (the rich form with data contract context) and `type Request = GetDocumentsRequest` (the wire). * `FromProof<Self::Query>` (was `FromProof<Self::Request>`): the proof verifier surface keeps binding to the rich form unchanged — zero changes to the eight `FromProof<DocumentQuery>` impls in `packages/rs-sdk/src/platform/documents/`. * `Sdk::parse_proof_with_metadata_and_proof` (renamed parameter source): takes `method_name: &'static str` as an explicit argument instead of reading it from `O::Request: TransportRequest`, since the rich query is no longer required to implement `TransportRequest`. Trampoline call sites pass `wire.method_name()` explicitly. * `DocumentQuery: TransportRequest` impl removed. Only `GetDocumentsRequest` implements `TransportRequest` now. Direct callers that constructed a `DocumentQuery` and pushed it through `rs-dapi-client` no longer compile — they should call `Document::fetch(...)` or `DocumentQuery::try_into_request_for_version(pv)` instead. * `'static` bound dropped from the `Query<T> for T` blanket (was only required by the deleted `Any::downcast`). * Mock infrastructure: `MockDashPlatformSdk::expect[_many]` now keys the `from_proof_expectations` cache on the rich `Self::Query` (preserving the protocol-version-agnostic mock key property) while keying the DAPI executor mock on the wire `Self::Request` (where the proto bytes actually flow). The internal `expect` / `remove` helpers take both args explicitly. # Mock vector regeneration Existing checked-in vectors under `packages/rs-sdk/tests/vectors/document_*/` were captured with filenames `msg_DocumentQuery_<hash>.json`. After this refactor the DAPI executor dumps the wire `GetDocumentsRequest` instead, so the filenames become `msg_GetDocumentsRequest_<hash>.json` with hashes computed from proto bytes (PV-coupled). Affected tests in `packages/rs-sdk/tests/fetch/document.rs` are gated with `#[ignore = "vectors require regeneration after Fetch::Query/Fetch::Request split (γ refactor); see commit body"]`: * `document_read` * `document_read_no_document` * `document_list_drive_query` * `document_list_document_query` * `document_list_bug_value_text_decode_base58_PLAN_653` To regenerate, run the live-testnet path that produces vectors (`yarn start && yarn test:sdk` or per-test `cargo test ... -- --ignored` with `DUMP_DIR` set per the sdk dump conventions). After regen, the old `msg_DocumentQuery_*.json` files can be deleted. The remaining document-related tests use `expect_fetch` programmatically and register expectations at test runtime — those continue to pass without any vector changes (`tests/fetch/document_count.rs`, `tests/fetch/mock_fetch.rs`, `tests/fetch/mock_fetch_many.rs::test_mock_document_fetch_many`). # Public API breaks (acceptable on v3.1-dev) * `<Document as Fetch>::Request` is now `GetDocumentsRequest`, not `DocumentQuery`. Code that named this explicitly breaks. * `DocumentQuery` no longer implements `TransportRequest`. Callers using `DocumentQuery` directly with `rs-dapi-client::DapiRequest` break. * `DocumentQuery::wire_protocol_version` public field is removed. * `Query::query` keeps the `(prove: bool, sdk: &Sdk)` signature for this commit (the `prove` parameter collapse is the planned second commit per the 3-commit Phase B plan). # Verification `cargo check --workspace --exclude wasm-sdk --exclude wasm-dpp --exclude rs-sdk-ffi` clean. `cargo check -p wasm-sdk` clean. `cargo test -p dash-sdk --features mocks,offline-testing --lib` → 138 passed, 0 failed, 6 ignored. `cargo test -p dash-sdk --features mocks,offline-testing --tests` → 127 passed, 0 failed, 8 ignored (the +4 ignores are the document tests gated above; one was pre-existing). `cargo test -p drive-abci --lib query` → 585 passed, 0 failed, 1 ignored. `cargo test -p platform-version` → 5 passed, 0 failed. `cargo test -p platform-wallet --no-run` clean. `cargo fmt --all` applied; `cargo clippy -p dash-sdk --features mocks,offline-testing --tests -- -D warnings` clean. `rg 'Any::downcast' packages/rs-sdk/src` returns nothing. `rg wire_protocol_version packages --include='*.rs'` returns nothing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
Contributor
Author
|
replaced by #3711 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
SDK clients running against any network older than Dash Platform v3.1 (e.g. v3.0 testnet, currently the shared testnet) failed every
getDocumentsrequest with a server-side decode error and a cascade of side-effects:gRPC Unknown { message: "decoding error: could not decode data contracts query" }. The text says "data contracts" because v3.0'srs-drive-abci/src/query/document_query/mod.rs:31has a copy-paste typo in its error string — the failing handler is in fact the documents query.rs-dapi-clientsaw an unfamiliarUnknownand ban-listed each masternode that returned one. With every node in the routing pool returning the same error on the first request, the pool drained toNoAvailableAddresseswithin seconds.Concretely, the funded e2e
dpns_001_register_and_resolve_namereproduced this with 165 decode errors in a single test run before the test panic'd onwait_for_dpns_name_visibletiming out.Root cause
PR #3633 (
feat(platform): getDocuments v1) added a V1 wire shape forGetDocumentsRequest(structuredWhereClause/OrderClause,optional uint32 limit,selects/group_by/having/offset) alongside the legacy V0 shape (CBORwhere/order_by, plainuint32 limit). To gate the new shape, it bumpedDriveAbciQueryDocumentVersions::document_query.{max_version, default_current_version}from0to1inside the existingDRIVE_ABCI_QUERY_VERSIONS_V1bundle.No
_V0bundle was forked. EveryPROTOCOL_VERSION_Nconstant fromPROTOCOL_VERSION_1throughPROTOCOL_VERSION_11continued to binddrive_abci.query = DRIVE_ABCI_QUERY_VERSIONS_V1, retroactively claiming those historical protocol versions reported V1 doc-query — even thoughv3.0.0(the live testnet release, PV_11) ships and only understands V0 wire on the server.The SDK side compounded this:
TryFrom<DocumentQuery> for GetDocumentsRequesthardcodedPlatformVersion::latest()(= PV_12 = V1) regardless of which network the SDK was talking to.Fetch::Request = DocumentQuerymeant the wire encoding had to happen insideDocumentQuery::execute_transport, where&Sdkis not in scope, so the encoder could not consultsdk.version()even if it wanted to.Net result: SDK unconditionally emits V1 wire bytes; v3.0 server can't decode them; bans cascade; storm.
Fix
Five coordinated pieces:
DRIVE_ABCI_QUERY_VERSIONS_V0bundle (packages/rs-platform-version/src/version/drive_abci_versions/drive_abci_query_versions/v0.rs). Verbatim fork of_V1exceptdocument_querypinned to{min:0, max:0, default_current:0}. Restores correct V0 semantics for historical PVs.Re-bind
PROTOCOL_VERSION_V1..V11to the new_V0bundle. PV_12 (Dash Platform v3.1-dev, the genuine V1 consumer) is untouched and keeps_V1. Future versioned-query promotions will follow the same fork-then-pin pattern — they no longer leak backward through history.PV-aware encoder for documents queries via a new
TryFromPlatformVersioned<DocumentQuery> for GetDocumentsRequestimpl inpackages/rs-sdk/src/platform/documents/document_query.rs. Dispatches onplatform_version.drive_abci.query.document_query.default_current_version:0→ V0 wire (CBOR),1→ V1 wire (structured), anything else →Error::Config. V1-only features (group_by,having, count/sum/avg projections) reject at request-build time rather than emitting malformed V0 that the server round-trips and rejects with an opaque error.SdkBuilder::with_initial_version(&PlatformVersion)(packages/rs-sdk/src/sdk.rs). Seeds the SDK's protocol-version atomic at boot without disabling auto-detect, so a client can start pinned to V0 wire and let the existingmaybe_update_protocol_versionratchet upward (fetch_max) the first time it sees a higher-PV response. Lets wallets boot against v3.0 testnet today and continue to work as testnet upgrades.Connect the encoder to the live transport path via a refactor of the
Fetchtrait shape.Fetch::Query(rich, user-facing — whatFromProofbinds to and what tests/mocks key on) is now distinct fromFetch::Request(the wire-encoded proto). For documents,type Query = DocumentQuery; type Request = GetDocumentsRequest;. Wire encoding moves fromDocumentQuery::execute_transport(where&Sdkwas unreachable) intoimpl Query<GetDocumentsRequest> for DocumentQuery::query(&self, sdk: &Sdk)(wheresdk.version()is in scope).Query::query()gains&Sdk; every other request type mapstype Query = type Request = Xwith no behavioural change. The previous "smuggle PV through a struct field plus anAny-downcast in the blanketQueryimpl" workaround is gone — wire-version awareness is now compiler-enforced per request type, and the same pattern naturally extends to the other 58 query operations tracked inDriveAbciQueryVersionsif any of them grow versioned wire shapes later.Plus one server-side housekeeping commit: the misleading
"could not decode data contracts query"error string inrs-drive-abci/src/query/document_query/mod.rs:31is corrected to"could not decode documents query". v3.0 testnet still reports the typo (the fix shipped post-v3.0), so the bug is also still observable from older nodes — but new releases will surface a less confusing error.Test plan
Unit and integration:
Live-network (against current shared v3.0 testnet via the sibling wallet PR #3549's e2e harness):
dpns_001_register_and_resolve_namesolo run withRUST_LOG=dash_sdk=trace:"could not decode data contracts query"occurrences (vs 165 pre-fix and 24 in a partial-fix iteration)feature_version=0 protocol_version=11from=10 to=118c0d6142ad): 32 / 38 pass; remaining 6 are pre-existing Found-021/022 RED-by-design pins and unrelated token/asset-lock issues, none in the V0/V1 dispatch path; 0 decode errors across the entire sweep at concurrent load.Four document-fetch offline tests are marked
#[ignore]pending vector regeneration (theFetch::Query/Fetch::Requestsplit changed the mock cache key for documents). Vectors will be regenerated against testnet in a follow-up; the runtime behaviour they were meant to capture is exercised by the live-network e2e suite above.Breaking changes
Target branch is
v3.1-dev. Out-of-tree code that builds against the SDK will see:trait Query<T>::query()— signature changed. Wasfn query(&self, prove: bool) -> Result<T, Error>; nowfn query(&self, prove: bool, sdk: &Sdk) -> Result<T, Error>. Any externalimpl Query<T> for MyTypeneeds to add the SDK parameter (typically_sdk: &crate::Sdkif unused).trait Fetch— newtype Queryassociated type. Every externalimpl Fetch for MyTypeneeds to addtype Query = Self::Request;unless the type wants a distinct rich/wire split.DocumentQueryno longer implementsTransportRequest. Code that was sending aDocumentQuerydirectly viars-dapi-client::DapiRequestwill not compile. UseDocument::fetch(sdk, query)(which now handles the PV-aware encoding internally) orGetDocumentsRequest::try_from_platform_versioned(query, sdk.version())?for the explicit transport request.SdkBuilder::with_initial_version(&PlatformVersion)is purely additive — it does not replacewith_version. The latter still pins the SDK to a single version with auto-detect disabled.with_initial_versionseeds the atomic and leaves the ratchet free to advance.Related
feat/rs-platform-wallet-e2e— carries the same fix on a wallet feature branch, plus the funded e2e harness that observably confirms the fix against testnet)Checklist
For repository code-owners and reviewers only
🤖 Generated with Claude Code