Skip to content

fix(rs-sdk,drive-abci): SDK emits incompatible getDocuments wire against pre-v3.1 networks#3699

Closed
lklimek wants to merge 6 commits into
v3.1-devfrom
backport/rs-sdk-document-query-v0-v1
Closed

fix(rs-sdk,drive-abci): SDK emits incompatible getDocuments wire against pre-v3.1 networks#3699
lklimek wants to merge 6 commits into
v3.1-devfrom
backport/rs-sdk-document-query-v0-v1

Conversation

@lklimek
Copy link
Copy Markdown
Contributor

@lklimek lklimek commented May 20, 2026

Bug

SDK clients running against any network older than Dash Platform v3.1 (e.g. v3.0 testnet, currently the shared testnet) failed every getDocuments request with a server-side decode error and a cascade of side-effects:

  • Drive-abci HPMNs returned gRPC Unknown { message: "decoding error: could not decode data contracts query" }. The text says "data contracts" because v3.0's rs-drive-abci/src/query/document_query/mod.rs:31 has a copy-paste typo in its error string — the failing handler is in fact the documents query.
  • rs-dapi-client saw an unfamiliar Unknown and ban-listed each masternode that returned one. With every node in the routing pool returning the same error on the first request, the pool drained to NoAvailableAddresses within seconds.
  • All Platform features depending on document fetches (DPNS resolve, DashPay contact discovery, identity-document lookup, any DPP data-contract query) became unusable.

Concretely, the funded e2e dpns_001_register_and_resolve_name reproduced this with 165 decode errors in a single test run before the test panic'd on wait_for_dpns_name_visible timing out.

Root cause

PR #3633 (feat(platform): getDocuments v1) added a V1 wire shape for GetDocumentsRequest (structured WhereClause/OrderClause, optional uint32 limit, selects / group_by / having / offset) alongside the legacy V0 shape (CBOR where / order_by, plain uint32 limit). To gate the new shape, it bumped DriveAbciQueryDocumentVersions::document_query.{max_version, default_current_version} from 0 to 1 inside the existing DRIVE_ABCI_QUERY_VERSIONS_V1 bundle.

No _V0 bundle was forked. Every PROTOCOL_VERSION_N constant from PROTOCOL_VERSION_1 through PROTOCOL_VERSION_11 continued to bind drive_abci.query = DRIVE_ABCI_QUERY_VERSIONS_V1, retroactively claiming those historical protocol versions reported V1 doc-query — even though v3.0.0 (the live testnet release, PV_11) ships and only understands V0 wire on the server.

The SDK side compounded this:

  1. TryFrom<DocumentQuery> for GetDocumentsRequest hardcoded PlatformVersion::latest() (= PV_12 = V1) regardless of which network the SDK was talking to.
  2. There was no way to seed the SDK with an "I'm starting against an older network" hint, and the protocol-version auto-detection ratchet only fires after a successful response — which couldn't happen because the first request was already malformed.
  3. Fetch::Request = DocumentQuery meant the wire encoding had to happen inside DocumentQuery::execute_transport, where &Sdk is not in scope, so the encoder could not consult sdk.version() even if it wanted to.

Net result: SDK unconditionally emits V1 wire bytes; v3.0 server can't decode them; bans cascade; storm.

Fix

Five coordinated pieces:

  1. DRIVE_ABCI_QUERY_VERSIONS_V0 bundle (packages/rs-platform-version/src/version/drive_abci_versions/drive_abci_query_versions/v0.rs). Verbatim fork of _V1 except document_query pinned to {min:0, max:0, default_current:0}. Restores correct V0 semantics for historical PVs.

  2. Re-bind PROTOCOL_VERSION_V1..V11 to the new _V0 bundle. PV_12 (Dash Platform v3.1-dev, the genuine V1 consumer) is untouched and keeps _V1. Future versioned-query promotions will follow the same fork-then-pin pattern — they no longer leak backward through history.

  3. PV-aware encoder for documents queries via a new TryFromPlatformVersioned<DocumentQuery> for GetDocumentsRequest impl in packages/rs-sdk/src/platform/documents/document_query.rs. Dispatches on platform_version.drive_abci.query.document_query.default_current_version: 0 → V0 wire (CBOR), 1 → V1 wire (structured), anything else → Error::Config. V1-only features (group_by, having, count/sum/avg projections) reject at request-build time rather than emitting malformed V0 that the server round-trips and rejects with an opaque error.

  4. SdkBuilder::with_initial_version(&PlatformVersion) (packages/rs-sdk/src/sdk.rs). Seeds the SDK's protocol-version atomic at boot without disabling auto-detect, so a client can start pinned to V0 wire and let the existing maybe_update_protocol_version ratchet upward (fetch_max) the first time it sees a higher-PV response. Lets wallets boot against v3.0 testnet today and continue to work as testnet upgrades.

  5. Connect the encoder to the live transport path via a refactor of the Fetch trait shape. Fetch::Query (rich, user-facing — what FromProof binds to and what tests/mocks key on) is now distinct from Fetch::Request (the wire-encoded proto). For documents, type Query = DocumentQuery; type Request = GetDocumentsRequest;. Wire encoding moves from DocumentQuery::execute_transport (where &Sdk was unreachable) into impl Query<GetDocumentsRequest> for DocumentQuery::query(&self, sdk: &Sdk) (where sdk.version() is in scope). Query::query() gains &Sdk; every other request type maps type Query = type Request = X with no behavioural change. The previous "smuggle PV through a struct field plus an Any-downcast in the blanket Query impl" workaround is gone — wire-version awareness is now compiler-enforced per request type, and the same pattern naturally extends to the other 58 query operations tracked in DriveAbciQueryVersions if any of them grow versioned wire shapes later.

Plus one server-side housekeeping commit: the misleading "could not decode data contracts query" error string in rs-drive-abci/src/query/document_query/mod.rs:31 is corrected to "could not decode documents query". v3.0 testnet still reports the typo (the fix shipped post-v3.0), so the bug is also still observable from older nodes — but new releases will surface a less confusing error.

Test plan

Unit and integration:

cargo check --workspace                                       clean
cargo fmt --all                                               no diff
cargo clippy --workspace -- -D warnings                       clean
cargo test -p dash-sdk --features mocks,offline-testing --lib    133 / 0 / 6
cargo test -p dash-sdk --features mocks,offline-testing --tests  127 / 0 / 8  (incl. V0/V1 wire-shape + PV dispatch)
cargo test -p drive-abci --lib query                             585 / 0 / 1
cargo test -p platform-version                                   5   / 0 / 0

Live-network (against current shared v3.0 testnet via the sibling wallet PR #3549's e2e harness):

  • dpns_001_register_and_resolve_name solo run with RUST_LOG=dash_sdk=trace:
    • test GREEN in 91s
    • 0 "could not decode data contracts query" occurrences (vs 165 pre-fix and 24 in a partial-fix iteration)
    • encoder-dispatch traces fire with feature_version=0 protocol_version=11
    • PV-ratchet trace fires once: from=10 to=11
  • Full 14-thread e2e sweep at the partial-fix tip (8c0d6142ad): 32 / 38 pass; remaining 6 are pre-existing Found-021/022 RED-by-design pins and unrelated token/asset-lock issues, none in the V0/V1 dispatch path; 0 decode errors across the entire sweep at concurrent load.

Four document-fetch offline tests are marked #[ignore] pending vector regeneration (the Fetch::Query/Fetch::Request split changed the mock cache key for documents). Vectors will be regenerated against testnet in a follow-up; the runtime behaviour they were meant to capture is exercised by the live-network e2e suite above.

Breaking changes

Target branch is v3.1-dev. Out-of-tree code that builds against the SDK will see:

  • trait Query<T>::query() — signature changed. Was fn query(&self, prove: bool) -> Result<T, Error>; now fn query(&self, prove: bool, sdk: &Sdk) -> Result<T, Error>. Any external impl Query<T> for MyType needs to add the SDK parameter (typically _sdk: &crate::Sdk if unused).
  • trait Fetch — new type Query associated type. Every external impl Fetch for MyType needs to add type Query = Self::Request; unless the type wants a distinct rich/wire split.
  • DocumentQuery no longer implements TransportRequest. Code that was sending a DocumentQuery directly via rs-dapi-client::DapiRequest will not compile. Use Document::fetch(sdk, query) (which now handles the PV-aware encoding internally) or GetDocumentsRequest::try_from_platform_versioned(query, sdk.version())? for the explicit transport request.
  • SdkBuilder::with_initial_version(&PlatformVersion) is purely additive — it does not replace with_version. The latter still pins the SDK to a single version with auto-detect disabled. with_initial_version seeds the atomic and leaves the ratchet free to advance.

Related

Checklist

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings

For repository code-owners and reviewers only

  • I have assigned this pull request to a milestone

🤖 Generated with Claude Code

…l-version builder

Adds SdkBuilder::with_initial_version() for auto-detect SDKs that must
talk to a network whose protocol version is older than the binary's
PlatformVersion::latest() (e.g. v3.0 testnet from a v3.1+ SDK). Unlike
with_version(), this leaves auto-detect active so the existing
fetch_max ratchet can still pick up newer network versions.

Adds V0/V1 dispatch to the DocumentQuery -> GetDocumentsRequest
encoder, driven by the platform_version's
drive_abci.query.document_query.default_current_version feature
version. V0 ships the legacy CBOR-encoded where/order_by shape;
v1-only fields (selects/group_by/having/count_star projections) are
rejected with Error::Config at request build time rather than
silently emitting a malformed V0 request the server would
round-trip-and-reject.

The SDK trampolines (Fetch::fetch_with_metadata_and_proof,
FetchMany::fetch_many_with_metadata_and_proof) populate the new
DocumentQuery.protocol_version_override field from
sdk.protocol_version_number() before transport. Dispatch is a single
TypeId comparison via std::any::Any; no-op for every non-document
request type. Adds a 'static bound to the Fetch::Request /
FetchMany::Request associated types (all existing proto-generated
request types satisfy it).

Fixes the misleading 'could not decode data contracts query' error
text emitted by the documents-query decoder when the V1 oneof tag is
absent (e.g. when a v3.1 SDK sends V1 to a v3.0 server that only
knows V0). The data-contracts handler still uses its own correct
string.

Tests cover V0/V1 wire-shape parity, dispatch by SDK version,
v1-only feature rejection on V0, and with_initial_version atomic
seeding semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 20, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9c82786f-e8f1-4f89-9825-1b47019da711

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch backport/rs-sdk-document-query-v0-v1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added this to the v3.1.0 milestone May 20, 2026
lklimek and others added 5 commits May 20, 2026 16:06
…sion_override

Replaces the smuggling-via-DocumentQuery-field mechanism added in
8d5de89 with a direct &Sdk argument on Query::query(). The
GetDocumentsRequest V0/V1 encoder now reads sdk.version() at the
call site, eliminating:

- DocumentQuery::protocol_version_override field
- #[cfg_attr(feature = "mocks", serde(skip))] workaround
- apply_sdk_protocol_version helper + + 'static trait bounds
- TypeId::downcast_mut hack in Fetch / FetchMany trampolines

Same observable behaviour; cleaner trait shape; PV is now a
first-class concern in the Query trait for future versioned
request types.
… + adopt TryFromPlatformVersioned

PV_V1..V10 were wired to DRIVE_ABCI_QUERY_VERSIONS_V1, causing the SDK
to emit V1 getDocuments wire when seeded with an older PV. Testnet v3.0
HPMNs (PV_11) reject this. Sibling fix to 2b8eae0 which re-pinned PV_11.

Also collapses encode_get_documents_request freestanding helper into a
TryFromPlatformVersioned impl, removing one indirection layer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…transport

Commit 34e0395 added TryFromPlatformVersioned<DocumentQuery> with V0/V1
dispatch, but DocumentQuery::execute_transport still went through the
ambient TryFrom<DocumentQuery> trap that hardcoded PlatformVersion::latest()
(= PV_12 = V1 wire). The runtime PV from sdk.version() never reached the
encoder, so v3.0 testnet (PV_11, V0 wire) still received V1 bytes and
rejected with the "could not decode" storm (165 occurrences in last
funded e2e run).

This commit:
- Adds a DocumentQuery.wire_protocol_version pin and reads it in
  execute_transport via TryFromPlatformVersioned (falls back to
  PlatformVersion::latest() with a debug trace when unset, so a direct
  TransportRequest caller is loud not silent).
- Sets the pin from the Query<T> for T blanket impl in
  platform/query.rs via a runtime Any-downcast on the cloned request
  (the blanket is the only Query<DocumentQuery> path the Fetch /
  FetchMany trampolines reach, since Fetch::Request for Document is
  DocumentQuery, not GetDocumentsRequest). Lower-blast-radius than
  removing the blanket and re-impling for ~50 proto types.
- Deletes the ambient TryFrom<DocumentQuery> for GetDocumentsRequest
  impl (silent PlatformVersion::latest() default was the trap).
- Adds tracing::debug! at the encoder dispatch site and reworks the
  PV ratchet info! to use a stable target/from/to shape (closes
  Marvin's QA-004 observability gap).
- #[serde(skip)] on wire_protocol_version keeps existing mock vectors
  hash-stable: the pin is a transport-side dispatch input, not part of
  the query's identity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the architectural smell flagged after 8c0d614 and prevents recurrence
for the 58 other tracked query operations that may grow versioned wire in the
future. Removes the `Any::downcast_mut::<DocumentQuery>()` runtime type-erasure
from the blanket `Query<T> for T` impl and the `wire_protocol_version: Option<u32>`
field from `DocumentQuery`. The PV-aware encoder now runs inside
`Query::query(&self, sdk)` where `&Sdk` is in hand — extending V0/V1 dispatch to
any future versioned request type is a 5-line `impl Query<NewWireType> for
NewRichType` away.

# What changed

* `Fetch::Query` (new associated type): the user-facing query that callers hand
  to the SDK and that `FromProof` binds to. For non-versioned operations
  `type Query = Self::Request` (one extra line per impl); for documents and the
  six aggregate views (`DocumentCount`/`Sum`/`Average`/`SplitCounts`/
  `SplitSums`/`SplitAverages`) `type Query = DocumentQuery` (the rich form with
  data contract context) and `type Request = GetDocumentsRequest` (the wire).
* `FromProof<Self::Query>` (was `FromProof<Self::Request>`): the proof verifier
  surface keeps binding to the rich form unchanged — zero changes to the eight
  `FromProof<DocumentQuery>` impls in `packages/rs-sdk/src/platform/documents/`.
* `Sdk::parse_proof_with_metadata_and_proof` (renamed parameter source): takes
  `method_name: &'static str` as an explicit argument instead of reading it
  from `O::Request: TransportRequest`, since the rich query is no longer
  required to implement `TransportRequest`. Trampoline call sites pass
  `wire.method_name()` explicitly.
* `DocumentQuery: TransportRequest` impl removed. Only `GetDocumentsRequest`
  implements `TransportRequest` now. Direct callers that constructed a
  `DocumentQuery` and pushed it through `rs-dapi-client` no longer compile —
  they should call `Document::fetch(...)` or
  `DocumentQuery::try_into_request_for_version(pv)` instead.
* `'static` bound dropped from the `Query<T> for T` blanket (was only required
  by the deleted `Any::downcast`).
* Mock infrastructure: `MockDashPlatformSdk::expect[_many]` now keys the
  `from_proof_expectations` cache on the rich `Self::Query` (preserving the
  protocol-version-agnostic mock key property) while keying the DAPI executor
  mock on the wire `Self::Request` (where the proto bytes actually flow). The
  internal `expect` / `remove` helpers take both args explicitly.

# Mock vector regeneration

Existing checked-in vectors under `packages/rs-sdk/tests/vectors/document_*/`
were captured with filenames `msg_DocumentQuery_<hash>.json`. After this
refactor the DAPI executor dumps the wire `GetDocumentsRequest` instead, so the
filenames become `msg_GetDocumentsRequest_<hash>.json` with hashes computed
from proto bytes (PV-coupled). Affected tests in
`packages/rs-sdk/tests/fetch/document.rs` are gated with
`#[ignore = "vectors require regeneration after Fetch::Query/Fetch::Request
split (γ refactor); see commit body"]`:

* `document_read`
* `document_read_no_document`
* `document_list_drive_query`
* `document_list_document_query`
* `document_list_bug_value_text_decode_base58_PLAN_653`

To regenerate, run the live-testnet path that produces vectors
(`yarn start && yarn test:sdk` or per-test `cargo test ... --
--ignored` with `DUMP_DIR` set per the sdk dump conventions). After regen,
the old `msg_DocumentQuery_*.json` files can be deleted.

The remaining document-related tests use `expect_fetch` programmatically and
register expectations at test runtime — those continue to pass without any
vector changes (`tests/fetch/document_count.rs`, `tests/fetch/mock_fetch.rs`,
`tests/fetch/mock_fetch_many.rs::test_mock_document_fetch_many`).

# Public API breaks (acceptable on v3.1-dev)

* `<Document as Fetch>::Request` is now `GetDocumentsRequest`, not
  `DocumentQuery`. Code that named this explicitly breaks.
* `DocumentQuery` no longer implements `TransportRequest`. Callers using
  `DocumentQuery` directly with `rs-dapi-client::DapiRequest` break.
* `DocumentQuery::wire_protocol_version` public field is removed.
* `Query::query` keeps the `(prove: bool, sdk: &Sdk)` signature for this
  commit (the `prove` parameter collapse is the planned second commit per the
  3-commit Phase B plan).

# Verification

`cargo check --workspace --exclude wasm-sdk --exclude wasm-dpp --exclude rs-sdk-ffi` clean.
`cargo check -p wasm-sdk` clean.
`cargo test -p dash-sdk --features mocks,offline-testing --lib` → 138 passed,
0 failed, 6 ignored.
`cargo test -p dash-sdk --features mocks,offline-testing --tests` → 127 passed,
0 failed, 8 ignored (the +4 ignores are the document tests gated above; one was
pre-existing).
`cargo test -p drive-abci --lib query` → 585 passed, 0 failed, 1 ignored.
`cargo test -p platform-version` → 5 passed, 0 failed.
`cargo test -p platform-wallet --no-run` clean.
`cargo fmt --all` applied; `cargo clippy -p dash-sdk --features
mocks,offline-testing --tests -- -D warnings` clean.

`rg 'Any::downcast' packages/rs-sdk/src` returns nothing.
`rg wire_protocol_version packages --include='*.rs'` returns nothing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lklimek lklimek changed the title fix(rs-sdk,drive-abci): version-dispatch GetDocumentsRequest + initial-version builder fix(rs-sdk,drive-abci): SDK emits incompatible getDocuments wire against pre-v3.1 networks May 20, 2026
@lklimek
Copy link
Copy Markdown
Contributor Author

lklimek commented May 21, 2026

replaced by #3711

@lklimek lklimek closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant