guard(abi): compile-time tripwire on the plug-in DP-vtable ABI + ADR-020#348
Merged
Conversation
The standalone-VK no-weave / xrCreateSession-crash bug was an unguarded ABI drift: runtime d665014 inserted is_self_submitting/on_pause/on_resume into struct xrt_display_processor (after v1.4.1) without bumping XRT_PLUGIN_API_VERSION_CURRENT, and the leia plug-in was built against the older layout — so the runtime's set_chroma_key call (slot 0x50) hit the plug-in's destroy (0x50 in the old layout) and self-destructed the DP. Nothing flagged it: the DP vtable is a fixed-offset ABI with no struct_size negotiation and was never covered by the version contract (which only mentioned the structs in xrt_plugin.h), and the loader only logs the negotiated version. This commit adds the immediate, behavior-neutral guard (ADR-020 rule 4) and the policy doc: - xrt_display_processor.h: a block of _Static_asserts pinning every DP-vtable slot index + struct size. Any layout change fails the build locally AND in CI (verified: corrupting a slot index -> error C2338), adjacent to a loud comment telling the author to bump XRT_PLUGIN_API_VERSION_CURRENT, update the asserts, re-pin every plug-in, and follow ADR-020. Offsets asserted as index * sizeof(void*) so the check holds on 32-bit (Android) too. - xrt_plugin.h: widen the API-version contract comment to state the DP vtables are part of this ABI (the documentation gap that let d665014 slip). - docs/adr/ADR-020: the full policy — struct_size + append-only on the DP vtable, major-version = compatibility boundary, loader rejects major mismatch, this tripwire, and plug-in CI pin self-consistency; and why a bundle-level lock-step pairing assert is explicitly NOT the mechanism (it fights independent plug-in/runtime updates). Behavior-neutral: no runtime logic changes. The coordinated struct_size + append-only + loader-enforcement work (ADR-020 rules 1-3, a major bump that rebuilds all first-party plug-ins) is a separate, validated follow-up and does NOT gate the immediate leia v1.0.5 fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dfattal
added a commit
that referenced
this pull request
May 28, 2026
…reject + #342 re-scan (ADR-020) Ships ADR-020 rules 1–3 as the coordinated runtime v1.6.0 major-version bump, preventing the class of bug that broke standalone-VK weaving (leia plugin pinned runtime headers v1.4.1 while runtime was v1.5.2 → DP-vtable offset skew → runtime's set_chroma_key call hit the plug-in's destroy). ABI v2 — struct_size + append-only + loader enforcement: * xrt_display_processor.h: add `uint32_t struct_size; uint32_t reserved_0;` as the 8-byte first-field header. Rewrites the #348 tripwire asserts to anchor at XRT_DP_BASE_OFF = offsetof(.., process_atlas) (asserted == 8 on both 64-bit and 32-bit/Android) plus i*sizeof(void*). Adds XRT_DP_HAS_SLOT(xdp, field): bounds the field against the plug-in's reported struct_size, so appending a new method at the END of the vtable is now backward- AND forward-compatible within a major. Gates all 12 optional inline wrappers on HAS_SLOT in addition to the per-pointer NULL check; mandatory process_atlas (slot 0) + destroy (last) stay un-gated. * xrt_display_processor_{d3d11,d3d12,gl,metal}.h: same 8-byte header + a per-API tripwire block (XRT_DP_{D3D11,D3D12,GL,METAL}_BASE_OFF) + HAS_SLOT gating on every optional wrapper. d3d12 has the extra set_output_format slot (index 1) the other APIs omit. The XRT_DP_ABI_ASSERT / XRT_DP_ABI_MSG / XRT_DP_HAS_SLOT macros are #ifndef-guarded in both the base and per-API headers so any include order is safe (no MSVC C4005 redefinitions). * xrt_plugin.h: XRT_PLUGIN_API_VERSION_2 = 2; XRT_PLUGIN_API_VERSION_CURRENT points at it. v1 → v2 is the one-time break introducing the struct_size header on the DP vtables and turning the loader's version log into an enforced reject. ABI-v1 plug-ins (≤ leia v1.0.5) are rejected and must rebuild against v2 headers. * target_plugin_loader.c: rule 3 enforcement — after a successful xrtPluginNegotiate but BEFORE iface->probe, reject any plug-in whose reported plugin_api_version != XRT_PLUGIN_API_VERSION_CURRENT in all three try_load_one variants (Windows registry, Android, POSIX/JSON). The DLL is unloaded and the loader falls back to the next plug-in (sim_display). * sim_display_processor{,_d3d11,_d3d12,_gl,_metal}.{c,cpp,m}: every factory sets base.struct_size = sizeof(struct xrt_display_processor[_<api>]) before assigning the vtable. calloc already zeroes reserved_0. * oxr_plugin_stub.c: static_assert pin moved from XRT_PLUGIN_API_VERSION_CURRENT == _1 to == _2. Folded #342 — durable DP re-scan at per-client compositor create: The bundle finalize service-restart (shipped in Track A / v0.5.0) covers the fresh-bundle-install ordering, but a service started mid-install outside the bundle path (or a standalone service) still bakes the already-discovered factories into xrt_system_compositor_info forever. This change picks up a vendor plug-in registered AFTER the service started on the next app launch, without requiring a service restart. Mechanism (callback-on-syscomp-info, layering-clean — confirmed with user): * xrt_compositor.h: add `void (*refresh_display_processors)(...)` function-pointer field to xrt_system_compositor_info. * target_plugin_loader.{c,h}: new public target_plugin_refresh_active(). Tracks the winning ProbeOrder of the active plug-in in a new static g_active_probe_order. discover_active_plugin now takes a uint32_t max_probe_order filter — the first-call path passes UINT32_MAX, the refresh path passes the active ProbeOrder so it only attempts STRICTLY-better candidates (never re-probes the active plug-in). Mutex-guarded via os_mutex (not C11 atomics — MinGW caveat per CLAUDE.md). The previous DLL is intentionally leaked on swap, consistent with the existing load-path leak. * target_instance.c: factor xsysc->info.dp_factory_* assignment block into fill_dp_factories_from_plugin() and install refresh_display_processors_cb (calls target_plugin_refresh_active + re-derives factories) on xsysc->info.refresh_display_processors. The in-process / handle path leaves the field NULL — fresh-instance-per- launch already covers it. * comp_multi_compositor.c (VK service path) and comp_d3d11_service.cpp (D3D11 service path) invoke the callback at the top of multi_compositor_create / system_create_native_compositor — i.e. once per IPC client at compositor create, before any DP-factory read. Cleanups bundled with the v1.6.0 bump: * comp_vk_native_compositor.c: the dp_vtable_looks_sane degrade-log now names "a plug-in built against a different runtime ABI major (ADR-020)" alongside the heap-collision possibility (the misleading old wording was flagged in project_plugin_abi_policy_and_release). * comp_d3d11_window.cpp (#345, dev-only): the dev-manifest auto-resolution at WM_WORKSPACE_LAUNCH_APP is now gated on `getenv("DISPLAYXR_DEV") != NULL && getenv("XR_RUNTIME_JSON") == NULL` — installs (with no build/Release/ sibling) no longer force a stale XR_RUNTIME_JSON onto child apps. Docs: * docs/adr/ADR-020: status Proposed → Accepted; Status/rollout section reflects rules 1–3 done in v1.6.0, the per-API DP structs that gained struct_size, and leia v1.0.6's rule-5 pin self-check. * docs/specs/runtime/plugin-discovery.md: §6 rewritten for v2 — describes XRT_DP_HAS_SLOT, the strict major-match reject (rule 3), and the XRT_PLUGIN_API_VERSION_3 reservation for the next break. * CMakeLists.txt: VERSION 1.5.2 → 1.6.0. Validation: scripts\build_windows.bat build → exit 0; all _Static_asserts (base + 4 per-API blocks) hold; DisplayXRClient.dll + displayxr-service.exe + DisplayXR-SimDisplay.dll all relink. Pre-existing C4090 + cmake_install \d warning unchanged. Hardware testing requires the matching leia v1.0.6 (Track B2). With v1.6.0 + the currently-shipped leia v1.0.5 (ABI v1), the loader will log "ABI major mismatch — plugin_api=1, runtime expects 2 ... skipping (ADR-020 rule 3)" and fall back to sim_display SBS — intentional. Memory: project_plugin_abi_policy_and_release. Plan: ~/.claude/plans/task-ship-the-displayxr-ticklish-quokka.md (Track B). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds the immediate, behavior-neutral guard against the class of bug that caused the standalone-VK no-weave /
xrCreateSessioncrash, plus the policy that explains the durable follow-up. No runtime logic changes.Why
That bug was an unguarded ABI drift: runtime
d665014abinsertedis_self_submitting/on_pause/on_resumeintostruct xrt_display_processor(afterv1.4.1) without bumpingXRT_PLUGIN_API_VERSION_CURRENT, and the leia plug-in shipped built against the older layout. The DP vtable is a fixed-offset ABI with nostruct_size, so the runtime'sset_chroma_keycall (slot0x50) landed on the plug-in'sdestroy(0x50in the old layout) and self-destructed the DP right after creation. Nothing flagged it — the version contract only ever mentioned the structs inxrt_plugin.h, the DP vtable was excluded, and the loader only logs the negotiated version.Changes
xrt_display_processor.h— a block of_Static_asserts pinning every DP-vtable slot index + struct size. Any layout change fails the build locally and in CI, right next to a loud comment instructing the author to bumpXRT_PLUGIN_API_VERSION_CURRENT, update the asserts, re-pin every plug-in, and follow ADR-020. Offsets are asserted asindex * sizeof(void *)so the check holds on 32-bit (Android) too.cl /std:c11, exit 0); corrupting a slot index →error C2338, build fails.xrt_plugin.h— widen the API-version contract comment to state the DP vtables are part of this ABI (the documentation gap that letd665014abslip).docs/adr/ADR-020— the full policy:struct_size+ append-only on the DP vtable, major-version = compatibility boundary, loader rejects major mismatch, this tripwire, and plug-in-CI pin self-consistency; and why a bundle-level lock-step pairing assert is explicitly not the mechanism (it fights independent plug-in/runtime updates).Scope
This is the safe-to-land-now slice. The coordinated
struct_size+ append-only + loader-enforcement work (ADR-020 rules 1–3 — a major bump that rebuilds all first-party plug-ins) is a separate, hardware-validated follow-up and does not gate the immediate leiav1.0.5fix.No auto-merge — for @dfattal review.
🤖 Generated with Claude Code