Skip to content

feat(abi): plug-in ABI major v2 — struct_size on DP vtables + loader reject + #342 re-scan (ADR-020 / v1.6.0)#351

Merged
dfattal merged 1 commit into
mainfrom
feat/abi-v2-runtime-v1.6.0
May 28, 2026
Merged

feat(abi): plug-in ABI major v2 — struct_size on DP vtables + loader reject + #342 re-scan (ADR-020 / v1.6.0)#351
dfattal merged 1 commit into
mainfrom
feat/abi-v2-runtime-v1.6.0

Conversation

@dfattal
Copy link
Copy Markdown
Collaborator

@dfattal dfattal commented May 28, 2026

Track B1 — runtime v1.6.0, plug-in ABI major v2

Ships ADR-020 rules 1–3 as the coordinated runtime major bump that prevents the class of bug that broke standalone-VK weaving (leia v1.4.1 headers vs runtime v1.5.2 → DP vtable offset skew → runtime's set_chroma_key call hit the plug-in's destroy).

Track A (bundle v0.5.0 with the leia v1.0.5 pin fix + #346 version gate + #342 finalize restart) shipped 2026-05-27 and was validated end-to-end on the Leia box (fresh install + upgrade, no reboot). With Track A out, this PR is unblocked.

ABI v2 — struct_size + append-only + loader enforcement

  • xrt_display_processor.h: 8-byte struct_size/reserved_0 header at the top of the vtable. guard(abi): compile-time tripwire on the plug-in DP-vtable ABI + ADR-020 #348 asserts rewritten to anchor at XRT_DP_BASE_OFF (= offsetof(.., process_atlas), asserted == 8 on both 64- and 32-bit) plus i*sizeof(void*). New XRT_DP_HAS_SLOT(xdp, field) macro bounds the field against the plug-in's reported struct_size, so appending a new method at the END of the vtable is now backward AND forward compatible within a major. All 12 optional inline wrappers gate on HAS_SLOT in addition to the NULL check; mandatory process_atlas (slot 0) + destroy (last) stay un-gated.
  • xrt_display_processor_{d3d11,d3d12,gl,metal}.h: same 8-byte header + a per-API tripwire block (XRT_DP_{D3D11,D3D12,GL,METAL}_BASE_OFF) + HAS_SLOT gating on every optional wrapper. d3d12 has the extra set_output_format slot the other APIs omit. The XRT_DP_ABI_ASSERT / XRT_DP_ABI_MSG / XRT_DP_HAS_SLOT macros are #ifndef-guarded in both base and per-API headers so any include order is safe (no MSVC C4005).
  • xrt_plugin.h: XRT_PLUGIN_API_VERSION_2 = 2; XRT_PLUGIN_API_VERSION_CURRENT now points at it. The v1 → v2 break is the one-time introduction of struct_size on the DP vtables + the flip from loader-log to loader-reject. ABI-v1 plug-ins (≤ leia v1.0.5) are rejected.
  • target_plugin_loader.c: rule 3 enforcement — after a successful xrtPluginNegotiate but before iface->probe, reject any plug-in whose reported plugin_api_version != XRT_PLUGIN_API_VERSION_CURRENT in all three try_load_one variants (Windows registry, Android, POSIX/JSON). DLL unloaded; loader falls back to the next plug-in / sim_display.
  • sim_display_processor{,_d3d11,_d3d12,_gl,_metal}.{c,cpp,m}: every factory sets base.struct_size = sizeof(struct xrt_display_processor[_<api>]) before assigning the vtable.
  • oxr_plugin_stub.c: _Static_assert pin moved from XRT_PLUGIN_API_VERSION_CURRENT == _1 to == _2.

Folded #342 — durable DP re-scan at per-client compositor create

Track A's bundle finalize service-restart deterministically covers the fresh-bundle-install ordering, but a service started mid-install outside the bundle path (or a standalone service) still bakes the already-discovered factories into xrt_system_compositor_info forever. This change picks up a vendor plug-in registered AFTER the service started on the next app launch, without requiring a service restart.

Layering-clean mechanism (callback-on-syscomp-info — confirmed with @dfattal):

  • xrt_compositor.h: add void (*refresh_display_processors)(...) fn-ptr to xrt_system_compositor_info.
  • target_plugin_loader.{c,h}: new public target_plugin_refresh_active(). Tracks the winning ProbeOrder in a new static g_active_probe_order; discover_active_plugin now takes a uint32_t max_probe_order filter — first-call path passes UINT32_MAX, refresh path passes the active ProbeOrder so it only attempts strictly-better candidates (never re-probes the active plug-in). os_mutex guarded (not C11 atomics — MinGW caveat). Previous DLL intentionally leaked on swap, consistent with the existing load-path leak.
  • target_instance.c: factor the xsysc->info.dp_factory_* assignment block into fill_dp_factories_from_plugin(); install refresh_display_processors_cb (calls target_plugin_refresh_active + re-derives factories) on xsysc->info. In-process / handle path leaves the field NULL — fresh-instance-per-launch already covers it.
  • comp_multi_compositor.c (VK service) and comp_d3d11_service.cpp (D3D11 service) invoke the callback at the top of multi_compositor_create / system_create_native_compositor — once per IPC client, before any DP-factory read.

Cleanups bundled with the bump

  • comp_vk_native_compositor.c: the dp_vtable_looks_sane degrade-log now names "a plug-in built against a different runtime ABI major (ADR-020)" alongside the heap-collision possibility.
  • comp_d3d11_window.cpp (Dev-manifest auto-resolution forces stale XR_RUNTIME_JSON onto child apps on dev boxes #345, dev-only): dev-manifest auto-resolution at WM_WORKSPACE_LAUNCH_APP now gated on getenv("DISPLAYXR_DEV") != NULL && getenv("XR_RUNTIME_JSON") == NULL. End-user installs (no build/Release/ sibling) no longer have a stale XR_RUNTIME_JSON forced onto child apps.
  • docs/adr/ADR-020: status Proposed → Accepted; rules 1–3 marked Done (v1.6.0, ABI major 2); per-API DP structs called out.
  • docs/specs/runtime/plugin-discovery.md: §6 rewritten for v2.
  • CMakeLists.txt: VERSION 1.5.2 → 1.6.0.

Validation

  • Local scripts\build_windows.bat build → exit 0.
  • All _Static_asserts hold (base + 4 per-API blocks) on MSVC x64.
  • DisplayXRClient.dll (1.81 MB), displayxr-service.exe (1.62 MB), DisplayXR-SimDisplay.dll (60 KB) all relink.
  • Only pre-existing warnings (oxr_session.c:2867 C4090 + the unrelated cmake_install.cmake:101 \d escape).

Sequencing — what ships next

  1. This PR → review → merge → /release v1.6.0 cuts the tag.
  2. B2 — leia v1.0.6 (separate PR in displayxr-leia-plugin): struct_size in 4 leia factories, re-pin DXR_RUNTIME_GIT_TAG "v1.5.2" → "v1.6.0", workflow ref: 'v1.5.2' → 'v1.6.0', rule-5 CI pin self-check, VERSION 1.0.5 → 1.0.6. Mechanical (~10 min) once the v1.6.0 tag exists.
  3. B4 — bundle v0.6.0: versions.json re-pin to runtime v1.6.0 + leia v1.0.6.

Expected behavior post-merge (before B2 ships)

On the Leia box with the currently-shipped leia v1.0.5 (ABI v1), the new loader will log:

plugin loader:   leia-sr: ABI major mismatch — plugin_api=1, runtime expects 2;
the plug-in must be rebuilt against this runtime's headers — skipping (ADR-020 rule 3).

…and fall back to sim_display SBS. That's the intentional Track-B cliff — leia v1.0.6 (B2) restores the weave.

Plan: ~/.claude/plans/task-ship-the-displayxr-ticklish-quokka.md (Track B).
Memory: project_plugin_abi_policy_and_release.

🤖 Generated with Claude Code

…reject + #342 re-scan (ADR-020)

Ships ADR-020 rules 1–3 as the coordinated runtime v1.6.0 major-version bump,
preventing the class of bug that broke standalone-VK weaving (leia plugin
pinned runtime headers v1.4.1 while runtime was v1.5.2 → DP-vtable offset
skew → runtime's set_chroma_key call hit the plug-in's destroy).

ABI v2 — struct_size + append-only + loader enforcement:

* xrt_display_processor.h: add `uint32_t struct_size; uint32_t reserved_0;`
  as the 8-byte first-field header. Rewrites the #348 tripwire asserts to
  anchor at XRT_DP_BASE_OFF = offsetof(.., process_atlas) (asserted == 8 on
  both 64-bit and 32-bit/Android) plus i*sizeof(void*). Adds
  XRT_DP_HAS_SLOT(xdp, field): bounds the field against the plug-in's
  reported struct_size, so appending a new method at the END of the vtable is
  now backward- AND forward-compatible within a major. Gates all 12 optional
  inline wrappers on HAS_SLOT in addition to the per-pointer NULL check;
  mandatory process_atlas (slot 0) + destroy (last) stay un-gated.

* xrt_display_processor_{d3d11,d3d12,gl,metal}.h: same 8-byte header + a
  per-API tripwire block (XRT_DP_{D3D11,D3D12,GL,METAL}_BASE_OFF) + HAS_SLOT
  gating on every optional wrapper. d3d12 has the extra set_output_format
  slot (index 1) the other APIs omit. The XRT_DP_ABI_ASSERT / XRT_DP_ABI_MSG /
  XRT_DP_HAS_SLOT macros are #ifndef-guarded in both the base and per-API
  headers so any include order is safe (no MSVC C4005 redefinitions).

* xrt_plugin.h: XRT_PLUGIN_API_VERSION_2 = 2;
  XRT_PLUGIN_API_VERSION_CURRENT points at it. v1 → v2 is the one-time
  break introducing the struct_size header on the DP vtables and turning
  the loader's version log into an enforced reject. ABI-v1 plug-ins
  (≤ leia v1.0.5) are rejected and must rebuild against v2 headers.

* target_plugin_loader.c: rule 3 enforcement — after a successful
  xrtPluginNegotiate but BEFORE iface->probe, reject any plug-in whose
  reported plugin_api_version != XRT_PLUGIN_API_VERSION_CURRENT in all
  three try_load_one variants (Windows registry, Android, POSIX/JSON).
  The DLL is unloaded and the loader falls back to the next plug-in
  (sim_display).

* sim_display_processor{,_d3d11,_d3d12,_gl,_metal}.{c,cpp,m}: every
  factory sets base.struct_size = sizeof(struct xrt_display_processor[_<api>])
  before assigning the vtable. calloc already zeroes reserved_0.

* oxr_plugin_stub.c: static_assert pin moved from
  XRT_PLUGIN_API_VERSION_CURRENT == _1 to == _2.

Folded #342 — durable DP re-scan at per-client compositor create:

The bundle finalize service-restart (shipped in Track A / v0.5.0) covers
the fresh-bundle-install ordering, but a service started mid-install
outside the bundle path (or a standalone service) still bakes the
already-discovered factories into xrt_system_compositor_info forever.
This change picks up a vendor plug-in registered AFTER the service
started on the next app launch, without requiring a service restart.

Mechanism (callback-on-syscomp-info, layering-clean — confirmed with user):

* xrt_compositor.h: add `void (*refresh_display_processors)(...)`
  function-pointer field to xrt_system_compositor_info.

* target_plugin_loader.{c,h}: new public target_plugin_refresh_active().
  Tracks the winning ProbeOrder of the active plug-in in a new static
  g_active_probe_order. discover_active_plugin now takes a
  uint32_t max_probe_order filter — the first-call path passes
  UINT32_MAX, the refresh path passes the active ProbeOrder so it only
  attempts STRICTLY-better candidates (never re-probes the active
  plug-in). Mutex-guarded via os_mutex (not C11 atomics — MinGW caveat
  per CLAUDE.md). The previous DLL is intentionally leaked on swap,
  consistent with the existing load-path leak.

* target_instance.c: factor xsysc->info.dp_factory_* assignment block
  into fill_dp_factories_from_plugin() and install
  refresh_display_processors_cb (calls target_plugin_refresh_active +
  re-derives factories) on xsysc->info.refresh_display_processors. The
  in-process / handle path leaves the field NULL — fresh-instance-per-
  launch already covers it.

* comp_multi_compositor.c (VK service path) and comp_d3d11_service.cpp
  (D3D11 service path) invoke the callback at the top of
  multi_compositor_create / system_create_native_compositor — i.e. once
  per IPC client at compositor create, before any DP-factory read.

Cleanups bundled with the v1.6.0 bump:

* comp_vk_native_compositor.c: the dp_vtable_looks_sane degrade-log now
  names "a plug-in built against a different runtime ABI major (ADR-020)"
  alongside the heap-collision possibility (the misleading old wording
  was flagged in project_plugin_abi_policy_and_release).

* comp_d3d11_window.cpp (#345, dev-only): the dev-manifest
  auto-resolution at WM_WORKSPACE_LAUNCH_APP is now gated on
  `getenv("DISPLAYXR_DEV") != NULL && getenv("XR_RUNTIME_JSON") == NULL`
  — installs (with no build/Release/ sibling) no longer force a stale
  XR_RUNTIME_JSON onto child apps.

Docs:

* docs/adr/ADR-020: status Proposed → Accepted; Status/rollout section
  reflects rules 1–3 done in v1.6.0, the per-API DP structs that gained
  struct_size, and leia v1.0.6's rule-5 pin self-check.

* docs/specs/runtime/plugin-discovery.md: §6 rewritten for v2 — describes
  XRT_DP_HAS_SLOT, the strict major-match reject (rule 3), and the
  XRT_PLUGIN_API_VERSION_3 reservation for the next break.

* CMakeLists.txt: VERSION 1.5.2 → 1.6.0.

Validation: scripts\build_windows.bat build → exit 0; all _Static_asserts
(base + 4 per-API blocks) hold; DisplayXRClient.dll + displayxr-service.exe
+ DisplayXR-SimDisplay.dll all relink. Pre-existing C4090 + cmake_install
\d warning unchanged.

Hardware testing requires the matching leia v1.0.6 (Track B2). With v1.6.0
+ the currently-shipped leia v1.0.5 (ABI v1), the loader will log
"ABI major mismatch — plugin_api=1, runtime expects 2 ... skipping
(ADR-020 rule 3)" and fall back to sim_display SBS — intentional.

Memory: project_plugin_abi_policy_and_release.
Plan: ~/.claude/plans/task-ship-the-displayxr-ticklish-quokka.md (Track B).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant