Skip to content

Apps/ModuleLoadTest: add boot-time module load regression test#1666

Merged
bghgary merged 16 commits into
BabylonJS:masterfrom
bghgary:bghgary/module-load-test
Apr 22, 2026
Merged

Apps/ModuleLoadTest: add boot-time module load regression test#1666
bghgary merged 16 commits into
BabylonJS:masterfrom
bghgary:bghgary/module-load-test

Conversation

@bghgary
Copy link
Copy Markdown
Contributor

@bghgary bghgary commented Apr 20, 2026

[Created by Copilot on behalf of @bghgary]

Adds a new Apps/ModuleLoadTest harness that asserts BabylonNative does not
load unexpected native modules on boot. Motivating case: catching regressions
like dbghelp.dll being introduced (currently loaded by bx's
DbgHelpSymbolResolve static initializer).

How it works

  • Pre-static-init baseline. A platform-specific callback captures the
    loaded-module set before any C++ static initializer in this binary runs.
    A main()-entry baseline would miss dbghelp.dll, since bx's static
    initializer fires before main().
    • Windows: TLS callback in .CRT$XLB.
    • macOS: __attribute__((constructor(101))) function ordered before
      normal static initializers.
    • Linux: .init_array entry via the same constructor priority mechanism.
  • Post-boot snapshot. The harness drives BN to a steady boot state
    (graphics device up, all polyfills + plugins initialized, one frame
    rendered) and snapshots again.
  • Asymmetric assertion. We fail only on unexpected new modules.
    Missing-from-delta is environmental variance (GPU SKU, OS patch, launch
    environment, config) and is not a regression.
  • Debug / debugger SKIP. Debug config and debugger-attached runs print
    a SKIP and exit 0 — they load a materially different module set and
    would produce confusing FAILs otherwise.
  • Launch-env allow-list. GPU driver ICDs (NVIDIA/Intel/AMD on Windows,
    Mesa software-renderer versioned libs on Linux) and VS-injected DLLs
    (kernel.appcore.dll, microsoft.internal.warppal*) are filtered via
    IsAllowedOptionalModule so devs see the same verdict from a VS Ctrl-F5
    run as CI does from a plain cmd / terminal launch.

Platforms

  • Windows — primary implementation in App.Win32.cpp.
  • macOSApp.Apple.mm; golden list seeded with the ARM64 paravirt
    runner's delta (appleparavirtgpumetaliogpufamily, iogpu).
  • LinuxApp.X11.cpp; runs under xvfb-run in CI. Golden list seeded
    with the stable Mesa/X11/DRI set (21 entries); versioned Mesa libs
    (libgallium-*.so, libllvm.so.*) are matched via prefix allow-list.

CI wires the test into build-windows.yml, build-macos.yml, and
build-linux.yml (the Linux step wraps with xvfb-run).

Local verification (Windows)

  • RelWithDebInfo: PASS.
  • Release: PASS (identical delta to RelWithDebInfo).
  • Debug: SKIP (by design).
  • VS Ctrl-F5 (no debugger): PASS (after launch-env filter).
  • VS F5 (debugger): SKIP (by design).

CI status

Fully green across all three platforms (Win32, macOS, Ubuntu).

The GetExpectedBootModules() lists are seeded from live CI runs; future
drift (e.g. new runner images, additional configs) will surface as the same
kind of fail-with-delta this test is designed to produce, at which point we
append to the golden list and re-push.

Current Windows list includes dbghelp.dll (pre-existing) so the test
passes — removing it is a separate follow-up.

bghgary and others added 5 commits April 20, 2026 14:13
Adds a dedicated test harness under Apps/ModuleLoadTest that snapshots
modules loaded before C++ static init (via a TLS callback) and after
BabylonNative reaches a stable boot state (graphics device up, all
polyfills + plugins initialized, one frame rendered). The delta is
compared against a golden list to catch new native dependencies being
pulled in on boot.

Motivating case: dbghelp.dll being introduced via bx's
DbgHelpSymbolResolve static initializer. A main()-entry baseline would
miss this because the static fires before main runs; the TLS callback
fires before any C++ static init in this binary.

Design notes:
- Pre-static-init baseline captured in the .CRT$XLB TLS callback.
- Asymmetric assertion: fail only on unexpected new modules (missing
  entries are environmental variance, not regressions).
- Debug config and debugger-attached runs SKIP explicitly.
- Launch-env noise (VS Ctrl-F5 injections, GPU driver ICDs) is
  filtered via IsAllowedOptionalModule.
- Windows-only for this commit; macOS and Linux support will follow
  in this PR.

CI: invoked from build-win32.yml after UnitTests, RelWithDebInfo only,
non-sanitizers configs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Speeds up ModuleLoadTest golden-list iteration. Revert before ready-for-review.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…p after UnitTests failure

- App.Win32.cpp: add bcryptprimitives, d3d10warp, d3d12/core/sdklayers, d3dscache,
  dxilconv, userenv, windows.storage from V8 + D3D12 CI runs
- ci.yml: disable all non-Win32_x64_D3D11 jobs for fast iteration (TEMP, revert)
- build-win32.yml: add always() so Module Load Test runs despite pre-existing
  light-projection UnitTests failure (TEMP, revert after BJS bump)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…verage

Different configs (D3D11/D3D12/V8/JSI) load different modules. Need the full
Win32 matrix to collect the complete union. Sanitizers stays off (step is gated
on !enable-sanitizers); PrecompiledShaderTest uses a different workflow.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- build-win32.yml: remove always() gate on Module Load Test step
- ci.yml: restore full job matrix (non-Win32 jobs were gated off during iteration)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bghgary and others added 7 commits April 21, 2026 09:58
macos-latest (ARM64 paravirtualized GPU runner) surfaces two system modules on first boot: appleparavirtgpumetaliogpufamily and iogpu. Add them to GetExpectedBootModules so the test passes on that runner.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ilure

Linux CI aborted in bgfx at glcontext_egl.cpp:551 (Failed to create surface). Apps/UnitTests runs under the same xvfb-run wrapper without issue, so match its X11 initialization sequence exactly: explicit field-by-field zero of XSetWindowAttributes, a clear-to-black XChangeWindowAttributes, WM_DELETE_WINDOW protocol setup, and the XMapWindow -> XStoreName ordering. bgfx's GL/EGL path is sensitive to this sequencing under Xvfb.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ubuntu-latest with Mesa software renderer under xvfb-run loads a predictable set of X/GL/DRI userspace libs during bgfx init. Add the stable-named ones to GetExpectedBootModules and extend the IsAllowedOptionalModule prefix list with libgallium-* and libllvm.so.* to tolerate Mesa/LLVM version bumps in the runner image.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ModuleSnapshot.Win32.cpp: use a single fixed-size (512) EnumProcessModules call. Avoids the documented race in the two-call sizing pattern (the module list can change between calls per MSDN). Fail loudly with an explicit error if the buffer is ever too small, rather than silently truncating and hiding regressions.

- App.Apple.mm: wrap MTL::CreateSystemDefaultDevice() in NS::SharedPtr via NS::TransferPtr so the +1 retained device is released on scope exit.

- App.cpp: in CompareAndReport, fail loudly if the pre-init baseline is empty. If the platform pre-static-init hook (TLS callback on Win32, __attribute__((constructor)) on Linux/macOS) fails to run, the baseline would be empty and the asymmetric assertion would silently report PASS despite providing no regression coverage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bghgary bghgary marked this pull request as ready for review April 21, 2026 18:42
Copilot AI review requested due to automatic review settings April 21, 2026 18:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Apps/ModuleLoadTest harness to regression-test which native modules are newly loaded during BabylonNative boot, with a pre-static-init baseline and platform-specific expected/optional allow-lists, and wires it into CI on Windows/macOS/Linux.

Changes:

  • Introduces a cross-platform module snapshot + pre-init baseline mechanism (TLS callback on Win32; constructor-priority hooks on macOS/Linux).
  • Adds a boot-driving harness that initializes Graphics + AppRuntime + polyfills/plugins, captures a post-boot snapshot, and reports unexpected new modules.
  • Integrates the new test app into CMake and runs it in GitHub Actions workflows.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Apps/ModuleLoadTest/Source/ModuleSnapshot.macOS.mm macOS dyld-based module enumeration + constructor/destructor baseline capture.
Apps/ModuleLoadTest/Source/ModuleSnapshot.h Defines ModuleSnapshot type and baseline/snapshot APIs.
Apps/ModuleLoadTest/Source/ModuleSnapshot.Win32.cpp Win32 module enumeration + TLS callback baseline capture.
Apps/ModuleLoadTest/Source/ModuleSnapshot.Linux.cpp Linux dl_iterate_phdr module enumeration + constructor/destructor baseline capture.
Apps/ModuleLoadTest/Source/App.h Declares boot runner, diff helpers, expected/optional module hooks, and comparison/reporting API.
Apps/ModuleLoadTest/Source/App.cpp Implements boot sequence, set-difference, printing, and pass/fail reporting logic.
Apps/ModuleLoadTest/Source/App.X11.cpp Linux/X11 entrypoint, expected/optional allow-lists, and X11 window bootstrap for GL.
Apps/ModuleLoadTest/Source/App.Win32.cpp Windows entrypoint, expected/optional allow-lists, and hidden HWND bootstrap.
Apps/ModuleLoadTest/Source/App.Apple.mm macOS entrypoint, expected/optional allow-lists, and Metal device bootstrap.
Apps/ModuleLoadTest/CMakeLists.txt Adds ModuleLoadTest target, links required BabylonNative components, and registers ctest entry.
Apps/CMakeLists.txt Includes ModuleLoadTest subdirectory on supported desktop platforms.
.github/workflows/build-win32.yml Runs ModuleLoadTest in Win32 CI (skipped when sanitizers enabled).
.github/workflows/build-macos.yml Builds and runs ModuleLoadTest in macOS CI.
.github/workflows/build-linux.yml Runs ModuleLoadTest under xvfb-run in Linux CI.

Comment thread .github/workflows/build-macos.yml
Comment thread Apps/ModuleLoadTest/Source/App.X11.cpp Outdated
Comment thread Apps/ModuleLoadTest/Source/ModuleSnapshot.Win32.cpp Outdated
- build-macos.yml / build-linux.yml: skip ModuleLoadTest when sanitizers are enabled. The ASan/UBSan runtime preloads extra dylibs/sos that would show up as unexpected new modules and cause spurious failures. Matches the existing guard on the Win32 workflow.

- App.X11.cpp: replace (char**)&const-pointer cast with a proper char[]/char*[] array for XInternAtoms. The cast is formally UB and unnecessary.

- ModuleSnapshot.Win32.cpp (WideToUtf8): WideCharToMultiByte includes the null terminator in its required-size result, so allocating size bytes and then resizing to converted-1 is correct. The previous code allocated size-1 bytes and let the conversion write the terminator into the std::string's implicit null slot, which was borderline-UB. Also check the conversion return value.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bghgary added a commit that referenced this pull request Apr 21, 2026
## Summary

Bump Babylon.js from `9.0.0` to `9.3.4` across all `package.json` /
`package-lock.json` files under `Apps/`.

## Motivation

Recent CI runs (e.g. on #1666) show `Win32_x64_D3D11` intermittently
failing on the `Light Projection Texture` UnitTests case:

```
[Log] Running Light Projection Texture
[Log] First pixel off at 182856: Value: (51, 51, 53) - Expected: (46, 22, 16)
[Log] Pixel difference: 170840 pixels.
[Log] failed
##[error]Process completed with exit code -1.
```

Two upstream readiness bugs combine to produce this flake, and both need
to land to remove it:

- **BabylonJS/Babylon.js#18255** — "Fix material readiness to gate on
light texture readiness." Shipped in Babylon.js 9.3.2.
- **BabylonJS/Babylon.js#18355** — heightmap `CreateGroundFromHeightMap`
readiness (the async image load was not gated by
`addPendingData`/`removePendingData`, so `scene.isReady()` could return
true before the heightmap was uploaded). Shipped in Babylon.js 9.3.4.

Earlier 9.3.2 / 9.3.3 bumps on this branch picked up #18255 but the
flake persisted because the heightmap race (#18355) was still present.
Bumping to `^9.3.4` picks up both.

## Changes

- `Apps/package.json` + lockfile: bump `babylonjs`,
`babylonjs-gltf2interface`, `babylonjs-gui`, `babylonjs-loaders`,
`babylonjs-materials`, `babylonjs-serializers` to `^9.3.4`.
- `Apps/UnitTests/JavaScript/package.json`: bump `babylonjs`,
`babylonjs-materials`, `@babylonjs/core`, `@babylonjs/materials` to
`^9.3.4`.
- `Apps/PrecompiledShaderTest/JavaScript/package.json` + lockfile: bump
`@babylonjs/core` to `^9.3.4`.

No code changes — dependency bump only.

## Verification

CI on this PR is expected to go green on `Win32_x64_D3D11 Light
Projection Texture` where it was previously flaky. Validated out-of-band
on #1668 (monkey-patched 9.3.3 → 9.3.4 behavior) running LPT ×20 × 6
configs = 120/120 on Ubuntu_GCC_JSC.

---

[Created by Copilot on behalf of @bghgary]

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bghgary and others added 2 commits April 21, 2026 13:55
Each platform's main() had identical boilerplate for the NDEBUG-skip and

debugger-attached skip. Move both to ModuleLoadTest::ShouldSkipEnvironment()

in the shared App.cpp, backed by per-platform IsBeingTraced() declared in

App.h. Each platform now implements IsBeingTraced() (Win32 wraps

::IsDebuggerPresent(); Linux reads /proc/self/status TracerPid; macOS uses

sysctl(KERN_PROC)).

Also remove a stale 'Empty initial seed' comment in App.X11.cpp -- the

Linux golden list has been populated from CI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nfig

Each platform's main() was essentially the same shape after the previous

preflight refactor: skip check, platform-specific setup to populate a

Graphics::Configuration, then RunBoot + CompareAndReport. Move the one

main() to App.cpp and have each platform expose a single

CreateGraphicsConfig() that returns an optional<Configuration>.

Platform-owned resources (HWND, Display*, Window, MTL::Device) are parked

in function-local static storage so they live for the duration of the

process. XCloseDisplay is dropped -- kernel reclaims the FD on exit,

which is the documented safe pattern for short-lived clients.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread Apps/ModuleLoadTest/Source/App.Win32.cpp
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bghgary bghgary enabled auto-merge (squash) April 22, 2026 00:14
@bghgary bghgary merged commit fb416e3 into BabylonJS:master Apr 22, 2026
28 checks passed
@bghgary bghgary deleted the bghgary/module-load-test branch April 22, 2026 15:29
bghgary added a commit that referenced this pull request May 15, 2026
)

## Context

bx commit `3ea49f9` ("Lazy load debug help once it's needed to resolve
callstack", #383) moved the `dlopen("dbghelp.dll")` call out of the
file-scope static's constructor into a lazy `init()` invoked on the
first `writeCallstack` call.

That commit is in the bx submodule of BabylonJS/bgfx.cmake `e5f3f31`,
which is BabylonNative's current `GIT_TAG` pin (root `CMakeLists.txt`).
So a fresh BN build no longer pulls `dbghelp.dll` into the process on
startup.

## Change

Drop `dbghelp.dll` from `GetExpectedBootModules()` in
`Apps/ModuleLoadTest/Source/App.Win32.cpp`, plus the TODO comment that
flagged it as bgfx-blocked. Resolves @bkaradzic-microsoft's review
comment on #1666 (L70).

## Verification

Local RelWithDebInfo build + run (Win11 x64, D3D11 + Chakra):

- Reconfigured CMake (deleted stale `_deps/bgfx.cmake-src` to force
re-fetch at the pinned SHA).
- Built `ModuleLoadTest` RelWithDebInfo.
- Ran the test; verdict `PASS`. `dbghelp.dll` is NOT in the boot delta.
(`imagehlp.dll` still is -- different DLL, image loader.)

[Created by Copilot on behalf of @bghgary]

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants