Apps/ModuleLoadTest: add boot-time module load regression test#1666
Merged
Conversation
Adds a dedicated test harness under Apps/ModuleLoadTest that snapshots modules loaded before C++ static init (via a TLS callback) and after BabylonNative reaches a stable boot state (graphics device up, all polyfills + plugins initialized, one frame rendered). The delta is compared against a golden list to catch new native dependencies being pulled in on boot. Motivating case: dbghelp.dll being introduced via bx's DbgHelpSymbolResolve static initializer. A main()-entry baseline would miss this because the static fires before main runs; the TLS callback fires before any C++ static init in this binary. Design notes: - Pre-static-init baseline captured in the .CRT$XLB TLS callback. - Asymmetric assertion: fail only on unexpected new modules (missing entries are environmental variance, not regressions). - Debug config and debugger-attached runs SKIP explicitly. - Launch-env noise (VS Ctrl-F5 injections, GPU driver ICDs) is filtered via IsAllowedOptionalModule. - Windows-only for this commit; macOS and Linux support will follow in this PR. CI: invoked from build-win32.yml after UnitTests, RelWithDebInfo only, non-sanitizers configs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Speeds up ModuleLoadTest golden-list iteration. Revert before ready-for-review. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…p after UnitTests failure - App.Win32.cpp: add bcryptprimitives, d3d10warp, d3d12/core/sdklayers, d3dscache, dxilconv, userenv, windows.storage from V8 + D3D12 CI runs - ci.yml: disable all non-Win32_x64_D3D11 jobs for fast iteration (TEMP, revert) - build-win32.yml: add always() so Module Load Test runs despite pre-existing light-projection UnitTests failure (TEMP, revert after BJS bump) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…verage Different configs (D3D11/D3D12/V8/JSI) load different modules. Need the full Win32 matrix to collect the complete union. Sanitizers stays off (step is gated on !enable-sanitizers); PrecompiledShaderTest uses a different workflow. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- build-win32.yml: remove always() gate on Module Load Test step - ci.yml: restore full job matrix (non-Win32 jobs were gated off during iteration) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
macos-latest (ARM64 paravirtualized GPU runner) surfaces two system modules on first boot: appleparavirtgpumetaliogpufamily and iogpu. Add them to GetExpectedBootModules so the test passes on that runner. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ilure Linux CI aborted in bgfx at glcontext_egl.cpp:551 (Failed to create surface). Apps/UnitTests runs under the same xvfb-run wrapper without issue, so match its X11 initialization sequence exactly: explicit field-by-field zero of XSetWindowAttributes, a clear-to-black XChangeWindowAttributes, WM_DELETE_WINDOW protocol setup, and the XMapWindow -> XStoreName ordering. bgfx's GL/EGL path is sensitive to this sequencing under Xvfb. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ubuntu-latest with Mesa software renderer under xvfb-run loads a predictable set of X/GL/DRI userspace libs during bgfx init. Add the stable-named ones to GetExpectedBootModules and extend the IsAllowedOptionalModule prefix list with libgallium-* and libllvm.so.* to tolerate Mesa/LLVM version bumps in the runner image. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ModuleSnapshot.Win32.cpp: use a single fixed-size (512) EnumProcessModules call. Avoids the documented race in the two-call sizing pattern (the module list can change between calls per MSDN). Fail loudly with an explicit error if the buffer is ever too small, rather than silently truncating and hiding regressions. - App.Apple.mm: wrap MTL::CreateSystemDefaultDevice() in NS::SharedPtr via NS::TransferPtr so the +1 retained device is released on scope exit. - App.cpp: in CompareAndReport, fail loudly if the pre-init baseline is empty. If the platform pre-static-init hook (TLS callback on Win32, __attribute__((constructor)) on Linux/macOS) fails to run, the baseline would be empty and the asymmetric assertion would silently report PASS despite providing no regression coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new Apps/ModuleLoadTest harness to regression-test which native modules are newly loaded during BabylonNative boot, with a pre-static-init baseline and platform-specific expected/optional allow-lists, and wires it into CI on Windows/macOS/Linux.
Changes:
- Introduces a cross-platform module snapshot + pre-init baseline mechanism (TLS callback on Win32; constructor-priority hooks on macOS/Linux).
- Adds a boot-driving harness that initializes Graphics + AppRuntime + polyfills/plugins, captures a post-boot snapshot, and reports unexpected new modules.
- Integrates the new test app into CMake and runs it in GitHub Actions workflows.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| Apps/ModuleLoadTest/Source/ModuleSnapshot.macOS.mm | macOS dyld-based module enumeration + constructor/destructor baseline capture. |
| Apps/ModuleLoadTest/Source/ModuleSnapshot.h | Defines ModuleSnapshot type and baseline/snapshot APIs. |
| Apps/ModuleLoadTest/Source/ModuleSnapshot.Win32.cpp | Win32 module enumeration + TLS callback baseline capture. |
| Apps/ModuleLoadTest/Source/ModuleSnapshot.Linux.cpp | Linux dl_iterate_phdr module enumeration + constructor/destructor baseline capture. |
| Apps/ModuleLoadTest/Source/App.h | Declares boot runner, diff helpers, expected/optional module hooks, and comparison/reporting API. |
| Apps/ModuleLoadTest/Source/App.cpp | Implements boot sequence, set-difference, printing, and pass/fail reporting logic. |
| Apps/ModuleLoadTest/Source/App.X11.cpp | Linux/X11 entrypoint, expected/optional allow-lists, and X11 window bootstrap for GL. |
| Apps/ModuleLoadTest/Source/App.Win32.cpp | Windows entrypoint, expected/optional allow-lists, and hidden HWND bootstrap. |
| Apps/ModuleLoadTest/Source/App.Apple.mm | macOS entrypoint, expected/optional allow-lists, and Metal device bootstrap. |
| Apps/ModuleLoadTest/CMakeLists.txt | Adds ModuleLoadTest target, links required BabylonNative components, and registers ctest entry. |
| Apps/CMakeLists.txt | Includes ModuleLoadTest subdirectory on supported desktop platforms. |
| .github/workflows/build-win32.yml | Runs ModuleLoadTest in Win32 CI (skipped when sanitizers enabled). |
| .github/workflows/build-macos.yml | Builds and runs ModuleLoadTest in macOS CI. |
| .github/workflows/build-linux.yml | Runs ModuleLoadTest under xvfb-run in Linux CI. |
- build-macos.yml / build-linux.yml: skip ModuleLoadTest when sanitizers are enabled. The ASan/UBSan runtime preloads extra dylibs/sos that would show up as unexpected new modules and cause spurious failures. Matches the existing guard on the Win32 workflow. - App.X11.cpp: replace (char**)&const-pointer cast with a proper char[]/char*[] array for XInternAtoms. The cast is formally UB and unnecessary. - ModuleSnapshot.Win32.cpp (WideToUtf8): WideCharToMultiByte includes the null terminator in its required-size result, so allocating size bytes and then resizing to converted-1 is correct. The previous code allocated size-1 bytes and let the conversion write the terminator into the std::string's implicit null slot, which was borderline-UB. Also check the conversion return value. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bghgary
added a commit
that referenced
this pull request
Apr 21, 2026
## Summary Bump Babylon.js from `9.0.0` to `9.3.4` across all `package.json` / `package-lock.json` files under `Apps/`. ## Motivation Recent CI runs (e.g. on #1666) show `Win32_x64_D3D11` intermittently failing on the `Light Projection Texture` UnitTests case: ``` [Log] Running Light Projection Texture [Log] First pixel off at 182856: Value: (51, 51, 53) - Expected: (46, 22, 16) [Log] Pixel difference: 170840 pixels. [Log] failed ##[error]Process completed with exit code -1. ``` Two upstream readiness bugs combine to produce this flake, and both need to land to remove it: - **BabylonJS/Babylon.js#18255** — "Fix material readiness to gate on light texture readiness." Shipped in Babylon.js 9.3.2. - **BabylonJS/Babylon.js#18355** — heightmap `CreateGroundFromHeightMap` readiness (the async image load was not gated by `addPendingData`/`removePendingData`, so `scene.isReady()` could return true before the heightmap was uploaded). Shipped in Babylon.js 9.3.4. Earlier 9.3.2 / 9.3.3 bumps on this branch picked up #18255 but the flake persisted because the heightmap race (#18355) was still present. Bumping to `^9.3.4` picks up both. ## Changes - `Apps/package.json` + lockfile: bump `babylonjs`, `babylonjs-gltf2interface`, `babylonjs-gui`, `babylonjs-loaders`, `babylonjs-materials`, `babylonjs-serializers` to `^9.3.4`. - `Apps/UnitTests/JavaScript/package.json`: bump `babylonjs`, `babylonjs-materials`, `@babylonjs/core`, `@babylonjs/materials` to `^9.3.4`. - `Apps/PrecompiledShaderTest/JavaScript/package.json` + lockfile: bump `@babylonjs/core` to `^9.3.4`. No code changes — dependency bump only. ## Verification CI on this PR is expected to go green on `Win32_x64_D3D11 Light Projection Texture` where it was previously flaky. Validated out-of-band on #1668 (monkey-patched 9.3.3 → 9.3.4 behavior) running LPT ×20 × 6 configs = 120/120 on Ubuntu_GCC_JSC. --- [Created by Copilot on behalf of @bghgary] --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each platform's main() had identical boilerplate for the NDEBUG-skip and debugger-attached skip. Move both to ModuleLoadTest::ShouldSkipEnvironment() in the shared App.cpp, backed by per-platform IsBeingTraced() declared in App.h. Each platform now implements IsBeingTraced() (Win32 wraps ::IsDebuggerPresent(); Linux reads /proc/self/status TracerPid; macOS uses sysctl(KERN_PROC)). Also remove a stale 'Empty initial seed' comment in App.X11.cpp -- the Linux golden list has been populated from CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nfig Each platform's main() was essentially the same shape after the previous preflight refactor: skip check, platform-specific setup to populate a Graphics::Configuration, then RunBoot + CompareAndReport. Move the one main() to App.cpp and have each platform expose a single CreateGraphicsConfig() that returns an optional<Configuration>. Platform-owned resources (HWND, Display*, Window, MTL::Device) are parked in function-local static storage so they live for the duration of the process. XCloseDisplay is dropped -- kernel reclaims the FD on exit, which is the documented safe pattern for short-lived clients. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bkaradzic-microsoft
approved these changes
Apr 22, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bghgary
added a commit
that referenced
this pull request
May 15, 2026
) ## Context bx commit `3ea49f9` ("Lazy load debug help once it's needed to resolve callstack", #383) moved the `dlopen("dbghelp.dll")` call out of the file-scope static's constructor into a lazy `init()` invoked on the first `writeCallstack` call. That commit is in the bx submodule of BabylonJS/bgfx.cmake `e5f3f31`, which is BabylonNative's current `GIT_TAG` pin (root `CMakeLists.txt`). So a fresh BN build no longer pulls `dbghelp.dll` into the process on startup. ## Change Drop `dbghelp.dll` from `GetExpectedBootModules()` in `Apps/ModuleLoadTest/Source/App.Win32.cpp`, plus the TODO comment that flagged it as bgfx-blocked. Resolves @bkaradzic-microsoft's review comment on #1666 (L70). ## Verification Local RelWithDebInfo build + run (Win11 x64, D3D11 + Chakra): - Reconfigured CMake (deleted stale `_deps/bgfx.cmake-src` to force re-fetch at the pinned SHA). - Built `ModuleLoadTest` RelWithDebInfo. - Ran the test; verdict `PASS`. `dbghelp.dll` is NOT in the boot delta. (`imagehlp.dll` still is -- different DLL, image loader.) [Created by Copilot on behalf of @bghgary] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[Created by Copilot on behalf of @bghgary]
Adds a new
Apps/ModuleLoadTestharness that asserts BabylonNative does notload unexpected native modules on boot. Motivating case: catching regressions
like
dbghelp.dllbeing introduced (currently loaded by bx'sDbgHelpSymbolResolvestatic initializer).How it works
loaded-module set before any C++ static initializer in this binary runs.
A
main()-entry baseline would missdbghelp.dll, since bx's staticinitializer fires before
main()..CRT$XLB.__attribute__((constructor(101)))function ordered beforenormal static initializers.
.init_arrayentry via the same constructor priority mechanism.(graphics device up, all polyfills + plugins initialized, one frame
rendered) and snapshots again.
Missing-from-delta is environmental variance (GPU SKU, OS patch, launch
environment, config) and is not a regression.
a SKIP and exit 0 — they load a materially different module set and
would produce confusing FAILs otherwise.
Mesa software-renderer versioned libs on Linux) and VS-injected DLLs
(
kernel.appcore.dll,microsoft.internal.warppal*) are filtered viaIsAllowedOptionalModuleso devs see the same verdict from a VS Ctrl-F5run as CI does from a plain
cmd/ terminal launch.Platforms
App.Win32.cpp.App.Apple.mm; golden list seeded with the ARM64 paravirtrunner's delta (
appleparavirtgpumetaliogpufamily,iogpu).App.X11.cpp; runs underxvfb-runin CI. Golden list seededwith the stable Mesa/X11/DRI set (21 entries); versioned Mesa libs
(
libgallium-*.so,libllvm.so.*) are matched via prefix allow-list.CI wires the test into
build-windows.yml,build-macos.yml, andbuild-linux.yml(the Linux step wraps withxvfb-run).Local verification (Windows)
CI status
Fully green across all three platforms (Win32, macOS, Ubuntu).
The
GetExpectedBootModules()lists are seeded from live CI runs; futuredrift (e.g. new runner images, additional configs) will surface as the same
kind of fail-with-delta this test is designed to produce, at which point we
append to the golden list and re-push.
Current Windows list includes
dbghelp.dll(pre-existing) so the testpasses — removing it is a separate follow-up.