ci: fail screenshot suites on missing screenshots; fix DialogTheme Metal hang#5135
Merged
Conversation
The iOS Metal screenshot suite silently shrank from 122 captures to 107 because DialogTheme hangs the Metal renderer partway through the run: the app dies right after DialogTheme_dark, so that test and every one of the 14 after it are recorded as "missing_actual" - yet the job still passed. The screenshot pipelines only failed on pixel mismatches (and only on some platforms), never on a shrinking suite, so a hang/crash that drops the tail of the suite went completely unnoticed. Centralise both guards in scripts/lib/cn1ss.sh's cn1ss_process_and_report (the one place every platform's runner already calls): - mismatch guard (existing): return 15 on any "different"/"error" entry - missing-screenshot guard (new): cn1ss_count_missing counts "missing_actual" entries; return 17 when that exceeds CN1SS_ALLOWED_MISSING (default 0). A missing/empty comparison JSON counts as a large miss, so "no data" fails loudly instead of passing. A len(results) count check (what the iOS/Mac runners used to do) cannot catch this: the harness still lists all 122 test names, just with 15 flagged missing_actual. Counting the missing entries directly is what makes the regression visible. The old, duplicated count checks in the iOS and Mac runners are removed in favour of the shared logic. Both guards are gated behind CN1SS_FAIL_ON_MISMATCH=1, now enabled on every pipeline (iOS GL, iOS Metal, Mac native, JavaSE; Android and JavaScript already had it). Each runner surfaces exit 15/17 as its own exit status so the GitHub Actions step goes red. Per-pipeline tolerances reflect verified steady-state gaps from the latest green master runs: - iOS GL / iOS Metal: CN1SS_ALLOWED_MISSING=2 (OrientationLock and MutableImageReadback do not render on either iOS backend) - Mac native: CN1SS_ALLOWED_MISSING=2 (MutableImageReadback, MorphTransitionSnapshot, while the Catalyst port matures) - Android, JavaScript, JavaSE: 0 Verified cn1ss_count_missing against the real screenshot-compare.json from each pipeline's latest run: the Metal 107 run reports 15 missing (15 > 2 -> exit 17, job fails), while a healthy local Metal run reports 0, and a missing JSON yields the 999999 sentinel. Also refresh the stale JavaSE golden javase-single-network-monitor.png from the latest master run artifact (it drifted ~1.4% after the NetworkMonitor resize change and was the lone JavaSE mismatch). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DialogTheme is the only screenshot test with useTexturedBackdrop()=true. Its TextureBackdropPainter rendered a mutable image (Image.createImage().getGraphics() + a scanline fill loop) lazily from inside Form.paintBackground() - i.e. while the screen's render-command encoder was still open. On the iOS Metal port that nests a second mutable-image encoder on the same command buffer and races the global active encoder (CN1Metalcompat.m's activeEncoder). On CI it intermittently hung the renderer right after DialogTheme: that capture and every test after it came back "missing_actual", silently shrinking the Metal suite from 122 captures to 107. Move the mutable-image render off the paint pass: TextureBackdropPainter now exposes prepare(w,h), called on the EDT before form.show() at the display size (which equals the form's full-screen paintBackground rect). paint() only blits the already-finished bitmap. If a size it was not prepared for ever reaches paint() (e.g. an orientation change), it draws a plain solid base fill rather than rendering the texture inline - the capture path always prepares the right size first, so that branch is a safety net, never the screenshotted frame. The hang is timing-dependent and surfaces on CI's renderer, not on a fast local GPU (local Metal runs hit 122/122 both before and after this change even with MTL_DEBUG_LAYER_ERROR_MODE=assert forwarded), so the local rebuild proves no-regression and that the texture still renders: the suite reaches CN1SS:SUITE:FINISHED, DialogTheme starts and finishes cleanly, both DialogTheme_light and DialogTheme_dark decode, the run is 122/122 with 0 missing, and the decoded DialogTheme_light still shows the diagonal-stripe texture (42 distinct colours, i.e. the prepare() fast path, not the 1-colour plain-fill fallback). The new missing-screenshot CI guard guarantees a recurrence fails loudly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
Compared 11 screenshots: 11 matched. |
Contributor
✅ Continuous Quality ReportTest & Coverage
Static Analysis
Generated automatically by the PR CI workflow. |
Collaborator
Author
|
Compared 122 screenshots: 122 matched. Native Android coverage
✅ Native Android screenshot tests passed. Native Android coverage
Benchmark ResultsDetailed Performance Metrics
|
Collaborator
Author
|
Compared 122 screenshots: 122 matched. Benchmark Results
Detailed Performance Metrics
|
Collaborator
Author
|
Compared 122 screenshots: 122 matched. Benchmark Results
Build and Run Timing
Detailed Performance Metrics
|
Collaborator
Author
|
Compared 121 screenshots: 121 matched. Benchmark Results
Build and Run Timing
Detailed Performance Metrics
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The iOS Metal screenshot suite silently shrank from 122 captures to 107 and the job still passed. Two independent gaps combined to hide a real regression:
A renderer hang.
DialogThemeis the only screenshot test withuseTexturedBackdrop()=true. ItsTextureBackdropPainterrendered a mutable image (Image.createImage().getGraphics()+ a scanline fill loop) lazily from insideForm.paintBackground()— i.e. while the screen's Metal render-command encoder was still open. On the iOS Metal port that nests a second mutable-image encoder on the same command buffer and races the global active encoder (CN1Metalcompat.m'sactiveEncoder). On CI it intermittently hung the renderer right afterDialogTheme; that capture and every test after it came backmissing_actual.CI couldn't see it. The screenshot pipelines only failed on pixel mismatches (and only on some platforms), never on a shrinking suite. A
len(results)count check can't catch this either — the harness still lists all 122 test names, just with 15 flaggedmissing_actual.Fix
Catch it (CI). Centralise both guards in
scripts/lib/cn1ss.sh'scn1ss_process_and_report(the one place every platform's runner already calls):return 15on anydifferent/errorentry.cn1ss_count_missingcountsmissing_actualentries;return 17when that exceedsCN1SS_ALLOWED_MISSING(default0). A missing/empty comparison JSON yields a large sentinel so "no data" fails loudly instead of passing.Both are gated behind
CN1SS_FAIL_ON_MISMATCH=1, now enabled on every pipeline (iOS GL, iOS Metal, Mac native, JavaSE; Android and JavaScript already had it). Each runner surfaces exit 15/17 as its own status. The old, duplicatedlen(results)count checks in the iOS/Mac runners are removed in favour of the shared logic.Per-pipeline tolerances reflect verified steady-state gaps from the latest green master runs:
CN1SS_ALLOWED_MISSINGOrientationLock,MutableImageReadback(don't render on either iOS backend)MutableImageReadback,MorphTransitionSnapshot(while the Catalyst port matures)Verified
cn1ss_count_missingagainst the realscreenshot-compare.jsonfrom each pipeline's latest run: the Metal 107 run reports 15 missing (15 > 2 → exit 17, job fails), a healthy local Metal run reports 0, and a missing JSON yields the sentinel.Fix the hang (root cause).
TextureBackdropPainternow exposesprepare(w,h), called on the EDT beforeform.show()at the display size (which equals the form's full-screenpaintBackgroundrect).paint()only blits the already-finished bitmap; if a size it wasn't prepared for ever reachespaint()(e.g. an orientation change) it draws a plain solid base fill rather than rendering the texture inline. The mutable-image encoder is now opened entirely outside the screen render pass.Stale golden. Refresh
scripts/javase/screenshots/javase-single-network-monitor.pngfrom the latest master run artifact — it drifted ~1.4% after the NetworkMonitor resize change (#3701) and was the lone JavaSE mismatch.Local verification (iOS Metal simulator, Xcode 26.3)
Built the sample app with
codename1.arg.ios.metal=trueand ran the full screenshot suite withMTL_DEBUG_LAYER=1 MTL_DEBUG_LAYER_ERROR_MODE=assert. With the fix:CN1SS:SUITE:FINISHEDDialogThemeboth starts and finishes cleanly; bothDialogTheme_lightandDialogTheme_darkare capturedScope of the local proof: local headless Metal captures come back low-fidelity for every test (not just DialogTheme) —
Display.screenshotreads through the Metal display layer whose drawable lags the EDT on headless macOS, so all 122 local captures are near-blank. The local run therefore proves no-hang + no-regression (DialogTheme finishes, suite stays at 122), but it cannot visually confirm the stripe texture; that confirmation is the green Metal CI job. The hang is timing-dependent and reproduces on CI's renderer rather than a fast local GPU, which is also why a local run alone never tripped it. The new missing-screenshot guard guarantees that if it ever recurs, the suite fails loudly instead of passing at 107.🤖 Generated with Claude Code