Skip to content

ci: fail screenshot suites on missing screenshots; fix DialogTheme Metal hang#5135

Merged
liannacasper merged 2 commits into
masterfrom
ci/screenshot-strict-gating
May 31, 2026
Merged

ci: fail screenshot suites on missing screenshots; fix DialogTheme Metal hang#5135
liannacasper merged 2 commits into
masterfrom
ci/screenshot-strict-gating

Conversation

@shai-almog
Copy link
Copy Markdown
Collaborator

@shai-almog shai-almog commented May 31, 2026

Problem

The iOS Metal screenshot suite silently shrank from 122 captures to 107 and the job still passed. Two independent gaps combined to hide a real regression:

  1. A renderer hang. DialogTheme is the only screenshot test with useTexturedBackdrop()=true. Its TextureBackdropPainter rendered a mutable image (Image.createImage().getGraphics() + a scanline fill loop) lazily from inside Form.paintBackground() — i.e. while the screen's Metal render-command encoder was still open. On the iOS Metal port that nests a second mutable-image encoder on the same command buffer and races the global active encoder (CN1Metalcompat.m's activeEncoder). On CI it intermittently hung the renderer right after DialogTheme; that capture and every test after it came back missing_actual.

  2. CI couldn't see it. The screenshot pipelines only failed on pixel mismatches (and only on some platforms), never on a shrinking suite. A len(results) count check can't catch this either — the harness still lists all 122 test names, just with 15 flagged missing_actual.

Fix

Catch it (CI). Centralise both guards in scripts/lib/cn1ss.sh's cn1ss_process_and_report (the one place every platform's runner already calls):

  • mismatch guard (existing): return 15 on any different/error entry.
  • missing-screenshot guard (new): cn1ss_count_missing counts missing_actual entries; return 17 when that exceeds CN1SS_ALLOWED_MISSING (default 0). A missing/empty comparison JSON yields a large sentinel so "no data" fails loudly instead of passing.

Both are gated behind CN1SS_FAIL_ON_MISMATCH=1, now enabled on every pipeline (iOS GL, iOS Metal, Mac native, JavaSE; Android and JavaScript already had it). Each runner surfaces exit 15/17 as its own status. The old, duplicated len(results) count checks in the iOS/Mac runners are removed in favour of the shared logic.

Per-pipeline tolerances reflect verified steady-state gaps from the latest green master runs:

Pipeline CN1SS_ALLOWED_MISSING Known gaps
iOS GL / iOS Metal 2 OrientationLock, MutableImageReadback (don't render on either iOS backend)
Mac native 2 MutableImageReadback, MorphTransitionSnapshot (while the Catalyst port matures)
Android / JavaScript / JavaSE 0

Verified cn1ss_count_missing against the real screenshot-compare.json from each pipeline's latest run: the Metal 107 run reports 15 missing (15 > 2 → exit 17, job fails), a healthy local Metal run reports 0, and a missing JSON yields the sentinel.

Fix the hang (root cause). TextureBackdropPainter now exposes prepare(w,h), called on the EDT before form.show() at the display size (which equals the form's full-screen paintBackground rect). paint() only blits the already-finished bitmap; if a size it wasn't prepared for ever reaches paint() (e.g. an orientation change) it draws a plain solid base fill rather than rendering the texture inline. The mutable-image encoder is now opened entirely outside the screen render pass.

Stale golden. Refresh scripts/javase/screenshots/javase-single-network-monitor.png from the latest master run artifact — it drifted ~1.4% after the NetworkMonitor resize change (#3701) and was the lone JavaSE mismatch.

Local verification (iOS Metal simulator, Xcode 26.3)

Built the sample app with codename1.arg.ios.metal=true and ran the full screenshot suite with MTL_DEBUG_LAYER=1 MTL_DEBUG_LAYER_ERROR_MODE=assert. With the fix:

  • the suite reaches CN1SS:SUITE:FINISHED
  • DialogTheme both starts and finishes cleanly; both DialogTheme_light and DialogTheme_dark are captured
  • the run is 122/122 with 0 missing — DialogTheme no longer hangs and nothing else regresses

Scope of the local proof: local headless Metal captures come back low-fidelity for every test (not just DialogTheme) — Display.screenshot reads through the Metal display layer whose drawable lags the EDT on headless macOS, so all 122 local captures are near-blank. The local run therefore proves no-hang + no-regression (DialogTheme finishes, suite stays at 122), but it cannot visually confirm the stripe texture; that confirmation is the green Metal CI job. The hang is timing-dependent and reproduces on CI's renderer rather than a fast local GPU, which is also why a local run alone never tripped it. The new missing-screenshot guard guarantees that if it ever recurs, the suite fails loudly instead of passing at 107.

🤖 Generated with Claude Code

shai-almog and others added 2 commits May 31, 2026 07:55
The iOS Metal screenshot suite silently shrank from 122 captures to 107
because DialogTheme hangs the Metal renderer partway through the run: the
app dies right after DialogTheme_dark, so that test and every one of the
14 after it are recorded as "missing_actual" - yet the job still passed.
The screenshot pipelines only failed on pixel mismatches (and only on
some platforms), never on a shrinking suite, so a hang/crash that drops
the tail of the suite went completely unnoticed.

Centralise both guards in scripts/lib/cn1ss.sh's cn1ss_process_and_report
(the one place every platform's runner already calls):

- mismatch guard (existing): return 15 on any "different"/"error" entry
- missing-screenshot guard (new): cn1ss_count_missing counts
  "missing_actual" entries; return 17 when that exceeds
  CN1SS_ALLOWED_MISSING (default 0). A missing/empty comparison JSON
  counts as a large miss, so "no data" fails loudly instead of passing.

A len(results) count check (what the iOS/Mac runners used to do) cannot
catch this: the harness still lists all 122 test names, just with 15
flagged missing_actual. Counting the missing entries directly is what
makes the regression visible. The old, duplicated count checks in the
iOS and Mac runners are removed in favour of the shared logic.

Both guards are gated behind CN1SS_FAIL_ON_MISMATCH=1, now enabled on
every pipeline (iOS GL, iOS Metal, Mac native, JavaSE; Android and
JavaScript already had it). Each runner surfaces exit 15/17 as its own
exit status so the GitHub Actions step goes red.

Per-pipeline tolerances reflect verified steady-state gaps from the
latest green master runs:
- iOS GL / iOS Metal: CN1SS_ALLOWED_MISSING=2 (OrientationLock and
  MutableImageReadback do not render on either iOS backend)
- Mac native: CN1SS_ALLOWED_MISSING=2 (MutableImageReadback,
  MorphTransitionSnapshot, while the Catalyst port matures)
- Android, JavaScript, JavaSE: 0

Verified cn1ss_count_missing against the real screenshot-compare.json
from each pipeline's latest run: the Metal 107 run reports 15 missing
(15 > 2 -> exit 17, job fails), while a healthy local Metal run reports
0, and a missing JSON yields the 999999 sentinel.

Also refresh the stale JavaSE golden javase-single-network-monitor.png
from the latest master run artifact (it drifted ~1.4% after the
NetworkMonitor resize change and was the lone JavaSE mismatch).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DialogTheme is the only screenshot test with useTexturedBackdrop()=true.
Its TextureBackdropPainter rendered a mutable image
(Image.createImage().getGraphics() + a scanline fill loop) lazily from
inside Form.paintBackground() - i.e. while the screen's render-command
encoder was still open. On the iOS Metal port that nests a second
mutable-image encoder on the same command buffer and races the global
active encoder (CN1Metalcompat.m's activeEncoder). On CI it
intermittently hung the renderer right after DialogTheme: that capture
and every test after it came back "missing_actual", silently shrinking
the Metal suite from 122 captures to 107.

Move the mutable-image render off the paint pass: TextureBackdropPainter
now exposes prepare(w,h), called on the EDT before form.show() at the
display size (which equals the form's full-screen paintBackground rect).
paint() only blits the already-finished bitmap. If a size it was not
prepared for ever reaches paint() (e.g. an orientation change), it draws
a plain solid base fill rather than rendering the texture inline - the
capture path always prepares the right size first, so that branch is a
safety net, never the screenshotted frame.

The hang is timing-dependent and surfaces on CI's renderer, not on a
fast local GPU (local Metal runs hit 122/122 both before and after this
change even with MTL_DEBUG_LAYER_ERROR_MODE=assert forwarded), so the
local rebuild proves no-regression and that the texture still renders:
the suite reaches CN1SS:SUITE:FINISHED, DialogTheme starts and finishes
cleanly, both DialogTheme_light and DialogTheme_dark decode, the run is
122/122 with 0 missing, and the decoded DialogTheme_light still shows
the diagonal-stripe texture (42 distinct colours, i.e. the prepare()
fast path, not the 1-colour plain-fill fallback). The new
missing-screenshot CI guard guarantees a recurrence fails loudly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@shai-almog
Copy link
Copy Markdown
Collaborator Author

shai-almog commented May 31, 2026

Compared 11 screenshots: 11 matched.
✅ JavaSE simulator integration screenshots matched stored baselines.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Continuous Quality Report

Test & Coverage

Static Analysis

  • SpotBugs [Report archive]
    • ByteCodeTranslator: 0 findings (no issues)
    • android: 0 findings (no issues)
    • codenameone-maven-plugin: 0 findings (no issues)
    • core-unittests: 0 findings (no issues)
    • ios: 0 findings (no issues)
  • PMD: 0 findings (no issues) [Report archive]
  • Checkstyle: 0 findings (no issues) [Report archive]

Generated automatically by the PR CI workflow.

@shai-almog
Copy link
Copy Markdown
Collaborator Author

shai-almog commented May 31, 2026

Compared 122 screenshots: 122 matched.

Native Android coverage

  • 📊 Line coverage: 12.86% (7485/58203 lines covered) [HTML preview] (artifact android-coverage-report, jacocoAndroidReport/html/index.html)
    • Other counters: instruction 10.45% (37448/358362), branch 4.38% (1476/33728), complexity 5.47% (1777/32480), method 9.54% (1455/15258), class 15.56% (331/2127)
    • Lowest covered classes
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysKt – 0.00% (0/6327 lines covered)
      • kotlin.collections.unsigned.kotlin.collections.unsigned.UArraysKt___UArraysKt – 0.00% (0/2384 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.ClassReader – 0.00% (0/1519 lines covered)
      • kotlin.collections.kotlin.collections.CollectionsKt___CollectionsKt – 0.00% (0/1148 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.MethodWriter – 0.00% (0/923 lines covered)
      • kotlin.sequences.kotlin.sequences.SequencesKt___SequencesKt – 0.00% (0/730 lines covered)
      • kotlin.text.kotlin.text.StringsKt___StringsKt – 0.00% (0/623 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.Frame – 0.00% (0/564 lines covered)
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysJvmKt – 0.00% (0/495 lines covered)
      • kotlinx.coroutines.kotlinx.coroutines.JobSupport – 0.00% (0/423 lines covered)

✅ Native Android screenshot tests passed.

Native Android coverage

  • 📊 Line coverage: 12.86% (7485/58203 lines covered) [HTML preview] (artifact android-coverage-report, jacocoAndroidReport/html/index.html)
    • Other counters: instruction 10.45% (37448/358362), branch 4.38% (1476/33728), complexity 5.47% (1777/32480), method 9.54% (1455/15258), class 15.56% (331/2127)
    • Lowest covered classes
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysKt – 0.00% (0/6327 lines covered)
      • kotlin.collections.unsigned.kotlin.collections.unsigned.UArraysKt___UArraysKt – 0.00% (0/2384 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.ClassReader – 0.00% (0/1519 lines covered)
      • kotlin.collections.kotlin.collections.CollectionsKt___CollectionsKt – 0.00% (0/1148 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.MethodWriter – 0.00% (0/923 lines covered)
      • kotlin.sequences.kotlin.sequences.SequencesKt___SequencesKt – 0.00% (0/730 lines covered)
      • kotlin.text.kotlin.text.StringsKt___StringsKt – 0.00% (0/623 lines covered)
      • org.jacoco.agent.rt.internal_b6258fc.asm.org.jacoco.agent.rt.internal_b6258fc.asm.Frame – 0.00% (0/564 lines covered)
      • kotlin.collections.kotlin.collections.ArraysKt___ArraysJvmKt – 0.00% (0/495 lines covered)
      • kotlinx.coroutines.kotlinx.coroutines.JobSupport – 0.00% (0/423 lines covered)

Benchmark Results

Detailed Performance Metrics

Metric Duration
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 native encode 984.000 ms
Base64 CN1 encode 214.000 ms
Base64 encode ratio (CN1/native) 0.217x (78.3% faster)
Base64 native decode 868.000 ms
Base64 CN1 decode 220.000 ms
Base64 decode ratio (CN1/native) 0.253x (74.7% faster)
Image encode benchmark status skipped (SIMD unsupported)

@shai-almog
Copy link
Copy Markdown
Collaborator Author

shai-almog commented May 31, 2026

Compared 122 screenshots: 122 matched.
✅ Native Mac screenshot tests passed.

Benchmark Results

  • VM Translation Time: 0 seconds
  • Compilation Time: 119 seconds

Detailed Performance Metrics

Metric Duration
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 native encode 863.000 ms
Base64 CN1 encode 1356.000 ms
Base64 encode ratio (CN1/native) 1.571x (57.1% slower)
Base64 native decode 478.000 ms
Base64 CN1 decode 1092.000 ms
Base64 decode ratio (CN1/native) 2.285x (128.5% slower)
Base64 SIMD encode 417.000 ms
Base64 encode ratio (SIMD/native) 0.483x (51.7% faster)
Base64 encode ratio (SIMD/CN1) 0.308x (69.2% faster)
Base64 SIMD decode 462.000 ms
Base64 decode ratio (SIMD/native) 0.967x (3.3% faster)
Base64 decode ratio (SIMD/CN1) 0.423x (57.7% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 69.000 ms
Image createMask (SIMD on) 10.000 ms
Image createMask ratio (SIMD on/off) 0.145x (85.5% faster)
Image applyMask (SIMD off) 189.000 ms
Image applyMask (SIMD on) 92.000 ms
Image applyMask ratio (SIMD on/off) 0.487x (51.3% faster)
Image modifyAlpha (SIMD off) 180.000 ms
Image modifyAlpha (SIMD on) 93.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.517x (48.3% faster)
Image modifyAlpha removeColor (SIMD off) 212.000 ms
Image modifyAlpha removeColor (SIMD on) 85.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.401x (59.9% faster)
Image PNG encode (SIMD off) 1100.000 ms
Image PNG encode (SIMD on) 849.000 ms
Image PNG encode ratio (SIMD on/off) 0.772x (22.8% faster)
Image JPEG encode 429.000 ms

@shai-almog
Copy link
Copy Markdown
Collaborator Author

shai-almog commented May 31, 2026

Compared 122 screenshots: 122 matched.
✅ Native iOS Metal screenshot tests passed.

Benchmark Results

  • VM Translation Time: 0 seconds
  • Compilation Time: 348 seconds

Build and Run Timing

Metric Duration
Simulator Boot 85000 ms
Simulator Boot (Run) 1000 ms
App Install 16000 ms
App Launch 9000 ms
Test Execution 332000 ms

Detailed Performance Metrics

Metric Duration
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 native encode 1299.000 ms
Base64 CN1 encode 1932.000 ms
Base64 encode ratio (CN1/native) 1.487x (48.7% slower)
Base64 native decode 646.000 ms
Base64 CN1 decode 1476.000 ms
Base64 decode ratio (CN1/native) 2.285x (128.5% slower)
Base64 SIMD encode 656.000 ms
Base64 encode ratio (SIMD/native) 0.505x (49.5% faster)
Base64 encode ratio (SIMD/CN1) 0.340x (66.0% faster)
Base64 SIMD decode 599.000 ms
Base64 decode ratio (SIMD/native) 0.927x (7.3% faster)
Base64 decode ratio (SIMD/CN1) 0.406x (59.4% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 65.000 ms
Image createMask (SIMD on) 13.000 ms
Image createMask ratio (SIMD on/off) 0.200x (80.0% faster)
Image applyMask (SIMD off) 315.000 ms
Image applyMask (SIMD on) 129.000 ms
Image applyMask ratio (SIMD on/off) 0.410x (59.0% faster)
Image modifyAlpha (SIMD off) 150.000 ms
Image modifyAlpha (SIMD on) 81.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.540x (46.0% faster)
Image modifyAlpha removeColor (SIMD off) 244.000 ms
Image modifyAlpha removeColor (SIMD on) 89.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.365x (63.5% faster)
Image PNG encode (SIMD off) 1750.000 ms
Image PNG encode (SIMD on) 1359.000 ms
Image PNG encode ratio (SIMD on/off) 0.777x (22.3% faster)
Image JPEG encode 2517.000 ms

@shai-almog
Copy link
Copy Markdown
Collaborator Author

shai-almog commented May 31, 2026

Compared 121 screenshots: 121 matched.
✅ Native iOS screenshot tests passed.

Benchmark Results

  • VM Translation Time: 0 seconds
  • Compilation Time: 236 seconds

Build and Run Timing

Metric Duration
Simulator Boot 85000 ms
Simulator Boot (Run) 1000 ms
App Install 14000 ms
App Launch 15000 ms
Test Execution 326000 ms

Detailed Performance Metrics

Metric Duration
Base64 payload size 8192 bytes
Base64 benchmark iterations 6000
Base64 native encode 481.000 ms
Base64 CN1 encode 1326.000 ms
Base64 encode ratio (CN1/native) 2.757x (175.7% slower)
Base64 native decode 284.000 ms
Base64 CN1 decode 985.000 ms
Base64 decode ratio (CN1/native) 3.468x (246.8% slower)
Base64 SIMD encode 674.000 ms
Base64 encode ratio (SIMD/native) 1.401x (40.1% slower)
Base64 encode ratio (SIMD/CN1) 0.508x (49.2% faster)
Base64 SIMD decode 472.000 ms
Base64 decode ratio (SIMD/native) 1.662x (66.2% slower)
Base64 decode ratio (SIMD/CN1) 0.479x (52.1% faster)
Image encode benchmark iterations 100
Image createMask (SIMD off) 78.000 ms
Image createMask (SIMD on) 38.000 ms
Image createMask ratio (SIMD on/off) 0.487x (51.3% faster)
Image applyMask (SIMD off) 133.000 ms
Image applyMask (SIMD on) 70.000 ms
Image applyMask ratio (SIMD on/off) 0.526x (47.4% faster)
Image modifyAlpha (SIMD off) 146.000 ms
Image modifyAlpha (SIMD on) 74.000 ms
Image modifyAlpha ratio (SIMD on/off) 0.507x (49.3% faster)
Image modifyAlpha removeColor (SIMD off) 245.000 ms
Image modifyAlpha removeColor (SIMD on) 91.000 ms
Image modifyAlpha removeColor ratio (SIMD on/off) 0.371x (62.9% faster)
Image PNG encode (SIMD off) 1132.000 ms
Image PNG encode (SIMD on) 782.000 ms
Image PNG encode ratio (SIMD on/off) 0.691x (30.9% faster)
Image JPEG encode 426.000 ms

@liannacasper liannacasper merged commit c23bec8 into master May 31, 2026
23 of 24 checks passed
@liannacasper liannacasper deleted the ci/screenshot-strict-gating branch May 31, 2026 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants