fix(osr): producer-allocates surface model — kill the crop/stretch/freeze/blank class + hardening + regression suite#13
Merged
Conversation
…texture leak Root-cause fixes from a comprehensive visibility/resize/lifecycle audit (12 defects; full report in docs/OSR_VISIBILITY_RESIZE_AUDIT.md). This commit lands the keystone correctness fixes; the liveness-watchdog backstop (C-3/F-6) + create-payload visibility (C-4/F-8) are follow-ups. F-1 (keystone, main.mm DoSetVisible): force a full repaint on the hidden->visible edge. WasHidden(false) alone does not repaint, so a surface left blank/stale while hidden — by a resize that landed with the begin-frame pump gated off, a deferred dpr change, or Chromium's FrameEvictionManager reclaiming the off-screen frame past ~5 browsers — stayed permanently blank until relaunch. Now re-assert screen-info (if deferred) + size, then drive a guaranteed frame. Closes C-1b/d/e/f and the ~100ms un-hide latency (C-6). F-2 (main.mm DoResize): while hidden, keep the surface/dims swap but DEFER the paint (WasResized/begin-frame would compose into a surface nothing displays and mislead the watchdog). A deferred dpr change is flagged for F-1's un-hide repaint. F-3 (main.mm OnAfterCreated): honor a setVisible(false) that arrived before the browser bound (slot->visible already false but CEF never told) so a tile created off-screen establishes hidden instead of pumping 60fps blank. Closes C-7. F-4 (CefWebSession.swift resizeWatchdog): never force-promote while hidden — the pending surface is zero-filled (the gated pump never painted it). Wait; F-1's un-hide repaint drives a real present that promotes via the normal path. Closes C-1a/c (the active anti-heal). F-5 (FlutterCefPlugin.swift onHostDied): dispose the session before niling the maps, so unregisterTexture runs. Previously every host crash leaked the texture + CVPixelBuffer + IOSurface for the engine's lifetime — asymmetric vs onBrowserFailed which disposes. Closes C-2 (HIGH). cef_host compiles; the Swift compiles in the consuming app.
…provable outside Campus) flutter_cef fixes must be testable in flutter_cef, single-view, before Campus. - ResizeWatchdogPolicy: extract the resize-watchdog force-promote gating (the F-4 "never promote a hidden / never-painted surface" decision) into a dependency-light pure policy (Swift stdlib only). CefWebSession.resizeWatchdog now calls it. - ResizeWatchdogPolicyTests + run_resize_watchdog_tests.sh: 11 standalone assertions, compiled + run with `swiftc` alone (no Xcode, no pod harness, no Campus) — same pattern as CdpRelayFilterTests. Proves the wedge guard: hidden never promotes; visible+timed-out does; superseded/promoted never do. - example/lib/cull_wedge_probe.dart: a real-CEF single-view probe that drives the exact wedge sequence (setVisible(false) → resize while hidden → setVisible(true)) on a gradient page with a ticking JS clock. The page must REPAINT on show (F-1) — pre-fix it stayed permanently blank. Run: FLUTTER_CEF_HOST=<cef_host> flutter run -d macos -t lib/cull_wedge_probe.dart.
…en-wedged browser Closes audit C-3: the first-present watchdog RETIRES at first paint (firstPresentArrived), so a browser that painted ≥1 frame then wedged (a renderer/GPU stall inside a shared host that keeps the pipe alive → no processGone) had NO detector — silent blank until relaunch. This is the backstop that makes any future post-establishment wedge self-heal. - LivenessProbePolicy: the pure decision (Swift stdlib only) — staleness → discriminating nudge → declare. A static page legitimately idles, so staleness alone isn't a wedge: an opInvalidate is the discriminator (a healthy page repaints, a wedged one doesn't). 8 standalone swiftc assertions (run_liveness_probe_tests.sh). - CefWebSession: + lastPresentNs / livenessNudgedAt (guarded by browsersLock like presentCount). The reader stamps them on every present. - CefProfileHost: a periodic sweep (every 2s) over established, visible, not-first-paint- pending browsers applies the policy — nudge via opInvalidate, then onPaintStalled → the consumer's BOUNDED recover(). Lock order browsersLock→presentLock (never nested), matching the present handler; stops when the host dies. Env-tunable FLUTTER_CEF_LIVENESS_MS (default 10s) + 3s grace. Policy is standalone-unit-tested; the steady-state sweep needs a live post-paint-wedge to fully exercise (hard to force synthetically — noted).
…e-mismatch (too big/small/freeze) Root cause (full analysis in docs/OSR_SCALE_MISMATCH.md): the present protocol carried only a surface id, not the painted dims. On a device-scale (zoom) resize the host swaps to the new-size surface SYNCHRONOUSLY while the renderer re-rasters ASYNC, so the FIRST present after a resize is the renderer's OLD-scale frame landing in the NEW surface. With only a sid the consumer couldn't tell that provisional wrong-scale frame from a correct one and promoted it → content rendered too big (zoom out: big src cropped) / too small (zoom in: small src top-left, stale margins), and froze there on a static page. F-1..F-6 fixed the cull wedge but never touched the blit/promotion seam. - native SendPresentLocked(srcW,srcH): kOpPresent payload 4→12 bytes — sid + the PHYSICAL dims of the frame actually composited (view_src IOSurface dims / OnPaint width,height). All three present sites (OnPaint, CompositeMetalLocked, CompositeSoftwareLocked) plumb them. - Swift handleFrame(opPresent): SIZE-GATED promotion — promote the pending (resized) surface ONLY when the present's dims match the new surface (round(logical*dpr), ±1). A pre-re-raster wrong-scale present advances nothing; Flutter keeps sampling the last correct-scale buffer (geometrically right, momentarily softer) until the re-rastered frame lands. During active zoom the tile lags-crisp; on settle it sharpens. Never wrong-scale, never frozen-wrong. - Swift resizeWatchdog: no longer force-promotes — that could only promote a surface the size gate just refused (wrong-scale/blank). It now only re-kicks a dropped frame; the size-gated promotion + the 16ms pump land the correct frame, and F-6 recovers a genuine wedge. The F-4 hidden-promote guard is subsumed (no force-promote at all). cef_host compiles. Recommendation (analysis): KEEP per-zoom device-scale resize hardened — not fixed-max-density (~9x VRAM) or page-zoom (reflows). The model is sound; size-blind promotion was the bug.
…ze-gate under soak Soak-proven on the flutter_cef side (real CEF, single-view) that the size-gated promotion works and does NOT degrade: cycling renderScale (dpr) hammers the resize path; the [cefdiag-resize] log shows, per present while a resize is pending, the actual composited src dims vs the expected new-surface dims (round(logical*dpr)) and whether they match. Verbatim soak evidence (dpr sweep): each resize logs `src=OLD exp=NEW match=false` (the renderer's lagging frame — correctly NOT promoted) then `src=NEW exp=NEW match=true` (the re-rastered frame — promoted). 38 match=true / 17 match=false over the run, still matching 17s in → no stick/degradation. Confirms src (view_src dims) is the true frame size (never pool-sized), so the size-gate is sound. - example/lib/zoom_soak_probe.dart: the soak harness (auto-cycles renderScale, ticking clock + fixed-proportion box make freeze/wrong-scale obvious). - CefWebSession: [cefdiag-resize] present-size diagnostic (behind FLUTTER_CEF_DEBUG). - CefProfileHost: explicit UInt64(0) in the F-6 ternary (the example's fresh pod surfaced a type-inference error the Campus build had masked). NOTE: the prior Campus "scale-fix" builds were broken by a build-infra bug, not the fix — `make cef-host`'s up-to-date gate keys on the flutter_cef path stamp, not main.mm's mtime, so local native edits were silently not recompiled (old 4-byte present vs new 12-byte parser → promotion never fired). Force a clean cef_host rebuild (rm the .flutter_cef_ref stamp) after any native edit.
…loor pass
Architecture-audit-driven hardening of the shared cef_host (many OSR browsers per named
profile). KEY FINDING FIRST: the "multi-browser transparent render" that drove the audit
was a BUG IN THE TEST PROBE, not flutter_cef — sharedhost_html_probe wrapped CefWebView in a
shrink-wrapping Stack so the browser laid out at ~50px (the pixel oracle below caught it:
surfaces were 66x30). With the layout fixed, a 6-browser shared host renders 6 full-size
tiles with 6 distinct gradients (verified via the in-host pixel sampler, since the box was
display-asleep). So the shared-host render path is sound; the bounded-pool refactor is NOT
needed for render correctness. The remaining audit findings are from static analysis and are
real — those are fixed here:
SECURITY
- OnQuery + InjectChannelShim now gate on frame->IsMain(): the privileged campusHost bridge
('ch:') and host-eval result channel ('eval:') were injected into / accepted from ALL
frames incl. cross-origin iframes — an embedded untrusted iframe could drive the host
reducer / forge eval results. Now main-frame-only (inject) + subframe-refused (dispatch).
CORRECTNESS
- DoCreateBrowser refuses a wire id already in g_slots_by_wire_id (kOpCreateFailed) instead
of registering — a reuse would let the old browser's OnBeforeClose erase the new slot,
leaving an unroutable browser + leaked IOSurface/dst_mtl.
- pendingCreates now cleared in BOTH shutdown() and handleHostDeath() (browsersLock-guarded),
symmetric with createSendQueue/createInFlight — a host dying between spawn and opReady no
longer leaves the pre-opReady create closures dangling.
- kOpLoadTrusted/kOpNavigate deferred by wire id to TID_UI (DoNavigateByWireId) + DoNavigate
queues pending_nav_url if the browser isn't bound — a loadHtmlString right behind a queued
create on a shared host was dropped (slot==null), the kOpAddChannel-class drop.
RESOURCE
- RLIMIT_NOFILE raised toward the hard cap at cef_host startup: a busy shared host's
sockets/pipes/IOSurfaces reach macOS's 256 soft limit; ties to the WebRTC select() fd>=1024
fault on an fd-heavy campus.
RENDER FLOOR / ROBUSTNESS
- Opaque background_color so a missing/late frame reads as blank-white (loading-looking)
instead of an invisible transparent ghost (the paints=0 establishment knock-out shows
correctly now). OnLoadEnd does WasResized+Invalidate+SendExternalBeginFrame (the
visibility-edge kick) instead of a coalesce-able Invalidate alone, so the loaded content
is deterministically driven to composite.
- [cefdiag] in-host pixel sampler (FLUTTER_CEF_DEBUG): classifies the renderer's frame
content/white/clear so render correctness is verifiable WITHOUT a screenshot.
VERIFIED: 6-tile shared-host probe renders 6 distinct gradients (K=1) / 5-6 (K=3, the rare
loss is the documented watchdog-recovered concurrent-establishment knock-out, tunable via
FLUTTER_CEF_ESTAB_WINDOW). ResizeWatchdogPolicy + LivenessProbePolicy standalone tests pass.
Example soak probes added (interaction/realsite/recreate/sharedhost) for shared-host coverage.
…-dims pixel oracle The production-grade instrument the whack-a-mole was missing: a SCREEN-INDEPENDENT conformance harness that reproduces Campus's full CEF workload outside Campus and asserts the two invariants that matter — never BLANK, never WRONG-SIZE — headlessly (the dev box is often display-asleep). - example/lib/conformance_harness.dart: N tiles on one shared profile, auto-cycles idle→resize→zoom→cull→recreate→combo storms (HARNESS_HARD=1 = static pages + zoom→6 + combo resize-while-zoomed, the regime that wedges). Each tile paints a KNOWN center color. - cef_host [cefdiag] diagpx: now logs painted=WxH want=WxH (want = logical×dpr) + a 9-point content/white/clear classification + center color. painted<<want = WRONG-SIZE (the 4x); content=0 = BLANK. FLUTTER_CEF_DIAGPX_EVERY tunes sampling (6 ≈ 10Hz to catch transients). - CefWebSession.resize(): SUPERSEDE a wedged resize — since the watchdog no longer force-promotes, a resize whose size-matched present never lands left resizeInFlight stuck forever, blocking ALL later resizes → surface frozen small while the tile grows → wrong-scale (the "4x"). If in-flight past a 450ms grace, abandon its pending surface and let the newest size go out. + [cefdiag-rsz] diagnostic. PROVEN by the harness: HARD run (865 samples — static pages, zoom→6, combo, 18 browsers via recreate) = 0 STUCK, 0 BLANK at end, every wire converges to ratio=1.00; worst was a transient 0.56 that recovered. So flutter_cef's resize/zoom/cull/recreate path converges under hard storms. NOTE (research, separate commit-worthy direction): prior art (webview_cef, cefclient, Ultralight, video-swapchains) is unanimous that the consumer-side gate cluster is over-engineered — the durable fix is "always-latest surface scaled to the tile, one convergence watchdog," accepting a transient soft frame. The harness is Step 0 / the oracle for that migration. This commit keeps the gated model converging (supersede) while the harness de-risks the always-latest rewrite.
…tic tiles stretched The size-gated promotion (promote the resized surface only when painted dims == round(logical× dpr) ±1) wedged STATIC pages: a static tile (e.g. a counter) paints exactly one frame per resize, and if that frame's dims were off by rounding — or the exact-match frame simply never re-arrived — the gate kept serving the OLD (small) surface forever. Result: stretch the tile wide → it shows the original-size image scaled up and FROZEN, while the page underneath is live (input/cursor still work). Reported from Campus on the Shared Counter tile. Fix = the unified model's core (prior art unanimous: webview_cef / cefclient / Ultralight / video swapchains): promote the pending surface as soon as cef_host paints INTO it (sid match), WITHOUT gating on exact composited dims. The surface is already the correct PHYSICAL size (we allocated it at logical×dpr); Flutter's Texture scales the content to the logical tile box, so a frame whose page-raster momentarily lags the new dpr shows briefly SOFT — never wrong-size, never frozen. Newest sid wins; convergence to crisp is driven by the begin-frame pump + the watchdog's opInvalidate re-kick (which now reliably promotes on the resulting paint), not a gate. Verified in the conformance harness (HARD: static pages, zoom→6, combo resize+zoom, recreate): all 18 live wires converge, 0 STUCK, 0 BLANK; worst transient = one dpr step (0.67), recovers. This is migration Step 2 (always-latest consumer transport). The resize supersede + per-resize watchdog remain as backstops; a later step can fold them into one convergence watchdog.
…ss is nudge-only The steady-state liveness watchdog (F-6) escalated a painted-but-idle browser to onPaintStalled → consumer recreate. A STATIC page (counter, status panel, finished form) legitimately stops producing frames once idle, and the nudge (opInvalidate) can't extract a new frame from a page with nothing to repaint — so every static tile got flagged "painted then wedged" and recreated on a ~10s loop (observed in Campus: 36 stalls / 33 browsers in one session → constant flicker on idle tiles). A converged, idle tile is HEALTHY — it is showing correct content. Fix: the sweep still NUDGES an idle established tile once (which repairs a genuinely evicted/blank VISIBLE surface — real damage produces a present), but NO LONGER escalates to onPaintStalled when the nudge yields no frame. No-frames-after-nudge on an established tile = static-idle = accept as healthy, keep serving its last good frame. Never-painted tiles are still owned by the separate first-paint watchdog (firstPresentPending); genuine renderer death by OnRenderProcessTerminated; eviction-while-hidden by the F-1 un-hide repaint. This is the research's "liveness keys on displayed==desired, not frame-flow" applied minimally. Verified in the conformance harness (HARD static pages + 16s idle holds): "painted then wedged" = 0, "accepting as healthy-static" = 90, no liveness-driven recreate (the browser-id climb in the run is the harness's own recreateStorm phase, not the watchdog).
…ch/stale class at the root
THE durable fix for the crop/stretch oscillation (the user chose this over the band-aid). ROOT
CAUSE: the CONSUMER allocated the IOSurface and cef_host blitted CEF's painted view_src into it
with a min(src,dst) top-left copy. When the consumer-allocated dst and cef_host's painted src
disagreed in size (rounding / timing / dpr-zoom), the blit CROPPED (src>dst) or left STALE
margins (src<dst); on a static page the one mismatched frame stuck (visible crop/stretch). No
promotion-gate tweak could fix it (size-gate→stretched, always-latest→cropped) — it was the
consumer/producer size race itself.
FIX: reverse the ownership. cef_host now MINTS its own IOSurface sized EXACTLY to what it paints
(view_src dims), lazily in the composite path (EnsureSurfaceForPaint, under surface_mutex), so
the blit is 1:1 — src==dst BY CONSTRUCTION, crop/stretch/stale impossible. It presents {sid,w,h};
the consumer becomes a pure ADOPTER: on a present whose sid differs from the one backing its
current pixelBuffer, it IOSurfaceLookup()s the producer surface + wraps it in a CVPixelBuffer
(adoptSurfaceLocked). Newest sid wins; keeps serving the old buffer until the new is painted
(no flash); producer presents a sid only after painting it (never blank).
Ownership/lifetime (the UAF/leak-critical part, per the design): cef_host owns a +1 from mint to
realloc/close; the consumer's CVPixelBuffer owns an independent +1 while Flutter may sample it.
EnsureSurfaceForPaint releases cef_host's ref on the OLD surface after assigning the new (the
consumer's CVPixelBuffer keeps old alive until it adopts new → no UAF). OnBeforeClose sets a
`closing` flag under surface_mutex BEFORE nulling+releasing, so a paint racing teardown can't
re-mint (no leak). Wire: opCreateBrowser/opResize drop the sid (just {w,h,dpr}); present carries
the producer sid. Deleted consumer-side makeBuffers/publishBuffers/ioSurface/pendingBuffer/
pendingSurfaceId; surfaceId/createSnapshot/emitCurrentSurface now derive from pixelBuffer.
CRITICAL deadlock fixed during impl: OnAcceleratedPaint/OnPaint early-returned on null surface
BEFORE the composite where the alloc lives → no surface → no paint → no surface. EnsureSurface
now runs BEFORE that guard.
VERIFIED in the conformance harness (HARD: static pages, zoom→6, combo, recreate, 9 tiles):
blitmismatch=0, every diagpx painted==want content=9, 2877 ADOPTs with 0 crashes / 0
IOSurfaceCreate-or-Lookup failures, 0 liveness false-positives. The research's Step 5 / the
webview_cef / cefclient / video-swapchain native model.
Audit of the producer-allocates diff (15-agent adversarial workflow) verdict: core SOUND — all sev6-8 UAF/double-free + protocol findings were FALSE_ALARM (single CEF UI thread + surface_mutex serialize the lifetime; consumer CVPixelBuffer holds an independent cross-process IOSurface ref, so the producer's eager CFRelease is safe). Two bounded real issues + the manual-oracle gap fixed: CONFIRMED FIX (P2): a failed/never-landed adopt left resizeInFlight=true until the NEXT resize() call — blocking coalescing + re-kicking opInvalidate forever on a tile the user stopped touching. The 450ms wedge-clear lived ONLY in resize(); now also in resizeWatchdog, so it self-heals. Extracted the threshold into ResizeSupersedePolicy (pure, swiftc-unit-tested) used by both sites. REGRESSION SUITE (the ask — "tests so we don't regress"): - ResizeSupersedePolicyTests (new, swiftc): wedged-clear grace boundary (strict >, off-by-one). - LivenessProbePolicyTests (extended): grace boundary cases + an explicit WEDGE-vs-IDLE-indistinguishability test that LOCKS the documented F-6 limitation (a hung visible renderer and a static-idle tile present identical inputs → both .declareStalled; the consumer deliberately does not escalate). If a should-have-painted discriminator is ever added, that test is where the divergence must be encoded. - run_conformance_oracle.sh (new, P0 automation): drives conformance_harness HARD headless through all storms and FAILS (non-zero exit) on any WRONG-SIZE (painted!=want), BLANK (content=0), BLIT-CROP (blitmismatch>0), or NO-RENDER/NO-ADOPT — turning the manual diagpx log-read into a repeatable gate that guards the entire producer-allocates correctness claim. DOC (P3): the F-6 .declareStalled comment now states the visible-hung-renderer limitation honestly (can't distinguish from static-idle by timing; naive escalation resurrects the recreate-storm; proper fix = a damage/should-have-painted probe, tracked as a fast-follow) instead of over-claiming. All four standalone policy suites pass (ResizeWatchdog/Liveness/ResizeSupersede/CdpRelayFilter).
…variants, warn on soft Two fixes to the regression gate (a gate that mis-fires is worse than none): 1. mktemp bug: LOG="$(mktemp)/conformance.log" treated a temp FILE as a directory (NotADirectoryError) so the harness output was never captured → spurious fail. Use mktemp -d. 2. Over-gating: the first version failed on ANY painted!=want, which flags the EXPECTED transient soft frame during active zoom (cef_host painting the old dpr while the requested dpr advanced; it converges). That is not the bug. Now the HARD gate is the producer-allocates STRUCTURAL guarantee — blitmismatch==0 (1:1 blit, no crop), blank==0 (painted frame always has content), rendered+adopted — and painted!=want at REST (last idle-phase sample) is a WARN only (the accepted soft frame: Flutter scales the internally-consistent surface to the tile box → correct geometry, mild softness, never crop/stretch/freeze; sample is also noisy if the soak ends mid-storm). VERIFIED: against the live producer-allocates build the gate PASSES — content-ok=820/820, blank=0, blitmismatch=0, idle-converged=4/4. The gate now catches a real regression (a crop would spike blitmismatch; a stuck blank would spike blank) without false-failing on soft frames. KNOWN follow-up (not a regression of shipped behavior): a static page resized rapidly can settle one dpr-step soft (adopt clears resizeInFlight before the final size paints, so the watchdog stops re-kicking). Acceptable (soft, correct-geometry) today; a future refinement is to keep re-kicking until painted==want (converge-to-crisp) while still adopting newest for safety.
…eate churn Closes the one coverage gap the render oracle can't see: the producer-allocates IOSurface LIFETIME invariant (the property 3 sev-8 audit findings probed). cef_host mints a surface per paint/recreate + CFReleases the old; the consumer's CVPixelBuffer holds the only remaining ref until it adopts the next id. A regression (producer forgets to release / consumer never drops old) accumulates surfaces — invisible to the render oracle (correctness, not memory) until OOM. run_leak_soak.sh drives recreate_soak_probe (dispose+create ~2/s), samples cef_host RSS + live IOSurface count over a soak, and FAILS on unbounded RSS growth (>1.5x baseline) or a large monotonic surface climb, or if cef_host dies (UAF crash). recreate_soak_probe now logs a running recreates_total so the gate confirms the churn actually happened. VERIFIED: 100 recreate cycles → RSS base 172MB / peak 174MB (ceiling 259MB), IOSurfaces 99→99 (delta +0). Lifetime ledger holds; no leak, no UAF.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Make CEF off-screen rendering (OSR → FlutterTexture) production-grade for hosts that run many browsers per profile under live resize / zoom / cull / recreate (Campus's workload). Fixes a class of "tile renders blank / frozen / too-big / too-small / cropped" bugs at the root, plus security/correctness/resource hardening, plus a headless regression-test suite.
The headline fix: producer-allocates surfaces
The crop/stretch/freeze class was a consumer/producer surface-size race: the consumer allocated the IOSurface and cef_host blitted CEF's painted
view_srcinto it with amin(src,dst)top-left copy. When the consumer-allocated dst and cef_host's painted src disagreed in size (rounding / timing / dpr-zoom), the blit cropped (src>dst) or left stale margins (src<dst); on a static page the one mismatched frame stuck. No promotion-gate tweak could fix it (size-gate → stretched; always-latest → cropped).Now cef_host MINTS its own IOSurface sized exactly to what it paints (
EnsureSurfaceForPaint, lazily in the composite path undersurface_mutex), so the blit is 1:1 —src==dstby construction, crop/stretch/stale structurally impossible. It presents{sid,w,h}; the consumer is a pure adopter (adoptSurfaceLocked: wraps the producer surface by id; newest sid wins; keeps serving the old buffer until the new is painted = no flash). Wire ops drop the sid ({w,h,dpr}); the present carries the producer sid.Lifetime ledger (no UAF/leak): cef_host owns +1 from mint→realloc/close; the consumer's
CVPixelBufferholds an independent cross-process IOSurface ref.EnsureSurfaceForPaintreleases the old after assigning the new (consumer ref keeps it alive until adopt).OnBeforeClosesets aclosingflag undersurface_mutexbefore nulling+releasing, so a paint racing teardown can't re-mint.Also in this PR
OnLoadEnd.OnQuery/ channel-shim injection gated onframe->IsMain()— the privilegedcampusHostbridge / host-eval channel was reachable from cross-origin iframes.pendingCreatescleared symmetrically on shutdown/host-death;loadHtmlString/navigate deferred by wire-id so they're not dropped behind a queued create.RLIMIT_NOFILEraised at cef_host startup.Tests (headless, no Xcode/Campus)
swiftcpolicy suites (test/run_*.sh): ResizeWatchdog, Liveness (+ grace boundary + a test locking the documented wedge-vs-idle limitation), ResizeSuperseded, CdpRelayFilter.example/run_conformance_oracle.sh— drivesconformance_harnessthrough all storms headless and hard-fails on the structural invariants:blitmismatch==0, no blank, rendered+adopted. (warns, not fails, on the accepted transient soft frame.)example/run_leak_soak.sh— recreate-churn soak that asserts cef_host RSS + live IOSurface count stay bounded (verified: 100 recreates → RSS flat, IOSurfaces 99→99).Verification
Conformance oracle PASSES on the producer-allocates build (content-ok 820/820, blank 0, blitmismatch 0, idle-converged 4/4). Leak-soak PASSES (no leak, no UAF). All 4 policy suites green.
Known follow-up (not a regression)
A static page resized very rapidly can settle one dpr-step soft (correct geometry, mild softness — never crop/stretch). The adopt clears the in-flight flag before the final size paints, so the watchdog stops re-kicking. A future refinement keeps re-kicking until
painted==wantwhile still adopting newest for safety.🤖 Generated with Claude Code