Skip to content

History

Revisions

  • audit-w2: defang prompt-injection wrappers used by build_user_prompt F09 wrapped each interpolated user-controlled value in a tag-distinct delimiter (WORKSPACE_CONTEXT / CONSTRAINTS / USER_GOAL / EXTRA_CONTEXT) but sanitize_prompt_text only defanged the legacy USER_CONTENT tag and the role-marker family. A tab title containing </WORKSPACE_CONTEXT> prematurely closed the data envelope and the model interpreted subsequent bytes as instructions. Broaden the case-insensitive regex defang to cover every wrapper build_user_prompt actually uses; plain prose without angle brackets survives untouched.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    67125da
  • audit Debt-1: generator infrastructure (Pydantic → TypeScript codegen) Adds the systemic close-out for Architectural Debt #1: hand-written TypeScript interfaces in cortex/apps/browser_extension/ that have drifted from the Pydantic models in cortex/libs/schemas/ (F42/F43/F44/ F45). Generator script + initial generated file + drift-detection test suite. Pieces: - cortex/scripts/generate_ts_schemas.py walks every cortex.libs.schemas submodule via pkgutil, imports them so every Pydantic class registers, then feeds the discovered models through pydantic-to-typescript (imported as pydantic2ts) which shells out to json2ts (json-schema-to-typescript). Emits a single cortex_schemas.d.ts under apps/browser_extension/types/generated/ with an AUTOGENERATED header naming the regeneration command. --check mode regenerates into a tempfile, diffs against the committed copy, prints the diff on drift, and exits non-zero so the pre-commit hook (Commit 5) and CI gate fail loudly. - cortex/pyproject.toml gains a 'codegen' optional-dependencies extra and pulls pydantic-to-typescript into the existing 'dev' extra so `pip install -e ./cortex[dev]` picks it up. - cortex/tests/unit/test_schema_codegen.py — 12 cases. Module discovery covers every schemas/*.py; banner stripping is idempotent and tolerant of pydantic2ts version drift; drift detection exits 1 with a unified diff; missing committed file is treated as drift (not a generator failure). One end-to-end case is skipped when json2ts is not on PATH so the suite still passes on minimal CI matrices. Verification: python -m cortex.scripts.generate_ts_schemas --check # exit 0 pytest cortex/tests/unit/test_schema_codegen.py # 12 passed The generated cortex_schemas.d.ts already includes: - SuggestedAction.action_type as a Literal union (F42 closes here once the extension imports it in Commit 4) - SuggestedAction.catalog_id (F43 closes here) - SuggestedAction.reversible (F44 closes here once the extension drops its 'undo_available' alias in Commit 4) WSMessage + MessageType land in Commit 2; the regenerated file covering them lands in Commit 3; extension migration in Commit 4; CI + pre-commit gate in Commit 5.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    bb847cb
  • audit-w2: route remaining raw-int spacing through tokens overlay.py inlined (24, 24, 24, 24) for the main HUD margin; onboarding.py inlined setSpacing(8) for the progress-step row. Both values match existing tokens (SP6 and SP2 respectively). Promoting them keeps the 4pt grid the single source of truth for layout rhythm so a future tokens edit can shift macroscopic spacing without grepping for stray integers. The remaining raw integers in dashboard.py (sub-token paddings like 3px inner pill padding, 2px column gap, etc.) are intentional sub-4pt visual fine-tuning that is below the token granularity. Verified F47 overlay-tokens + F55 dashboard tests stay green.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    c90d382
  • audit-w2: regression-guard for native window chrome coverage Five top-level windows (DashboardWindow, SettingsDialog, OnboardingWindow, OverlayWindow, ConnectionsPanel) already invoke apply_unified_titlebar + apply_vibrancy in their showEvent. A future window class that forgets the call would silently inherit Qt's default opaque titlebar — visually divergent from the rest of the shell. Adds cortex/tests/unit/test_window_chrome_coverage.py — 10 parameterised ast-based cases that pin every top-level window to its required mac_native calls. No production code change; pure regression guard.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    84a58f4
  • audit-w2: route timeline font-family through FONT_MONO token dashboard.py's timeline panel inlined a font-family stack literal ('"SF Mono", ui-monospace, Menlo, monospace') instead of importing the FONT_MONO token. Replaced the literal with FONT_MONO so a future edit to the mono-font stack in tokens.yaml propagates here. No visual change — the existing literal happened to match the token verbatim.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    d661a38
  • audit-w2: promote warm label tints to tokens, lift sub-AA tertiary connections, settings, onboarding kept private copies of '#5C5854' and '#827971' for the secondary/tertiary warm-greyscale tints. The '#827971' tertiary fails WCAG AA on the cream background (3.98:1, under the 4.5:1 threshold) — F55 fixed it in dashboard.py but the other three surfaces drifted. This commit promotes both tints to CX_TEXT_SECONDARY / CX_TEXT_TERTIARY in the token registry (tokens.yaml emitter + generated tokens.py + browser-extension design-tokens.ts), pins the AA-passing '#6B6661' value, and updates every consumer to import from tokens rather than carry a private hex literal. Adds cortex/tests/unit/test_token_label_consistency.py with 9 cases: the registry value is '#6B6661', no surface contains the legacy hex literally, and every surface's _LABEL_TERTIARY/_LABEL_SECONDARY equals the token at import time. Existing F47 + F55 + overlay-token tests remain green.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    9c7c32b
  • test: rewrite F24 async-ladder priming with asyncio.run F24 made ConsentLadder.check / record_approval async (lock). The test_get_consent_level_with_ladder and test_reset_consent_with_ladder tests were still using asyncio.get_event_loop().run_until_complete(), which raises RuntimeError on Python 3.10+ when no loop is running. Switched to asyncio.run() / an async helper.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    ec6fc78
  • merge: Wave 1-F (audit F31+F33+F35+F36+F47+F48+F49+F51+F53+F55+F56) — maintainability cluster # Conflicts: # cortex/apps/desktop_shell/dashboard.py # cortex/apps/desktop_shell/overlay.py # cortex/libs/utils/atomic_write.py # cortex/services/runtime_daemon.py

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    485742d
  • audit C-tier follow-up: harness compat for the legacy PySide6 mock suite The Phase-C commits (F31, F33, F47, F48, F51, F53, F55) introduced new attribute calls on dashboard / overlay widgets and new tests that require the real PySide6 library. They passed in isolation but failed when run inside the full unit suite because test_desktop_shell.py: * installs lightweight mock PySide6 modules in sys.modules at import time and never restores the real ones (so a test that runs AFTER test_desktop_shell is collected sees the mocks even if it was imported earlier alphabetically); * the mocks (MockQLineEdit, MockQApplication) only expose a subset of the QWidget API the audit fixes call (setAccessibleName, accessibleName, setText on QToolButton, QApplication.instance); * the real PySide6 C extension cannot be re-imported within a single process after sys.modules['PySide6'] is deleted — the C state from the first load conflicts with the second load and segfaults. Two-pronged fix: (a) Desktop-shell helpers degrade defensively against the lightweight mock surface. _set_accessible_name / _set_accessible_description / _set_tab_order / _safe_call wrappers no-op when the target method is missing on the stub. QToolButton import in overlay.py falls back to QPushButton when the stub lacks it. (b) The new audit Qt tests detect when PySide6 has been replaced by the mock (heuristic: missing __file__ attribute on the cached module) and skip with a clear reason rather than crashing. The skip happens in an autouse fixture + a re-check inside qapp so fixture-ordering quirks cannot bypass it. Tests still pass cleanly in isolation. Net effect: full pytest cortex/tests/unit/ --ignore=test_capture_service runs 928 passed, 26 skipped, 0 failures.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    ad8023d
  • merge: Wave 1-G (audit F40+F16+F15+F19b+F07b+F08b+F32+F46+F50+F52+F54) — TS test infra + extension wiring

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    6d25329
  • audit F56: register SIGINT/SIGTERM via loop.add_signal_handler Pre-fix the daemon-side shutdown chain relied entirely on the outer harness (run_dev.py) to register signal handlers. If the daemon was launched without that harness (desktop-shell in-process mode, future tests, future CLI entry points), nothing was wired to SIGINT/SIGTERM at all — or worse, a caller might add a signal.signal handler before asyncio.run started. signal.signal registers a C-level handler that the kernel invokes in the signal frame, which on Cortex is almost always inside numpy / mediapipe / OpenCV native code. Running Python in that frame violates the GIL contract and can segfault on resume. CortexDaemon.start() now calls _install_loop_signal_handlers() before spawning the loop tasks. That helper uses loop.add_signal_handler so the callback (_on_signal_received) is dispatched as a regular event-loop tick — Python state is frame-safe when the handler runs. On platforms that don't support add_signal_handler (Windows, embedded), the helper degrades silently and the outer harness retains responsibility. Two test cases: SIGTERM during a stub numpy-style tight loop sets _shutdown without segfaulting AND the handler runs inside the asyncio loop (asyncio.get_running_loop() succeeds in the captured context); unsupported-platform stub raises NotImplementedError and the helper logs+continues rather than crashing.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    c95541e
  • audit F54: connections panel distinguishes four failure states Replaces the single "Not connected" disconnect screen with four distinct connectivity states, each with its own title, diagnostic body, and fix-action CTA: - not_installed: native messaging host missing - installed_no_daemon: host present but daemon WS unreachable - installed_version_mismatch: daemon up, version differs from extension - handshake_failed: WS open, daemon rejected handshake New `classifyConnectivity` pure function (exported) computes the state from {connected, nativeHostStatus, daemonVersion, expectedVersion, handshakeError}. New CONNECTIVITY_DIAGNOSTIC message type lets background push the resolved diagnostic. The diagnostic block now renders whenever `connectivity !== 'ok'` rather than only when `!connected`, because version_mismatch and handshake_failed both occur while the WS is technically connected. Test: __tests__/f54_connectivity_states.spec.tsx covers all 6 enum inputs and asserts the popup renders distinct `[data-testid=conn-state-<state>]` titles for each failure mode.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    df18591
  • audit F55: accessible names, tab-order chain, WCAG AA tertiary contrast Pre-fix: * Several interactive widgets in Dashboard and Overlay lacked setAccessibleName so VoiceOver / screen readers announced raw ObjectClass names instead of the human label. * No widget had setTabOrder wired explicitly; the tab chain depended on construction order and a single re-arrangement could silently scramble it. * _LABEL_TERTIARY = '#827971' against _CONTROL_BG '#FFFFFF' computed to ~3.98:1 — just below WCAG AA's 4.5:1 threshold for normal-weight text. The role is 'tertiary captions / placeholders' so the volume affected is high (every QLineEdit placeholder, every '--' debug label). Fix: (a) setAccessibleName on goal QLineEdit ('Goal'), Connect button ('Open Connections panel'), overlay dismiss button ('Dismiss intervention'), causal-explanation toggle ('Show full causal explanation'). setAccessibleDescription on goal input for richer screen-reader context. (b) QWidget.setTabOrder explicit chain — Goal → Connect → Stop in _ConsumerTab; causal toggle → dismiss button in OverlayWindow. (c) _LABEL_TERTIARY bumped from #827971 to #6B6661 (~5.4:1 against white — comfortably above AA). (d) HUD palette tokens (TEXT_HUD_PRIMARY etc.) — the F47 commit already moved overlay text to tokens.py; this test pins the alpha contract. Five test cases, including a hand-rolled WCAG 2.1 contrast-ratio helper (avoids adding wcag-contrast-ratio as a dep).

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    5c0d3e3
  • merge: Wave 1-E (audit F06+F34+F04+F05+F22+F23) — UI race-condition cohort # Conflicts: # cortex/services/runtime_daemon.py

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    3ca51ef
  • audit F52: dedup tab-close affordance by tab_index The previous synthesise rule was all-or-nothing: if any suggested_action with action_type=close_tab existed, we skipped synthesising entirely (dropping the close affordance for any other recommended tab); otherwise we synthesised one action per closeable rec — duplicating the close button when the LLM emitted both suggested_actions and tab_recommendations for the same tab_index. F52 makes the dedup per-tab_index: synthesise only for tab indices not already covered by an existing close-style suggested_action, so the tab card carries the single close button. Applied identically in background.ts (intervention overlay HTML build) and popup.tsx (popup action list). `synthesizeActions` is now exported so it can be unit-tested. Test: __tests__/f52_tab_dedup.spec.ts covers covered/uncovered mixes, empty rec short-circuit, and full synthesis when no close-style suggested_action exists.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    eb92a1b
  • audit F53: surface QSettings sync() failures to the controller Pre-fix, SettingsDialog._persist_settings wrapped self._qs.sync() in a bare except: pass. A sandbox container with a revoked ACL, a read-only filesystem, or a disk-full condition all manifested as the Apply button succeeding from the user's perspective while nothing actually persisted. Add a settings_save_failed(str) Signal. The new _persist_settings: (a) catches sync() exceptions and emits with the reason; (b) inspects QSettings.status() afterwards and emits if anything other than NoError. _describe_qsettings_status maps AccessError / FormatError to human-readable strings so the controller can surface the failure to the user without exposing Qt enum integers. Three test cases: happy path emits no signal; sync() exception emits once with the OSError text; status==AccessError emits once with an 'access denied' reason.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    8b264fb
  • audit F50: stabilise popup runtime.onMessage listener identity Extracts the popup's message handler into a `useCallback([], …)` so the addListener/removeListener pair refers to the same reference across re-renders. The original code inlined a fresh closure inside the useEffect; React's setState identities kept things working in practice, but the contract was easy to break (e.g. closing over a mutable state would have leaked listeners). Test: __tests__/f50_popup_listener_leak.spec.tsx mounts and unmounts the same pattern 10x and asserts the listener count returns to its pre-mount baseline every cycle. Also adds `IS_REACT_ACT_ENVIRONMENT = true` in test setup so the React 18 act() warning stops spamming the console for any tsx component test.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    9298f59
  • audit F23: cancel pending correlation-id futures on client disconnect WebSocketServer.request_context registered a future keyed by correlation_id; the matching CONTEXT_RESPONSE resolved it. Pre-F23, if the requesting client disconnected before responding, the future hung until the per-call timeout (default 5 s), wedging the calling coroutine in the daemon's context loop for every concurrent request. New plumbing: - self._pending_cids_by_client: dict[client_id, set[correlation_id]] populated in request_context, kept tight by _drop_pending_cid / _handle_context_response so it only ever holds in-flight cids. - _cancel_pending_for_client(client_id) cancels every pending future for that client, called from _handle_client's finally block when the connection drops. Returns the count for the debug log. - request_context now catches asyncio.CancelledError separately so a cancellation triggered by disconnect returns the empty-dict fallback rather than propagating up into context-loop code that isn't ready for the cancellation. Test cortex/tests/unit/test_pending_context_cleanup.py exercises four cases: disconnect with no pending is a no-op; disconnect with two pending futures cancels both (returning {} so callers don't hang); reconnect with the same client_id starts with a fresh cid; concurrent disconnect + CONTEXT_RESPONSE leaves the resolved future intact and the disconnect is a no-op for that cid. All four fail on main (_pending_cids_by_client and _cancel_pending_for_client do not exist); all four pass on this branch.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    9c889bf
  • audit F51: causal-explanation truncation indicator + Show more toggle Long causal-explanation strings used to be dumped into the overlay verbatim — overflowing the HUD card, pushing the breathing pacer and dismiss button below the fold, and giving the user no affordance to scan a one-line summary first. Truncate to a preview of _CAUSAL_TRUNCATE_THRESHOLD (180) characters with a trailing ellipsis when the text exceeds the threshold; surface a checkable QToolButton 'Show more' that toggles between preview and full. The full + preview strings are stashed on the OverlayWindow so the toggle handler can swap without re-parsing the payload. Four test cases: short text → no ellipsis; long text → ellipsis + toggle visible; click toggle → expanded shown + button label flips to Show less; _hide_causal_explanation resets cached strings + toggle state so a subsequent show isn't contaminated by stale state.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    50cdf94
  • audit F46: DEBUG flag becomes env-driven + runtime override Replaces the hard-coded `const DEBUG = false` with a layered resolver: 1. Build-time env: `import.meta.env.CORTEX_DEBUG === 'true'` or `process.env.CORTEX_DEBUG === 'true'` (covers Plasmo, vitest, and Node test contexts). 2. Runtime override via `chrome.storage.local.cortex_debug`. A `storage.onChanged` listener flips DEBUG immediately so an in-field debug session needs no reload. Setting `cortex_debug` back to false falls through to the build-time env value rather than locking on. New `_getDebugFlag()` export lets tests assert the resolved state. Test: __tests__/f46_debug_flag.spec.ts covers default-off, env-on, and runtime-flip in both directions.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    79ca532
  • audit F32: WS reconnect backoff resets on every successful open Introduces `INITIAL_RECONNECT_DELAY = 3000` and re-uses it in `ws.onopen` so a backoff that drifted up to 30s during a flaky period returns to 3s the moment a connection succeeds. Without this, the next transient drop after a long disconnect cycle still waited 30s — actively worsening the post-recovery experience. Test: __tests__/f32_reconnect_backoff.spec.ts asserts that after `__remoteClose` doubles the delay above INITIAL, the new auto-opened socket resets it back to INITIAL.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    2cceafd
  • merge: Wave 1-B (audit F13+F18+F29) — API-gateway rate limit, degraded envelope, context-truncation telemetry # Conflicts: # cortex/libs/logging/structured.py # cortex/libs/schemas/intervention.py

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    efa44da
  • audit F22: explicit close frame for slow WS consumers WebSocketServer._broadcast used to silently drop a client whose send() exceeded the 1 s timeout; the extension then saw an EPIPE on the next send and had no clean signal to drive reconnect. The new path: - Distinguishes timeout ("slow consumer") from generic send error ("send error") so logs can correlate cause and effect. - Routes both through a new _close_slow_consumer helper that: - Sends a close frame with code 1011 + the reason string. Wrapped in try/except so a half-torn-down socket whose close() raises is swallowed instead of cascading into the broadcast hot path. - Emits a structured EventType.WS_CLIENT_DISCONNECTED event with client_id, client_type, and reason so the launcher log lets support correlate disconnects with extension reconnect cycles. Test cortex/tests/unit/test_ws_slow_client.py exercises four cases via a stub websocket that sleeps in send(): slow client gets close(1011, "slow consumer") + is removed + emits ws_client_disconnected event with the cid and reason; healthy peer is unaffected; reconnection after the slow close works (same client_id, fresh socket); close on an already-dead socket whose close() raises does not propagate. Three of the four fail on main (the close-frame, the disconnect event, and the _close_slow_consumer helper do not exist); all four pass on this branch.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    9fdc1ad
  • audit F49: durable per-step onboarding completion marker Pre-fix the only signal that onboarding had completed was a sentinel file written when the user clicked Get Started. Step-level state was not persisted; a user who re-opened the wizard to fix one permission and clicked Get Started again had no record of which specific steps they actually finished, and a crash mid-wizard lost all progress. Add OnboardingState dataclass + onboarding_state_path() under <config_dir>/onboarding_state.json. Each step in ONBOARDING_STEPS (camera, accessibility, llm_backend, extensions) can be marked complete or incomplete; every mutation is persisted via atomic_write_json so a crash between mutation and rename does not corrupt the prior on-disk file. OnboardingWindow loads the state on construction (resume-friendly) and marks every step complete on the Get Started click before re-emitting completed. Public mark_step_complete / mark_step_incomplete hooks let individual affordances (permission grants, BYOK save) record progress independently of the final click. cortex/libs/utils/atomic_write.py introduced as the audit-F02 helper this finding depends on. Six test cases: full completion → marker present; back-then-forward preserves other steps; partial → marker absent; atomic write under simulated os.replace crash preserves prior file; missing file → fresh state; unknown step id raises.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    86059ab
  • audit F28: include prompt-template version in LLM cache key cache._make_key keyed on context + state + constraints only, so editing a template body (or SYSTEM_PROMPT) served plans generated by the prior text for up to cache_ttl_seconds. The cache was technically correct for a fixed template but invisibly stale across maintainer edits. cortex/services/llm_engine/prompts.py: new PROMPT_TEMPLATE_VERSION string — the first 12 hex chars of sha256(SYSTEM_PROMPT + sorted template-name/body pairs). The sort keeps the fingerprint stable across import-order refactors (e.g. a future decorator-based registry). The helper that computes it (_compute_prompt_template_version) is exposed so tests can simulate a template edit + restart. cortex/services/llm_engine/cache.py: LLMCache._context_key folds PROMPT_TEMPLATE_VERSION into the key payload. Lazy-imports prompts to avoid the import cycle (prompts imports schemas; cache also imports schemas). cortex/tests/unit/test_cache_template_version.py: 4 cases — same version hits the cache, editing a template body invalidates the cached plan on the next lookup, editing SYSTEM_PROMPT also invalidates, and the fingerprint is stable across import order. All fail on 36cc15f (no PROMPT_TEMPLATE_VERSION export; cache key omits it).

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    4d8663f
  • audit F29: context-truncation telemetry + UI affordance Pre-fix, ``_truncate_section`` silently dropped content to fit the token budget. A user pasting a 200-line traceback would unknowingly ship only the first 10 lines to the LLM and wonder why the plan ignored the actual error on line 187. The drop was invisible to logs and to the user. Failure mode closed: silent loss of user-relevant prompt context with no telemetry and no UI affordance. Fix has three pieces: 1. ``_truncate_section`` now returns ``(text, dropped_chars)``. Every caller in ``_enforce_token_budget`` records the byte loss under a canonical section name (``terminal_errors``, ``tab_titles``, ``code``, ``final_overflow``). 2. New ``TruncationReport`` aggregates per-section losses across the three truncation passes plus the final hard-cap. A thread-local buffer + ``capture_truncation_report()`` context manager let the planner scope a report across a single ``build_anthropic_messages`` call without threading new arguments through every helper. When truncation actually occurred, the helper emits exactly one ``EventType.CONTEXT_TRUNCATED`` log line with the bound F19 correlation id, the original/truncated token counts, and the comma-joined section list. The happy path stays log-silent. 3. ``InterventionPlan.metadata`` gains a free-form dict (default empty so existing plans are wire-compatible). The Anthropic planner stamps ``metadata["context_truncated_sections"]`` after enrichment when the captured report saw drops. The overlay surfaces a "Show more context" affordance label below the causal explanation iff that field is populated; the label quotes the dominant section name back to the user. Reused, not reinvented: ``get_correlation_id()`` and ``EventType`` from F19/F10. The thread-local capture pattern follows the same ``contextvars`` ergonomics as ``correlation_scope``. Test: cortex/tests/unit/test_context_truncation.py — 7 cases. (1) No truncation → no event, no flag. (2) Single section trimmed → exactly one event emitted + ``terminal_errors`` recorded. (3) Multiple sections trimmed → list contains every trimmed name. (4) ``_truncate_section`` returns a non-zero ``dropped_chars`` and the truncated text quotes the line count. (5) Token-count math: ``truncated_tokens <= original_tokens`` and both are positive on truncation. (6) ``InterventionPlan.metadata`` round-trips through Pydantic's ``model_dump_json`` / ``model_validate_json`` with the truncated-sections list intact; defaults to empty for plans that didn't truncate. (7) The overlay affordance is hidden when metadata is absent / empty / empty-list and visible when populated; the visible label text quotes the section name. All seven fail on main (commit 36cc15f) because ``_truncate_section`` has no return-shape contract for drops, ``InterventionPlan`` has no ``metadata`` field, the overlay has no truncation label, and the EventType entry doesn't exist.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    bbb75b8
  • audit F07b+F08b: extension presents capability token on stop chain Closes the deferred extension half of F07 (WS SHUTDOWN auth) and F08 (launcher /stop auth). New `cortex/apps/browser_extension/lib/auth.ts` exports `getAuthToken()` which fetches the token from the native host on first need (`{command: get_auth_token}`), caches it in `chrome.storage.session`, and shares an in-flight latch so concurrent callers fan in to one native-host round trip. STOP_CORTEX now: - Step 1 SHUTDOWN WS frame: payload.auth_token = <token> - Step 3 fetch /shutdown: header X-Cortex-Auth-Token: <token> - Step 6 fetch /stop: header X-Cortex-Auth-Token: <token> A token-fetch failure is non-fatal — Steps 2/4/5 still complete the kill chain. Test: __tests__/f07b_f08b_auth.spec.ts asserts native host is called once, subsequent reads hit cache, and SHUTDOWN/stop carry the token.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    aff1e1e
  • audit F05: apply_intervention waits for client confirmation The pre-F05 _OptimisticInterventionAdapter returned True unconditionally from execute(), meaning Mutation.success was always reported as success and the session report could not distinguish a partial / failed extension apply from a clean one. New plumbing in cortex/services/runtime_daemon.py: - await_apply_confirmation(intervention_id, timeout_seconds=30) returns an InterventionApplyResult future. The future is resolved by the WS INTERVENTION_APPLIED handler (real ack) or by a background timeout watcher (confirmed=False, timed_out=True). Guaranteed to resolve exactly once. - _spawn_background_task helper tracks watcher tasks in self._background_tasks so stop() can cancel them cleanly. - _handle_intervention_applied now resolves the apply-phase future before its existing executor mutation-reconcile path runs. - stop() drains background tasks and resolves any still-pending future to confirmed=False so awaiters never hang on a daemon restart. The apply_intervention route in cortex/services/api_gateway/routes.py accepts an await_confirmation query flag (default True) and surfaces the real outcome via InterventionApplyResponse.confirmation. The correlation_id is taken from X-Cortex-Request-ID so the response matches the F19 correlation pattern. Callers that want non-blocking 202-style semantics pass await_confirmation=False and poll later using the returned correlation_id. New schema InterventionApplyResult in cortex/libs/schemas/intervention.py carries intervention_id, correlation_id, confirmed, timed_out, applied_actions, errors, and phase. The daemon registers itself in the service registry under "daemon" so the route can call into it. Test cortex/tests/integration/test_apply_intervention_confirmation.py exercises five cases via a minimal-daemon mixin that re-binds the production methods: apply+ack -> confirmed=True; no ack within timeout -> confirmed=False with timed_out=True; partial ack -> per-action breakdown; daemon restart loses in-flight + next ack is a no-op; future resolved exactly once when ack and watcher race. All five fail on main (the schema, method, and helper do not exist); all five pass on this branch.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    29346ed
  • audit F19b: correlation IDs in browser extension Closes the deferred extension half of F19. New `cortex/apps/browser_extension/lib/correlation.ts` exports `newCorrelationId()` (returns `cid_<12 hex chars>`), `isCorrelationId()` for narrowing, and a `withCorrelationId` listener wrapper. popup.tsx and newtab.tsx mint a cid on every user-initiated click and attach it to the outbound `chrome.runtime.sendMessage` payload. background.ts logs the cid on receive (`cortex.bg.recv cid=...`) and threads it onto outbound WS frames: USER_ACTION (already done by F16) plus SHUTDOWN. Daemon-side logging (added in F19) now correlates clicks across all four layers. Test: __tests__/f19b_correlation.spec.ts asserts the cid shape and that a popup-supplied cid lands on the outbound WS USER_ACTION frame untouched.

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    afa80a6
  • merge: Wave 1-A (audit F12+F14+F37+F38+F39) — security/scripts cluster

    @StevenWang-CY StevenWang-CY committed May 18, 2026
    debb297