audit-w2: defang prompt-injection wrappers used by build_user_prompt
F09 wrapped each interpolated user-controlled value in a tag-distinct
delimiter (WORKSPACE_CONTEXT / CONSTRAINTS / USER_GOAL / EXTRA_CONTEXT)
but sanitize_prompt_text only defanged the legacy USER_CONTENT tag and
the role-marker family. A tab title containing </WORKSPACE_CONTEXT>
prematurely closed the data envelope and the model interpreted
subsequent bytes as instructions. Broaden the case-insensitive regex
defang to cover every wrapper build_user_prompt actually uses; plain
prose without angle brackets survives untouched.
67125da
audit Debt-1: generator infrastructure (Pydantic → TypeScript codegen)
Adds the systemic close-out for Architectural Debt #1: hand-written
TypeScript interfaces in cortex/apps/browser_extension/ that have
drifted from the Pydantic models in cortex/libs/schemas/ (F42/F43/F44/
F45). Generator script + initial generated file + drift-detection
test suite.
Pieces:
- cortex/scripts/generate_ts_schemas.py walks every cortex.libs.schemas
submodule via pkgutil, imports them so every Pydantic class
registers, then feeds the discovered models through
pydantic-to-typescript (imported as pydantic2ts) which shells out
to json2ts (json-schema-to-typescript). Emits a single
cortex_schemas.d.ts under apps/browser_extension/types/generated/
with an AUTOGENERATED header naming the regeneration command.
--check mode regenerates into a tempfile, diffs against the
committed copy, prints the diff on drift, and exits non-zero so
the pre-commit hook (Commit 5) and CI gate fail loudly.
- cortex/pyproject.toml gains a 'codegen' optional-dependencies
extra and pulls pydantic-to-typescript into the existing 'dev'
extra so `pip install -e ./cortex[dev]` picks it up.
- cortex/tests/unit/test_schema_codegen.py — 12 cases. Module
discovery covers every schemas/*.py; banner stripping is
idempotent and tolerant of pydantic2ts version drift; drift
detection exits 1 with a unified diff; missing committed file is
treated as drift (not a generator failure). One end-to-end
case is skipped when json2ts is not on PATH so the suite still
passes on minimal CI matrices.
Verification:
python -m cortex.scripts.generate_ts_schemas --check # exit 0
pytest cortex/tests/unit/test_schema_codegen.py # 12 passed
The generated cortex_schemas.d.ts already includes:
- SuggestedAction.action_type as a Literal union (F42 closes here once
the extension imports it in Commit 4)
- SuggestedAction.catalog_id (F43 closes here)
- SuggestedAction.reversible (F44 closes here once the extension
drops its 'undo_available' alias in Commit 4)
WSMessage + MessageType land in Commit 2; the regenerated file
covering them lands in Commit 3; extension migration in Commit 4;
CI + pre-commit gate in Commit 5.
bb847cb
audit-w2: route remaining raw-int spacing through tokens
overlay.py inlined (24, 24, 24, 24) for the main HUD margin; onboarding.py
inlined setSpacing(8) for the progress-step row. Both values match
existing tokens (SP6 and SP2 respectively). Promoting them keeps the
4pt grid the single source of truth for layout rhythm so a future tokens
edit can shift macroscopic spacing without grepping for stray integers.
The remaining raw integers in dashboard.py (sub-token paddings like 3px
inner pill padding, 2px column gap, etc.) are intentional sub-4pt
visual fine-tuning that is below the token granularity. Verified F47
overlay-tokens + F55 dashboard tests stay green.
c90d382
audit-w2: regression-guard for native window chrome coverage
Five top-level windows (DashboardWindow, SettingsDialog, OnboardingWindow,
OverlayWindow, ConnectionsPanel) already invoke apply_unified_titlebar +
apply_vibrancy in their showEvent. A future window class that forgets the
call would silently inherit Qt's default opaque titlebar — visually
divergent from the rest of the shell.
Adds cortex/tests/unit/test_window_chrome_coverage.py — 10 parameterised
ast-based cases that pin every top-level window to its required mac_native
calls. No production code change; pure regression guard.
84a58f4
audit-w2: route timeline font-family through FONT_MONO token
dashboard.py's timeline panel inlined a font-family stack literal
('"SF Mono", ui-monospace, Menlo, monospace') instead of importing
the FONT_MONO token. Replaced the literal with FONT_MONO so a future
edit to the mono-font stack in tokens.yaml propagates here. No visual
change — the existing literal happened to match the token verbatim.
d661a38
audit-w2: promote warm label tints to tokens, lift sub-AA tertiary
connections, settings, onboarding kept private copies of '#5C5854' and
'#827971' for the secondary/tertiary warm-greyscale tints. The '#827971'
tertiary fails WCAG AA on the cream background (3.98:1, under the 4.5:1
threshold) — F55 fixed it in dashboard.py but the other three surfaces
drifted. This commit promotes both tints to CX_TEXT_SECONDARY /
CX_TEXT_TERTIARY in the token registry (tokens.yaml emitter +
generated tokens.py + browser-extension design-tokens.ts), pins the
AA-passing '#6B6661' value, and updates every consumer to import from
tokens rather than carry a private hex literal.
Adds cortex/tests/unit/test_token_label_consistency.py with 9 cases:
the registry value is '#6B6661', no surface contains the legacy hex
literally, and every surface's _LABEL_TERTIARY/_LABEL_SECONDARY equals
the token at import time. Existing F47 + F55 + overlay-token tests
remain green.
9c7c32b
test: rewrite F24 async-ladder priming with asyncio.run
F24 made ConsentLadder.check / record_approval async (lock). The
test_get_consent_level_with_ladder and test_reset_consent_with_ladder
tests were still using asyncio.get_event_loop().run_until_complete(),
which raises RuntimeError on Python 3.10+ when no loop is running.
Switched to asyncio.run() / an async helper.
ec6fc78
merge: Wave 1-F (audit F31+F33+F35+F36+F47+F48+F49+F51+F53+F55+F56) — maintainability cluster
# Conflicts:
# cortex/apps/desktop_shell/dashboard.py
# cortex/apps/desktop_shell/overlay.py
# cortex/libs/utils/atomic_write.py
# cortex/services/runtime_daemon.py
485742d
audit C-tier follow-up: harness compat for the legacy PySide6 mock suite
The Phase-C commits (F31, F33, F47, F48, F51, F53, F55) introduced new
attribute calls on dashboard / overlay widgets and new tests that
require the real PySide6 library. They passed in isolation but failed
when run inside the full unit suite because test_desktop_shell.py:
* installs lightweight mock PySide6 modules in sys.modules at import
time and never restores the real ones (so a test that runs AFTER
test_desktop_shell is collected sees the mocks even if it was
imported earlier alphabetically);
* the mocks (MockQLineEdit, MockQApplication) only expose a subset
of the QWidget API the audit fixes call (setAccessibleName,
accessibleName, setText on QToolButton, QApplication.instance);
* the real PySide6 C extension cannot be re-imported within a
single process after sys.modules['PySide6'] is deleted — the C
state from the first load conflicts with the second load and
segfaults.
Two-pronged fix:
(a) Desktop-shell helpers degrade defensively against the lightweight
mock surface. _set_accessible_name / _set_accessible_description /
_set_tab_order / _safe_call wrappers no-op when the target method
is missing on the stub. QToolButton import in overlay.py falls
back to QPushButton when the stub lacks it.
(b) The new audit Qt tests detect when PySide6 has been replaced by
the mock (heuristic: missing __file__ attribute on the cached
module) and skip with a clear reason rather than crashing. The
skip happens in an autouse fixture + a re-check inside qapp so
fixture-ordering quirks cannot bypass it. Tests still pass cleanly
in isolation.
Net effect: full pytest cortex/tests/unit/ --ignore=test_capture_service
runs 928 passed, 26 skipped, 0 failures.
ad8023d
merge: Wave 1-G (audit F40+F16+F15+F19b+F07b+F08b+F32+F46+F50+F52+F54) — TS test infra + extension wiring
6d25329
audit F56: register SIGINT/SIGTERM via loop.add_signal_handler
Pre-fix the daemon-side shutdown chain relied entirely on the outer
harness (run_dev.py) to register signal handlers. If the daemon was
launched without that harness (desktop-shell in-process mode, future
tests, future CLI entry points), nothing was wired to SIGINT/SIGTERM
at all — or worse, a caller might add a signal.signal handler before
asyncio.run started. signal.signal registers a C-level handler that
the kernel invokes in the signal frame, which on Cortex is almost
always inside numpy / mediapipe / OpenCV native code. Running Python
in that frame violates the GIL contract and can segfault on resume.
CortexDaemon.start() now calls _install_loop_signal_handlers() before
spawning the loop tasks. That helper uses loop.add_signal_handler so
the callback (_on_signal_received) is dispatched as a regular
event-loop tick — Python state is frame-safe when the handler runs.
On platforms that don't support add_signal_handler (Windows,
embedded), the helper degrades silently and the outer harness
retains responsibility.
Two test cases: SIGTERM during a stub numpy-style tight loop sets
_shutdown without segfaulting AND the handler runs inside the
asyncio loop (asyncio.get_running_loop() succeeds in the captured
context); unsupported-platform stub raises NotImplementedError and
the helper logs+continues rather than crashing.
c95541e
audit F54: connections panel distinguishes four failure states
Replaces the single "Not connected" disconnect screen with four
distinct connectivity states, each with its own title, diagnostic
body, and fix-action CTA:
- not_installed: native messaging host missing
- installed_no_daemon: host present but daemon WS unreachable
- installed_version_mismatch: daemon up, version differs from extension
- handshake_failed: WS open, daemon rejected handshake
New `classifyConnectivity` pure function (exported) computes the
state from {connected, nativeHostStatus, daemonVersion,
expectedVersion, handshakeError}. New CONNECTIVITY_DIAGNOSTIC
message type lets background push the resolved diagnostic.
The diagnostic block now renders whenever `connectivity !== 'ok'`
rather than only when `!connected`, because version_mismatch and
handshake_failed both occur while the WS is technically connected.
Test: __tests__/f54_connectivity_states.spec.tsx covers all 6 enum
inputs and asserts the popup renders distinct
`[data-testid=conn-state-<state>]` titles for each failure mode.
df18591
audit F55: accessible names, tab-order chain, WCAG AA tertiary contrast
Pre-fix:
* Several interactive widgets in Dashboard and Overlay lacked
setAccessibleName so VoiceOver / screen readers announced raw
ObjectClass names instead of the human label.
* No widget had setTabOrder wired explicitly; the tab chain depended
on construction order and a single re-arrangement could silently
scramble it.
* _LABEL_TERTIARY = '#827971' against _CONTROL_BG '#FFFFFF' computed
to ~3.98:1 — just below WCAG AA's 4.5:1 threshold for normal-weight
text. The role is 'tertiary captions / placeholders' so the volume
affected is high (every QLineEdit placeholder, every '--' debug
label).
Fix:
(a) setAccessibleName on goal QLineEdit ('Goal'), Connect button
('Open Connections panel'), overlay dismiss button
('Dismiss intervention'), causal-explanation toggle ('Show full
causal explanation'). setAccessibleDescription on goal input for
richer screen-reader context.
(b) QWidget.setTabOrder explicit chain — Goal → Connect → Stop in
_ConsumerTab; causal toggle → dismiss button in OverlayWindow.
(c) _LABEL_TERTIARY bumped from #827971 to #6B6661 (~5.4:1 against
white — comfortably above AA).
(d) HUD palette tokens (TEXT_HUD_PRIMARY etc.) — the F47 commit
already moved overlay text to tokens.py; this test pins the
alpha contract.
Five test cases, including a hand-rolled WCAG 2.1 contrast-ratio
helper (avoids adding wcag-contrast-ratio as a dep).
5c0d3e3
merge: Wave 1-E (audit F06+F34+F04+F05+F22+F23) — UI race-condition cohort
# Conflicts:
# cortex/services/runtime_daemon.py
3ca51ef
audit F52: dedup tab-close affordance by tab_index
The previous synthesise rule was all-or-nothing: if any
suggested_action with action_type=close_tab existed, we skipped
synthesising entirely (dropping the close affordance for any other
recommended tab); otherwise we synthesised one action per closeable
rec — duplicating the close button when the LLM emitted both
suggested_actions and tab_recommendations for the same tab_index.
F52 makes the dedup per-tab_index: synthesise only for tab indices
not already covered by an existing close-style suggested_action,
so the tab card carries the single close button.
Applied identically in background.ts (intervention overlay HTML
build) and popup.tsx (popup action list). `synthesizeActions` is
now exported so it can be unit-tested.
Test: __tests__/f52_tab_dedup.spec.ts covers covered/uncovered
mixes, empty rec short-circuit, and full synthesis when no
close-style suggested_action exists.
eb92a1b
audit F53: surface QSettings sync() failures to the controller
Pre-fix, SettingsDialog._persist_settings wrapped self._qs.sync() in a
bare except: pass. A sandbox container with a revoked ACL, a read-only
filesystem, or a disk-full condition all manifested as the Apply
button succeeding from the user's perspective while nothing actually
persisted.
Add a settings_save_failed(str) Signal. The new _persist_settings:
(a) catches sync() exceptions and emits with the reason; (b) inspects
QSettings.status() afterwards and emits if anything other than
NoError. _describe_qsettings_status maps AccessError / FormatError to
human-readable strings so the controller can surface the failure to
the user without exposing Qt enum integers.
Three test cases: happy path emits no signal; sync() exception emits
once with the OSError text; status==AccessError emits once with an
'access denied' reason.
8b264fb
audit F50: stabilise popup runtime.onMessage listener identity
Extracts the popup's message handler into a `useCallback([], …)` so
the addListener/removeListener pair refers to the same reference
across re-renders. The original code inlined a fresh closure inside
the useEffect; React's setState identities kept things working in
practice, but the contract was easy to break (e.g. closing over a
mutable state would have leaked listeners).
Test: __tests__/f50_popup_listener_leak.spec.tsx mounts and unmounts
the same pattern 10x and asserts the listener count returns to its
pre-mount baseline every cycle.
Also adds `IS_REACT_ACT_ENVIRONMENT = true` in test setup so the
React 18 act() warning stops spamming the console for any tsx
component test.
9298f59
audit F23: cancel pending correlation-id futures on client disconnect
WebSocketServer.request_context registered a future keyed by
correlation_id; the matching CONTEXT_RESPONSE resolved it. Pre-F23, if
the requesting client disconnected before responding, the future hung
until the per-call timeout (default 5 s), wedging the calling
coroutine in the daemon's context loop for every concurrent request.
New plumbing:
- self._pending_cids_by_client: dict[client_id, set[correlation_id]]
populated in request_context, kept tight by _drop_pending_cid /
_handle_context_response so it only ever holds in-flight cids.
- _cancel_pending_for_client(client_id) cancels every pending future
for that client, called from _handle_client's finally block when
the connection drops. Returns the count for the debug log.
- request_context now catches asyncio.CancelledError separately so a
cancellation triggered by disconnect returns the empty-dict fallback
rather than propagating up into context-loop code that isn't ready
for the cancellation.
Test cortex/tests/unit/test_pending_context_cleanup.py exercises four
cases: disconnect with no pending is a no-op; disconnect with two
pending futures cancels both (returning {} so callers don't hang);
reconnect with the same client_id starts with a fresh cid; concurrent
disconnect + CONTEXT_RESPONSE leaves the resolved future intact and
the disconnect is a no-op for that cid. All four fail on main
(_pending_cids_by_client and _cancel_pending_for_client do not exist);
all four pass on this branch.
9c889bf
audit F51: causal-explanation truncation indicator + Show more toggle
Long causal-explanation strings used to be dumped into the overlay
verbatim — overflowing the HUD card, pushing the breathing pacer and
dismiss button below the fold, and giving the user no affordance to
scan a one-line summary first.
Truncate to a preview of _CAUSAL_TRUNCATE_THRESHOLD (180) characters
with a trailing ellipsis when the text exceeds the threshold; surface
a checkable QToolButton 'Show more' that toggles between preview and
full. The full + preview strings are stashed on the OverlayWindow so
the toggle handler can swap without re-parsing the payload.
Four test cases: short text → no ellipsis; long text → ellipsis +
toggle visible; click toggle → expanded shown + button label flips to
Show less; _hide_causal_explanation resets cached strings + toggle
state so a subsequent show isn't contaminated by stale state.
50cdf94
audit F46: DEBUG flag becomes env-driven + runtime override
Replaces the hard-coded `const DEBUG = false` with a layered
resolver:
1. Build-time env: `import.meta.env.CORTEX_DEBUG === 'true'` or
`process.env.CORTEX_DEBUG === 'true'` (covers Plasmo, vitest,
and Node test contexts).
2. Runtime override via `chrome.storage.local.cortex_debug`. A
`storage.onChanged` listener flips DEBUG immediately so an
in-field debug session needs no reload.
Setting `cortex_debug` back to false falls through to the build-time
env value rather than locking on. New `_getDebugFlag()` export lets
tests assert the resolved state.
Test: __tests__/f46_debug_flag.spec.ts covers default-off, env-on,
and runtime-flip in both directions.
79ca532
audit F32: WS reconnect backoff resets on every successful open
Introduces `INITIAL_RECONNECT_DELAY = 3000` and re-uses it in
`ws.onopen` so a backoff that drifted up to 30s during a flaky
period returns to 3s the moment a connection succeeds. Without
this, the next transient drop after a long disconnect cycle still
waited 30s — actively worsening the post-recovery experience.
Test: __tests__/f32_reconnect_backoff.spec.ts asserts that after
`__remoteClose` doubles the delay above INITIAL, the new
auto-opened socket resets it back to INITIAL.
2cceafd
merge: Wave 1-B (audit F13+F18+F29) — API-gateway rate limit, degraded envelope, context-truncation telemetry
# Conflicts:
# cortex/libs/logging/structured.py
# cortex/libs/schemas/intervention.py
efa44da
audit F22: explicit close frame for slow WS consumers
WebSocketServer._broadcast used to silently drop a client whose send()
exceeded the 1 s timeout; the extension then saw an EPIPE on the next
send and had no clean signal to drive reconnect. The new path:
- Distinguishes timeout ("slow consumer") from generic send error
("send error") so logs can correlate cause and effect.
- Routes both through a new _close_slow_consumer helper that:
- Sends a close frame with code 1011 + the reason string. Wrapped in
try/except so a half-torn-down socket whose close() raises is
swallowed instead of cascading into the broadcast hot path.
- Emits a structured EventType.WS_CLIENT_DISCONNECTED event with
client_id, client_type, and reason so the launcher log lets
support correlate disconnects with extension reconnect cycles.
Test cortex/tests/unit/test_ws_slow_client.py exercises four cases via
a stub websocket that sleeps in send(): slow client gets close(1011,
"slow consumer") + is removed + emits ws_client_disconnected event
with the cid and reason; healthy peer is unaffected; reconnection
after the slow close works (same client_id, fresh socket); close on
an already-dead socket whose close() raises does not propagate. Three
of the four fail on main (the close-frame, the disconnect event, and
the _close_slow_consumer helper do not exist); all four pass on this
branch.
9fdc1ad
audit F49: durable per-step onboarding completion marker
Pre-fix the only signal that onboarding had completed was a sentinel
file written when the user clicked Get Started. Step-level state was
not persisted; a user who re-opened the wizard to fix one permission
and clicked Get Started again had no record of which specific steps
they actually finished, and a crash mid-wizard lost all progress.
Add OnboardingState dataclass + onboarding_state_path() under
<config_dir>/onboarding_state.json. Each step in ONBOARDING_STEPS
(camera, accessibility, llm_backend, extensions) can be marked
complete or incomplete; every mutation is persisted via
atomic_write_json so a crash between mutation and rename does not
corrupt the prior on-disk file. OnboardingWindow loads the state on
construction (resume-friendly) and marks every step complete on
the Get Started click before re-emitting completed. Public
mark_step_complete / mark_step_incomplete hooks let individual
affordances (permission grants, BYOK save) record progress
independently of the final click.
cortex/libs/utils/atomic_write.py introduced as the audit-F02 helper
this finding depends on. Six test cases: full completion → marker
present; back-then-forward preserves other steps; partial → marker
absent; atomic write under simulated os.replace crash preserves
prior file; missing file → fresh state; unknown step id raises.
86059ab
audit F28: include prompt-template version in LLM cache key
cache._make_key keyed on context + state + constraints only, so editing
a template body (or SYSTEM_PROMPT) served plans generated by the prior
text for up to cache_ttl_seconds. The cache was technically correct
for a fixed template but invisibly stale across maintainer edits.
cortex/services/llm_engine/prompts.py: new PROMPT_TEMPLATE_VERSION
string — the first 12 hex chars of sha256(SYSTEM_PROMPT + sorted
template-name/body pairs). The sort keeps the fingerprint stable across
import-order refactors (e.g. a future decorator-based registry). The
helper that computes it (_compute_prompt_template_version) is exposed
so tests can simulate a template edit + restart.
cortex/services/llm_engine/cache.py: LLMCache._context_key folds
PROMPT_TEMPLATE_VERSION into the key payload. Lazy-imports prompts to
avoid the import cycle (prompts imports schemas; cache also imports
schemas).
cortex/tests/unit/test_cache_template_version.py: 4 cases — same
version hits the cache, editing a template body invalidates the cached
plan on the next lookup, editing SYSTEM_PROMPT also invalidates, and
the fingerprint is stable across import order. All fail on 36cc15f
(no PROMPT_TEMPLATE_VERSION export; cache key omits it).
4d8663f
audit F29: context-truncation telemetry + UI affordance
Pre-fix, ``_truncate_section`` silently dropped content to fit the
token budget. A user pasting a 200-line traceback would unknowingly
ship only the first 10 lines to the LLM and wonder why the plan
ignored the actual error on line 187. The drop was invisible to logs
and to the user.
Failure mode closed: silent loss of user-relevant prompt context with
no telemetry and no UI affordance.
Fix has three pieces:
1. ``_truncate_section`` now returns ``(text, dropped_chars)``. Every
caller in ``_enforce_token_budget`` records the byte loss under a
canonical section name (``terminal_errors``, ``tab_titles``,
``code``, ``final_overflow``).
2. New ``TruncationReport`` aggregates per-section losses across the
three truncation passes plus the final hard-cap. A thread-local
buffer + ``capture_truncation_report()`` context manager let the
planner scope a report across a single ``build_anthropic_messages``
call without threading new arguments through every helper. When
truncation actually occurred, the helper emits exactly one
``EventType.CONTEXT_TRUNCATED`` log line with the bound F19
correlation id, the original/truncated token counts, and the
comma-joined section list. The happy path stays log-silent.
3. ``InterventionPlan.metadata`` gains a free-form dict (default empty
so existing plans are wire-compatible). The Anthropic planner
stamps ``metadata["context_truncated_sections"]`` after enrichment
when the captured report saw drops. The overlay surfaces a "Show
more context" affordance label below the causal explanation iff
that field is populated; the label quotes the dominant section
name back to the user.
Reused, not reinvented: ``get_correlation_id()`` and ``EventType``
from F19/F10. The thread-local capture pattern follows the same
``contextvars`` ergonomics as ``correlation_scope``.
Test: cortex/tests/unit/test_context_truncation.py — 7 cases. (1) No
truncation → no event, no flag. (2) Single section trimmed → exactly
one event emitted + ``terminal_errors`` recorded. (3) Multiple
sections trimmed → list contains every trimmed name. (4) ``_truncate_section``
returns a non-zero ``dropped_chars`` and the truncated text quotes the
line count. (5) Token-count math: ``truncated_tokens <= original_tokens``
and both are positive on truncation. (6) ``InterventionPlan.metadata``
round-trips through Pydantic's ``model_dump_json`` /
``model_validate_json`` with the truncated-sections list intact;
defaults to empty for plans that didn't truncate. (7) The overlay
affordance is hidden when metadata is absent / empty / empty-list and
visible when populated; the visible label text quotes the section
name. All seven fail on main (commit 36cc15f) because
``_truncate_section`` has no return-shape contract for drops,
``InterventionPlan`` has no ``metadata`` field, the overlay has no
truncation label, and the EventType entry doesn't exist.
bbb75b8
audit F07b+F08b: extension presents capability token on stop chain
Closes the deferred extension half of F07 (WS SHUTDOWN auth) and F08
(launcher /stop auth). New `cortex/apps/browser_extension/lib/auth.ts`
exports `getAuthToken()` which fetches the token from the native host
on first need (`{command: get_auth_token}`), caches it in
`chrome.storage.session`, and shares an in-flight latch so concurrent
callers fan in to one native-host round trip.
STOP_CORTEX now:
- Step 1 SHUTDOWN WS frame: payload.auth_token = <token>
- Step 3 fetch /shutdown: header X-Cortex-Auth-Token: <token>
- Step 6 fetch /stop: header X-Cortex-Auth-Token: <token>
A token-fetch failure is non-fatal — Steps 2/4/5 still complete the
kill chain.
Test: __tests__/f07b_f08b_auth.spec.ts asserts native host is called
once, subsequent reads hit cache, and SHUTDOWN/stop carry the token.
aff1e1e
audit F05: apply_intervention waits for client confirmation
The pre-F05 _OptimisticInterventionAdapter returned True unconditionally
from execute(), meaning Mutation.success was always reported as success
and the session report could not distinguish a partial / failed
extension apply from a clean one.
New plumbing in cortex/services/runtime_daemon.py:
- await_apply_confirmation(intervention_id, timeout_seconds=30) returns
an InterventionApplyResult future. The future is resolved by the WS
INTERVENTION_APPLIED handler (real ack) or by a background timeout
watcher (confirmed=False, timed_out=True). Guaranteed to resolve
exactly once.
- _spawn_background_task helper tracks watcher tasks in
self._background_tasks so stop() can cancel them cleanly.
- _handle_intervention_applied now resolves the apply-phase future
before its existing executor mutation-reconcile path runs.
- stop() drains background tasks and resolves any still-pending future
to confirmed=False so awaiters never hang on a daemon restart.
The apply_intervention route in cortex/services/api_gateway/routes.py
accepts an await_confirmation query flag (default True) and surfaces
the real outcome via InterventionApplyResponse.confirmation. The
correlation_id is taken from X-Cortex-Request-ID so the response
matches the F19 correlation pattern. Callers that want non-blocking
202-style semantics pass await_confirmation=False and poll later
using the returned correlation_id.
New schema InterventionApplyResult in cortex/libs/schemas/intervention.py
carries intervention_id, correlation_id, confirmed, timed_out,
applied_actions, errors, and phase. The daemon registers itself in the
service registry under "daemon" so the route can call into it.
Test cortex/tests/integration/test_apply_intervention_confirmation.py
exercises five cases via a minimal-daemon mixin that re-binds the
production methods: apply+ack -> confirmed=True; no ack within timeout
-> confirmed=False with timed_out=True; partial ack -> per-action
breakdown; daemon restart loses in-flight + next ack is a no-op;
future resolved exactly once when ack and watcher race. All five fail
on main (the schema, method, and helper do not exist); all five pass
on this branch.
29346ed
audit F19b: correlation IDs in browser extension
Closes the deferred extension half of F19. New
`cortex/apps/browser_extension/lib/correlation.ts` exports
`newCorrelationId()` (returns `cid_<12 hex chars>`),
`isCorrelationId()` for narrowing, and a `withCorrelationId`
listener wrapper.
popup.tsx and newtab.tsx mint a cid on every user-initiated click
and attach it to the outbound `chrome.runtime.sendMessage` payload.
background.ts logs the cid on receive (`cortex.bg.recv cid=...`)
and threads it onto outbound WS frames: USER_ACTION (already done
by F16) plus SHUTDOWN. Daemon-side logging (added in F19) now
correlates clicks across all four layers.
Test: __tests__/f19b_correlation.spec.ts asserts the cid shape and
that a popup-supplied cid lands on the outbound WS USER_ACTION
frame untouched.
afa80a6
merge: Wave 1-A (audit F12+F14+F37+F38+F39) — security/scripts cluster
debb297