-
Notifications
You must be signed in to change notification settings - Fork 0
state
Phase: 2 (remediation FULLY closed; 56 of 56 Ledger findings + 2 Architectural Debts + Phase I + Phase J shipped)
Next finding to address: none — Ledger fully closed.
Last finding closed: F41 (eval-harness CI regression gate) — commit 4fc42fd.
Last commit: audit F41: regression harness + committed baseline + CI gate (4fc42fd).
- Read
audit/findings.md— authoritative Ledger (Phase 1). - Read
audit/execution-log.md— full commit-by-commit log + "Phase 2 Session 2 — Close-out Report" at the bottom. - Read this file — pointer + outstanding list.
- If a deferred item is to be picked up, dispatch with the same per-finding atomic-commit conventions used in Session 2.
All three previously-deferred items shipped:
| ID | Summary | Closure |
|---|---|---|
| F17 | State-update sequence-number drop on receivers | Daemon stamps _seq on in-process callbacks (runtime_daemon.py); DaemonBridge + WebSocketBridge + background.ts each maintain per-channel/per-type last-applied counters and drop stale frames. Trackers cleared on (re)connect so a daemon restart's seq=1 wins. 12 Python + 7 TS tests. Commit 71b94c1. |
| F25 | Cooldown/dwell oscillation hysteresis | New InterventionConfig.max_interventions_per_hour (default 6) imposes a sliding-window hourly cap; new oscillation_max_flips + oscillation_dwell_multiplier lengthen the required dwell when the state has been entering HYPER more than N times in a 10-minute window. Drive-by fix to the now = timestamp or time.monotonic() 0.0-falsy bug across 4 call sites. 7 tests. Commit 16c8bd5. |
| F41 | Eval harness in CI with regression threshold | New cortex/services/eval/regression_harness.py replays four synthetic traces (oscillation, sustained-overwhelm, pure-FLOW, bandit) and compares against cortex/services/eval/baseline.json (committed). CLI exits 1 on any metric crossing its 3%-relative-+-abs-floor tolerance band. New eval-regression job in .github/workflows/ci.yml runs on PRs touching llm_engine/state_engine/eval/. 17 tests. Commit 4fc42fd. |
-
F07b — Native-host
get_auth_token(Wave 1-G, closed). -
F08b — Extension
X-Cortex-Auth-Tokenheader on launcher/daemon HTTP (Wave 1-G, closed). - F16-srv — Daemon refuses stale USER_ACTION cid (Wave 1-G, closed).
- F19b — Correlation IDs in browser extension (Wave 1-G, closed).
- F20 persistent dashboard banner (per-intervention hint sufficient; dashboard banner is a deepening).
- 9 catalogue-only LEETCODE_* WS types (default-arm log line is the visibility hatch).
-
SessionReportaggregate rollup ofintervention_apply_confirmationevents. - 3 Qt overlay tests fail under PySide6 mock pollution (pre-existing test-infra issue; pass in isolation).
- Pre-existing test pollution suite (
test_redis_store,test_helpfulness, etc.) — orthogonal to audit work. - 4 P2/P3 a11y items documented in
CHANGELOG.md"Known limitations".
-
93 commits landed on
mainthis session. - ~345 audit-specific tests added across Python (pytest) and TypeScript (vitest).
- 2 Architectural Debts closed structurally (Debt-1 codegen, Debt-2 systemic auth).
- 2 Non-Ledger phases shipped (Phase I performance, Phase J UX polish).
-
0 commits pushed — all work is on local
main; user should review before pushing to thecortexremote.
See "Verification commands (reproducible)" section in audit/execution-log.md for the full battery. TL;DR:
pytest cortex/tests/unit/ # 1275 pass (modulo pre-existing test-pollution suite)
QT_QPA_PLATFORM=offscreen pytest cortex/tests/unit/test_dashboard_toast.py \
cortex/tests/unit/test_dashboard_empty_state.py \
cortex/tests/unit/test_onboarding_hints.py \
cortex/tests/unit/test_overlay_animation.py
cd cortex/apps/browser_extension && pnpm test # 35 pass across 12 specs
CORTEX_JSON2TS_CMD=$(which json2ts) python -m cortex.scripts.generate_ts_schemas --check-
Trigger-policy hysteresis under real biometric jitter (F25-residual). Cost runaway bounded by F20's budget kill-switch ($20/day default). Quality bounded by F26 + F27. The next escalation is data-driven via F41's eval baseline. Monitor:
cortex_state_loop_interventions_per_hourshould stay <10 nominal, <30 with budget kill armed. -
Schema-codegen drift via Pydantic source bypass. Debt-1 closure depends on every TS-visible field originating in
cortex/libs/schemas/. CI gateschema-codegen-checkmust be marked Required on the GitHub repo to enforce. - Capability-token rotation collision with in-flight WS sessions. Debt-2 rotation kills existing connections; the extension's auto-reconnect handles it but logs AUTH_REJECTED during the transition window. Monitor: a sustained spike in AUTH_REJECTED beyond 30s = rotation went wrong.
F25 partial closure. Cost-runaway side is well-contained and regression-tested. The quality-of-experience side (intervention spam under jitter) is partially closed by F26/F27 but not directly tested with adversarial state sequences. The right next step is an /eval suite that replays a synthetic jittery-state trace and asserts intervention count stays within an envelope. That is F41's territory and was deferred.