Skip to content
Chuyue Wang edited this page Mar 19, 2026 · 21 revisions

Cortex

Cortex is a real-time biofeedback engine that watches you work through your webcam and input devices, detects cognitive overwhelm, and actively restructures your digital workspace so you can stay focused. Unlike timer-based productivity tools, Cortex uses your biology to decide when you need help, and uses LLMs to decide how to help.


Key Features

  • Bio-extraction at 30 FPS — heart rate, HRV, and respiratory rate via rPPG from your face (no video stored); blink rate, head pose, and posture via MediaPipe; mouse/keyboard patterns via pynput. Gracefully degrades to telemetry-only mode in poor lighting
  • Cognitive state classification — captures at 30 FPS and fuses signals into state estimates every 500ms as FLOW, HYPER (overwhelmed), HYPO (disengaged), or RECOVERY using rule-based scoring with EMA smoothing and hysteresis
  • LLM-powered interventions — workspace context (never biometrics) is sent to the LLM, which returns executable actions: close distraction tabs, group related tabs, surface error fixes, decompose tasks into micro-steps. Smart tab algorithm protects recently-visited tabs, AI assistants, and goal-relevant content from being closed
  • Activity tracking and resume — tracks learning progress across YouTube, Bilibili, Coursera, LeetCode, PDFs, Jupyter, and more. On return, shows a one-click resume card that seeks video, scrolls to position, or pastes saved code
  • LeetCode mode — DOM observer, stage inference (READ/PLAN/IMPLEMENT/DEBUG/REFLECT), amygdala hijack lockout, pattern ladder hints, submission discipline guard
  • Biology-driven breaks — cumulative HRV suppression integral replaces arbitrary Pomodoro timers; you can ride deep FLOW until your body says stop
  • Progressive consent — 5-level trust ladder (OBSERVE → SUGGEST → PREVIEW → REVERSIBLE_ACT → AUTONOMOUS_ACT) per action type; Cortex earns autonomy through repeated approvals
  • Learning loop — contextual bandit (LinUCB) selects intervention type; helpfulness tracker computes reward from user engagement and explicit ratings; per-tab relevance tracker learns individual tab preferences from Keep button feedback
  • Ambient somatic feedback — sub-threshold color vignettes, weather particles, and flow shield that fades distraction elements during sustained focus
  • Chrome + Edge — Plasmo/React Manifest V3 extension with popup dashboard, one-click daemon launch (via native messaging + Terminal.app for camera access), camera restart, intervention overlay, Pulse Room new tab, and focus sessions with distraction blocking

How It Works

Webcam (30 FPS)
     │
     ▼
L1: Bio-Extraction ─── rPPG · Respiration · Blink · Pose · Telemetry
     │
     ▼  FeatureVector (500ms)
L2: State Engine ────── Fusion · Focus Graph · Scoring · Detectors
     │
     ▼  StateEstimate + stress_integral
L3: Context Engine ──── VS Code · Chrome · Terminal · Adapter Registry
     │
     ▼  TaskContext
L4: LLM Engine ──────── Azure OpenAI · Qwen-3 · Ollama · Bandit
     │
     ▼  InterventionPlan
L5: Intervention ────── Consent · Validate · Execute · Undo · Learn
     │
     ▼
Store (Redis / In-Memory)

All layers communicate via FastAPI (port 9472) and WebSocket (port 9473). The desktop shell, VS Code extension, and Chrome/Edge extension are all clients. An optional launcher agent on port 9471 provides HTTP-based daemon start/stop.


What's Inside

Directory Description
cortex/ Core engine — bio-extraction, state classification, LLM interventions, consent ladder, learning loop, v2.0 detectors, LeetCode mode, activity tracker, smart camera selection
cortex/apps/browser_extension/ Chrome + Edge extension (Plasmo/React) — one-click daemon launch/stop, intervention overlay, ambient feedback, focus sessions, LeetCode observer, activity tracker, resume cards, Pulse Room
cortex/apps/vscode_extension/ VS Code extension — context provider, code folding, morning briefing, copilot throttle
cortex/apps/desktop_shell/ PySide6 desktop app — system tray, dashboard, onboarding, settings
cortex/scripts/ Daemon entry point, native messaging host, launcher agent, calibration, install scripts

Tech Stack

Layer Technologies
Backend Python 3.11+, FastAPI, MediaPipe, OpenCV, pynput, PySide6
Browser Extension TypeScript, React, Plasmo (Manifest V3), Chrome + Edge
VS Code Extension TypeScript, VS Code Extension API
LLM Azure OpenAI, Qwen-3-8B (remote via SSH tunnel), Ollama (local)
Storage Redis 7+ with automatic in-memory fallback
Testing pytest (45 test files), mypy (strict), ruff

Quick Start

# Backend
cd /path/to/Ralph
pip install -e "./cortex[dev]"
cp cortex/.env.example .env   # Edit with your Azure OpenAI config
python -m cortex.scripts.seed_config --root .
# Chrome extension
cd cortex/apps/browser_extension
pnpm install && npx plasmo build
# Load build/chrome-mv3-prod/ as unpacked extension in chrome://extensions
# Register native messaging host (one-time, enables click-to-start from extension)
python -m cortex.scripts.install_native_host --extension-id YOUR_EXTENSION_ID
# Restart Chrome after installing

Starting the daemon:

  • From browser — click Start Cortex in the extension popup. The daemon launches via Terminal.app (which has camera permissions). A Terminal window opens briefly while the daemon runs.
  • From terminalcd /path/to/Ralph && .venv/bin/python -m cortex.scripts.run_dev

The first time you start the daemon, macOS will prompt for camera access — click Allow.

See cortex/README.md for full documentation — setup, architecture, all features, API reference, and development guide.


What To Expect

Cortex watches you through your webcam while you study — not to record you, but to read your pulse and breathing from subtle color changes in your face (a technique called remote photoplethysmography). It combines those biological signals with what's happening on your screen — which tabs are open, what errors your code is throwing, how fast you're switching between windows — to figure out whether you're in a productive flow, spiraling into overwhelm, or zoning out. When it detects you're struggling, it uses an AI model to figure out how to help: closing distraction tabs, surfacing the error fix you need, breaking your task into smaller steps, or just telling you to take a break because your body's stress accumulator says so. It also has a dedicated LeetCode mode that detects panic-coding patterns and tries to get you to slow down before you submit your fifth wrong answer in a row.

What works well today: the state classification system is conservative and well-tuned — in testing, it correctly avoids false alarms for caffeinated studying, debugging sessions, and deep reading. The biological break timer (which replaces arbitrary Pomodoro intervals with actual HRV-based fatigue tracking) is a genuinely novel feature that works as designed. The LeetCode mode's multi-selector DOM strategy is resilient to LeetCode's frequent UI changes, and the intervention matrix covers real failure modes students hit. The context-aware fallback system means you still get useful help even when the AI model is slow or unavailable. The progressive consent system lets Cortex earn your trust gradually — it starts by just observing, and only takes actions after you've approved similar ones multiple times.

Cortex asks for your webcam (for pulse and posture — no video is saved or sent anywhere), broad browser permissions (to read tab titles and URLs for context — the AI model never sees your biometrics), and a 2-minute baseline calibration session where you sit still so it can learn your resting heart rate. It runs a local daemon on your machine that communicates with a Chrome extension and optionally a VS Code extension. The AI model (Azure OpenAI, Qwen, or a local Ollama instance) sees only workspace context: file paths, error messages, and tab titles. Your physiological data stays on your machine.

Cortex is not a study planner, a to-do app, or a replacement for actually understanding the material. The heart rate signal from a webcam is noisier than a chest strap — in dim lighting or if you move a lot, the biological signals degrade and the system falls back to behavioral-only detection. The HRV measurement at 30 FPS is at the edge of what's physiologically meaningful and works best as a trend indicator over minutes, not a precise beat-by-beat measurement. The AI-generated interventions are sometimes generic or slightly off-target, especially early on before the learning system has calibrated to your preferences. And if you're the kind of student who studies past midnight, you'll want to adjust the wind-down hour from its default — it was set for an earlier bedtime than most college students keep.


Privacy

  • No video is ever saved. Frames are processed in memory and immediately discarded.
  • No biometrics reach the LLM. The model sees only workspace context: file paths, error messages, tab titles.
  • Consent-gated autonomy. No action executes without earned trust. Users control the maximum autonomy level.

License

MIT

Clone this wiki locally