Your eyes. Your voice.
Lumen is a free, browser-based eye-gaze communication system for people who can no longer move their hands. It runs entirely on the device. No servers, no subscription, no installation. Open a URL, look at letters, speak.
It's built for the Gemma 4 Good Hackathon (Kaggle, May 2026).
Lumen has two input modes, selected on the landing screen:
- Gaze mode — grants camera access, runs a 9-point calibration, then types by dwell. Intended daily-use flow for AAC users.
- Pointer mode — no camera, no calibration. The cursor stands in for your gaze. Use this to review the keyboard flow, test the app for accessibility, or share a demo link with reviewers who can't grant camera access.
Word suggestions work immediately from a built-in AAC vocabulary (bigrams + prefix matching, no model needed). Optionally, load on-device Gemma for smarter suggestions: either point the URL field to a hosted .task file or pick one from disk. Gemma stays optional, not a gate.
npm install
npm run dev # http://localhost:5173
npm test # unit tests (vitest)
npm run build # production bundle into dist/npx wrangler login
npm run deploy # builds + deploys dist/ to the "lumen" CF Pages projectOn first run, wrangler will prompt to create the lumen project. After that, npm run deploy is one command. The repo includes wrangler.toml and public/_headers with the right MIME type for .wasm and long-cache rules for hashed assets.
About 30,000 Americans and roughly half a million people worldwide are living with ALS at any given time. Most will eventually lose the ability to move their hands. Tens of thousands more — survivors of brainstem stroke, severe cerebral palsy, late-stage MS, locked-in syndrome — face the same wall.
For these people, eye-gaze communication is not a convenience. It is the only way left to say I love you, I'm in pain, please don't.
The hardware that makes this possible exists. It is also priced as if it doesn't:
| Tool | Price | Reach |
|---|---|---|
| Tobii Dynavox I‑Series | $8,000 – $20,000 | Wealthy households, well-funded clinics |
| EyeTech VT3 | $4,000 – $7,000 | Specialized clinics |
| Apple's iPadOS 18 Eye Tracking | Free | iPad-only · cursor mover only · no AAC stack |
| Lumen | $0 | Any device with a browser and a webcam |
Globally, fewer than 2% of the people who need eye-gaze communication have access to it. The barrier isn't physics. It's pricing, software, and a market that has decided this user doesn't matter enough.
We disagree.
You sit in front of a laptop or tablet. The webcam watches your eyes. A face-mesh model running in your browser reads where you're looking. A nine-point calibration learns the mapping between your eyes and your screen. After that, Lumen is yours.
You compose by looking. Stare at a letter for ~0.8 seconds and it commits. Above the keyboard, a row of word suggestions — generated by Gemma 4 running on your device — predicts what you're trying to say from what you've typed and your conversation context, so you don't have to spell every word letter by letter. When the sentence is right, you look at Speak and the device says it aloud. Or, soon, you look at Send and it texts your daughter.
Nothing leaves your machine. Your face, your gaze, your unfinished sentences, your medical context — none of it touches a server.
Lumen is not a chatbot wrapped around an AAC keyboard. Gemma 4 does three jobs that the application would be impossible without:
-
Multimodal input. The system is a camera reading your face. Gemma 4's multimodal architecture is what makes it plausible to run vision-grounded reasoning on the same device that reads the iris.
-
Native function calling. For a user who cannot move their hand, every meaningful output is a tool call. "Text my wife." "Call the nurse." "Turn off the light." "Increase the bed height." Gemma 4's first-class function-calling support is the substrate for the agentic action layer that turns Lumen from a typewriter into a body.
-
On-device, offline-capable. The judges asked for impact and deployability. A medical-context AI tool that requires a cloud round-trip is a tool that fails the moment the family's WiFi drops at 2 a.m. Lumen runs entirely in WebGPU on the local machine via
@mediapipe/tasks-genai. The user supplies a Gemma.taskmodel file once; from then on, no network is required.
If you stripped any one of these properties out, Lumen wouldn't exist. The Gemma 3 generation made the on-device piece possible. Gemma 4 makes the multimodal-plus-agentic loop possible.
┌──────────────────────────────────────────────────────────────────┐
│ Browser tab │
│ │
│ webcam ──► <video> ──► MediaPipe FaceLandmarker (WebGPU) │
│ │ │
│ ▼ │
│ 478 face landmarks + iris │
│ │ │
│ gazeEstimator │
│ (image-space iris displacement, │
│ averaged over both eyes, scaled) │
│ │ │
│ ▼ │
│ OneEuroFilter (1.2 Hz) │
│ │ │
│ ▼ │
│ gazeStore (pub/sub) │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ Calibration DwellDetector LandmarkOverlay │
│ (9-point (sticky 15% (debug) │
│ affine) margin) │
│ │ │ │
│ ▼ ▼ │
│ Affine model Keyboard / Suggestion strip │
│ │ │
│ ▼ │
│ wordPredictor │
│ ┌───────┴───────┐ │
│ ▼ ▼ │
│ Gemma 4 (WebGPU) Static fallback │
│ via @mediapipe/ (130-word AAC │
│ tasks-genai frequency dict) │
│ │
│ ▼ │
│ Web Speech API · TTS │
└──────────────────────────────────────────────────────────────────┘
| Module | Purpose |
|---|---|
gaze/faceLandmarker.ts |
Singleton MediaPipe FaceLandmarker, GPU-delegated |
gaze/gazeEstimator.ts |
478 landmarks → image-space iris displacement → user-perspective gaze. Includes a debug snapshot (eye box, eye center, iris center per eye) for the diagnostic overlay. |
gaze/oneEuroFilter.ts |
Casiez-Roussel-Vogel One-Euro filter — heavy smoothing during fixation, light smoothing during saccades, no perceptible lag. |
gaze/calibration.ts |
9-point calibration with affine least-squares fit via 3×3 normal equations + Cramer's rule. |
gaze/dwellDetector.ts |
Enter → ramp → fire → re-arm dwell logic with a 15% sticky-margin hit test for the currently-dwelled target (so a small wobble doesn't reset the timer). |
gaze/gazeStore.ts |
Module-level pub/sub. Publishes both filtered and unfiltered gaze so calibration can read clean samples while the keyboard reads smoothed gaze. |
| Module | Purpose |
|---|---|
llm/gemma.ts |
Lazy LlmInference singleton. Loads from file picker (or URL). Tracks status: idle → loading → ready / error. Wraps the Gemma chat template (<start_of_turn>user…). |
llm/wordPredictor.ts |
High-level facade. 220 ms debounced. Always emits a fast static result first; replaces it with Gemma when the model finishes. Cancels stale generations. |
llm/staticPredictor.ts |
130-word AAC frequency vocabulary + prefix matching. Used when Gemma isn't loaded so the keyboard is never blank. |
| Component | Purpose |
|---|---|
App.tsx |
State machine: boot → needs-calibration → calibrating → ready |
GazeEngine.tsx |
Headless: webcam capture + rAF detection loop + push to store |
GazeDot.tsx |
Glow-tracking dot overlay |
LandmarkOverlay.tsx |
Diagnostic: draws cyan eye-boxes + iris dots so you can confirm MediaPipe is finding your eyes |
CalibrationScreen.tsx |
9-point calibration flow with settle/capture phases |
Keyboard.tsx |
Suggestion strip + 6×5 letter grid + command row |
GemmaLoader.tsx |
Header status pill + file picker for the Gemma model |
DebugHud.tsx |
Live numerical readout: face / raw / calib / conf / fps |
Lumen is a working prototype. Some pieces are solid; others are honestly v0.
| State | |
|---|---|
| MediaPipe face mesh + iris detection | working |
| Gaze pipeline math (image-space, both-eye averaging, regression-tested) | working |
| One-Euro smoothing | working |
| 9-point affine calibration | working — math is unit-tested against synthetic transforms |
| Dwell selection with sticky-margin hysteresis | working |
| Suggestion strip with static fallback | working |
Gemma word prediction via @mediapipe/tasks-genai |
working once you supply a model file |
| Web Speech API TTS for "Speak" | working |
| Function calling (text family / call caregiver / smart home) | not yet — task #14 |
| Mobile / tablet UX | not yet — desktop browser only for now |
| Real ALS / locked-in user testing | not yet — needs a small co-design study |
The honest version: the calibration and dwell math are real and tested. The one piece that may need live tuning during the demo is the iris-to-screen scale factor — the affine fit normally absorbs it, but if the dot under-shoots the corners, recalibrate. There is a diagnostic mode (cyan eye boxes + green iris dots) on the pre-calibration screen that lets you visually verify MediaPipe is locking onto your eyes before you commit to a calibration.
$ npm test
Test Files 5 passed (5)
Tests 33 passed (33)
Includes:
- 6 affine-calibration tests (recovers known transforms, stable under noise, throws on degenerate samples)
- 9 dwell detector tests (enter, ramp, fire, re-arm, target switch, sticky margin in/out, null gaze)
- 4 One-Euro filter tests (dampens microsaccades, tracks fast ramps, reset clears state)
- 7 gaze estimator tests with synthetic landmarks — including a regression test (
"REGRESSION: horizontal motion is NOT cancelled by averaging both eyes") that guards a sign bug we hit during development - 6 static predictor tests (top unigrams, prefix matching, fallback on no match)
git clone <this repo>
cd lumen
npm install
npm run dev
# → open http://localhost:5173/ in Chrome or EdgeYou'll need a webcam. The MediaPipe face model (~3 MB) downloads from Google's CDN on first run and is then cached.
The keyboard works without Gemma — there's a static frequency-based fallback predictor — but Gemma's word prediction is much smarter. To enable it:
- Download a Gemma
.taskmodel file from the LiteRT community on Hugging Face. For a desktop browser, Gemma 3 1B IT (~1 GB) or Gemma 4 IT (when available) are good choices. - In Lumen, click the Load Gemma pill in the top-right of the header.
- Pick the
.taskfile. The pill walks through Loading… → Initialising WASM runtime… → Loading Gemma into the GPU… → Gemma ready · 1.2 GB. - Type a few letters in the keyboard. The suggestion chips above the letters will switch from blue (static) to green (Gemma) as completions arrive.
npm run build
# → dist/index.html, dist/assets/index.js (~410 KB), dist/assets/index.cssnpm test # run once
npm run test:watchEverything runs in your browser. The webcam stream never leaves the device. The Gemma model never makes a network request once loaded. Lumen itself never phones home, never collects telemetry, never logs your typed text. There is no account.
If you load Lumen from a remote host, your browser will fetch the static HTML, JS, and CSS bundle from that host (the same as visiting any web page). Beyond that, no data leaves your machine.
This is a prototype communication aid, not a medical device. Lumen has not been clinically validated. It is not a substitute for a professional AAC evaluation, an SLP-prescribed device, or medical-grade equipment for life-critical communication.
If you or someone you love depends on AAC for safety-critical communication (calling for help, expressing pain, signaling distress), Lumen should complement, not replace, a professional setup. The right framing for the prototype is "a free, no-install, no-subscription option for the millions of people who currently have nothing."
The next step beyond this prototype is a small co-design study with three to five ALS patients and their caregivers (remote, over video) to validate the latency, the dwell threshold, and the suggestion vocabulary. That work is planned, not done.
- MediaPipe — face mesh, iris landmarks, on-device LLM inference
- Casiez, Roussel & Vogel (2012) — the One-Euro filter that makes the gaze dot sit still
- LiteRT community — hosted Gemma
.taskmodel files for browser inference - The AAC research community, especially the Beukelman/Mirenda Augmentative & Alternative Communication literature, and every clinician who has ever taken the time to teach a patient to type with their eyes
- Vite Vere Offline — proof from the Gemma 3n Impact Challenge that one developer can ship an accessibility tool that genuinely changes lives
MIT. Do whatever helps users.