Lumen

Your eyes. Your voice.

Lumen is a free, browser-based eye-gaze communication system for people who can no longer move their hands. It runs entirely on the device. No servers, no subscription, no installation. Open a URL, look at letters, speak.

It's built for the Gemma 4 Good Hackathon (Kaggle, May 2026).

Try it

Lumen has two input modes, selected on the landing screen:

Gaze mode — grants camera access, runs a 9-point calibration, then types by dwell. Intended daily-use flow for AAC users.
Pointer mode — no camera, no calibration. The cursor stands in for your gaze. Use this to review the keyboard flow, test the app for accessibility, or share a demo link with reviewers who can't grant camera access.

Word suggestions work immediately from a built-in AAC vocabulary (bigrams + prefix matching, no model needed). Optionally, load on-device Gemma for smarter suggestions: either point the URL field to a hosted .task file or pick one from disk. Gemma stays optional, not a gate.

Local dev

npm install
npm run dev     # http://localhost:5173
npm test        # unit tests (vitest)
npm run build   # production bundle into dist/

Deploy to Cloudflare Pages

npx wrangler login
npm run deploy  # builds + deploys dist/ to the "lumen" CF Pages project

On first run, wrangler will prompt to create the lumen project. After that, npm run deploy is one command. The repo includes wrangler.toml and public/_headers with the right MIME type for .wasm and long-cache rules for hashed assets.

The problem

About 30,000 Americans and roughly half a million people worldwide are living with ALS at any given time. Most will eventually lose the ability to move their hands. Tens of thousands more — survivors of brainstem stroke, severe cerebral palsy, late-stage MS, locked-in syndrome — face the same wall.

For these people, eye-gaze communication is not a convenience. It is the only way left to say I love you, I'm in pain, please don't.

The hardware that makes this possible exists. It is also priced as if it doesn't:

Tool	Price	Reach
Tobii Dynavox I‑Series	$8,000 – $20,000	Wealthy households, well-funded clinics
EyeTech VT3	$4,000 – $7,000	Specialized clinics
Apple's iPadOS 18 Eye Tracking	Free	iPad-only · cursor mover only · no AAC stack
Lumen	$0	Any device with a browser and a webcam

Globally, fewer than 2% of the people who need eye-gaze communication have access to it. The barrier isn't physics. It's pricing, software, and a market that has decided this user doesn't matter enough.

We disagree.

What Lumen does

You sit in front of a laptop or tablet. The webcam watches your eyes. A face-mesh model running in your browser reads where you're looking. A nine-point calibration learns the mapping between your eyes and your screen. After that, Lumen is yours.

You compose by looking. Stare at a letter for ~0.8 seconds and it commits. Above the keyboard, a row of word suggestions — generated by Gemma 4 running on your device — predicts what you're trying to say from what you've typed and your conversation context, so you don't have to spell every word letter by letter. When the sentence is right, you look at Speak and the device says it aloud. Or, soon, you look at Send and it texts your daughter.

Nothing leaves your machine. Your face, your gaze, your unfinished sentences, your medical context — none of it touches a server.

Why Gemma 4 is load-bearing here

Lumen is not a chatbot wrapped around an AAC keyboard. Gemma 4 does three jobs that the application would be impossible without:

Multimodal input. The system is a camera reading your face. Gemma 4's multimodal architecture is what makes it plausible to run vision-grounded reasoning on the same device that reads the iris.
Native function calling. For a user who cannot move their hand, every meaningful output is a tool call. "Text my wife." "Call the nurse." "Turn off the light." "Increase the bed height." Gemma 4's first-class function-calling support is the substrate for the agentic action layer that turns Lumen from a typewriter into a body.
On-device, offline-capable. The judges asked for impact and deployability. A medical-context AI tool that requires a cloud round-trip is a tool that fails the moment the family's WiFi drops at 2 a.m. Lumen runs entirely in WebGPU on the local machine via @mediapipe/tasks-genai. The user supplies a Gemma .task model file once; from then on, no network is required.

If you stripped any one of these properties out, Lumen wouldn't exist. The Gemma 3 generation made the on-device piece possible. Gemma 4 makes the multimodal-plus-agentic loop possible.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                         Browser tab                              │
│                                                                  │
│   webcam ──► <video> ──► MediaPipe FaceLandmarker (WebGPU)       │
│                                  │                               │
│                                  ▼                               │
│                       478 face landmarks + iris                  │
│                                  │                               │
│                          gazeEstimator                           │
│                  (image-space iris displacement,                 │
│                  averaged over both eyes, scaled)                │
│                                  │                               │
│                                  ▼                               │
│                       OneEuroFilter (1.2 Hz)                     │
│                                  │                               │
│                                  ▼                               │
│                     gazeStore  (pub/sub)                         │
│                                  │                               │
│                  ┌───────────────┼───────────────┐               │
│                  ▼               ▼               ▼               │
│           Calibration     DwellDetector    LandmarkOverlay       │
│           (9-point        (sticky 15%      (debug)               │
│            affine)        margin)                                │
│                  │               │                               │
│                  ▼               ▼                               │
│           Affine model    Keyboard / Suggestion strip            │
│                                  │                               │
│                                  ▼                               │
│                          wordPredictor                           │
│                          ┌───────┴───────┐                       │
│                          ▼               ▼                       │
│                  Gemma 4 (WebGPU)   Static fallback              │
│                  via @mediapipe/    (130-word AAC                │
│                   tasks-genai       frequency dict)              │
│                                                                  │
│                                  ▼                               │
│                       Web Speech API · TTS                       │
└──────────────────────────────────────────────────────────────────┘

Gaze pipeline

Module	Purpose
`gaze/faceLandmarker.ts`	Singleton MediaPipe FaceLandmarker, GPU-delegated
`gaze/gazeEstimator.ts`	478 landmarks → image-space iris displacement → user-perspective gaze. Includes a debug snapshot (eye box, eye center, iris center per eye) for the diagnostic overlay.
`gaze/oneEuroFilter.ts`	Casiez-Roussel-Vogel One-Euro filter — heavy smoothing during fixation, light smoothing during saccades, no perceptible lag.
`gaze/calibration.ts`	9-point calibration with affine least-squares fit via 3×3 normal equations + Cramer's rule.
`gaze/dwellDetector.ts`	Enter → ramp → fire → re-arm dwell logic with a 15% sticky-margin hit test for the currently-dwelled target (so a small wobble doesn't reset the timer).
`gaze/gazeStore.ts`	Module-level pub/sub. Publishes both filtered and unfiltered gaze so calibration can read clean samples while the keyboard reads smoothed gaze.

LLM pipeline

Module	Purpose
`llm/gemma.ts`	Lazy `LlmInference` singleton. Loads from file picker (or URL). Tracks status: idle → loading → ready / error. Wraps the Gemma chat template (`<start_of_turn>user…`).
`llm/wordPredictor.ts`	High-level facade. 220 ms debounced. Always emits a fast static result first; replaces it with Gemma when the model finishes. Cancels stale generations.
`llm/staticPredictor.ts`	130-word AAC frequency vocabulary + prefix matching. Used when Gemma isn't loaded so the keyboard is never blank.

UI

Component	Purpose
`App.tsx`	State machine: `boot → needs-calibration → calibrating → ready`
`GazeEngine.tsx`	Headless: webcam capture + rAF detection loop + push to store
`GazeDot.tsx`	Glow-tracking dot overlay
`LandmarkOverlay.tsx`	Diagnostic: draws cyan eye-boxes + iris dots so you can confirm MediaPipe is finding your eyes
`CalibrationScreen.tsx`	9-point calibration flow with settle/capture phases
`Keyboard.tsx`	Suggestion strip + 6×5 letter grid + command row
`GemmaLoader.tsx`	Header status pill + file picker for the Gemma model
`DebugHud.tsx`	Live numerical readout: face / raw / calib / conf / fps

Status

Lumen is a working prototype. Some pieces are solid; others are honestly v0.

	State
MediaPipe face mesh + iris detection	working
Gaze pipeline math (image-space, both-eye averaging, regression-tested)	working
One-Euro smoothing	working
9-point affine calibration	working — math is unit-tested against synthetic transforms
Dwell selection with sticky-margin hysteresis	working
Suggestion strip with static fallback	working
Gemma word prediction via `@mediapipe/tasks-genai`	working once you supply a model file
Web Speech API TTS for "Speak"	working
Function calling (text family / call caregiver / smart home)	not yet — task #14
Mobile / tablet UX	not yet — desktop browser only for now
Real ALS / locked-in user testing	not yet — needs a small co-design study

The honest version: the calibration and dwell math are real and tested. The one piece that may need live tuning during the demo is the iris-to-screen scale factor — the affine fit normally absorbs it, but if the dot under-shoots the corners, recalibrate. There is a diagnostic mode (cyan eye boxes + green iris dots) on the pre-calibration screen that lets you visually verify MediaPipe is locking onto your eyes before you commit to a calibration.

Tests

$ npm test

 Test Files  5 passed (5)
      Tests  33 passed (33)

Includes:

6 affine-calibration tests (recovers known transforms, stable under noise, throws on degenerate samples)
9 dwell detector tests (enter, ramp, fire, re-arm, target switch, sticky margin in/out, null gaze)
4 One-Euro filter tests (dampens microsaccades, tracks fast ramps, reset clears state)
7 gaze estimator tests with synthetic landmarks — including a regression test ("REGRESSION: horizontal motion is NOT cancelled by averaging both eyes") that guards a sign bug we hit during development
6 static predictor tests (top unigrams, prefix matching, fallback on no match)

Run it locally

git clone <this repo>
cd lumen
npm install
npm run dev
# → open http://localhost:5173/ in Chrome or Edge

You'll need a webcam. The MediaPipe face model (~3 MB) downloads from Google's CDN on first run and is then cached.

Loading Gemma (optional but recommended)

The keyboard works without Gemma — there's a static frequency-based fallback predictor — but Gemma's word prediction is much smarter. To enable it:

Download a Gemma .task model file from the LiteRT community on Hugging Face. For a desktop browser, Gemma 3 1B IT (~1 GB) or Gemma 4 IT (when available) are good choices.
In Lumen, click the Load Gemma pill in the top-right of the header.
Pick the .task file. The pill walks through Loading… → Initialising WASM runtime… → Loading Gemma into the GPU… → Gemma ready · 1.2 GB.
Type a few letters in the keyboard. The suggestion chips above the letters will switch from blue (static) to green (Gemma) as completions arrive.

Production build

npm run build
# → dist/index.html, dist/assets/index.js (~410 KB), dist/assets/index.css

Tests

npm test         # run once
npm run test:watch

Privacy

Everything runs in your browser. The webcam stream never leaves the device. The Gemma model never makes a network request once loaded. Lumen itself never phones home, never collects telemetry, never logs your typed text. There is no account.

If you load Lumen from a remote host, your browser will fetch the static HTML, JS, and CSS bundle from that host (the same as visiting any web page). Beyond that, no data leaves your machine.

Safety, scope, and what Lumen is not

This is a prototype communication aid, not a medical device. Lumen has not been clinically validated. It is not a substitute for a professional AAC evaluation, an SLP-prescribed device, or medical-grade equipment for life-critical communication.

If you or someone you love depends on AAC for safety-critical communication (calling for help, expressing pain, signaling distress), Lumen should complement, not replace, a professional setup. The right framing for the prototype is "a free, no-install, no-subscription option for the millions of people who currently have nothing."

The next step beyond this prototype is a small co-design study with three to five ALS patients and their caregivers (remote, over video) to validate the latency, the dwell threshold, and the suggestion vocabulary. That work is planned, not done.

Acknowledgments

MediaPipe — face mesh, iris landmarks, on-device LLM inference
Casiez, Roussel & Vogel (2012) — the One-Euro filter that makes the gaze dot sit still
LiteRT community — hosted Gemma .task model files for browser inference
The AAC research community, especially the Beukelman/Mirenda Augmentative & Alternative Communication literature, and every clinician who has ever taken the time to teach a patient to type with their eyes
Vite Vere Offline — proof from the Gemma 3n Impact Challenge that one developer can ship an accessibility tool that genuinely changes lives

License

MIT. Do whatever helps users.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
public		public
src		src
.gitignore		.gitignore
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
wrangler.toml		wrangler.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lumen

Try it

Local dev

Deploy to Cloudflare Pages

The problem

What Lumen does

Why Gemma 4 is load-bearing here

Architecture

Gaze pipeline

LLM pipeline

UI

Status

Tests

Run it locally

Loading Gemma (optional but recommended)

Production build

Tests

Privacy

Safety, scope, and what Lumen is not

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lumen

Try it

Local dev

Deploy to Cloudflare Pages

The problem

What Lumen does

Why Gemma 4 is load-bearing here

Architecture

Gaze pipeline

LLM pipeline

UI

Status

Tests

Run it locally

Loading Gemma (optional but recommended)

Production build

Tests

Privacy

Safety, scope, and what Lumen is not

Acknowledgments

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages