Skip to content

Abhi183/lumen

Repository files navigation

Lumen

Your eyes. Your voice.

Lumen is a free, browser-based eye-gaze communication system for people who can no longer move their hands. It runs entirely on the device. No servers, no subscription, no installation. Open a URL, look at letters, speak.

It's built for the Gemma 4 Good Hackathon (Kaggle, May 2026).

Try it

Lumen has two input modes, selected on the landing screen:

  • Gaze mode — grants camera access, runs a 9-point calibration, then types by dwell. Intended daily-use flow for AAC users.
  • Pointer mode — no camera, no calibration. The cursor stands in for your gaze. Use this to review the keyboard flow, test the app for accessibility, or share a demo link with reviewers who can't grant camera access.

Word suggestions work immediately from a built-in AAC vocabulary (bigrams + prefix matching, no model needed). Optionally, load on-device Gemma for smarter suggestions: either point the URL field to a hosted .task file or pick one from disk. Gemma stays optional, not a gate.

Local dev

npm install
npm run dev     # http://localhost:5173
npm test        # unit tests (vitest)
npm run build   # production bundle into dist/

Deploy to Cloudflare Pages

npx wrangler login
npm run deploy  # builds + deploys dist/ to the "lumen" CF Pages project

On first run, wrangler will prompt to create the lumen project. After that, npm run deploy is one command. The repo includes wrangler.toml and public/_headers with the right MIME type for .wasm and long-cache rules for hashed assets.


The problem

About 30,000 Americans and roughly half a million people worldwide are living with ALS at any given time. Most will eventually lose the ability to move their hands. Tens of thousands more — survivors of brainstem stroke, severe cerebral palsy, late-stage MS, locked-in syndrome — face the same wall.

For these people, eye-gaze communication is not a convenience. It is the only way left to say I love you, I'm in pain, please don't.

The hardware that makes this possible exists. It is also priced as if it doesn't:

Tool Price Reach
Tobii Dynavox I‑Series $8,000 – $20,000 Wealthy households, well-funded clinics
EyeTech VT3 $4,000 – $7,000 Specialized clinics
Apple's iPadOS 18 Eye Tracking Free iPad-only · cursor mover only · no AAC stack
Lumen $0 Any device with a browser and a webcam

Globally, fewer than 2% of the people who need eye-gaze communication have access to it. The barrier isn't physics. It's pricing, software, and a market that has decided this user doesn't matter enough.

We disagree.


What Lumen does

You sit in front of a laptop or tablet. The webcam watches your eyes. A face-mesh model running in your browser reads where you're looking. A nine-point calibration learns the mapping between your eyes and your screen. After that, Lumen is yours.

You compose by looking. Stare at a letter for ~0.8 seconds and it commits. Above the keyboard, a row of word suggestions — generated by Gemma 4 running on your device — predicts what you're trying to say from what you've typed and your conversation context, so you don't have to spell every word letter by letter. When the sentence is right, you look at Speak and the device says it aloud. Or, soon, you look at Send and it texts your daughter.

Nothing leaves your machine. Your face, your gaze, your unfinished sentences, your medical context — none of it touches a server.


Why Gemma 4 is load-bearing here

Lumen is not a chatbot wrapped around an AAC keyboard. Gemma 4 does three jobs that the application would be impossible without:

  1. Multimodal input. The system is a camera reading your face. Gemma 4's multimodal architecture is what makes it plausible to run vision-grounded reasoning on the same device that reads the iris.

  2. Native function calling. For a user who cannot move their hand, every meaningful output is a tool call. "Text my wife." "Call the nurse." "Turn off the light." "Increase the bed height." Gemma 4's first-class function-calling support is the substrate for the agentic action layer that turns Lumen from a typewriter into a body.

  3. On-device, offline-capable. The judges asked for impact and deployability. A medical-context AI tool that requires a cloud round-trip is a tool that fails the moment the family's WiFi drops at 2 a.m. Lumen runs entirely in WebGPU on the local machine via @mediapipe/tasks-genai. The user supplies a Gemma .task model file once; from then on, no network is required.

If you stripped any one of these properties out, Lumen wouldn't exist. The Gemma 3 generation made the on-device piece possible. Gemma 4 makes the multimodal-plus-agentic loop possible.


Architecture

┌──────────────────────────────────────────────────────────────────┐
│                         Browser tab                              │
│                                                                  │
│   webcam ──► <video> ──► MediaPipe FaceLandmarker (WebGPU)       │
│                                  │                               │
│                                  ▼                               │
│                       478 face landmarks + iris                  │
│                                  │                               │
│                          gazeEstimator                           │
│                  (image-space iris displacement,                 │
│                  averaged over both eyes, scaled)                │
│                                  │                               │
│                                  ▼                               │
│                       OneEuroFilter (1.2 Hz)                     │
│                                  │                               │
│                                  ▼                               │
│                     gazeStore  (pub/sub)                         │
│                                  │                               │
│                  ┌───────────────┼───────────────┐               │
│                  ▼               ▼               ▼               │
│           Calibration     DwellDetector    LandmarkOverlay       │
│           (9-point        (sticky 15%      (debug)               │
│            affine)        margin)                                │
│                  │               │                               │
│                  ▼               ▼                               │
│           Affine model    Keyboard / Suggestion strip            │
│                                  │                               │
│                                  ▼                               │
│                          wordPredictor                           │
│                          ┌───────┴───────┐                       │
│                          ▼               ▼                       │
│                  Gemma 4 (WebGPU)   Static fallback              │
│                  via @mediapipe/    (130-word AAC                │
│                   tasks-genai       frequency dict)              │
│                                                                  │
│                                  ▼                               │
│                       Web Speech API · TTS                       │
└──────────────────────────────────────────────────────────────────┘

Gaze pipeline

Module Purpose
gaze/faceLandmarker.ts Singleton MediaPipe FaceLandmarker, GPU-delegated
gaze/gazeEstimator.ts 478 landmarks → image-space iris displacement → user-perspective gaze. Includes a debug snapshot (eye box, eye center, iris center per eye) for the diagnostic overlay.
gaze/oneEuroFilter.ts Casiez-Roussel-Vogel One-Euro filter — heavy smoothing during fixation, light smoothing during saccades, no perceptible lag.
gaze/calibration.ts 9-point calibration with affine least-squares fit via 3×3 normal equations + Cramer's rule.
gaze/dwellDetector.ts Enter → ramp → fire → re-arm dwell logic with a 15% sticky-margin hit test for the currently-dwelled target (so a small wobble doesn't reset the timer).
gaze/gazeStore.ts Module-level pub/sub. Publishes both filtered and unfiltered gaze so calibration can read clean samples while the keyboard reads smoothed gaze.

LLM pipeline

Module Purpose
llm/gemma.ts Lazy LlmInference singleton. Loads from file picker (or URL). Tracks status: idle → loading → ready / error. Wraps the Gemma chat template (<start_of_turn>user…).
llm/wordPredictor.ts High-level facade. 220 ms debounced. Always emits a fast static result first; replaces it with Gemma when the model finishes. Cancels stale generations.
llm/staticPredictor.ts 130-word AAC frequency vocabulary + prefix matching. Used when Gemma isn't loaded so the keyboard is never blank.

UI

Component Purpose
App.tsx State machine: boot → needs-calibration → calibrating → ready
GazeEngine.tsx Headless: webcam capture + rAF detection loop + push to store
GazeDot.tsx Glow-tracking dot overlay
LandmarkOverlay.tsx Diagnostic: draws cyan eye-boxes + iris dots so you can confirm MediaPipe is finding your eyes
CalibrationScreen.tsx 9-point calibration flow with settle/capture phases
Keyboard.tsx Suggestion strip + 6×5 letter grid + command row
GemmaLoader.tsx Header status pill + file picker for the Gemma model
DebugHud.tsx Live numerical readout: face / raw / calib / conf / fps

Status

Lumen is a working prototype. Some pieces are solid; others are honestly v0.

State
MediaPipe face mesh + iris detection working
Gaze pipeline math (image-space, both-eye averaging, regression-tested) working
One-Euro smoothing working
9-point affine calibration working — math is unit-tested against synthetic transforms
Dwell selection with sticky-margin hysteresis working
Suggestion strip with static fallback working
Gemma word prediction via @mediapipe/tasks-genai working once you supply a model file
Web Speech API TTS for "Speak" working
Function calling (text family / call caregiver / smart home) not yet — task #14
Mobile / tablet UX not yet — desktop browser only for now
Real ALS / locked-in user testing not yet — needs a small co-design study

The honest version: the calibration and dwell math are real and tested. The one piece that may need live tuning during the demo is the iris-to-screen scale factor — the affine fit normally absorbs it, but if the dot under-shoots the corners, recalibrate. There is a diagnostic mode (cyan eye boxes + green iris dots) on the pre-calibration screen that lets you visually verify MediaPipe is locking onto your eyes before you commit to a calibration.

Tests

$ npm test

 Test Files  5 passed (5)
      Tests  33 passed (33)

Includes:

  • 6 affine-calibration tests (recovers known transforms, stable under noise, throws on degenerate samples)
  • 9 dwell detector tests (enter, ramp, fire, re-arm, target switch, sticky margin in/out, null gaze)
  • 4 One-Euro filter tests (dampens microsaccades, tracks fast ramps, reset clears state)
  • 7 gaze estimator tests with synthetic landmarks — including a regression test ("REGRESSION: horizontal motion is NOT cancelled by averaging both eyes") that guards a sign bug we hit during development
  • 6 static predictor tests (top unigrams, prefix matching, fallback on no match)

Run it locally

git clone <this repo>
cd lumen
npm install
npm run dev
# → open http://localhost:5173/ in Chrome or Edge

You'll need a webcam. The MediaPipe face model (~3 MB) downloads from Google's CDN on first run and is then cached.

Loading Gemma (optional but recommended)

The keyboard works without Gemma — there's a static frequency-based fallback predictor — but Gemma's word prediction is much smarter. To enable it:

  1. Download a Gemma .task model file from the LiteRT community on Hugging Face. For a desktop browser, Gemma 3 1B IT (~1 GB) or Gemma 4 IT (when available) are good choices.
  2. In Lumen, click the Load Gemma pill in the top-right of the header.
  3. Pick the .task file. The pill walks through Loading…Initialising WASM runtime…Loading Gemma into the GPU…Gemma ready · 1.2 GB.
  4. Type a few letters in the keyboard. The suggestion chips above the letters will switch from blue (static) to green (Gemma) as completions arrive.

Production build

npm run build
# → dist/index.html, dist/assets/index.js (~410 KB), dist/assets/index.css

Tests

npm test         # run once
npm run test:watch

Privacy

Everything runs in your browser. The webcam stream never leaves the device. The Gemma model never makes a network request once loaded. Lumen itself never phones home, never collects telemetry, never logs your typed text. There is no account.

If you load Lumen from a remote host, your browser will fetch the static HTML, JS, and CSS bundle from that host (the same as visiting any web page). Beyond that, no data leaves your machine.


Safety, scope, and what Lumen is not

This is a prototype communication aid, not a medical device. Lumen has not been clinically validated. It is not a substitute for a professional AAC evaluation, an SLP-prescribed device, or medical-grade equipment for life-critical communication.

If you or someone you love depends on AAC for safety-critical communication (calling for help, expressing pain, signaling distress), Lumen should complement, not replace, a professional setup. The right framing for the prototype is "a free, no-install, no-subscription option for the millions of people who currently have nothing."

The next step beyond this prototype is a small co-design study with three to five ALS patients and their caregivers (remote, over video) to validate the latency, the dwell threshold, and the suggestion vocabulary. That work is planned, not done.


Acknowledgments

  • MediaPipe — face mesh, iris landmarks, on-device LLM inference
  • Casiez, Roussel & Vogel (2012) — the One-Euro filter that makes the gaze dot sit still
  • LiteRT community — hosted Gemma .task model files for browser inference
  • The AAC research community, especially the Beukelman/Mirenda Augmentative & Alternative Communication literature, and every clinician who has ever taken the time to teach a patient to type with their eyes
  • Vite Vere Offline — proof from the Gemma 3n Impact Challenge that one developer can ship an accessibility tool that genuinely changes lives

License

MIT. Do whatever helps users.

About

Free browser-based eye-gaze AAC with on-device Gemma word prediction. Built for the Gemma 4 Good Hackathon (May 2026). Runs entirely client-side, no server.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages