SignFlow is a Chrome extension + API stack that listens to web-conference audio, streams it to an AI-driven backend, and overlays sign-language animations on top of any tab so Deaf or hard-of-hearing participants can follow the conversation in real time.
The project currently contains:
- Extension frontend (manifest v3) –
manifest.json,popup.*,contentScript.js,overlay.css,service-worker.js, and assets underassets/. - Backend service –
backend/Node 18+ app powered by Express, Gemini APIs, and Qdrant-compatible sign lookup logic. - Demo assets - placeholder icon PNGs and WebM sign clips for 10 core glosses (HELLO, TODAY, MEETING, TEAM, PROJECT, QUESTION, HELP, THANK-YOU, GOOD, LATER) to visualize the experience before the AI pipeline is fully integrated. These clips are mirrored to Firebase Storage so the backend can return HTTPS URLs.
Use this README as the canonical reference for architecture, file locations, and remaining work for future contributors.
SignFlow Browser extension/
├── assets/
│ ├── icons/icon-{16,48,128}.png # Extension action icons
│ └── signs/*.webm # Demo sign animations
├── backend/
│ ├── data/signGlosses.json # Local fallback catalogue of gloss metadata
│ ├── src/
│ │ ├── config.js # Env + port + external service config
│ │ ├── logger.js # Pino with pretty logging in dev
│ │ ├── routes/signflowRoutes.js # /transcribe, /translate, /sign-sequence endpoints
│ │ └── services/ # Gemini + Qdrant clients and sequencing pipeline
│ ├── .env.example # Reference environment variables
│ └── README.md # Backend-specific setup docs
├── contentScript.js # Injected overlay + Web Audio capture
├── overlay.css # Styling for floating video box
├── manifest.json # MV3 definition
├── popup.html / popup.css / popup.js # Control UI for enabling/disabling SignFlow
└── service-worker.js # Extension background logic + backend bridge
| Area | Details | Files |
|---|---|---|
| MV3 wiring | manifest, action icons, popup, content script, background worker, CSP-safe assets | manifest.json, assets/ |
| Popup UI | Toggle switch, status indicators (idle/pending/listening/streaming/error), chunk counter, last gloss list, overlay reset, toast messaging | popup.html, popup.css, popup.js |
| Audio capture | Web Audio microphone capture, MediaRecorder chunking, analyser-based level meter, permission handling | contentScript.js |
| Streaming | Each audio chunk is encoded to base64 and posted to /api/v1/sign-sequence (configurable API base), with automatic fallback to deterministic demo glosses if the backend errors |
contentScript.js, service-worker.js |
| Overlay | Draggable floating video card with live captions, status header, mic level visual, and per-gloss video playback sourcing either bundled assets or backend URLs. Size presets (S/M/L) are applied instantly and synced via storage. | contentScript.js, overlay.css, assets/signs/ |
| Backend + overlay controls | Popup lets you set/test the backend base URL, adjust the overlay size preset, and surfaces the last error message whenever the API fails, so QA/devs can switch environments without rebuilding | popup.html, popup.js, service-worker.js |
- Popup toggle (
popup.js) sendspopup:togglemessages toservice-worker.js. - Service worker injects
contentScript.jsif absent, updates state inchrome.storage, and instructs the content script to start or stop capturing. - Content script uses
MediaRecorderto emit ~750 ms chunks -> base64 ->content:audio-chunkto the service worker. - Service worker posts each chunk to
POST {backendEndpoint}/sign-sequence(endpoint defaults tohttp://localhost:5055/api/v1but can be edited/tested from the popup) and receives:{ "transcript": "...", "glossSequence": ["HELLO","TODAY","MEETING"], "videos": [ {"gloss":"HELLO","videoUrl":"https://.../hello.webm", ...} ] } - The content script receives
background:play-signswith eithervideos(preferred) orglossesand plays the clips inside the overlay.
- Production sign media – replace the placeholder WebM clips with the Grok-generated catalogue (40–50 clips) and refine compression/looping.
- Permissions polishing – consider microphone capture persistence (offscreen document) to survive tab refreshes without requiring the popup toggle each time.
- Advanced overlay personalization – add theme/opacity presets and optional captions for transcripts when backend returns them.
See backend/README.md for setup, but here is the quick summary.
- Express 5 for HTTP routing (
src/index.js). - Pino for structured logging with
pino-prettyin development. - Google Generative AI client for:
transcribeAudio– speech-to-text withgemini-2.0-flash(configurable).simplifySentence– prompts for ASL-friendly keywords/gloss sequences as JSON (defaultgemini-2.0-flash). Both methods include deterministic fallbacks whenGEMINI_API_KEYis not set.
- Qdrant (optional) for vector search over gloss embeddings. When absent, the
SignRepositoryfalls back todata/signGlosses.jsonand a keyword-overlap heuristic. - Firebase Storage bucket (default
pak-drive.appspot.com) hosts the sign animation videos/CDN; scripts/uploadAssets.js pushes local assets and makes them public. - CDN abstraction via
SIGNFLOW_CDN_BASE_URLso video URLs can be served from Firebase Storage, CloudFront, etc.
| Endpoint | Body | Response | Purpose |
|---|---|---|---|
POST /api/v1/transcribe |
{ audioBase64, mimeType, locale } |
{ text, locale, confidence, provider } |
Direct transcription (used if another client wants STT only). |
POST /api/v1/translate |
{ text } |
{ normalizedText, keywords, glossSequence, provider } |
Text-only simplification/gloss extraction. |
POST /api/v1/sign-sequence |
{ audioBase64?, transcript?, mimeType?, locale? } |
{ transcript?, normalizedText, keywords, glossSequence, videos[], providers } |
Full pipeline – if audioBase64 exists it calls STT first, otherwise uses the provided transcript. |
cd backend
npm install
cp .env.example .env # already populated in repo; adjust if deploying elsewhere
npm run sync:assets # uploads assets/signs to Firebase Storage
npm run sync:qdrant # creates & upserts into the Qdrant collection
npm run dev # starts on http://localhost:5055service-worker.js expects the base URL http://localhost:5055/api/v1; after deploying, set the popup “Backend endpoint” to your public URL.
- Real Gemini integration – update
.envwith valid keys. The code is production-ready but currently works through mock transcripts/glosses until keys are provided. - Qdrant collection – create the
signflow_signscollection (or updateQDRANT_COLLECTION), ingest embeddings for the 40–50 AI-generated sign clips, and verify theSignRepositoryreturns robust matches. Ingestion scripts are not yet included. - Video CDN – populate
SIGNFLOW_CDN_BASE_URLwith the actual storage path for the MP4/WebM assets generated by the AI/ML phase. The frontend will automatically load those URLs when they are returned by the API. - Security tightening – add API authentication (e.g., bearer token) once the service is exposed beyond localhost.
- Latency tuning – consider batching audio chunks (WebSocket streaming) once backend infrastructure is ready; the current REST call per chunk is acceptable for the MVP but not optimal for production.
- run
npm run lintinsidebackend/for syntax errors. - Load the extension as an unpacked Chrome extension:
chrome://extensions→ enable Developer Mode → “Load unpacked” → select this folder.- Open the popup, toggle ON, allow microphone access.
- Set the backend endpoint under “Backend endpoint” (defaults to
http://localhost:5055/api/v1) and hit Test to confirm connectivity. - Pick an overlay size preset that works with your call layout (Small/Medium/Large).
- With the backend running, observe requests to
/api/v1/sign-sequencein DevTools → Network. - Use the “Show overlay again” button if the draggable card is lost.
- Frontend – start from
manifest.jsonto understand file wiring. UI code is vanilla JS/HTML/CSS to keep hackathon setup simple; swapping to a framework is possible but not planned for the MVP. - Backend –
SignPipelineorchestrates transcription → simplification → video mapping; improve each layer independently (e.g., plug in better translation prompts, add caching, etc.). - AI/ML Workstream – use
backend/data/signGlosses.jsonas a schema reference for the Grok-generated dataset. Embeddings should flow into Qdrant and the CDN should matchvideoFilenames. - Open Issues
- Multi-language support: add locale detection and translation to/from ASL gloss.
- Bi-directional signing: plan a webcam-to-text pipeline (see proposal’s future scope).
- Monitoring: add observability (trace IDs, metrics) once deployed.
Feel free to open an issue or create a PR when picking up any of the above tasks so the team can coordinate work across Frontend, Backend, and AI tracks.