Local Teleprompter

A voice-driven teleprompter that runs entirely on your laptop.

Open a script, hit start and read out loud! The page scrolls under your voice, word by word, driven by NVIDIA's multilingual 600M-parameter speech model running locally. Read in English, Spanish, Japanese, or any of 35+ languages — the model detects the language automatically (or pin one with STT_LANGUAGE). Your voice never leaves the machine.

Quickstart

git clone <this repo>
cd local-teleprompter
./start.sh

That's it. Then open http://localhost:3000, pick a script, hit start, and read. Press Ctrl-C in the terminal to bring everything down.

On first run, start.sh will install dependencies — pulling torch and NVIDIA's NeMo toolkit takes a few minutes and several GB. Subsequent runs come up in seconds.

Why local?

Zero cloud bill. A teleprompter is a "20 minutes of mic" workload. That gets expensive fast at $0.01–0.05/min cloud STT pricing. Local is free.
No latency floor. End-of-utterance to final transcript is consistently <150ms on Apple Silicon, around 100ms on a real GPU. Word-level streaming starts within ~80ms of speech.
Privacy. Your voice and your script never leave the machine.
It's fun.

Requirements

macOS or Linux (Apple Silicon, Intel Mac, or NVIDIA Linux)
Python 3.10+
uv — install with curl -LsSf https://astral.sh/uv/install.sh | sh
Node 20+ and pnpm — npm i -g pnpm or brew install pnpm
Homebrew on macOS (used to install one binary if it isn't already on PATH)

A CUDA GPU is not required. On Apple Silicon the model runs on MPS automatically. CPU also works, just slower. GPU does work best though.

Scripts

Scripts are stored in a local SQLite database at frontend/data/scripts.db. The DB is created on first launch and seeded with a sample script. You can:

Create, edit, and delete scripts in the library view.
Import any Markdown or plain text file (.md, .markdown, .txt). The first # heading becomes the title; the rest becomes the body.
Export any script as Markdown.

The DB is just a regular SQLite file — open it with any SQLite tool, back it up by copying it, or move it elsewhere with SCRIPTS_DB_PATH=/path/to/scripts.db.

How the cursor follows your voice

Speech recognition gives you words. Turning those into the right script position — even when you stumble, skip a sentence, or re-read a paragraph — is the interesting part. The matcher lives in frontend/lib/teleprompter/position-tracker.ts:

Forward-only under normal flow. Each new spoken word scans an 18-word lookahead from the current cursor.
Bigram confirmation for non-local jumps. A far match (3–17 words ahead) only commits if the previous spoken word also confirmed nearby — stops stopwords ("the", "is", "of") from yanking the cursor ten words ahead.
Tightly-scoped fuzzy matching. Levenshtein-1, but only for words ≥ 5 characters. Short words must match exactly, so the ↔ then doesn't sneak through.
Auto re-anchor. After 4 unmatched words in a row, scan a 6-word trailing window globally; commit a jump if ≥ 3 words align.
Double-click a word to manually snap the cursor there. Pause freezes it; resume picks up wherever you start reading again.

If the matcher feels off for your style, the constants at the top of position-tracker.ts are the levers:

Constant	Default	What it does
`DEFAULT_LOOKAHEAD`	18	How many script words ahead a single spoken word can jump
`NEAR_JUMP`	2	Small jumps that don't need bigram confirmation
`BIGRAM_WINDOW`	3	How far back the prior-word confirmation looks
`REANCHOR_MISS_THRESHOLD`	4	Unmatched words in a row before re-anchor kicks in
`REANCHOR_WINDOW`	6	Trailing spoken-word window for re-anchoring
`REANCHOR_MIN_MATCHES`	3	Matches required to commit a re-anchor

If you stall on mumbled words, bump NEAR_JUMP to 3. If the cursor jumps too aggressively, lower DEFAULT_LOOKAHEAD or raise REANCHOR_MIN_MATCHES.

Tinkering

Most of the interesting parts:

frontend/lib/teleprompter/position-tracker.ts — the matching algorithm.
frontend/components/teleprompter/ — library, editor, and prompter UI.
stt-server/server.py — the speech-to-text server.
agent/src/ — the glue that hands audio to the model and transcripts to the browser.

Credits

Speech model: NVIDIA's nemotron-3.5-asr-streaming-0.6b, a multilingual (35+ languages) cache-aware FastConformer-RNNT.
STT server scaffold: fastapi-nemotron-speech-streaming.
Frontend starter: livekit-examples/agent-starter-react.

License

MIT — do whatever you want with it.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agent		agent
assets		assets
frontend		frontend
livekit-server		livekit-server
stt-server		stt-server
.gitignore		.gitignore
README.md		README.md
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Teleprompter

Quickstart

Why local?

Requirements

Scripts

How the cursor follows your voice

Tinkering

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local Teleprompter

Quickstart

Why local?

Requirements

Scripts

How the cursor follows your voice

Tinkering

Credits

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages