Beta — This project is under active development. Expect rough edges and breaking changes.
A macOS desktop app for real-time speech-to-text transcription using local Whisper or NVIDIA Parakeet models. Everything runs on-device — no data leaves your machine — with optional cloud API integration for AI-powered summaries.
Transcripto captures both microphone and system audio simultaneously, transcribes live recordings and video files, detects speech via a custom Voice Activity Detection (VAD) engine, and generates real-time AI summaries of your transcripts. The core transcription runs locally using whisper.node or NVIDIA Parakeet models, with optional speaker diarization powered by sherpa-onnx.
- Local transcription — runs Whisper or NVIDIA Parakeet GGML models entirely on-device
- Dual transcription engine — choose between Whisper or NVIDIA Parakeet-TDT-0.6B-v3 for transcription
- Dual audio capture — records microphone and system audio (Zoom, Teams, etc.) simultaneously
- File transcription — transcribe audio and video files (MP4, MKV, MOV, AVI, WebM, FLV, WMV, and more) with automatic audio extraction via ffmpeg
- Real-time VAD — custom voice activity detection with configurable sensitivity and silence thresholds
- Multi-language — supports all languages available in Whisper and Parakeet models
- Live AI summaries — real-time transcript summarization during recording (via OpenAI-compatible API)
- Summary refinement — add corrections and notes to improve live summaries in real-time
- Post-recording summaries — generate comprehensive summaries after transcription completes
- Customizable summary templates — control summary format with customizable prompt templates
- Speaker diarization — optional speaker identification and labeling via sherpa-onnx
- Inline transcript editing — edit and refine individual segments and speaker names on the fly
- Audio waveform visualization — live waveform display during recording for both mic and system audio
- Pause/resume — pause and resume recording without losing context
- Global keyboard shortcuts — configurable hotkeys for record, pause, and mute operations
- Session persistence — automatically save and restore your transcription sessions and settings
- Microphone muting — mute/unmute microphone without stopping the recording
- Markdown export — save transcripts as Markdown with customizable filename and body templates
- Template variables — interpolate timestamp, date, title, transcript, and summary into exports
- Token tracking — monitor token usage for AI-generated summaries
- Dark mode — toggle between light and dark themes (follow system preference or manual override)
- Settings panels — configure AI providers, export formats, VAD sensitivity, keyboard shortcuts, and more
- Split-panel view — view transcript and live summary side-by-side during recording
- Model management — download, switch, and delete transcription models from the UI with progress tracking
- macOS (Apple Silicon)
- Node.js 20+
- pnpm 10+
# Install dependencies
pnpm install
# Start in development mode
pnpm devOn first launch, select and download a Whisper model. The Large v3 Turbo model is recommended for the best balance of speed and accuracy.
Transcripto needs two permissions in System Settings > Privacy & Security:
- Microphone — prompted automatically on first use
- Screen Recording / System Audio Recording — required to capture audio from other apps. Add the app manually, then restart.
| Command | Description |
|---|---|
pnpm dev |
Start Vite dev server + Electron |
pnpm build |
TypeScript check + Vite production build |
pnpm start |
Build and launch Electron |
pnpm test |
Run tests (Vitest) |
pnpm test:watch |
Run tests in watch mode |
pnpm dist |
Package as macOS .dmg |
electron/ Main process (CommonJS) — Whisper, IPC, model management
services/ Whisper, diarization, audio file, download services
workers/ Diarization worker thread
shared/ Types shared across both processes
src/ Renderer (React + TypeScript + Vite)
components/ React components (shadcn/ui)
hooks/ Audio capture, transcription, VAD, export hooks
lib/ VAD engine, audio utilities, export formatting
public/ AudioWorklet processor
Audio pipeline: AudioWorklet (48kHz) -> Resample (16kHz) -> VAD -> IPC -> Whisper -> Transcript
- Electron — desktop shell
- React + TypeScript — UI
- Vite — bundler
- @fugood/whisper.node — local Whisper inference
- sherpa-onnx-node — speaker diarization
- shadcn/ui + Tailwind CSS v4 — components and styling
- Vitest — testing
Contributions are welcome! Here's how to get started:
- Fork the repository
- Create a feature branch (
git checkout -b my-feature) - Make your changes
- Run tests (
pnpm test) - Commit your changes (
git commit -m "Add my feature") - Push to your branch (
git push origin my-feature) - Open a Pull Request
Please make sure your code builds cleanly (pnpm build) and all tests pass before submitting.
MIT