Skip to content

asantinos/ditto

Repository files navigation

Ditto

Ditto

Voice-to-text desktop app for Windows.

Press a shortcut, talk, press it again. The text appears wherever your cursor is.

License: MIT Platform: Windows Electron


Ditto lives in the system tray and shows a floating, draggable pill. Hit the global shortcut, speak, hit it again — Ditto transcribes locally with whisper.cpp and pastes the text into whatever app has focus.

  • Local-only. Audio never leaves your machine.
  • GPU-accelerated. Bundled CUDA build for NVIDIA GPUs (CPU fallback works too).
  • Out of the way. Frameless pill with click-through, always on top, tray icon for control.
  • Configurable. Custom shortcut, audio device, model, theme, and more.

Status

Early personal project. Windows-first. macOS and Linux are not targeted yet, but the code avoids Windows-only APIs without fallback where possible.

Requirements

  • Windows 10 or 11 (x64)
  • Node.js 20+
  • For GPU acceleration: NVIDIA GPU with recent drivers (CUDA Toolkit not required — cuDNN/cuBLAS DLLs are bundled with whisper.cpp)

Quick start

git clone https://github.com/asantinos/ditto.git
cd ditto
npm install
npm run setup:whisper   # downloads whisper.cpp + base model
npm run dev

Default shortcut: Ctrl+Shift+Space — press to record, press again to stop and paste.

Open settings from the tray icon (right-click → "Ajustes…", or double-click).

Stack

  • Electron 39 + React 19 + TypeScript (strict mode)
  • electron-vite for bundling, ESM throughout
  • whisper.cpp as an external binary, spawned from main (no native bindings)
  • @nut-tree-fork/nut-js for simulating Ctrl+V into the active window
  • electron-store for settings persistence

Scripts

Command What it does
npm run dev Start in development mode
npm run typecheck TypeScript on both main and renderer configs
npm run lint ESLint with cache
npm run test:transcribe One-shot transcription test on jfk.wav
npm run build:unpack Build to dist/win-unpacked/ (no installer)
npm run build:win Build the full NSIS installer to dist/

Architecture

src/
  main/         Electron main process (ESM): windows, tray, IPC, whisper, lifecycle
  preload/      contextBridge API exposed to renderers (compiled to .mjs)
  renderer/     Two React apps: pill (index.html) and settings (settings.html)
  shared/       IPC contracts and types shared between main and renderer
resources/      Tray icons + (after setup) whisper.cpp binary and models
scripts/        Setup and test scripts

The renderer captures audio with getUserMedia + MediaRecorder, decodes and resamples it to 16 kHz mono PCM with the Web Audio API, and sends it to main over a typed IPC channel as an ArrayBuffer. Main writes a temporary WAV, spawns whisper-cli.exe, parses stdout, copies the result to the clipboard, and simulates Ctrl+V with nut-js.

License

MIT © Alex Santos

Acknowledgements