Ditto

Voice-to-text desktop app for Windows.

Press a shortcut, talk, press it again. The text appears wherever your cursor is.

Ditto lives in the system tray and shows a floating, draggable pill. Hit the global shortcut, speak, hit it again — Ditto transcribes locally with whisper.cpp and pastes the text into whatever app has focus.

Local-only. Audio never leaves your machine.
GPU-accelerated. Bundled CUDA build for NVIDIA GPUs (CPU fallback works too).
Out of the way. Frameless pill with click-through, always on top, tray icon for control.
Configurable. Custom shortcut, audio device, model, theme, and more.

Status

Early personal project. Windows-first. macOS and Linux are not targeted yet, but the code avoids Windows-only APIs without fallback where possible.

Requirements

Windows 10 or 11 (x64)
Node.js 20+
For GPU acceleration: NVIDIA GPU with recent drivers (CUDA Toolkit not required — cuDNN/cuBLAS DLLs are bundled with whisper.cpp)

Quick start

git clone https://github.com/asantinos/ditto.git
cd ditto
npm install
npm run setup:whisper   # downloads whisper.cpp + base model
npm run dev

Default shortcut: Ctrl+Shift+Space — press to record, press again to stop and paste.

Open settings from the tray icon (right-click → "Ajustes…", or double-click).

Stack

Electron 39 + React 19 + TypeScript (strict mode)
electron-vite for bundling, ESM throughout
whisper.cpp as an external binary, spawned from main (no native bindings)
@nut-tree-fork/nut-js for simulating Ctrl+V into the active window
electron-store for settings persistence

Scripts

Command	What it does
`npm run dev`	Start in development mode
`npm run typecheck`	TypeScript on both main and renderer configs
`npm run lint`	ESLint with cache
`npm run test:transcribe`	One-shot transcription test on `jfk.wav`
`npm run build:unpack`	Build to `dist/win-unpacked/` (no installer)
`npm run build:win`	Build the full NSIS installer to `dist/`

Architecture

src/
  main/         Electron main process (ESM): windows, tray, IPC, whisper, lifecycle
  preload/      contextBridge API exposed to renderers (compiled to .mjs)
  renderer/     Two React apps: pill (index.html) and settings (settings.html)
  shared/       IPC contracts and types shared between main and renderer
resources/      Tray icons + (after setup) whisper.cpp binary and models
scripts/        Setup and test scripts

The renderer captures audio with getUserMedia + MediaRecorder, decodes and resamples it to 16 kHz mono PCM with the Web Audio API, and sends it to main over a typed IPC channel as an ArrayBuffer. Main writes a temporary WAV, spawns whisper-cli.exe, parses stdout, copies the result to the clipboard, and simulates Ctrl+V with nut-js.

License

Acknowledgements

whisper.cpp by Georgi Gerganov — the transcription engine
OpenAI Whisper — the underlying model
@nut-tree-fork/nut-js — keyboard simulation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
build		build
resources/tray		resources/tray
scripts		scripts
src		src
.editorconfig		.editorconfig
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc.yaml		.prettierrc.yaml
LICENSE		LICENSE
README.md		README.md
electron-builder.yml		electron-builder.yml
electron.vite.config.ts		electron.vite.config.ts
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
tsconfig.web.json		tsconfig.web.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ditto

Status

Requirements

Quick start

Stack

Scripts

Architecture

License

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ditto

Status

Requirements

Quick start

Stack

Scripts

Architecture

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages