Important
This project has moved.
The plugin have been merged into a single extension — Sonara.
👉 New repository: github.com/ArtisanWebLab/sonara
This repository is archived and will no longer receive updates, bug fixes, or new features. Please switch to Sonara for the latest version.
Local voice input and media transcription via Whisper. Dictate prompts and text into the Voice Log, or drop an audio/video file and get a timestamped transcript — all through a single local Whisper server.
This extension is a personal pet-project, built for myself and for fun.
- An experiment and a way to scratch my own itch for voice input
- Written on a "works for me, good enough" basis
- Never intended as a product
- Supported — please don't open issues asking for help, I won't respond
- Open to pull requests — I don't accept, review or merge them
- On a roadmap — there is no roadmap and there won't be one
- Guaranteed to be compatible, stable or secure — no guarantees whatsoever
- Published on the VS Code Marketplace — no, local
.vsixonly
- Download it, build it, install it for yourself
- Fork it and do whatever you want with it
- Use the code as an example or a starting point for your own extension
- Works for me — great
- Doesn't work for you — feel free to dig in yourself or fork it
- Want a feature — fork the project, don't ping me
Provided AS IS, with no obligations on my side.
- Open the Releases page and download the latest
pathtotalk-<version>.vsix - Install it into VS Code:
or in VS Code UI:
code --install-extension pathtotalk-<version>.vsix
Extensionspanel →...menu →Install from VSIX...→ pick the file - Reload VS Code
- On first run, the extension will run a setup wizard:
- Creates a Python virtualenv under the extension's global storage
- Installs
faster-whisper,torch(CUDA or CPU build, auto-detected), and other deps - Downloads the selected Whisper model
Requires Docker + Docker Compose (no local Node.js / Python needed).
git clone <your-repo-url> pathtotalk
cd pathtotalk
make install # install npm deps (inside Docker)
make build # compile TypeScript + package .vsix
make install-ext # install the built .vsix into local VS CodeOther useful targets:
make compile # tsc only
make watch # tsc in watch mode
make lint # eslint
make clean # remove out/ and *.vsixAfter make install-ext, reload VS Code. The first launch triggers the same setup wizard as Option A.
Requires: gh CLI authenticated, git remote origin configured.
make release V=0.1.1The tools/release.sh script will:
- Validate the working tree is clean and the tag doesn't exist yet
- Bump
package.json+package-lock.jsonto the given version (vianpm version) - Run
make clean && make buildto producepathtotalk-<version>.vsix - Commit
chore: release vX.Y.Z, create tagvX.Y.Z, push both toorigin - Create a GitHub Release with the
.vsixattached and auto-generated notes from commits
Default keybindings:
| Shortcut | Action |
|---|---|
Ctrl+Shift+M |
Start / stop recording (toggle) |
Ctrl+Alt+M |
Cancel recording — discard audio, no transcription (while recording) |
Ctrl+Shift+L |
Open Voice Log |
The Voice: Recording status bar item is also clickable — left-click toggles recording on and off. When pressed Stop, the recorder keeps capturing for an extra pathtotalk.stopDelayMs milliseconds (default 1s) so the last words aren't cut off.
Available commands (Command Palette → Voice: ...):
Voice Log (short dictation records)
Voice: Start / Stop Recording(toggle — bound toCtrl+Shift+Mand the status bar)Voice: Cancel Recording (discard)— stop without transcribing (bound toCtrl+Alt+Mwhile recording)Voice: Start Recording/Voice: Stop Recording(explicit, non-toggle variants)Voice: Show LogVoice: Copy Last TranscriptionVoice: Search LogVoice: Export Log as MarkdownVoice: Open Log File/Voice: Clear Project LogVoice: Edit Project Vocabulary— open the per-project vocabulary file (custom terms biased into the Whisper prompt)Voice: Change Streaming Mode (off / adaptive / on)— pick how transcription is delivered (see Streaming modes)
Voice Transcripts (audio / video file transcription)
Voice: Transcribe File...— pick an audio or video file, transcribe into a timestamped Markdown fileVoice: Show Transcripts— open the Transcripts panel
Model / server
Voice: Change Model— pick a Whisper model (tiny…large-v3)Voice: Change Language— pick transcription language orautoVoice: Change Compute Device—auto/cuda:0/cuda:1/cpuVoice: Restart Server,Voice: Show Server Logs,Voice: Show Extension LogsVoice: Download Model— pre-download a model without switching to itVoice: Reset Extension— wipe the Python venv and re-run the setup wizard
Storage
Voice: Open Project Storage Folder— reveals the per-project storage directoryVoice: Open Global Storage Folder— reveals the extension's global storage (Python venv, models, fallback logs)
Three modes for how the dictation transcript is delivered, chosen via Voice: Change Streaming Mode (off / adaptive / on) or the pathtotalk.streamingMode setting.
| Mode | When to use | How it works |
|---|---|---|
off (classic) |
Short messages, weak GPU / CPU only | Records → stops → transcribes the whole WAV in a single Whisper pass. Best accuracy on short clips, no GPU pressure during recording. |
on (live) |
Long dictation when you want to see text as you speak | Streams raw PCM over WebSocket from the first second; Whisper emits partial confirmed/pending text every few seconds. Requires a CUDA GPU; on CPU it lags behind speech on medium+ models. |
adaptive (default recommendation for GPU users) |
Mix of short and long messages | Buffers PCM in memory until pathtotalk.adaptiveStreamingThresholdSec (default 30s) is reached. Short recordings finish in classic mode (full accuracy). When the threshold is crossed, the buffered head is transcribed in one classic pass, then a live WebSocket session takes over for the remainder, so the final text is "classic head + streaming tail". |
While a streaming or adaptive recording is in progress, the Voice Log shows a pinned draft card with a pulsing dot, a ticking duration timer and the live label (Recording while buffering in adaptive, Live once Whisper is producing partial text).
Open VS Code settings (Ctrl+,) and search for pathtotalk. Key options:
pathtotalk.model— Whisper model (defaultlarge-v3)pathtotalk.device— compute device (autopicks CUDA if available, otherwise CPU)pathtotalk.language— transcription language orautopathtotalk.computeType—auto/float16/int8_float16/int8/float32pathtotalk.vadFilter— enable voice activity detection (defaulttrue)pathtotalk.beamSize— beam search size (1–10, default 5)pathtotalk.stopDelayMs— extra recording time after Stop (ms, default1000)pathtotalk.streamingMode—off/on/adaptive(see Streaming modes, defaultoff)pathtotalk.adaptiveStreamingThresholdSec— whenstreamingMode = adaptive, switch from classic buffering to live transcription after this many seconds (default30, range 5–300)pathtotalk.streamingIntervalSec— how often (seconds) the live transcriber emits a partial result (default2)pathtotalk.log.*— Voice Log behavior (max records, grouping, notifications, gitignore handling)
- A bundled Python FastAPI server (
python/server.py) runsfaster-whisperfor transcription - The extension spawns the server on activation, picks a free port, and talks to it over HTTP for classic transcription and over WebSocket for live (streaming) transcription
- Two sidebar views share the same server:
- Voice Log — records captured by the OS recorder (
parecord/arecord) are sent as WAV (classic mode) or as a raw PCM stream (live / adaptive mode); results are stored line-by-line - Voice Transcripts — a selected media file is transcribed in a streaming HTTP request that reports progress and returns timestamped segments
- Voice Log — records captured by the OS recorder (
- All persistent data lives under the extension's global storage (
globalStorageUri), per project:<global-storage>/projects/<project-hash>/voice-log.jsonl— dictation history for that project (newest-first,copiedflag clears the "unread" highlight after you copy)<global-storage>/projects/<project-hash>/transcripts/<YYYY-MM-DD_HH-mm-ss>_<source-name>.md— one file per transcribed recording, each with a JSON metadata header, summary (duration / language / model) and[HH:MM:SS]timestamps every 60s<global-storage>/projects/<project-hash>/vocabulary.md— per-project vocabulary biased into the Whisper prompt (edit viaVoice: Edit Project Vocabulary)
- Workspaces without a folder fall back to a shared
voice-logs-fallbackdirectory under the same global storage - Use
Voice: Open Project Storage Folder/Voice: Open Global Storage Folderto reveal the actual paths in your file manager
On first activation, a legacy .vscode/voice-log.jsonl and .vscode/pathtotalk/ directory are auto-migrated into the per-project global storage layout above.
Requirements on the host:
- Docker + Docker Compose (only for building from source)
- Python 3.10+ available as
python3(the setup wizard uses it to create the venv) ffmpeginPATH— required only for Voice: Transcribe File..., sincefaster-whispershells out to it for video and non-WAV audio- Optional: NVIDIA GPU with CUDA for faster transcription (
nvidia-smiis used to detect it)