Local AI-powered dictation for macOS using WhisperKit
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β π€ VoxScript - Local Dictation for macOS β
β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββ β
β β Record β -> β WhisperKit β -> β Insert at β β
β β Audio β β Transcribe β β Cursor β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββ β
β β Ollama β β
β β (Optional) β β
β ββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- 100% Local Processing - All transcription happens on-device using Apple Silicon
- Global Hotkeys - Press ββ§Space anywhere to start/stop recording
- Multiple Recording Modes - Toggle, Push-to-Talk, or Continuous with silence detection
- Optional Post-Processing - Clean up text with local Ollama LLM
- Menu Bar App - Runs quietly in the background
- Works Everywhere - Text insertion works in standard apps AND terminal emulators
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VoxScript.app β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β VoxScriptApp β β
β β (App Entry Point) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βββββββββββββββββΌββββββββββββββββ β
β βΌ βΌ βΌ β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ β
β β StatusBar β β FloatingPanelβ β HotkeyManager β β
β β Controller β β Controller β β β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Core Services β β
β ββββββββββββββββ¬ββββββββββββββββ¬βββββββββββββββββββββββ€ β
β βAudioRecorder βTranscriptionE β PostProcessor β β
β β (AVAudioEng) β (WhisperKit) β (Ollama) β β
β ββββββββββββββββ΄ββββββββββββββββ΄βββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Models β β
β βββββββββββββββ¬ββββββββββββββ¬ββββββββββββββββββββββββββ€ β
β β AppState β Settings β TranscriptionResult β β
β βββββββββββββββ΄ββββββββββββββ΄ββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- macOS 14.0+ (Sonoma or later)
- Apple Silicon (M1/M2/M3/M4)
- ~1-2GB disk space for Whisper model
Download the latest DMG from the Releases page.
# Clone the repository
git clone https://github.com/davidcv5/VoxScript.git
cd VoxScript
# Open in Xcode
open VoxScript.xcodeproj
# Build and Run (βR)Or build via command line:
xcodebuild -project VoxScript.xcodeproj -scheme VoxScript -configuration Release| Package | Version | Purpose |
|---|---|---|
| WhisperKit | 0.15.0+ | Speech-to-text engine |
| KeyboardShortcuts | 2.0.0+ | Global hotkey handling |
- Ollama - For post-processing text cleanup (install separately)
- Launch VoxScript - It appears in the menu bar
- Press ββ§Space to start recording
- Speak your text
- Press ββ§Space again to stop and transcribe
- Text is automatically inserted at cursor
On first launch, VoxScript will:
- Request Microphone permission
- Request Accessibility permission (for global shortcuts)
- Download the default Whisper model (~1GB)
| Action | Shortcut |
|---|---|
| Toggle Recording | ββ§Space |
| Cancel Recording | Escape |
| Open Settings | β, |
| Mode | Behavior |
|---|---|
| Toggle | Press to start, press again to stop |
| Push-to-Talk | Hold key to record, release to transcribe |
| Continuous | Auto-stops after detecting silence (2s) |
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| large-v3-turbo | ~950MB | Fast | Excellent |
| large-v3 | ~1.5GB | Slower | Best |
| small.en | ~460MB | Very fast | Good (English) |
| base | ~140MB | Fastest | Basic |
| tiny | ~75MB | Instant | Testing only |
ββββββββββββ βββββββββββββββββ ββββββββββββββββ
β Mic/User ββββββΆβ AudioRecorder ββββββΆβ Temp WAV Fileβ
ββββββββββββ βββββββββββββββββ ββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β TranscriptionEngine β
β ββββββββββββββββββββββββββββββββ β
β β WhisperKit β β
β β ββββββββββ βββββββββββββ β β
β β β Model β + β CoreML/ANEβ β β
β β ββββββββββ βββββββββββββ β β
β ββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β TranscriptionResult β
β { text, language } β
ββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββ
β β β
βΌ β βΌ
βββββββββββββββββββ β βββββββββββββββββββ
β Post-Processing ββββββββββββββ β ClipboardManagerβ
β (Optional) β β β
β βββββββββββββββ β β βββββββββββββ β
β β Ollama β β β β Paste/ β β
β β llama3.2 β βββββββββββββββββββββββΆβ β Insert β β
β βββββββββββββββ β β βββββββββββββ β
βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
ββββββββββββββββ
β Target App β
β (at cursor) β
ββββββββββββββββ
Access via menu bar icon β Settings (β,)
- General: Launch at login, sounds, floating indicator
- Transcription: Model selection, language, post-processing
- Shortcuts: Customize keyboard shortcuts
- Advanced: Insert directly, trailing newline, silence detection
- All processing happens locally on your device
- No data is sent to any cloud service
- Audio is only saved temporarily during transcription
- No telemetry or usage tracking
VoxScript/
βββ Package.swift # Swift Package Manager dependencies
βββ VoxScript.xcodeproj/
βββ VoxScript/
β βββ VoxScriptApp.swift # Main app entry point
β βββ Info.plist # App configuration
β βββ VoxScript.entitlements # Audio, automation entitlements
β βββ Core/
β β βββ TranscriptionEngine.swift # WhisperKit wrapper (singleton)
β β βββ AudioRecorder.swift # AVAudioEngine recording
β β βββ HotkeyManager.swift # KeyboardShortcuts wrapper
β β βββ ClipboardManager.swift # Text insertion with terminal detection
β β βββ PostProcessor.swift # Ollama integration
β βββ UI/
β β βββ FloatingPanel/ # Recording indicator
β β βββ Settings/ # Settings tabs
β β βββ Onboarding/ # First-run setup
β β βββ MenuBar/ # Status bar controller
β βββ Models/
β β βββ AppState.swift # Observable app state
β β βββ Settings.swift # User preferences
β β βββ TranscriptionResult.swift # Result model
β βββ Utilities/
β βββ Permissions.swift # Permission helpers
β βββ SoundPlayer.swift # Audio feedback
βββ VoxScriptTests/ # Unit tests
VoxScript automatically detects terminal apps and uses a different insertion method. If it's still not working:
- Open Settings β Advanced
- Disable "Insert directly"
- Manually paste with βV after transcription
- Check your internet connection
- Try a smaller model first (base or tiny)
- Check available disk space
- Ensure Accessibility permission is granted
- Check System Settings β Privacy & Security β Accessibility
- Toggle VoxScript off and on in the list
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - see LICENSE for details.
- WhisperKit by Argmax
- Whisper by OpenAI
- KeyboardShortcuts by Sindre Sorhus
- Ollama for local LLM inference
- VoxScript PRD - Full product requirements document with implementation notes