Your data. Your device. Your rules.
Features • Installation • Architecture • Roadmap • Contributing
Organizations handling confidential information—law firms, medical practices, government agencies, research institutions—face a critical challenge: cloud transcription services require sending sensitive data to third-party servers.
Verbatim Studio eliminates this risk entirely. All transcription and AI processing happens locally on your machine. Your files never leave your control.
- HIPAA-ready — Patient interviews and medical dictation stay on-premises
- Legal privilege — Attorney-client communications remain confidential
- Government security — Classified briefings never touch external networks
- Research ethics — IRB-protected interviews maintain participant privacy
Verbatim Studio works just as well for everyday use:
- Project managers documenting meetings and standups
- Content creators transcribing interviews and podcasts
- Students and academics processing lectures and research
- Anyone who wants accurate transcription without privacy trade-offs or subscription fees
| Minimum | Recommended | |
|---|---|---|
| RAM | 8 GB | 16 GB+ |
| Disk | ~2 GB (base install) | ~8 GB (all AI models) |
| Platform | Requirement | Notes |
|---|---|---|
| macOS | Apple Silicon (M1/M2/M3/M4) | Optimized for Metal / MLX — Intel Macs not supported |
| Windows | x86-64 with 8 GB+ RAM | NVIDIA GPU optional — enables CUDA-accelerated transcription |
Minimum covers transcription and basic editing. Recommended includes the full AI suite (Max assistant, semantic search, OCR).
Each AI feature loads its own model. You only pay for what you use — deactivate models in Settings > AI to reclaim memory.
| Feature | Memory | Loaded when… |
|---|---|---|
| App (idle) | ~300 MB | Always |
| Transcription (Whisper base) | +200–300 MB | Transcribing audio/video |
| Speaker ID (pyannote) | +1 GB | Diarization enabled |
| Semantic search (nomic-embed) | +600 MB | Search index active |
| Max assistant (Granite 8B) | +5 GB | AI chat / summaries |
| OCR (Qwen2-VL 2B) | +5 GB | Processing images / scanned PDFs |
Tip: On a 16 GB machine you can comfortably run transcription + diarization + search + Max simultaneously. OCR is loaded on-demand and released when idle.
Out of the box, only transcription uses your NVIDIA GPU (via CTranslate2). To accelerate all AI features on the GPU, open Settings > AI and click Enable Full GPU Acceleration (~2.8 GB download).
| Feature | VRAM | Notes |
|---|---|---|
| Transcription (Whisper base) | ~200 MB | GPU by default (CTranslate2) |
| Transcription (Whisper large-v3) | ~3 GB | GPU by default (CTranslate2) |
| Speaker ID (pyannote) | ~1 GB | Requires GPU acceleration pack |
| Semantic search (nomic-embed) | ~600 MB | Requires GPU acceleration pack |
| AI Assistant (Granite 8B) | ~5 GB | Requires GPU acceleration pack |
| OCR (Qwen2-VL 2B) | ~5 GB | Requires GPU acceleration pack |
Minimum VRAM: 4 GB handles transcription + diarization + search. Recommended: 8 GB+ for the full AI suite including Granite and OCR.
No NVIDIA GPU? Everything works on CPU — just slower for transcription.
- OpenAI Whisper accuracy — State-of-the-art speech recognition running entirely on your device
- Multi-language support — Transcribe in 12+ languages with automatic detection
- Automatic speaker identification — Know who said what without manual tagging
- Live transcription — Real-time speech-to-text from your microphone
- Video support — Drop in MP4, MOV, WebM, or MKV files and get transcripts automatically
Record directly from your microphone with real-time speech-to-text. Choose your language, enable speaker diarization, and watch the transcript appear as you speak. Save sessions as recordings for later editing and export.
Max isn't just a chatbot—it's a research tool that actually understands your content:
- Query across your entire library — Ask questions that span multiple files and documents
- Persistent conversations — Pick up where you left off with saved chat history
- Document-aware — Upload PDFs, images, and notes for Max to reference
- OCR built-in — Extract text from scanned documents and images automatically
- Platform guidance — Not sure how to do something? Just ask Max
All powered by IBM Granite, running 100% locally. No API keys. No usage limits. No data leaving your machine.
Upload PDFs and images for automatic text extraction. The built-in OCR model (Qwen2-VL) reads printed text and handwriting alike — all processed locally on your machine.
- Semantic search — Find content by meaning, not just exact keywords
- Search everything — Files, transcripts, documents, notes, and chat history in one place
- Smart results — See context snippets with keyword highlighting and semantic match indicators
- Clickable timestamps — Jump to any moment instantly
- Highlights and bookmarks — Mark important segments for quick reference
- In-transcript search — Find exactly what you're looking for with highlighted navigation
- Keyboard-first workflow — Control playback without leaving your keyboard
- Inline annotations — Add notes directly to your documents and transcripts
- Real folders — Projects map to actual directories on your filesystem
- Bulk operations — Select multiple files and act on them at once
- Flexible storage — Keep files local, on network drives, or synced with Google Drive, OneDrive, and Dropbox
- Full exports — TXT, SRT, VTT, JSON, or complete backup archives
All AI runs on your machine. Download and manage models directly from the settings page — no API keys, no cloud dependencies. Deactivate models when you don't need them to reclaim memory.
Download for your platform:
| Platform | Download | Notes |
|---|---|---|
| macOS (Apple Silicon) | Download .dmg | M1/M2/M3/M4 optimized |
| Windows (x64) | Download .exe | NVIDIA GPU optional for faster transcription |
The app is self-contained—no Python, Node.js, or other dependencies required. Just download, install, and run.
On first launch, Verbatim Studio will guide you through downloading the AI models you need. Choose what fits your workflow—transcription only, or the full suite with Max and semantic search.
macOS: "App is damaged" or "unidentified developer" warning
The app is not yet code-signed. To open it:
- Right-click (or Control-click) the app and select Open
- Click Open in the dialog that appears
Or, if that doesn't work:
- Open System Settings → Privacy & Security
- Scroll down to find the blocked app message
- Click Open Anyway
Alternative: Remove quarantine via Terminal
After downloading, run this command to remove the quarantine attribute:
xattr -c ~/Downloads/Verbatim.Studio-<version>-arm64.dmgReplace <version> with the version number you downloaded (e.g., 0.26.21).
Development Setup (Build from Source)
- Python 3.12+
- Node.js 20+
- pnpm 9+
- ffmpeg 7+
# Clone the repository
git clone https://github.com/JongoDB/verbatim-studio.git
cd verbatim-studio
# Install Node dependencies
pnpm install
# Set up Python environment
cd packages/backend
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cd ../..# Run both frontend and backend
pnpm dev
# Or run separately
# Terminal 1 - Backend
cd packages/backend && source .venv/bin/activate
python -m uvicorn api.main:app --reload --port 8000
# Terminal 2 - Frontend
cd packages/frontend && pnpm devOpen http://localhost:5173 in your browser.
┌─────────────────────────────────────────────────────────────┐ │ Frontend (React) │ │ Dashboard • Recordings • Projects • Documents • Search │ ├─────────────────────────────────────────────────────────────┤ │ Backend (FastAPI) │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Database │ │Transcription│ │ AI │ │ │ │ Adapter │ │ Engine │ │ Service │ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ │ │ │ │ SQLite WhisperX llama.cpp │ │ MLX Whisper Granite │ └─────────────────────────────────────────────────────────────┘
| Layer | Technology |
|---|---|
| Frontend | React 18, TypeScript, Vite, Tailwind CSS |
| Backend | FastAPI, SQLAlchemy, Pydantic |
| Transcription | WhisperX, MLX Whisper, pyannote.audio |
| AI/LLM | llama-cpp-python, sentence-transformers |
| OCR | Qwen2-VL (vision-language model) |
| Audio | WaveSurfer.js, ffmpeg |
| Storage | SQLite, Google Drive, OneDrive, Dropbox |
Core Platform
- Native macOS desktop app (Apple Silicon optimized)
- Native Windows desktop app (Nvidia CUDA optimized)
- Local AI transcription with speaker identification
- Live transcription from microphone
- Video file support with automatic audio extraction
- Automatic update notifications with release notes
AI Assistant (Max)
- Multi-document conversations with chat history
- Semantic search across all content
- Platform guidance and help
- OCR for scanned documents and images
Editing & Organization
- Clickable timestamps and playback keyboard shortcuts
- Segment highlights and bookmarks
- In-transcript search with navigation
- Inline document annotations
- Project-based organization with real filesystem folders
- Bulk operations
Storage & Export
- Local, network, and cloud storage options
- Google Drive, OneDrive, Dropbox integration
- Export to TXT, SRT, VTT, JSON
- External LLM connections (Ollama, OpenAI, self-hosted)
- Multi-user with role-based access control
- Meeting bots for Teams, Google Meet, and Zoom
- PostgreSQL database support
- Administration dashboard
- Audit logging and compliance reports
- Secure mobile access to self-hosted servers
Most settings are available through the Settings page in the app.
On first use, Verbatim Studio downloads the AI models you select:
| Model | Size | Purpose |
|---|---|---|
| Whisper (base) | ~150 MB | Transcription (pre-bundled; configurable up to large-v3) |
| pyannote | ~200 MB | Speaker diarization |
| nomic-embed-text | ~550 MB | Semantic search (pre-bundled) |
| IBM Granite 3.3 | ~5 GB | AI assistant and transcript summarization |
| Qwen2 VL | ~4.4 GB | Image OCR and document parsing |
Models are cached locally and only download once.
Environment Variables (Developers)
Create a .env file in packages/backend/:
# Core settings
VERBATIM_MODE=basic
VERBATIM_DATA_DIR=~/.verbatim-studio
# Transcription
VERBATIM_WHISPERX_MODEL=base
VERBATIM_WHISPERX_DEVICE=auto
# OAuth (optional - for cloud storage)
VERBATIM_GOOGLE_CLIENT_ID=your-client-id
VERBATIM_GOOGLE_CLIENT_SECRET=your-secretContributions are welcome. See the Development Setup section to get started.
# Build for your current platform
pnpm build:electron
# Build for specific platform
pnpm build:electron:mac
pnpm build:electron:win
pnpm build:electron:linux# Backend tests
cd packages/backend && pytest
# Frontend tests
cd packages/frontend && pnpm test
# Type checking
cd packages/frontend && pnpm typecheckMIT License. See LICENSE for details.
Verbatim Studio — Transcription you can trust.







