Verbatim Studio

Your data. Your device. Your rules.

Features • Installation • Architecture • Roadmap • Contributing

Why Verbatim Studio?

Organizations handling confidential information—law firms, medical practices, government agencies, research institutions—face a critical challenge: cloud transcription services require sending sensitive data to third-party servers.

Verbatim Studio eliminates this risk entirely. All transcription and AI processing happens locally on your machine. Your files never leave your control.

Built for Compliance

HIPAA-ready — Patient interviews and medical dictation stay on-premises
Legal privilege — Attorney-client communications remain confidential
Government security — Classified briefings never touch external networks
Research ethics — IRB-protected interviews maintain participant privacy

Built for Everyone

Verbatim Studio works just as well for everyday use:

Project managers documenting meetings and standups
Content creators transcribing interviews and podcasts
Students and academics processing lectures and research
Anyone who wants accurate transcription without privacy trade-offs or subscription fees

System Requirements

Hardware

	Minimum	Recommended
RAM	8 GB	16 GB+
Disk	~2 GB (base install)	~8 GB (all AI models)

Platform Support

Platform	Requirement	Notes
macOS	Apple Silicon (M1/M2/M3/M4)	Optimized for Metal / MLX — Intel Macs not supported
Windows	x86-64 with 8 GB+ RAM	NVIDIA GPU optional — enables CUDA-accelerated transcription

Minimum covers transcription and basic editing. Recommended includes the full AI suite (Max assistant, semantic search, OCR).

Memory Usage by Feature

Each AI feature loads its own model. You only pay for what you use — deactivate models in Settings > AI to reclaim memory.

Feature	Memory	Loaded when…
App (idle)	~300 MB	Always
Transcription (Whisper base)	+200–300 MB	Transcribing audio/video
Speaker ID (pyannote)	+1 GB	Diarization enabled
Semantic search (nomic-embed)	+600 MB	Search index active
Max assistant (Granite 8B)	+5 GB	AI chat / summaries
OCR (Qwen2-VL 2B)	+5 GB	Processing images / scanned PDFs

Tip: On a 16 GB machine you can comfortably run transcription + diarization + search + Max simultaneously. OCR is loaded on-demand and released when idle.

Windows GPU / VRAM

Out of the box, only transcription uses your NVIDIA GPU (via CTranslate2). To accelerate all AI features on the GPU, open Settings > AI and click Enable Full GPU Acceleration (~2.8 GB download).

Feature	VRAM	Notes
Transcription (Whisper base)	~200 MB	GPU by default (CTranslate2)
Transcription (Whisper large-v3)	~3 GB	GPU by default (CTranslate2)
Speaker ID (pyannote)	~1 GB	Requires GPU acceleration pack
Semantic search (nomic-embed)	~600 MB	Requires GPU acceleration pack
AI Assistant (Granite 8B)	~5 GB	Requires GPU acceleration pack
OCR (Qwen2-VL 2B)	~5 GB	Requires GPU acceleration pack

Minimum VRAM: 4 GB handles transcription + diarization + search. Recommended: 8 GB+ for the full AI suite including Granite and OCR.

No NVIDIA GPU? Everything works on CPU — just slower for transcription.

Features

Transcription That Actually Works

OpenAI Whisper accuracy — State-of-the-art speech recognition running entirely on your device
Multi-language support — Transcribe in 12+ languages with automatic detection
Automatic speaker identification — Know who said what without manual tagging
Live transcription — Real-time speech-to-text from your microphone
Video support — Drop in MP4, MOV, WebM, or MKV files and get transcripts automatically

Live Transcription

Record directly from your microphone with real-time speech-to-text. Choose your language, enable speaker diarization, and watch the transcript appear as you speak. Save sessions as recordings for later editing and export.

Max: Your AI-Powered Verbatim Assistant

Max isn't just a chatbot—it's a research tool that actually understands your content:

Query across your entire library — Ask questions that span multiple files and documents
Persistent conversations — Pick up where you left off with saved chat history
Document-aware — Upload PDFs, images, and notes for Max to reference
OCR built-in — Extract text from scanned documents and images automatically
Platform guidance — Not sure how to do something? Just ask Max

All powered by IBM Granite, running 100% locally. No API keys. No usage limits. No data leaving your machine.

Documents & OCR

Upload PDFs and images for automatic text extraction. The built-in OCR model (Qwen2-VL) reads printed text and handwriting alike — all processed locally on your machine.

Find Anything, Instantly

Semantic search — Find content by meaning, not just exact keywords
Search everything — Files, transcripts, documents, notes, and chat history in one place
Smart results — See context snippets with keyword highlighting and semantic match indicators

Professional Editing Tools

Clickable timestamps — Jump to any moment instantly
Highlights and bookmarks — Mark important segments for quick reference
In-transcript search — Find exactly what you're looking for with highlighted navigation
Keyboard-first workflow — Control playback without leaving your keyboard
Inline annotations — Add notes directly to your documents and transcripts

Organize Your Way

Real folders — Projects map to actual directories on your filesystem
Bulk operations — Select multiple files and act on them at once
Flexible storage — Keep files local, on network drives, or synced with Google Drive, OneDrive, and Dropbox
Full exports — TXT, SRT, VTT, JSON, or complete backup archives

Local AI Models

All AI runs on your machine. Download and manage models directly from the settings page — no API keys, no cloud dependencies. Deactivate models when you don't need them to reclaim memory.

Installation

Desktop App

Download for your platform:

Platform	Download	Notes
macOS (Apple Silicon)	Download .dmg	M1/M2/M3/M4 optimized
Windows (x64)	Download .exe	NVIDIA GPU optional for faster transcription

The app is self-contained—no Python, Node.js, or other dependencies required. Just download, install, and run.

First Launch

On first launch, Verbatim Studio will guide you through downloading the AI models you need. Choose what fits your workflow—transcription only, or the full suite with Max and semantic search.

macOS: "App is damaged" or "unidentified developer" warning

The app is not yet code-signed. To open it:

Right-click (or Control-click) the app and select Open
Click Open in the dialog that appears

Or, if that doesn't work:

Open System Settings → Privacy & Security
Scroll down to find the blocked app message
Click Open Anyway

Alternative: Remove quarantine via Terminal

After downloading, run this command to remove the quarantine attribute:

xattr -c ~/Downloads/Verbatim.Studio-<version>-arm64.dmg

Replace <version> with the version number you downloaded (e.g., 0.26.21).

Development Setup (Build from Source)

Prerequisites

Python 3.12+
Node.js 20+
pnpm 9+
ffmpeg 7+

Clone and Install

# Clone the repository
git clone https://github.com/JongoDB/verbatim-studio.git
cd verbatim-studio

# Install Node dependencies
pnpm install

# Set up Python environment
cd packages/backend
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cd ../..

Run Development Servers

# Run both frontend and backend
pnpm dev

# Or run separately
# Terminal 1 - Backend
cd packages/backend && source .venv/bin/activate
python -m uvicorn api.main:app --reload --port 8000

# Terminal 2 - Frontend
cd packages/frontend && pnpm dev

Open http://localhost:5173 in your browser.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Frontend (React)                        │
│   Dashboard • Recordings • Projects • Documents • Search    │
├─────────────────────────────────────────────────────────────┤
│                     Backend (FastAPI)                       │
│                                                             │
│    ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│    │  Database   │  │Transcription│  │     AI      │       │
│    │   Adapter   │  │   Engine    │  │   Service   │       │
│    └──────┬──────┘  └──────┬──────┘  └──────┬──────┘       │
│           │                │                │               │
│       SQLite          WhisperX         llama.cpp           │
│                      MLX Whisper        Granite            │
└─────────────────────────────────────────────────────────────┘

Tech Stack

Layer	Technology
Frontend	React 18, TypeScript, Vite, Tailwind CSS
Backend	FastAPI, SQLAlchemy, Pydantic
Transcription	WhisperX, MLX Whisper, pyannote.audio
AI/LLM	llama-cpp-python, sentence-transformers
OCR	Qwen2-VL (vision-language model)
Audio	WaveSurfer.js, ffmpeg
Storage	SQLite, Google Drive, OneDrive, Dropbox

Roadmap

Current Release (v0.26.x)

Core Platform

Native macOS desktop app (Apple Silicon optimized)
Native Windows desktop app (Nvidia CUDA optimized)
Local AI transcription with speaker identification
Live transcription from microphone
Video file support with automatic audio extraction
Automatic update notifications with release notes

AI Assistant (Max)

Multi-document conversations with chat history
Semantic search across all content
Platform guidance and help
OCR for scanned documents and images

Editing & Organization

Clickable timestamps and playback keyboard shortcuts
Segment highlights and bookmarks
In-transcript search with navigation
Inline document annotations
Project-based organization with real filesystem folders
Bulk operations

Storage & Export

Local, network, and cloud storage options
Google Drive, OneDrive, Dropbox integration
Export to TXT, SRT, VTT, JSON

Enterprise Tier (Planned)

External LLM connections (Ollama, OpenAI, self-hosted)
Multi-user with role-based access control
Meeting bots for Teams, Google Meet, and Zoom
PostgreSQL database support
Administration dashboard
Audit logging and compliance reports
Secure mobile access to self-hosted servers

Configuration

Most settings are available through the Settings page in the app.

AI Models

On first use, Verbatim Studio downloads the AI models you select:

Model	Size	Purpose
Whisper (base)	~150 MB	Transcription (pre-bundled; configurable up to large-v3)
pyannote	~200 MB	Speaker diarization
nomic-embed-text	~550 MB	Semantic search (pre-bundled)
IBM Granite 3.3	~5 GB	AI assistant and transcript summarization
Qwen2 VL	~4.4 GB	Image OCR and document parsing

Models are cached locally and only download once.

Environment Variables (Developers)

Create a .env file in packages/backend/:

# Core settings
VERBATIM_MODE=basic
VERBATIM_DATA_DIR=~/.verbatim-studio

# Transcription
VERBATIM_WHISPERX_MODEL=base
VERBATIM_WHISPERX_DEVICE=auto

# OAuth (optional - for cloud storage)
VERBATIM_GOOGLE_CLIENT_ID=your-client-id
VERBATIM_GOOGLE_CLIENT_SECRET=your-secret

Contributing

Contributions are welcome. See the Development Setup section to get started.

Building the Desktop App

# Build for your current platform
pnpm build:electron

# Build for specific platform
pnpm build:electron:mac
pnpm build:electron:win
pnpm build:electron:linux

Running Tests

# Backend tests
cd packages/backend && pytest

# Frontend tests
cd packages/frontend && pnpm test

# Type checking
cd packages/frontend && pnpm typecheck

License

MIT License. See LICENSE for details.

Verbatim Studio — Transcription you can trust.

Report Issue • Discussions

Name		Name	Last commit message	Last commit date
Latest commit History 767 Commits
.github/workflows		.github/workflows
apps/electron		apps/electron
docs		docs
packages		packages
scripts		scripts
.gitignore		.gitignore
.nvmrc		.nvmrc
CLAUDE_CONTEXT.md		CLAUDE_CONTEXT.md
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
release-notes.md		release-notes.md

JongoDB/verbatim-studio

Folders and files

Latest commit

History

Repository files navigation