Overview
A minimalist Python CLI tool for real-time speech transcription that integrates seamlessly with the InterBrain Obsidian plugin. Uses whisper_streaming with LocalAgreement-2 policy to append timestamped, duplicate-free transcripts to markdown files for post-call summarization and semantic search integration.
Vision
Enable effortless voice-to-text capture during conversations, meetings, and ideation sessions. Transcripts appear seamlessly in markdown files with timestamps, ready for semantic search indexing and DreamNode integration. No UI complexity - just terminal commands orchestrated by Obsidian.
Architectural Decision: Deferred Marketplace Compatibility
Current Implementation Strategy: Build as self-contained vertical slice feature with Python scripts included in src/features/realtime-transcription/scripts/.
Rationale:
- Faster Development: No external repo coordination needed during initial development
- Easy Testing: Scripts co-located with TypeScript code for rapid iteration
- Future-Proof: Minimal refactoring cost to split later (~2-3 hours)
- Cross-Platform: Works on macOS, Windows, Linux with Python 3.8+
Path to Marketplace Compatibility (Optional Future):
If we later decide to pursue Obsidian marketplace listing:
- Extract
scripts/ directory to separate interbrain-transcription-extension repository
- Update
transcription-service.ts path resolution to check vault .interbrain/extensions/ first
- Add installation command for extension setup (git clone + pip install instructions)
- Submit core plugin (without Python scripts) to marketplace
What Stays The Same (No Rewrite):
- ✅ All TypeScript code (commands, services, store)
- ✅ Python script logic
- ✅ Process spawning (just path argument changes)
- ✅ Feature architecture (self-contained vertical slice)
What Changes (Minimal Refactoring):
- 📦 Move Python scripts to separate repo
- 🔧 Update path resolution logic (~50 lines)
- 📝 Add installation command (~100 lines)
Current Focus: Build robust, cross-platform transcription feature. Defer packaging decisions until functionality proven.
Technical Architecture
Core Components
1. Python CLI Script: interbrain-transcribe.py
- Uses
whisper_streaming library (UFAL - IWSLT 2022 winner)
- LocalAgreement-2 policy prevents duplicates and handles retroactive corrections
- Captures microphone audio via
sounddevice library
- Appends timestamped, finalized transcripts to markdown file
- Runs as background process managed by Obsidian plugin
2. Obsidian Plugin Integration
- Two command palette commands: Start/Stop Real-Time Transcription
- Uses Node.js
child_process.spawn() for process management
- Real-time stdout/stderr monitoring via event listeners
- Process reference stored in Zustand for lifecycle management
- Automatic cleanup on plugin unload
Why This Approach?
Decision: Python CLI over Tauri
- Simplicity: ~150 lines Python vs ~2000+ lines Rust/Tauri/UI
- Development Speed: 2-3 days vs 2-3 weeks
- Proven Technology: whisper_streaming achieves 3.3s latency in production
- Zero UI Overhead: Terminal-based, managed by Obsidian
- Easy Maintenance: Pure Python, no build system
Decision: whisper_streaming over raw whisper.cpp
- LocalAgreement-2 Built-In: Prevents duplicates without manual implementation
- Seamless Retroactive Corrections: Like macOS native dictation
- Production-Ready: Used in live multilingual conference transcription
- Self-Adaptive Latency: Adjusts based on source complexity
Decision: whisper.cpp over Nvidia Parakeet
- Accuracy: Whisper remains gold standard (Parakeet trades accuracy for speed)
- Mature Ecosystem: Extensive documentation, large community
- Apple Silicon Support: Metal acceleration, 30x realtime speeds
- Streaming Support: whisper_streaming built specifically for it
Implementation Specification
Python CLI Script
Command-Line Interface:
# Start transcription
python3 interbrain-transcribe.py --output "/path/to/transcript.md"
# With optional parameters
python3 interbrain-transcribe.py \
--output "/path/to/transcript.md" \
--device "MacBook Pro Microphone" \
--model "small.en"
Arguments:
--output (required): Path to output markdown file (absolute or relative)
--device (optional): Microphone device ID/name for selection
--model (optional): Whisper model size (tiny, base, small.en, medium, large)
--language (optional): Language code (default: auto-detect)
Output Format:
[2025-10-03 14:32:15] Hello my name is David I'm working on InterBrain transcription system.
[2025-10-03 14:32:47] The whisper streaming library uses local agreement to prevent duplicates.
[2025-10-03 14:33:12] This is a seamless experience just like macOS native dictation.
Design Decisions:
- Timestamped entries: ISO 8601 format
[YYYY-MM-DD HH:MM:SS] for easy parsing
- Blank line separator: Natural reading flow, easy semantic segmentation
- Text extraction: Simple regex
\[.*?\] (.*) to strip timestamps for search indexing
- No speaker diarization: Deferred to future enhancement (Phase 2)
- Always append: Never overwrite existing content
- Create if missing: Automatically create output file and parent directories
Terminal Output (for Obsidian monitoring):
🎙️ Starting transcription to: /Users/david/transcript.md
📝 Model: small.en
🔴 Recording... (Ctrl+C to stop)
✅ Hello my name is David
✅ I'm working on InterBrain transcription system
⚠️ Audio buffer overrun (dropped 0.05s)
✅ This is a test of the transcription system
⏹️ Transcription stopped
Error Handling:
- Microphone access denied:
❌ Microphone permission denied. Grant access in System Settings → Privacy & Security → Microphone.
- Output file unwritable:
❌ Cannot write to [path]. Check file permissions.
- Model download needed:
📥 Downloading whisper model 'small.en' (first run, ~500MB)...
- Audio device not found:
❌ Device 'X' not found. Available devices:\n - MacBook Pro Microphone\n - External USB Mic
- whisper_streaming import error:
❌ whisper_streaming not installed. Run: pip install whisper-streaming
Obsidian Plugin Integration
New Commands File: src/features/realtime-transcription/commands/transcription-commands.ts
Command 1: Start Real-Time Transcription
plugin.addCommand({
id: 'start-realtime-transcription',
name: 'Start Real-Time Transcription',
hotkeys: [{ modifiers: ['Ctrl', 'Shift'], key: 't' }],
callback: async () => {
const store = useInterBrainStore.getState();
// Check if already running
if (store.transcriptionProcess) {
uiService.showWarning('Transcription already running');
return;
}
// Validate active file is markdown
const activeFile = plugin.app.workspace.getActiveFile();
if (!activeFile || !activeFile.path.endsWith('.md')) {
uiService.showError('Please open a markdown file for transcript output');
return;
}
// Get full file path
const transcriptPath = vaultService.getFullPath(activeFile.path);
// Spawn Python process
const { spawn } = require('child_process');
const scriptPath = require('path').join(
plugin.manifest.dir,
'features',
'realtime-transcription',
'scripts',
'interbrain-transcribe.py'
);
const process = spawn('python3', [
scriptPath,
'--output', transcriptPath,
'--model', 'small.en'
]);
// Monitor stdout for status updates
process.stdout.on('data', (data: Buffer) => {
const output = data.toString().trim();
console.log(`[Transcription] ${output}`);
});
// Monitor stderr for errors
process.stderr.on('data', (data: Buffer) => {
const error = data.toString().trim();
console.error(`[Transcription Error] ${error}`);
if (error.includes('permission denied')) {
uiService.showError('Microphone permission denied');
} else {
uiService.showError('Transcription error - see console');
}
});
// Handle process exit
process.on('close', (code: number) => {
console.log(`Transcription process exited with code ${code}`);
store.setTranscriptionProcess(null);
if (code === 0) {
uiService.showSuccess('Transcription stopped');
} else {
uiService.showError(`Transcription failed (exit code: ${code})`);
}
});
// Store process reference
store.setTranscriptionProcess(process);
uiService.showSuccess('🎙️ Transcription started');
}
});
Command 2: Stop Real-Time Transcription
plugin.addCommand({
id: 'stop-realtime-transcription',
name: 'Stop Real-Time Transcription',
hotkeys: [{ modifiers: ['Ctrl', 'Shift'], key: 't' }], // Same key toggles
callback: () => {
const store = useInterBrainStore.getState();
const process = store.transcriptionProcess;
if (!process) {
uiService.showWarning('No active transcription');
return;
}
console.log('Stopping transcription process...');
// Send SIGTERM for graceful shutdown
process.kill('SIGTERM');
// Force kill after 5 seconds if still running
setTimeout(() => {
if (process.exitCode === null) {
console.warn('Force killing transcription process');
process.kill('SIGKILL');
}
}, 5000);
uiService.showInfo('Stopping transcription...');
}
});
Zustand Store Extension:
interface InterBrainStore {
// ... existing state ...
transcriptionProcess: ChildProcess | null;
setTranscriptionProcess: (process: ChildProcess | null) => void;
}
Plugin Cleanup (main.ts):
onunload() {
// Kill transcription process on plugin unload
const store = useInterBrainStore.getState();
if (store.transcriptionProcess) {
console.log('Cleaning up transcription process...');
store.transcriptionProcess.kill('SIGTERM');
}
}
Dependencies
Python Dependencies (requirements.txt):
whisper-streaming>=0.1.0
sounddevice>=0.4.6
numpy>=1.24.0
faster-whisper>=0.10.0 # Backend for whisper_streaming
Installation:
pip install -r src/features/realtime-transcription/scripts/requirements.txt
First-Time Model Download:
The whisper model will auto-download on first run (~500MB for small.en). User sees progress:
📥 Downloading whisper model 'small.en' (first run)...
⬇️ Progress: 45% (225MB / 500MB)
✅ Model downloaded successfully
Implementation Plan
Phase 1: Python CLI Script (~1 day)
Tasks:
- Create
src/features/realtime-transcription/scripts/interbrain-transcribe.py with basic structure
- Integrate whisper_streaming library with LocalAgreement-2
- Implement audio capture with sounddevice (16kHz mono)
- Add file writing with ISO 8601 timestamps
- Implement command-line argument parsing
- Add graceful error handling with user-friendly messages
- Test LocalAgreement-2 duplicate prevention manually
Acceptance Criteria:
Phase 2: Obsidian Plugin Integration (~1 day)
Tasks:
- Create
src/features/realtime-transcription/commands/transcription-commands.ts
- Add two command palette commands (start/stop)
- Extend Zustand store with transcriptionProcess state
- Implement process spawning with stdout/stderr monitoring
- Add process cleanup in plugin onunload
- Register commands in main.ts
- Test full start/stop workflow
Acceptance Criteria:
Phase 3: Testing & Documentation (~1 day)
Tasks:
- Test on macOS Apple Silicon (primary target)
- Test error scenarios (no mic, permission denied, file locked)
- Validate transcription accuracy with real speech
- Test rapid start/stop cycles
- Verify no memory leaks or zombie processes
- Write README with installation/usage instructions
- Update CLAUDE.md with transcription workflow
Acceptance Criteria:
Phase 4: Polish & Future Enhancements (Optional)
Potential Future Improvements:
- Multiple language support: CLI flag for language selection
- Speaker diarization: Integrate pyannote.audio for multi-speaker transcription
- Custom model paths: Allow local whisper model file selection
- Pause/resume: Two-way stdin communication for pause control
- Menu bar indicator: Optional Python
rumps app for visual status
- Auto-indexing: Trigger semantic search indexing after transcription
- Session metadata: Include session start/end markers in transcript
File Structure
InterBrain/
├── src/
│ └── features/
│ └── realtime-transcription/ # Self-contained feature
│ ├── README.md # Feature documentation
│ ├── index.ts # Feature exports
│ ├── commands/
│ │ └── transcription-commands.ts
│ ├── services/
│ │ └── transcription-service.ts
│ ├── store/
│ │ └── transcription-store.ts
│ ├── scripts/ # Python scripts (co-located)
│ │ ├── interbrain-transcribe.py
│ │ └── requirements.txt
│ ├── types/
│ │ └── transcription-types.ts
│ └── tests/
│ └── transcription-service.test.ts
└── CLAUDE.md # Updated with integration notes
Testing Strategy
Unit Tests (Python)
- File writing and timestamp formatting
- Path validation (absolute/relative)
- Error message generation
- Command-line argument parsing
Integration Tests (Python + Audio)
- whisper_streaming integration
- Audio capture from mock device
- LocalAgreement-2 behavior verification
- Process signal handling (SIGTERM, SIGINT)
Manual Testing (End-to-End)
- Real dictation sessions with natural speech patterns
- Various microphone devices
- File permission edge cases
- Rapid start/stop cycles
- Plugin reload scenarios
Performance Testing
- CPU usage monitoring during transcription
- Memory leak detection (run for 30+ minutes)
- Latency measurement (speech to file write)
- Concurrent Obsidian usage (no UI lag)
Success Criteria
MVP Definition of Done
Quality Gates
- Zero lint errors/warnings
- TypeScript strict mode compliance
- All manual test scenarios pass
- No memory leaks after 30-minute session
- CPU usage <50% during active transcription
- Process cleanup leaves no zombies
Dependencies & Prerequisites
System Requirements
- macOS: 10.15+ (Apple Silicon preferred)
- Windows: 10+ (with Python 3.8+)
- Linux: Any modern distro (with Python 3.8+)
- Python: 3.8+ (with pip)
- Microphone: Any USB or built-in mic
- Disk Space: ~1GB (for whisper models)
Python Environment
# Install dependencies
pip install -r src/features/realtime-transcription/scripts/requirements.txt
# First run downloads model (~500MB)
python3 src/features/realtime-transcription/scripts/interbrain-transcribe.py --output test.md
Obsidian Plugin
- Node.js
child_process module (built-in)
- Zustand store access
- Command palette registration
Future Integration Points
Semantic Search (Epic 5)
- Auto-index transcripts after session ends
- Extract text without timestamps for embedding
- Link transcript nodes to DreamNodes
Copilot Mode (Future Epic)
- Real-time transcription during DreamSong creation
- Voice-driven canvas node creation
- Spoken relationship mapping
DreamWeaving (Epic 6)
- Transcribe video calls for DreamSong content
- Auto-generate DreamTalk summaries from transcripts
- Voice annotation for canvas nodes
Technical References
Research Links
Related Issues
Notes
Why Not Tauri?
Originally considered Tauri system tray app but realized unnecessary complexity:
- No UI actually needed (Obsidian commands suffice)
- Terminal output via stdout is perfect for monitoring
- Python subprocess avoids Rust/IPC overhead
- Much faster development cycle
Why LocalAgreement-2?
Prevents the duplicate sentence problem that plagues naive streaming approaches:
- Waits for 2 consecutive chunks to agree on prefix before committing
- Handles retroactive corrections seamlessly (like macOS dictation)
- Production-proven at IWSLT 2022 conference
Performance Expectations
- Latency: 3.3 seconds (whisper_streaming benchmark)
- Accuracy: Whisper-grade (best in class)
- CPU: <50% on Apple Silicon M1
- Memory: ~150-200MB for Python process
- Disk I/O: Negligible (appending text to file)
Overview
A minimalist Python CLI tool for real-time speech transcription that integrates seamlessly with the InterBrain Obsidian plugin. Uses whisper_streaming with LocalAgreement-2 policy to append timestamped, duplicate-free transcripts to markdown files for post-call summarization and semantic search integration.
Vision
Enable effortless voice-to-text capture during conversations, meetings, and ideation sessions. Transcripts appear seamlessly in markdown files with timestamps, ready for semantic search indexing and DreamNode integration. No UI complexity - just terminal commands orchestrated by Obsidian.
Architectural Decision: Deferred Marketplace Compatibility
Current Implementation Strategy: Build as self-contained vertical slice feature with Python scripts included in
src/features/realtime-transcription/scripts/.Rationale:
Path to Marketplace Compatibility (Optional Future):
If we later decide to pursue Obsidian marketplace listing:
scripts/directory to separateinterbrain-transcription-extensionrepositorytranscription-service.tspath resolution to check vault.interbrain/extensions/firstWhat Stays The Same (No Rewrite):
What Changes (Minimal Refactoring):
Current Focus: Build robust, cross-platform transcription feature. Defer packaging decisions until functionality proven.
Technical Architecture
Core Components
1. Python CLI Script:
interbrain-transcribe.pywhisper_streaminglibrary (UFAL - IWSLT 2022 winner)sounddevicelibrary2. Obsidian Plugin Integration
child_process.spawn()for process managementWhy This Approach?
Decision: Python CLI over Tauri
Decision: whisper_streaming over raw whisper.cpp
Decision: whisper.cpp over Nvidia Parakeet
Implementation Specification
Python CLI Script
Command-Line Interface:
Arguments:
--output(required): Path to output markdown file (absolute or relative)--device(optional): Microphone device ID/name for selection--model(optional): Whisper model size (tiny, base, small.en, medium, large)--language(optional): Language code (default: auto-detect)Output Format:
Design Decisions:
[YYYY-MM-DD HH:MM:SS]for easy parsing\[.*?\] (.*)to strip timestamps for search indexingTerminal Output (for Obsidian monitoring):
Error Handling:
❌ Microphone permission denied. Grant access in System Settings → Privacy & Security → Microphone.❌ Cannot write to [path]. Check file permissions.📥 Downloading whisper model 'small.en' (first run, ~500MB)...❌ Device 'X' not found. Available devices:\n - MacBook Pro Microphone\n - External USB Mic❌ whisper_streaming not installed. Run: pip install whisper-streamingObsidian Plugin Integration
New Commands File:
src/features/realtime-transcription/commands/transcription-commands.tsCommand 1: Start Real-Time Transcription
Command 2: Stop Real-Time Transcription
Zustand Store Extension:
Plugin Cleanup (main.ts):
Dependencies
Python Dependencies (
requirements.txt):Installation:
First-Time Model Download:
The whisper model will auto-download on first run (~500MB for small.en). User sees progress:
Implementation Plan
Phase 1: Python CLI Script (~1 day)
Tasks:
src/features/realtime-transcription/scripts/interbrain-transcribe.pywith basic structureAcceptance Criteria:
Phase 2: Obsidian Plugin Integration (~1 day)
Tasks:
src/features/realtime-transcription/commands/transcription-commands.tsAcceptance Criteria:
Phase 3: Testing & Documentation (~1 day)
Tasks:
Acceptance Criteria:
Phase 4: Polish & Future Enhancements (Optional)
Potential Future Improvements:
rumpsapp for visual statusFile Structure
Testing Strategy
Unit Tests (Python)
Integration Tests (Python + Audio)
Manual Testing (End-to-End)
Performance Testing
Success Criteria
MVP Definition of Done
Quality Gates
Dependencies & Prerequisites
System Requirements
Python Environment
Obsidian Plugin
child_processmodule (built-in)Future Integration Points
Semantic Search (Epic 5)
Copilot Mode (Future Epic)
DreamWeaving (Epic 6)
Technical References
Research Links
Related Issues
Notes
Why Not Tauri?
Originally considered Tauri system tray app but realized unnecessary complexity:
Why LocalAgreement-2?
Prevents the duplicate sentence problem that plagues naive streaming approaches:
Performance Expectations