Skip to content

aporb/LocalTranscribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

59 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LocalTranscribe

Privacy-first audio transcription with speaker diarization. Entirely offline.

Transform recordings into detailed transcripts showing who said what and whenβ€”all on your Mac, with complete privacy.


Why LocalTranscribe?

Feature LocalTranscribe Cloud Services
Privacy 100% offline processing Data uploaded to third-party servers
Cost Free forever $10-50/month subscription
Speaker Identification Automatic speaker detection Often extra cost or unavailable
Speed (Apple Silicon) Real-time to 2x audio length Depends on upload/download speed
Quality OpenAI Whisper models Varies by provider
Data Ownership All files stay on your machine Depends on provider terms

Perfect for: Researchers, podcasters, journalists, legal professionals, content creatorsβ€”anyone who needs accurate transcripts with speaker labels and complete data privacy.


Features

Core Features

  • πŸ”’ Complete Privacy - All processing happens locally on your machine
  • 🎯 Speaker Diarization - Automatic detection of who spoke when
  • 🏷️ Speaker Labeling - Replace speaker IDs with actual names
  • πŸ§™β€β™‚οΈ Guided Wizard - Dummy-proof setup for beginners
  • πŸ“‚ Interactive File Browser - Navigate folders and select files with arrow keys
  • πŸ”‘ Smart Token Management - One-time HuggingFace token setup with validation
  • πŸ“ High Accuracy - Powered by OpenAI's Whisper models (defaults to medium)
  • ⚑️ Apple Silicon Optimized - Auto-detects and uses MLX on M1/M2/M3/M4 Macs
  • πŸš€ Simple CLI - Zero commands needed - just run localtranscribe
  • πŸ“¦ Python SDK - Integrate transcription into your applications
  • πŸ”„ Batch Processing - Process multiple files simultaneously
  • πŸ“Š Multiple Formats - Output as TXT, JSON, SRT, or Markdown

Quality Enhancements (v3.1.1)

  • 🎯 Intelligent Segment Processing - 50-70% reduction in false speaker switches
  • 🧠 Enhanced Speaker Mapping - 30-40% better speaker attribution accuracy
  • πŸ”Š Audio Quality Analysis - Pre-processing quality assessment with SNR calculation
  • βœ… Quality Gates System - Per-stage validation with actionable recommendations
  • πŸ“š Domain Dictionaries - 360+ specialized terms across 8 domains (military, technical, business, medical, legal, academic, common, entities)
  • πŸ”€ Acronym Expansion - 180+ definitions with intelligent context-aware disambiguation
  • 🧠 Context-Aware Matching - spaCy NER for intelligent acronym disambiguation (IP, PR, AI, OR, PI)
  • ⚑ High-Performance Matching - FlashText integration for 10-100x faster dictionary lookups
  • ✨ Typo Tolerance - RapidFuzz fuzzy matching for automatic typo correction
  • πŸ€– Auto-Download Models - Automatic spaCy model management with user prompts
  • πŸ“Š Real-Time Progress Tracking - Live progress bars and time estimates during transcription

Quick Start

Install from PyPI

Package: pypi.org/project/localtranscribe

pip install localtranscribe

Setup HuggingFace Token (One-Time)

Speaker diarization requires a free HuggingFace account. The wizard will guide you through setup:

  1. Create account & get token: https://huggingface.co/settings/tokens
  2. Accept model licenses (click "Agree" on each):
  3. Enter token when prompted - The wizard will:
    • Validate your token format
    • Auto-save to .env file
    • Never ask again after successful setup

Manual setup (optional):

echo "HUGGINGFACE_TOKEN=hf_your_token_here" > .env

Transcribe Audio

🎯 The Simplest Way (Recommended for Everyone!):

# Option 1: Browse for files interactively
localtranscribe

# Option 2: Provide file path directly
localtranscribe your-audio.mp3

Both methods start the guided wizard that walks you through all options interactively. The interactive browser lets you navigate folders and select files with arrow keys. Perfect for beginners, fast for everyone!

⚑️ Direct Mode (For Power Users):

localtranscribe process your-audio.mp3

🎯 Advanced with All Features:

localtranscribe process your-audio.mp3 --labels speakers.json --proofread

Done! Results appear in ./output/ with speaker labels, timestamps, and full transcript.


Installation

Option 1: Install from PyPI (Recommended)

# Basic installation
pip install localtranscribe

# For Apple Silicon optimization (recommended for M1/M2/M3/M4)
pip install localtranscribe[mlx]

# For NVIDIA GPU support
pip install localtranscribe[faster]

# Install all optional dependencies
pip install localtranscribe[all]

Option 2: Install from Source

# Clone repository
git clone https://github.com/aporb/LocalTranscribe.git
cd LocalTranscribe

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install in development mode
pip install -e .

Verify Installation

localtranscribe doctor

This command checks your system configuration and reports any issues.


Usage Examples

Guided Wizard (Now the Default!)

# Option 1: Interactive file browser
localtranscribe

# Option 2: Provide file path directly
localtranscribe interview.mp3

The wizard will guide you through:

  • Interactive file selection (if no file provided)
  • HuggingFace token setup with validation and auto-save
  • Quality vs speed preferences (defaults to medium model)
  • Speaker detection options
  • Speaker labeling setup
  • Automatic proofreading
  • Output location

Note: The wizard runs automatically when you run localtranscribe or provide an audio file. Use localtranscribe process for direct mode.

Simple Mode

# Smart defaults with minimal prompts
localtranscribe process meeting.mp3 --simple

Simple mode:

  • Auto-detects speaker labels file if present
  • Prompts for speaker count if unknown
  • Asks about proofreading preferences
  • Shows detailed progress

Basic Transcription

# Transcribe with automatic settings
localtranscribe process meeting.mp3

# Specify number of speakers for better accuracy
localtranscribe process interview.wav --speakers 2

# Use larger model for higher quality
localtranscribe process podcast.m4a --model medium

# Save to custom location
localtranscribe process audio.mp3 --output ./results/

Speaker Labeling

# Create a speaker labels file (speakers.json):
{
  "SPEAKER_00": "John Smith",
  "SPEAKER_01": "Jane Doe"
}

# Apply labels during processing
localtranscribe process meeting.mp3 --labels speakers.json

# Save speaker IDs for later labeling
localtranscribe process meeting.mp3 --save-labels speakers.json

Automatic Proofreading

# Enable proofreading with default rules
localtranscribe process meeting.mp3 --proofread

# Use thorough proofreading
localtranscribe process meeting.mp3 --proofread --proofread-level thorough

# Custom proofreading rules
localtranscribe process meeting.mp3 --proofread --proofread-rules my-rules.json

# NEW in v3.1.1: Enable domain-specific dictionaries (360+ specialized terms)
localtranscribe process meeting.mp3 --proofread --domains technical business legal

# NEW in v3.1.1: Enable context-aware acronym expansion (180+ definitions)
localtranscribe process meeting.mp3 --proofread --expand-acronyms --context-aware

# Check NLP model status and download if needed
localtranscribe check-models
localtranscribe check-models --download en_core_web_sm

Proofreading fixes:

  • Technical terms (API, JavaScript, Python, AWS, Docker, etc.)
  • Business terms (CEO, KPI, B2B, ROI, etc.)
  • Military terms (Captain, Colonel, battalion, etc.)
  • Medical terms (procedures, medications, conditions)
  • Common homophones (your/you're, their/there)
  • Contractions and grammar
  • Excessive repetitions

v3.1.1 Enhancements:

  • Domain Dictionaries: 360+ specialized terms across 8 domains (military, technical, business, medical, legal, academic, common, entities)
  • Acronym Expansion: 180+ definitions with intelligent context-aware disambiguation
  • Context-Aware Matching: spaCy NER for intelligent acronym resolution (IP, PR, AI, OR, PI)
  • High-Performance Matching: FlashText integration for 10-100x faster dictionary lookups
  • Typo Tolerance: RapidFuzz fuzzy matching for automatic typo correction
  • Auto-Download Models: Automatic spaCy model management with user prompts
  • Multiple Formats: Parenthetical API (Application Programming Interface), replacement, or footnote styles

Batch Processing

# Process entire folder
localtranscribe batch ./audio-files/

# Process with multiple workers
localtranscribe batch ./recordings/ --workers 4

# With custom settings
localtranscribe batch ./files/ --model small --output ./transcripts/

Single-Speaker Content

# Skip speaker detection for faster processing
localtranscribe process lecture.mp3 --skip-diarization

Advanced Options

localtranscribe process audio.mp3 \
  --model medium \              # Model: tiny|base|small|medium|large
  --speakers 3 \                # Number of speakers (if known)
  --language en \               # Force specific language
  --format txt json srt \       # Output formats
  --output ./results/ \         # Output directory
  --verbose                     # Show detailed progress

Using the Python SDK

Basic Usage:

from localtranscribe import LocalTranscribe

# Initialize with options
lt = LocalTranscribe(
    model_size="base",
    num_speakers=2,
    output_dir="./transcripts"
)

# Process single file
result = lt.process("meeting.mp3")

# Access results
print(f"Transcript: {result.transcript}")
print(f"Speakers: {result.num_speakers}")
print(f"Duration: {result.duration}s")

# Access detailed segments
for segment in result.segments:
    print(f"[{segment.speaker}] {segment.text}")

# Batch processing
results = lt.process_batch("./audio-files/", max_workers=4)
print(f"Completed: {results.successful}/{results.total}")

NEW in v3.1 - Advanced Pipeline with Quality Features:

from localtranscribe.pipeline import PipelineOrchestrator

# Enable all quality enhancements
pipeline = PipelineOrchestrator(
    audio_file="meeting.wav",
    output_dir="./output",
    # Phase 1: Segment Processing
    enable_segment_processing=True,
    use_speaker_regions=True,
    # Phase 2: Audio Analysis & Quality Gates
    enable_audio_analysis=True,
    enable_quality_gates=True,
    quality_report_path="./quality_report.txt",
    # Phase 2: Enhanced Proofreading
    enable_proofreading=True,
    proofreading_domains=["technical", "business"],
    enable_acronym_expansion=True,
    verbose=True
)

result = pipeline.run()

NEW in v3.1 - Standalone Quality Analysis:

# Audio Quality Analysis
from localtranscribe.audio import AudioAnalyzer

analyzer = AudioAnalyzer(verbose=True)
analysis = analyzer.analyze("audio.wav")
print(f"Quality: {analysis.quality_level.value}")
print(f"SNR: {analysis.snr_db:.1f} dB")
print(f"Recommended Model: {analysis.recommended_whisper_model}")

# Quality Gates Assessment
from localtranscribe.quality import QualityGate, QualityThresholds

gate = QualityGate(thresholds=QualityThresholds(), verbose=True)
assessment = gate.assess_diarization_quality(diarization_result)
print(f"Score: {assessment.overall_score:.2f}")
print(f"Passed: {assessment.passed}")

β†’ Full SDK Documentation


Output Formats

LocalTranscribe generates multiple output files for different use cases:

Format File Description
Markdown *_combined.md Formatted transcript with speaker labels and timestamps
Plain Text *_transcript.txt Simple text output for analysis
JSON *_transcript.json Structured data for programming
SRT *_transcript.srt Subtitle format for video
Diarization *_diarization.md Speaker timeline and statistics

Example Output:

# Combined Transcript

**Audio File:** interview.mp3
**Processing Date:** 2025-10-13 22:30:00

## SPEAKER_00
**Time:** [0.0s - 5.2s]

Hello, welcome to the show. Thanks for joining us today.

## SPEAKER_01
**Time:** [5.5s - 12.8s]

Thanks for having me. I'm excited to discuss our new project.

Commands

Default Command (Easiest!)

localtranscribe                        # Interactive file browser + wizard
localtranscribe audio.mp3              # Automatically runs wizard - perfect for everyone!

All Commands

Command Description Example
DEFAULT 🎯 Interactive file browser (no args) or wizard (with file) localtranscribe or localtranscribe audio.mp3
wizard πŸ§™β€β™‚οΈ Guided interactive setup (explicit) localtranscribe wizard audio.mp3
process Direct transcription without wizard localtranscribe process audio.mp3
batch Process multiple files localtranscribe batch ./folder/
doctor Verify system setup localtranscribe doctor
check-models πŸ” Check NLP model status and download models localtranscribe check-models
label Replace speaker IDs with names localtranscribe label output.md
version Show version information localtranscribe version
config Manage configuration localtranscribe config show

πŸ’‘ Pro Tip: Just run localtranscribe to browse and select files interactively, or localtranscribe audio.mp3 to transcribe directly!

Run localtranscribe --help or localtranscribe <command> --help for detailed options.

New in v3.1.1:

  • 🎯 Intelligent Segment Processing - Filters micro-segments, merges continuations (50-70% fewer false switches)
  • 🧠 Enhanced Speaker Mapping - Region-based context for better attribution (30-40% accuracy improvement)
  • πŸ”Š Audio Quality Analysis - Pre-processing SNR, quality assessment, parameter recommendations
  • βœ… Quality Gates System - Per-stage validation with actionable recommendations
  • πŸ“š Domain Dictionaries - 360+ specialized terms across 8 domains (military, technical, business, medical, legal, academic, common, entities)
  • πŸ”€ Acronym Expansion - 180+ definitions with intelligent context-aware disambiguation
  • 🧠 Context-Aware Matching - spaCy NER for intelligent acronym disambiguation (IP, PR, AI, OR, PI)
  • ⚑ High-Performance Matching - FlashText integration for 10-100x faster dictionary lookups
  • ✨ Typo Tolerance - RapidFuzz fuzzy matching for automatic typo correction
  • πŸ€– Auto-Download Models - Automatic spaCy model management with interactive user prompts
  • πŸ” Model Status CLI - check-models command to verify and download NLP models
  • πŸ“Š Real-Time Progress Tracking - Live progress bars (Faster-Whisper) and time estimates (MLX-Whisper) during transcription
  • βš™οΈ 20+ New Configuration Options - Fine-tune quality thresholds, enable context-aware features
  • πŸ“Š Quality Reports - Comprehensive quality assessment with severity indicators and recommendations
  • πŸš€ ~4,500 Lines of Production Code - 12 new files, 15+ new dataclasses, 60+ new methods
  • ✨ 100% Backward Compatible - All features are opt-in and configurable

New in v3.0.0:

  • ✨ Wizard is now the default - just provide your audio file!
  • --simple mode for process command
  • --labels and --proofread flags
  • Automatic speaker labeling
  • Intelligent proofreading with 100+ rules

Model Selection Guide

Choose the right Whisper model for your needs:

Model Speed Quality RAM Use Case
tiny Fastest Basic 1GB Quick drafts, testing
base Fast Good 1GB Quick transcription
small Moderate Better 2GB Longer recordings
medium Moderate Excellent 5GB Default - Best balance
large Slow Best 10GB Maximum accuracy

Performance on M2 Mac with MLX (10-minute audio):

  • tiny: ~30 seconds
  • base: ~2 minutes
  • small: ~5 minutes
  • medium: ~7 minutes ← Default starting point
  • large: ~15 minutes

Note: LocalTranscribe automatically uses MLX-Whisper on Apple Silicon Macs for optimal performance.


System Requirements

Recommended:

  • Mac with Apple Silicon (M1/M2/M3/M4)
  • 16GB RAM
  • 10GB free disk space
  • macOS 12.0 or later

Minimum:

  • Any Mac with Python 3.9+
  • 8GB RAM
  • 5GB free disk space
  • macOS 11.0 or later

Supported Audio Formats:

  • Audio: MP3, WAV, OGG, M4A, FLAC, AAC, WMA, OPUS
  • Video: MP4, MOV, AVI, MKV, WEBM (audio will be extracted)

How It Works

LocalTranscribe uses a three-stage pipeline:

1. Speaker Diarization (pyannote.audio)

  • Analyzes audio waveform patterns
  • Identifies distinct speakers
  • Creates precise speaker timeline
  • Optimized for 2-10 speakers

2. Speech-to-Text (Whisper)

  • Converts speech to text using OpenAI's Whisper
  • Automatically detects language
  • Handles accents and background noise
  • Creates timestamped segments
  • Real-time progress tracking:
    • MLX-Whisper: Shows audio duration and estimated completion time based on hardware benchmarks
    • Faster-Whisper: Live progress bar updating as segments are processed
    • Eliminates long silent waits during transcription

3. Intelligent Combination

  • Aligns speaker labels with transcript
  • Matches timestamps accurately
  • Formats output for readability
  • Generates multiple export formats

Technologies:


Documentation

πŸ“š SDK Reference - Python API documentation
πŸ› Troubleshooting Guide - Common issues and solutions
πŸ“ Changelog - Version history and updates
πŸš€ Contributing Guide - How to contribute


Troubleshooting

Common Issues

Command not found after installation:

# Ensure package is installed
pip install --upgrade localtranscribe

# If using virtual environment, activate it first
source .venv/bin/activate

HuggingFace authentication error:

# Verify token is correctly set
cat .env

# Should show: HUGGINGFACE_TOKEN=hf_...
# Make sure you accepted both model licenses

Slow processing:

# Use a faster model
localtranscribe process audio.mp3 --model tiny

# Skip diarization for single speaker
localtranscribe process audio.mp3 --skip-diarization

Run system check:

localtranscribe doctor

This command diagnoses common setup issues and suggests fixes.

β†’ Full Troubleshooting Guide


What's New

v3.1.2 (Current) - Stability & Progress Tracking πŸ”§

  • πŸ› Critical Bug Fix - Fixed name 'Span' is not defined error in combination stage
  • πŸ“Š Live Progress Tracking - Real-time progress bars for MLX-Whisper and Original Whisper during transcription
  • ⏱️ Time Estimates - Shows elapsed time and estimated remaining time during processing
  • ⚑ Background Progress Updates - Non-blocking progress tracker updating every 0.5s
  • βœ… Improved User Experience - No more silent waits during long transcriptions
  • πŸ”§ Type Hint Fixes - Proper deferred evaluation for optional dependencies

v3.1.1 - Context-Aware Intelligence 🧠

  • 🧠 Context-Aware Matching - spaCy NER for intelligent acronym disambiguation (IP, PR, AI, OR, PI)
  • ⚑ High-Performance Matching - FlashText integration for 10-100x faster dictionary lookups
  • ✨ Typo Tolerance - RapidFuzz fuzzy matching for automatic typo correction (85% threshold)
  • πŸ€– Auto-Download Models - Automatic spaCy model management with interactive user prompts
  • πŸ” Model Status CLI - New check-models command to verify and download NLP models
  • πŸ“š Expanded Dictionaries - 360+ specialized terms across 8 domains (added Legal and Academic)
  • πŸ”€ Enhanced Acronyms - 180+ definitions with context-aware disambiguation
  • πŸ“Š Frequency Tracking - Usage statistics for intelligent expansion decisions
  • βš™οΈ 20+ Configuration Options - New context-aware and fuzzy matching parameters
  • ✨ 100% Backward Compatible - All features are opt-in and configurable

v3.1.0 - Quality Revolution 🎯

  • 🎯 Intelligent Segment Processing - 50-70% reduction in false speaker switches
  • 🧠 Enhanced Speaker Mapping - 30-40% better speaker attribution accuracy
  • πŸ”Š Audio Quality Analysis - Pre-processing SNR and quality assessment
  • βœ… Quality Gates System - Per-stage validation with actionable recommendations
  • πŸ“š Domain Dictionaries - 260+ specialized terms (technical, business, military, medical)
  • πŸ”€ Acronym Expansion - 80+ definitions with intelligent context-aware expansion
  • πŸ“Š Real-Time Progress Tracking - Live progress bars and time estimates during transcription
  • βš™οΈ 15+ New Configuration Options - Fine-tune quality thresholds and processing
  • πŸ“Š Quality Reports - Comprehensive assessment with recommendations
  • ✨ 100% Backward Compatible - All features are opt-in and configurable

v3.0.0 - Major UX Overhaul πŸŽ‰

  • ✨ Interactive File Browser - Navigate folders and select files with arrow keys
  • ✨ Smart Token Management - Inline HuggingFace token entry with validation
  • ✨ Guided Wizard - Dummy-proof interactive setup (now the default!)
  • ✨ Auto-Proofreading - Fix 100+ common transcription errors
  • ✨ Speaker Labeling - Integrated speaker name replacement
  • πŸ”§ Default Model: Changed to medium for better quality
  • πŸš€ Auto MLX Detection - Automatically uses MLX-Whisper on Apple Silicon

v2.0.2b1

  • βœ… Updated package description and metadata
  • βœ… Enhanced README with PyPI link
  • βœ… Professional documentation polish

v2.0.1-beta

  • βœ… Published to PyPI - Install with pip install localtranscribe
  • βœ… Fixed pyannote.audio 3.x API compatibility
  • βœ… Updated documentation for model licenses

v2.0.0-beta

  • βœ… Complete rewrite with modern CLI
  • βœ… Python SDK for programmatic use
  • βœ… Batch processing support
  • βœ… System health checks with doctor command
  • βœ… Modular architecture

β†’ Full Changelog


Contributing

We welcome contributions! Here's how to get started:

  1. Check existing issues at github.com/aporb/LocalTranscribe/issues
  2. Fork the repository and create your feature branch
  3. Make your changes following the existing code style
  4. Add tests if applicable
  5. Submit a pull request with a clear description

Development Setup:

git clone https://github.com/aporb/LocalTranscribe.git
cd LocalTranscribe
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

License

MIT License - Free for personal and commercial use.

See LICENSE for full details.


Support

Need help?

  1. Run localtranscribe doctor to check your setup
  2. Check the Troubleshooting Guide
  3. Search existing issues
  4. Open a new issue with:
    • Output from localtranscribe doctor
    • Error message or unexpected behavior
    • Your system info (OS, Python version)

Acknowledgments

LocalTranscribe builds on excellent open-source work:

  • OpenAI - Whisper speech recognition model
  • Apple - MLX framework for Metal acceleration
  • Pyannote team - Speaker diarization models
  • HuggingFace - Model hosting and distribution

⭐ Star on GitHub β€’ πŸ› Report Bug β€’ πŸ’‘ Request Feature

Made for privacy-conscious professionals who value data ownership.

Transform audio to text. Know who said what. Keep it private.

About

A lightweight, offline-first transcription utility for audio and video files with multi-speaker diarization, built entirely in Python.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors