Real-time speech recognition with system tray interface, dual-mode operation, and customizable streaming subtitles.
Last Updated: April 2026
Academic project - BCA Final Year (2025-2026)
- Features
- Performance
- Requirements
- Installation & Setup
- Usage
- Troubleshooting
- Documentation Guides
- License
- Speak into your microphone and SubZero directly injects native keystrokes into any focused window!
- Sub-5ms grammar checking via Harper CLI.
- Immersive Audio Visualizer: A responsive real-time audio wave actively reacts to your microphone energy directly across the GUI.
- On-the-fly LLM Review Window allowing rapid text formatting (Grammar, Email, Essay) via seamless local AI processing pipelines.
- Translates local system audio explicitly into low-latency on-screen subtitles using PyAudio native WASAPI loopbacks.
- Dynamic Sizing & Click-Through: 100% overlay compatibility over movies, videos, and games without interfering with your mouse.
- Streaming & VAD Tracking: Subtitles render fluidly during speech using configurable 1-10s polling intervals.
- Splash Menu: Intuitive startup workflow built natively via PyQt6 allowing immediate, state-aware execution.
- System Tray Hub: Complete background execution with right-click traversal.
- Universal Settings GUI: 100% hot-reload configurations for overlay styling, font customizations, runtime intervals, and language parameters without requiring software restarts!
20-40x performance improvement through batched inference and optimizations:
- Before: 1-3 seconds per 3-second audio chunk (0.33-1.0x real-time)
- After: 0.05-0.15 seconds per 3-second audio chunk (20-60x real-time)
- Technique: BatchedInferencePipeline with batch_size=16, beam_size=1 (greedy decoding)
- Real-Time Factor: 2.5-3.5x faster than audio (large-v3), 20-40x (turbo model)
- Dictate Mode: Instant typing (<0.2s from silence detection to first keystroke)
- VRAM Usage: ~1.9 GB with INT8 quantization (vs 2.9 GB FP16)
- CPU Idle Usage: <1% (reduced from 5% via queue timeout optimization)
- OS: Windows 10/11 (Strictly uses Win32 Native Hooks)
- Runtime: Python 3.11+
- Hardware: NVIDIA GPU with CUDA support (Minimum 4GB+ VRAM)
- Clone repository:
git clone https://github.com/Tech-Genkai/SubZero.git
cd SubZero- Create virtual environment:
python -m venv venv
.\venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- (Optional) Download Harper CLI for grammar checking:
# Download from https://github.com/Automattic/harper/releases/tag/v1.4.1
# Extract harper-cli.exe to tools/harper/Note: Windows users must have FFmpeg installed and added to their system PATH for audio processing.
The Review Window provides AI-powered text formatting during dictation. To enable it:
- Install Groq package:
pip install groq-
Get free Groq API key:
- Visit https://console.groq.com/keys
- Create free account & Generate new API key
-
Create .env file:
Copy-Item .env.example .env
notepad .env- Add your API key to .env:
GROQ_API_KEY=your_key_hereStart SubZero globally via terminal execution:
python main.pyEmergency Panic Stop: Press
Ctrl+Alt+Sglobally to instantly force-stop active threads.
- CTranslate2 CUDA errors: Ensure
ctranslate2==4.5.0is installed (not 4.7.0 which has a Windows CUDA bug)
- No typing: Focus a text editor before speaking.
- Audio not detected: Check Windows mic permissions.
- No subtitles: Ensure audio is playing through your default system output device.
- Wrong position: Right-click the SubZero tray icon, open Settings GUI, and adjust position/width entirely natively.
- "Review Window disabled" message:
- Check
GROQ_API_KEYis set inside the.envfile!
- Check
- Mute button seems unresponsive:
- Mute only applies while Dictate Mode is actively capturing audio. If not in Dictate Mode, the button does not dynamically bind.
The massive technical and implementation logs were intentionally split across logical markers:
- ROADMAP.md - Future releases, v2.0 hardware deployment horizons, and pipeline iterations.
- TODO.md - Raw execution constraints and checklist matrices.
- IMPLEMENTATION_STRATEGY.md - All core technical mechanics, complex threading architectures, structural boundaries, Python optimizations, and GPU deployment constraints natively.
This project is licensed under the MIT License.




