Skip to content

Tech-Genkai/SubZero

Repository files navigation

SubZero

Python 3.11+ License: MIT

Real-time speech recognition with system tray interface, dual-mode operation, and customizable streaming subtitles.

Last Updated: April 2026
Academic project - BCA Final Year (2025-2026)


📸 Interface Previews

Startup & Splash Menu

SubZero Splash Screen

Dictate Mode (LLM Review)

Dictate Mode

Watch Mode (Streaming Subtitles)

Streaming Subtitles

Settings & Application Tray

Settings Menu System Tray Menu

Table of Contents

  1. Features
  2. Performance
  3. Requirements
  4. Installation & Setup
  5. Usage
  6. Troubleshooting
  7. Documentation Guides
  8. License

Features

🎤 Dictate Mode (Offline Transcription)

  • Speak into your microphone and SubZero directly injects native keystrokes into any focused window!
  • Sub-5ms grammar checking via Harper CLI.
  • Immersive Audio Visualizer: A responsive real-time audio wave actively reacts to your microphone energy directly across the GUI.
  • On-the-fly LLM Review Window allowing rapid text formatting (Grammar, Email, Essay) via seamless local AI processing pipelines.

👁️ Watch Mode (Streaming Subtitles)

  • Translates local system audio explicitly into low-latency on-screen subtitles using PyAudio native WASAPI loopbacks.
  • Dynamic Sizing & Click-Through: 100% overlay compatibility over movies, videos, and games without interfering with your mouse.
  • Streaming & VAD Tracking: Subtitles render fluidly during speech using configurable 1-10s polling intervals.

🖥️ Native UI & Controls

  • Splash Menu: Intuitive startup workflow built natively via PyQt6 allowing immediate, state-aware execution.
  • System Tray Hub: Complete background execution with right-click traversal.
  • Universal Settings GUI: 100% hot-reload configurations for overlay styling, font customizations, runtime intervals, and language parameters without requiring software restarts!

Performance

Optimized Performance (v2.1 - February 2026)

20-40x performance improvement through batched inference and optimizations:

Transcription Speed

  • Before: 1-3 seconds per 3-second audio chunk (0.33-1.0x real-time)
  • After: 0.05-0.15 seconds per 3-second audio chunk (20-60x real-time)
  • Technique: BatchedInferencePipeline with batch_size=16, beam_size=1 (greedy decoding)

Real-Time Performance

  • Real-Time Factor: 2.5-3.5x faster than audio (large-v3), 20-40x (turbo model)
  • Dictate Mode: Instant typing (<0.2s from silence detection to first keystroke)

Resource Efficiency

  • VRAM Usage: ~1.9 GB with INT8 quantization (vs 2.9 GB FP16)
  • CPU Idle Usage: <1% (reduced from 5% via queue timeout optimization)

Requirements

  • OS: Windows 10/11 (Strictly uses Win32 Native Hooks)
  • Runtime: Python 3.11+
  • Hardware: NVIDIA GPU with CUDA support (Minimum 4GB+ VRAM)

Installation & Setup

Basic Setup

  1. Clone repository:
git clone https://github.com/Tech-Genkai/SubZero.git
cd SubZero
  1. Create virtual environment:
python -m venv venv
.\venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. (Optional) Download Harper CLI for grammar checking:
# Download from https://github.com/Automattic/harper/releases/tag/v1.4.1
# Extract harper-cli.exe to tools/harper/

Note: Windows users must have FFmpeg installed and added to their system PATH for audio processing.

LLM Review Window Setup (Optional)

The Review Window provides AI-powered text formatting during dictation. To enable it:

  1. Install Groq package:
pip install groq
  1. Get free Groq API key:

  2. Create .env file:

Copy-Item .env.example .env
notepad .env
  1. Add your API key to .env:
GROQ_API_KEY=your_key_here

Usage

Start SubZero globally via terminal execution:

python main.py

Emergency Panic Stop: Press Ctrl+Alt+S globally to instantly force-stop active threads.

Troubleshooting

Installation Issues

  • CTranslate2 CUDA errors: Ensure ctranslate2==4.5.0 is installed (not 4.7.0 which has a Windows CUDA bug)

Dictate / Watch Mode

  • No typing: Focus a text editor before speaking.
  • Audio not detected: Check Windows mic permissions.
  • No subtitles: Ensure audio is playing through your default system output device.
  • Wrong position: Right-click the SubZero tray icon, open Settings GUI, and adjust position/width entirely natively.

LLM Review Window Issues

  • "Review Window disabled" message:
    • Check GROQ_API_KEY is set inside the .env file!
  • Mute button seems unresponsive:
    • Mute only applies while Dictate Mode is actively capturing audio. If not in Dictate Mode, the button does not dynamically bind.

Documentation Guides

The massive technical and implementation logs were intentionally split across logical markers:

  • ROADMAP.md - Future releases, v2.0 hardware deployment horizons, and pipeline iterations.
  • TODO.md - Raw execution constraints and checklist matrices.
  • IMPLEMENTATION_STRATEGY.md - All core technical mechanics, complex threading architectures, structural boundaries, Python optimizations, and GPU deployment constraints natively.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors