SubZero

Real-time speech recognition with system tray interface, dual-mode operation, and customizable streaming subtitles.

Last Updated: April 2026
Academic project - BCA Final Year (2025-2026)

📸 Interface Previews

Startup & Splash Menu

Dictate Mode (LLM Review)

Watch Mode (Streaming Subtitles)

Settings & Application Tray

Features

🎤 Dictate Mode (Offline Transcription)

Speak into your microphone and SubZero directly injects native keystrokes into any focused window!
Sub-5ms grammar checking via Harper CLI.
Immersive Audio Visualizer: A responsive real-time audio wave actively reacts to your microphone energy directly across the GUI.
On-the-fly LLM Review Window allowing rapid text formatting (Grammar, Email, Essay) via seamless local AI processing pipelines.

👁️ Watch Mode (Streaming Subtitles)

Translates local system audio explicitly into low-latency on-screen subtitles using PyAudio native WASAPI loopbacks.
Dynamic Sizing & Click-Through: 100% overlay compatibility over movies, videos, and games without interfering with your mouse.
Streaming & VAD Tracking: Subtitles render fluidly during speech using configurable 1-10s polling intervals.

🖥️ Native UI & Controls

Splash Menu: Intuitive startup workflow built natively via PyQt6 allowing immediate, state-aware execution.
System Tray Hub: Complete background execution with right-click traversal.
Universal Settings GUI: 100% hot-reload configurations for overlay styling, font customizations, runtime intervals, and language parameters without requiring software restarts!

Performance

Optimized Performance (v2.1 - February 2026)

20-40x performance improvement through batched inference and optimizations:

Transcription Speed

Before: 1-3 seconds per 3-second audio chunk (0.33-1.0x real-time)
After: 0.05-0.15 seconds per 3-second audio chunk (20-60x real-time)
Technique: BatchedInferencePipeline with batch_size=16, beam_size=1 (greedy decoding)

Real-Time Performance

Real-Time Factor: 2.5-3.5x faster than audio (large-v3), 20-40x (turbo model)
Dictate Mode: Instant typing (<0.2s from silence detection to first keystroke)

Resource Efficiency

VRAM Usage: ~1.9 GB with INT8 quantization (vs 2.9 GB FP16)
CPU Idle Usage: <1% (reduced from 5% via queue timeout optimization)

Requirements

OS: Windows 10/11 (Strictly uses Win32 Native Hooks)
Runtime: Python 3.11+
Hardware: NVIDIA GPU with CUDA support (Minimum 4GB+ VRAM)

Installation & Setup

Basic Setup

Clone repository:

git clone https://github.com/Tech-Genkai/SubZero.git
cd SubZero

Create virtual environment:

python -m venv venv
.\venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

(Optional) Download Harper CLI for grammar checking:

# Download from https://github.com/Automattic/harper/releases/tag/v1.4.1
# Extract harper-cli.exe to tools/harper/

Note: Windows users must have FFmpeg installed and added to their system PATH for audio processing.

LLM Review Window Setup (Optional)

The Review Window provides AI-powered text formatting during dictation. To enable it:

Install Groq package:

pip install groq

Get free Groq API key:
- Visit https://console.groq.com/keys
- Create free account & Generate new API key
Create .env file:

Copy-Item .env.example .env
notepad .env

Add your API key to .env:

GROQ_API_KEY=your_key_here

Usage

Start SubZero globally via terminal execution:

python main.py

Emergency Panic Stop: Press Ctrl+Alt+S globally to instantly force-stop active threads.

Troubleshooting

Installation Issues

CTranslate2 CUDA errors: Ensure ctranslate2==4.5.0 is installed (not 4.7.0 which has a Windows CUDA bug)

Dictate / Watch Mode

No typing: Focus a text editor before speaking.
Audio not detected: Check Windows mic permissions.
No subtitles: Ensure audio is playing through your default system output device.
Wrong position: Right-click the SubZero tray icon, open Settings GUI, and adjust position/width entirely natively.

LLM Review Window Issues

"Review Window disabled" message:
- Check GROQ_API_KEY is set inside the .env file!
Mute button seems unresponsive:
- Mute only applies while Dictate Mode is actively capturing audio. If not in Dictate Mode, the button does not dynamically bind.

Documentation Guides

The massive technical and implementation logs were intentionally split across logical markers:

ROADMAP.md - Future releases, v2.0 hardware deployment horizons, and pipeline iterations.
TODO.md - Raw execution constraints and checklist matrices.
IMPLEMENTATION_STRATEGY.md - All core technical mechanics, complex threading architectures, structural boundaries, Python optimizations, and GPU deployment constraints natively.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SubZero

📸 Interface Previews

Startup & Splash Menu

Dictate Mode (LLM Review)

Watch Mode (Streaming Subtitles)

Settings & Application Tray

Table of Contents

Features

🎤 Dictate Mode (Offline Transcription)

👁️ Watch Mode (Streaming Subtitles)

🖥️ Native UI & Controls

Performance

Optimized Performance (v2.1 - February 2026)

Transcription Speed

Real-Time Performance

Resource Efficiency

Requirements

Installation & Setup

Basic Setup

LLM Review Window Setup (Optional)

Usage

Troubleshooting

Installation Issues

Dictate / Watch Mode

LLM Review Window Issues

Documentation Guides

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
core		core
output		output
screenshots		screenshots
ui		ui
utils		utils
.env.example		.env.example
IMPLEMENTATION_STRATEGY.md		IMPLEMENTATION_STRATEGY.md
README.md		README.md
ROADMAP.md		ROADMAP.md
TODO.md		TODO.md
config.default.json		config.default.json
diagrams.html		diagrams.html
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SubZero

📸 Interface Previews

Startup & Splash Menu

Dictate Mode (LLM Review)

Watch Mode (Streaming Subtitles)

Settings & Application Tray

Table of Contents

Features

🎤 Dictate Mode (Offline Transcription)

👁️ Watch Mode (Streaming Subtitles)

🖥️ Native UI & Controls

Performance

Optimized Performance (v2.1 - February 2026)

Transcription Speed

Real-Time Performance

Resource Efficiency

Requirements

Installation & Setup

Basic Setup

LLM Review Window Setup (Optional)

Usage

Troubleshooting

Installation Issues

Dictate / Watch Mode

LLM Review Window Issues

Documentation Guides

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages