Real-time speech recognition and translation overlay for macOS.
Captures system audio, transcribes speech using Apple's Speech framework, and displays translated subtitles in a floating overlay window. Works with any audio source — YouTube, podcasts, Zoom/Teams meetings, and more.
This project was entirely written by Claude (Anthropic's AI assistant). The code, build scripts, documentation, and CI/CD configuration were all generated through AI-assisted development. While functional, the code has not undergone formal human code review — use at your own discretion.
- Real-time system audio capture via ScreenCaptureKit (16kHz mono PCM)
- Speech-to-text using SFSpeechRecognizer (on-device or server-based)
- Live translation via Apple Translation framework — translates text as it's being recognized, not just after finalization
- Dual display modes:
- Combined — single overlay with both recognized and translated text
- Split — separate recognition and translation windows, independently positionable
- Floating overlay — resizable, movable, always-on-top window with customizable appearance
- Lock/Unlock — locked = click-through, unlocked = move/resize/scroll
- Scrollable subtitle history with auto-scroll
- Customizable appearance — separate font size/color for original and translated text, background color/opacity
- Automatic language detection (English, Korean, Japanese, Chinese)
- Smart text processing — sentence-based segmentation, pause detection, duplicate filtering, punctuation cleanup
- Session history recording with export
- Menu bar app — no Dock icon, minimal footprint
- macOS 15.0 (Sequoia) or later
- Apple Silicon (arm64)
- Download
OST.zipfrom the latest release - Unzip and move
OST.appto your Applications folder - If macOS blocks the app on first run:
xattr -dr com.apple.quarantine /Applications/OST.app
Requires Xcode Command Line Tools:
xcode-select --installSee the Build section below for full instructions.
On first launch, macOS will prompt for the following permissions. If not prompted, enable them manually:
| Permission | Purpose | How to Enable |
|---|---|---|
| Screen Recording | System audio capture via ScreenCaptureKit | System Settings > Privacy & Security > Screen Recording > Enable OST |
| Speech Recognition | SFSpeechRecognizer access | System Settings > Privacy & Security > Speech Recognition > Enable OST |
After granting permissions, you may need to restart OST for changes to take effect.
Speech recognition (especially server-based) requires Siri & Dictation to be enabled:
- Open System Settings > Siri & Spotlight
- Turn on Siri (or "Listen for...")
- If using on-device recognition only, Siri does not need to be active — but the speech model must be downloaded (see Step 3)
For faster, offline, and more reliable recognition:
- Open System Settings > General > Keyboard > Dictation
- Under Languages, download the speech model for your source language (e.g., English, Korean, Japanese)
- After download, enable "On-device recognition" in OST Settings > Languages tab
Without the on-device model, server-based recognition is used. This requires internet and may have higher latency.
For offline translation using Apple Translation framework:
- Open System Settings > General > Language & Region > Translation Languages
- Download the language pair you need (e.g., English ↔ Korean)
Without the translation pack, translation will not work offline.
# Clone the repository
git clone https://github.com/9bow/OST.git
cd OST
# Full build → produces build/OST.app
./build.sh
# Type-check only (no binary)
./build.sh --typecheck
# Clean build
./build.sh --clean
# Run
open build/OST.appNo Xcode project is required. The build script compiles all Swift sources via xcrun swiftc.
If macOS blocks the app on first run, execute:
xattr -dr com.apple.quarantine build/OST.app
- Click the captions bubble icon in the menu bar
- Select source and target languages (or use "Auto" for automatic detection)
- Click Start to begin capturing system audio
- The overlay window(s) will appear with live transcription and translation
| Action | How |
|---|---|
| Lock/Unlock | Menu bar > Lock Overlay, or Settings > Display > Overlay Window |
| Move | Unlock, then drag the overlay window |
| Resize | Unlock, then drag the window edges |
| Scroll | Unlock, then scroll through subtitle history |
| Reset position | Settings > Display > "Reset All Overlay Windows" |
- Locked mode: The overlay is click-through — interact with windows behind it normally
- Unlocked mode: Drag to move, resize edges, scroll through subtitle history. Auto-scrolls to the latest text
Configure in Settings > Display > Mode:
- Combined: Single window showing both original and translated text
- Split: Two separate windows — recognition (original text) and translation. Each window can be independently positioned and resized. Lock/Unlock applies to both windows simultaneously
- Speech Pause: Adjust in Settings > Display > "Speech Pause" slider. Shorter values finalize text faster; longer values wait for natural sentence endings
- Subtitle Expiry: Old subtitles automatically fade after the configured time (default 10s)
- Max Lines: Control how many subtitle entries are visible at once
- Session History: View past transcription sessions via menu bar > Session History. Sessions can be exported for reference
ScreenCaptureKit (16kHz mono) → SpeechRecognizer → AppState → TranslationService → Overlay Views
SystemAudioCapture SFSpeech entries Translation.framework NSPanel
OST/Sources/
├── App/ AppState, OSTApp, WindowManager, Logger, SessionRecorder
├── Audio/ SystemAudioCapture (ScreenCaptureKit)
├── Speech/ SpeechRecognizer, SupportedLanguages
├── Translation/ TranslationService, TranslationConfig
├── Settings/ UserSettings
├── UI/ SubtitleView, RecognitionOverlayView, TranslationOverlayView,
│ OverlayWindow, MenuBarView, SettingsView, FontSettingsView, etc.
└── Accessibility/ AccessibilityManager
| Problem | Solution |
|---|---|
| No audio captured | Grant Screen Recording permission in System Settings, then restart OST |
| Speech recognition not working | Grant Speech Recognition permission; ensure Siri & Dictation is enabled |
| Translation not appearing | Download translation language pack in System Settings > Translation Languages |
| Overlay invisible but blocking clicks | Use Settings > Display > "Reset All Overlay Windows" to restore default position |
| macOS blocks the app | Run xattr -dr com.apple.quarantine build/OST.app |
| On-device recognition produces no results | Download the speech model for your language in System Settings > Keyboard > Dictation |
- Endpoint detection (EPD) — Speech segmentation uses a pause timer combined with sentence boundary detection, not proper endpoint detection. Subtitle boundaries may sometimes split mid-sentence or merge unrelated phrases.
- Automatic language detection — Auto-detect uses NLLanguageRecognizer on the first ~15 characters, which may misidentify the language from short or ambiguous input. Detection only runs once per session.
- Translation consistency — Translation is triggered per speech segment. Short or fragmented segments may produce less coherent translations.
- Speech recognition restart gap — SFSpeechRecognizer's recognition task expires after ~60 seconds and auto-restarts. Overlap detection minimizes duplicate text, but a brief gap in recognition may still occur.





