WPF desktop app that captures Windows audio output (any app's playback), streams it to OpenAI's realtime translation model (gpt-realtime-translate), and renders the translated audio + dual-language transcript live.
- .NET 8 + WPF (Windows Desktop)
- NAudio —
WasapiLoopbackCapturefor input,WasapiOut/WaveOutEventfor playback,WdlResamplingSampleProviderfor high-quality 48 kHz → 24 kHz resampling System.Net.WebSockets.ClientWebSocket(built-in) for the realtime APISystem.Text.Json(built-in) for protocol serialization
[Any Windows app] ──► WasapiLoopbackCapture ──► downmix ──► WDL resample to 24 kHz ──► PCM16
│
▼
ClientWebSocket → OpenAI Realtime
│
┌───────────────────────────────────────────────────────────────┴───┐
▼ ▼
translated audio (PCM16) dual transcript deltas
│ │
▼ ▼
WasapiOut device WPF TextBox
- Windows 10 / 11
- .NET 8 SDK
- An OpenAI API key with access to
gpt-realtime-translate
cd D:\temp\Babelive-net
dotnet restore
dotnet runOn first launch, click the API… button in the settings panel and paste
your sk-… key. The key is stored locally at
%APPDATA%\Babelive\settings.json (plain JSON, never transmitted anywhere
except to the configured API endpoint).
For a self-contained release build:
dotnet publish -c ReleaseProduces a single ~68 MB Babelive.exe at
bin\Release\net8.0-windows\win-x64\publish\ — bundles the .NET 8 runtime,
all WPF native DLLs, and is compressed. Just ship that one file.
- Pick a target language.
- Pick a Capture device (any active render endpoint — its loopback feed is what gets captured). Defaults to the system default playback device.
- Pick a Playback device for the translated audio. Read the feedback warning below.
- Click Start, then play any video / call / song.
If translated audio plays through the same speakers you're capturing, the loopback re-translates it forever. Three fixes:
- Use headphones for playback (different physical device than the captured speakers).
- Install VB-CABLE — free virtual audio cable. Send translated audio to
CABLE Inputand you can monitor it without it leaking into the loopback. - Tick "Transcript only (no audio playback)" — only spoken text appears, nothing replays.
Babelive/
├── Babelive.csproj
├── App.xaml / App.xaml.cs
├── MainWindow.xaml / MainWindow.xaml.cs ← WPF UI + orchestration
├── LanguageCodes.cs ← dropdown options
├── Audio/
│ ├── LoopbackCapture.cs ← WASAPI loopback → 24 kHz/mono PCM16
│ └── AudioPlayer.cs ← plays translated PCM16 chunks
└── Translation/
└── RealtimeTranslatorClient.cs ← async ClientWebSocket
The realtime translation API is new. The exact event/field names in RealtimeTranslatorClient.cs are best-effort based on https://developers.openai.com/api/docs/guides/realtime-translation plus the standard /v1/realtime event conventions. If your account sees errors:
- Endpoint: defaults to
wss://api.openai.com/v1/realtime/translations?model=gpt-realtime-translate. Tick "Use alt endpoint" in the UI to fall back towss://api.openai.com/v1/realtime?model=gpt-realtime-translate. - Session config:
RealtimeTranslatorClient.SendSessionUpdateAsyncsendssession.updatewithinput_audio_format=pcm16,output_audio_format=pcm16, andtranslation.target_language=<code>. Adjust if the official schema differs. - Event names:
Dispatchmatches both theoutput_*.deltaandresponse.output_*.deltashapes. If transcripts/audio don't arrive, log every incoming event and adjust.
Open YouTube in any non-target language, hit Start, and the translation should start streaming into the bottom panel within a second or two of the source audio playing.