Babelive (.NET)

WPF desktop app that captures Windows audio output (any app's playback), streams it to OpenAI's realtime translation model (gpt-realtime-translate), and renders the translated audio + dual-language transcript live.

Stack

.NET 8 + WPF (Windows Desktop)
NAudio — WasapiLoopbackCapture for input, WasapiOut/WaveOutEvent for playback, WdlResamplingSampleProvider for high-quality 48 kHz → 24 kHz resampling
System.Net.WebSockets.ClientWebSocket (built-in) for the realtime API
System.Text.Json (built-in) for protocol serialization

How it works

[Any Windows app] ──► WasapiLoopbackCapture ──► downmix ──► WDL resample to 24 kHz ──► PCM16
                                                                                          │
                                                                                          ▼
                                                              ClientWebSocket → OpenAI Realtime
                                                                                          │
                          ┌───────────────────────────────────────────────────────────────┴───┐
                          ▼                                                                   ▼
              translated audio (PCM16)                                       dual transcript deltas
                          │                                                                   │
                          ▼                                                                   ▼
                    WasapiOut device                                                   WPF TextBox

Requirements

Windows 10 / 11
.NET 8 SDK
An OpenAI API key with access to gpt-realtime-translate

Setup & run

cd D:\temp\Babelive-net
dotnet restore
dotnet run

On first launch, click the API… button in the settings panel and paste your sk-… key. The key is stored locally at %APPDATA%\Babelive\settings.json (plain JSON, never transmitted anywhere except to the configured API endpoint).

For a self-contained release build:

dotnet publish -c Release

Produces a single ~68 MB Babelive.exe at bin\Release\net8.0-windows\win-x64\publish\ — bundles the .NET 8 runtime, all WPF native DLLs, and is compressed. Just ship that one file.

Using it

Pick a target language.
Pick a Capture device (any active render endpoint — its loopback feed is what gets captured). Defaults to the system default playback device.
Pick a Playback device for the translated audio. Read the feedback warning below.
Click Start, then play any video / call / song.

⚠️ Feedback loop warning

If translated audio plays through the same speakers you're capturing, the loopback re-translates it forever. Three fixes:

Use headphones for playback (different physical device than the captured speakers).
Install VB-CABLE — free virtual audio cable. Send translated audio to CABLE Input and you can monitor it without it leaking into the loopback.
Tick "Transcript only (no audio playback)" — only spoken text appears, nothing replays.

File layout

Babelive/
├── Babelive.csproj
├── App.xaml / App.xaml.cs
├── MainWindow.xaml / MainWindow.xaml.cs   ← WPF UI + orchestration
├── LanguageCodes.cs                        ← dropdown options
├── Audio/
│   ├── LoopbackCapture.cs                  ← WASAPI loopback → 24 kHz/mono PCM16
│   └── AudioPlayer.cs                      ← plays translated PCM16 chunks
└── Translation/
    └── RealtimeTranslatorClient.cs         ← async ClientWebSocket

API quirks / things that may need tuning

The realtime translation API is new. The exact event/field names in RealtimeTranslatorClient.cs are best-effort based on https://developers.openai.com/api/docs/guides/realtime-translation plus the standard /v1/realtime event conventions. If your account sees errors:

Endpoint: defaults to wss://api.openai.com/v1/realtime/translations?model=gpt-realtime-translate. Tick "Use alt endpoint" in the UI to fall back to wss://api.openai.com/v1/realtime?model=gpt-realtime-translate.
Session config: RealtimeTranslatorClient.SendSessionUpdateAsync sends session.update with input_audio_format=pcm16, output_audio_format=pcm16, and translation.target_language=<code>. Adjust if the official schema differs.
Event names: Dispatch matches both the output_*.delta and response.output_*.delta shapes. If transcripts/audio don't arrive, log every incoming event and adjust.

Quick sanity test

Open YouTube in any non-target language, hit Start, and the translation should start streaming into the bottom panel within a second or two of the source audio playing.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
Audio		Audio
Translation		Translation
.gitattributes		.gitattributes
.gitignore		.gitignore
ApiSettingsWindow.xaml		ApiSettingsWindow.xaml
ApiSettingsWindow.xaml.cs		ApiSettingsWindow.xaml.cs
App.xaml		App.xaml
App.xaml.cs		App.xaml.cs
AppSettings.cs		AppSettings.cs
Babelive.csproj		Babelive.csproj
LanguageCodes.cs		LanguageCodes.cs
LyricWindow.xaml		LyricWindow.xaml
LyricWindow.xaml.cs		LyricWindow.xaml.cs
MainWindow.xaml		MainWindow.xaml
MainWindow.xaml.cs		MainWindow.xaml.cs
README.md		README.md
TrayIconHost.cs		TrayIconHost.cs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Babelive (.NET)

Stack

How it works

Requirements

Setup & run

Using it

⚠️ Feedback loop warning

File layout

API quirks / things that may need tuning

Quick sanity test

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Babelive (.NET)

Stack

How it works

Requirements

Setup & run

Using it

⚠️ Feedback loop warning

File layout

API quirks / things that may need tuning

Quick sanity test

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages