vpook is a local avatar overlay service for OBS-style scenes. It serves a small browser overlay over HTTP, publishes live talking state over WebSocket, and swaps between idle and talking avatar images based on detected audio activity.
- Serves the overlay UI at
http://127.0.0.1:8000 - Broadcasts live voice state at
ws://127.0.0.1:8765 - Detects voice activity with configurable threshold, attack, and release timing
- Supports three audio backends (fake, full-device loopback, per-app session metering)
- Serves avatar assets from
apps/assets
apps/overlay_service.py: entrypoint — parses args and starts the serviceapps/overlay_service_args.py: argument parsing and config buildingapps/overlay_service_logging.py: logging setupapps/assets/: user-owned avatar images served by the HTTP serversrc/vpook/app.py: main application loopsrc/vpook/config.py: runtime configuration defaultssrc/vpook/audio/: audio provider implementationssrc/vpook/state/voice_state.py: threshold, attack, and release logicsrc/vpook/transport/: HTTP and WebSocket serverssrc/vpook/overlay/: browser overlay files
- Windows (for live audio capture via WASAPI)
- Python 3.12
justtask runner- OBS or any browser source consumer to display the overlay visually
1. Install Python 3.12
winget install Python.Python.3.12Close and reopen PowerShell after this so python is on your PATH.
2. Install just
winget install Casey.Just3. Clone the repo and create a virtual environment
git clone <repo-url>
cd vpook
python -m venv .venv4. Activate the virtual environment
.\.venv\Scripts\Activate.ps1If PowerShell blocks activation, run this once and then retry step 4:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser5. Install vpook
just install-windowsYou should see (.venv) in your prompt before running any just commands. Re-run step 4 any time you open a new terminal.
The fake provider works cross-platform for development.
python3 -m venv .venv
source .venv/bin/activate
just installFrom the repo root with the virtual environment active:
just run # fake audio (default, cross-platform)
just run-discord # per-app session metering targeting Discord
just run-lan # bind to all interfaces for LAN access (WASAPI)Extra flags are passed through to the entrypoint:
just run --wasapi
just run --process --target-process chromeOnce the service is running:
- Open
http://127.0.0.1:8000in a browser to preview the overlay - Add that URL as a Browser Source in OBS if you want to use it in a scene
just run-lan binds both servers to 0.0.0.0 so other devices on your network can use the overlay. The service auto-detects your LAN IP and uses it in config.json so remote browsers connect to the right WebSocket address.
On another device (or OBS on a separate PC):
- Add
http://<your-ip>:8000as a Browser Source - Both ports
8000(HTTP) and8765(WebSocket) must be reachable — Windows Defender will prompt to allow them on first run
To use a specific Discord audio session over LAN:
just run --host 0.0.0.0 --process --target-process discordThe CLI entrypoint (overlay_service_args.py) accepts flags to choose a provider. Pass exactly one provider flag.
just run
# or explicitly:
just run --fakeGenerates a deterministic idle/talking cycle. No audio hardware required. Good for verifying the overlay and WebSocket transport without needing Windows.
just run --wasapi
# target a specific device by name substring:
just run --wasapi --audio-device "Headphones"Captures the full mix of whatever is playing through a Windows output device via WASAPI loopback. Picks up all apps at once — Discord, game audio, music, everything.
just run --process --target-process discord
# shortcut:
just run-discordUses the Windows Audio Session API (IAudioMeterInformation) to read the peak volume for a specific process only. Discord is the default target. Useful when streaming — game audio won't bleed into voice detection. The service auto-recovers if the target app is restarted.
--fake Use fake sine-wave audio (default)
--wasapi Capture system audio via WASAPI loopback
--process Capture a specific app's audio via Windows Audio Session API
--target-process NAME Process name substring to monitor (default: discord)
--audio-device NAME Loopback device name substring (default: system output)
--threshold FLOAT Volume threshold for VAD (default: 0.08)
--attack-ms MS Time above threshold before switching to talking (default: 120)
--release-ms MS Time below threshold before switching to idle (default: 300)
--talking-glow-color COLOR CSS color for the talking glow
--talking-glow-intensity FLOAT
Multiplier for talking glow size and strength
--host HOST Bind address for both HTTP and WebSocket (overrides --http-host and --websocket-host)
--http-host HOST HTTP bind address (default: 127.0.0.1)
--http-port PORT HTTP port (default: 8000)
--websocket-host HOST WebSocket bind address (default: 127.0.0.1)
--websocket-port PORT WebSocket port (default: 8765)
--tick-ms MS Main loop interval (default: 50)
--log-level LEVEL DEBUG, INFO, WARNING, or ERROR (default: INFO)
Default avatar image paths are defined in src/vpook/config.py:
- Idle image:
/assets/pookie/idle.png - Talking image:
/assets/pookie/talking.png
Those HTTP paths resolve to files under:
apps/assets/
If you replace those files, the overlay will serve your new images. The HTTP server rejects asset paths that escape the asset root.
apps/overlay_service.pyparses CLI flags, builds anAppConfig, and starts the app.src/vpook/app.pycreates the audio provider, voice activity detector, WebSocket state server, and static HTTP server.- The app loop samples audio every
tick_ms. VoiceActivityDetectorturns raw volume into a stabletalkingoridlestate using threshold, attack, and release timing.- The current
OverlayStateis broadcast to connected browser clients over WebSocket. - The browser overlay swaps images and applies transforms based on the state payload.
src/vpook/audio/base.py
- Defines the provider interface used by the app loop.
src/vpook/audio/fake_provider.py
- Generates a deterministic idle/talking cycle for local development.
- Useful when you want to verify animation and transport behavior without real audio input.
src/vpook/audio/windows_wasapi_provider.py
- Captures Windows system output audio through WASAPI loopback.
- Computes RMS volume from audio buffers.
- Applies a small smoothing window to reduce visual strobing.
src/vpook/audio/windows_audio_session_provider.py
- Uses
IAudioMeterInformation(Windows Audio Session API viapycaw) to meter a specific process's audio output. - Tracks all audio sessions matching the target process name and takes the peak across them.
- Re-enumerates sessions every 5 seconds so the provider recovers automatically if the app restarts.
src/vpook/state/voice_state.py
- Implements threshold-based state transitions.
attack_msavoids flipping to talking on tiny spikes.release_msavoids flickering back to idle too aggressively.
src/vpook/transport/websocket_server.py
- Accepts browser clients over WebSocket.
- Stores the latest overlay state.
- Broadcasts JSON messages of type
state.
src/vpook/transport/static_server.py
- Serves
index.html,app.js,styles.css, andconfig.json. - Also serves avatar assets from
apps/assets. - Builds
config.jsondynamically so the frontend knows which WebSocket URL and image paths to use.
src/vpook/overlay/index.html
- Minimal document with a single avatar image element.
src/vpook/overlay/app.js
- Fetches
/config.json - Connects to the WebSocket server
- Applies avatar image swaps and transform effects based on incoming state
The browser client does not do voice detection itself. It only renders the state computed by the Python service.
- Replace files in
apps/assets/pookie/ - Or update the avatar paths in
AppConfig
Pass flags when running:
just run --threshold 0.05 --attack-ms 80 --release-ms 500Or edit defaults in src/vpook/config.py:
thresholdattack_msrelease_ms
just run --http-port 8080 --websocket-port 9000Or edit these in AppConfig:
http_hosthttp_portwebsocket_hostwebsocket_port
vpook uses a src/ layout, so the package must be installed into the active environment first. From the repo root, run:
just installOn Windows, use:
just install-windowsThen rerun the service.
If you see an address-in-use or bind failure:
- check whether another process is already using port
8000or8765 - pass different ports:
just run --http-port 8080 --websocket-port 9000
Device loopback (--wasapi):
- confirm
.[windows-audio]was installed (just install-windows) - confirm the selected output device is active and playing audio
- pass
--audio-devicewith a substring of the target device name if the default isn't picked up
Per-app session metering (--process):
- confirm
.[windows-audio]was installed (just install-windows) - make sure the target app is open and joined to a voice channel — Windows only creates an audio session once the app is actively outputting audio
- if the app was just launched, wait up to 5 seconds for the session to be discovered
- check logs for
No active audio sessions found for process '...'
- open browser devtools and confirm
/config.jsonloads - confirm the WebSocket connection to
ws://127.0.0.1:8765succeeds - verify the backend logs show voice state transitions
- if using real audio, lower the threshold:
just run --threshold 0.04
Formatting and linting are wired through ruff:
just lint
just lint-verbose
just format
just format-diff