podcommentators is a browser-based AI commentary companion for live audio and video. It can listen to your microphone, camera, screen share, or a local/remote stream URL, transcribe the audio in real time, and let a configurable cast of AI commentators react live in an overlay.
The app is built with Next.js, React, and TypeScript, and runs fully on the client. Your keys and commentator settings are stored in browser localStorage.
- Captures audio from:
- microphone
- camera
- screen share
- stream URL
- Shows video for:
- camera
- screen share
- video stream URLs
- Transcribes live audio with:
- ElevenLabs Scribe, if configured
- Gemini audio transcription, if configured
- Web Speech API fallback for basic mic-only browser support
- Streams live AI reactions from a configurable set of commentators
- Supports two viewing modes:
Regular: source + transcript onlyEnhanced: source + transcript + commentator overlay
- Includes an in-app settings editor for commentator CRUD:
- create
- edit
- duplicate
- enable/disable
- delete
podcommentators is ready for local use and repo publishing.
Implemented:
- local Next.js app
- browser-based settings persistence
- editable commentators
- camera, screen, mic, and stream URL input modes
- focused video view with a header stop button
- OBS guidance for camera and local HLS workflows
- saved YouTube ingest settings in the UI
Not yet implemented:
- direct YouTube live publishing from the browser app itself
Why not: YouTube live publishing uses RTMP/RTMPS ingest, which requires an encoder or relay layer that this app does not currently provide.
- Node.js 18+ recommended
- npm
- Chrome or Edge recommended for the best media capture support
Optional accounts and keys:
- Gemini API key for AI commentary and Gemini transcription
- ElevenLabs API key for higher-quality transcription
git clone https://github.com/YOUR_USERNAME/podcommentators.git
cd podcommentators
npm install
npm run devThen open:
- Open the app in your browser.
- Click
Settings. - Add your API keys:
- Gemini API key
- ElevenLabs API key, optional
- Save settings.
- Choose a source and start listening.
Use your microphone or a virtual audio device as the input source.
Best for:
- quick testing
- podcasts routed into a virtual input
- voice-only sessions
Uses getUserMedia() for live video plus audio capture.
Best for:
- webcam sessions
- OBS Virtual Camera workflows
Notes:
- the camera feed becomes the main video view
- when video is visible, the app switches to focused video mode
- click the header
Stopbutton to return to the source/transcript screen
Uses getDisplayMedia() for screen video and optional system audio, then mixes mic audio in separately.
Best for:
- presenting a browser tab or app window
- reacting to livestreams or desktop content
- capturing screen video in the main preview
Notes:
- for tab or system audio, the browser share dialog must include the audio-sharing option
- Firefox support for system audio is limited
Loads a direct media URL and can show video if the URL points to a supported video stream or file.
Typical examples:
.m3u8.mp4.webm.mov
When video is active through:
- camera
- screen share
- video stream URL
podcommentators hides:
- the source panel
- the transcript panel
And shows:
- the video preview
- the commentator overlay in
Enhancedmode - a
Stopbutton in the top header
Clicking Stop returns the app to the source/transcript screen.
- source controls visible
- transcript visible
- commentators hidden
- source controls visible when not in focused video mode
- transcript visible when not in focused video mode
- commentator overlay enabled
Commentators are fully editable from Settings.
For each commentator you can change:
- name
- title
- icon
- accent color
- enabled/disabled state
- search grounding on/off
- cooldown
- temperature
- max tokens
- system prompt
- relevance prompt
Supported CRUD actions:
- add a new commentator
- edit an existing commentator
- duplicate a commentator
- delete a commentator
Settings are stored in localStorage per browser profile.
Keys are stored locally in the browser via localStorage.
They are used for:
- Gemini requests directly from the client to Google
- ElevenLabs transcription requests directly from the client to ElevenLabs
There is no backend in this repo handling your secrets.
That is convenient for local use, but it also means:
- do not use this architecture for a public multi-user deployment without a backend
- browser users with access to the app can access their own stored keys in that browser profile
podcommentators includes an OBS setup guide in the UI. There are two main workflows.
Use this when you want OBS output to appear as a webcam inside podcommentators.
High-level flow:
- Start
Virtual Camerain OBS. - In podcommentators, choose
Camera. - Start camera and select
OBS Virtual Camera. - Route audio separately with a virtual audio device if needed.
Use this when you want OBS to publish to a local RTMP server and podcommentators to load the resulting HLS URL.
High-level flow:
- Run MediaMTX locally.
- Point OBS to the local RTMP endpoint.
- Load the HLS URL in podcommentators as a stream URL.
This is often the best local setup for full video + audio capture without relying on virtual camera behavior.
The settings screen includes fields for:
- YouTube ingest URL
- YouTube stream key
This currently helps you store your destination details and jump into YouTube Live Control Room, but podcommentators does not yet push the stream to YouTube directly.
To support direct YouTube publishing, the project would need one of these:
- an OBS-based outbound workflow
- a local FFmpeg relay
- a backend or desktop helper that can encode and publish RTMP/RTMPS
npm run dev
npm run build
npm run start
npm run lint- Next.js 16
- React 19
- TypeScript
@google/genai- browser media APIs:
getUserMediagetDisplayMediaMediaRecorderWeb Speech APIAudioContext
src/
app/
layout.tsx
page.tsx
globals.css
page.module.css
components/
AudioSourcePanel.tsx
CommentatorRail.tsx
OBSSetupGuide.tsx
SettingsModal.tsx
TranscriptPanel.tsx
VideoDisplay.tsx
WaveformCanvas.tsx
context/
SettingsContext.tsx
hooks/
usePersonaOrchestrator.ts
useTranscript.ts
lib/
gemini.ts
elevenlabs.ts
personas.ts
settings.ts
prompts/
...
types/
index.ts
- app settings are stored in localStorage
- commentators are part of persisted settings
- settings updates are distributed through
useSyncExternalStore
Default commentator prompts are stored in:
src/prompts/*/system.mdsrc/prompts/*/relevance.md
Those defaults are copied into local settings and can then be edited in the UI.
This project currently assumes:
- local development
- a trusted single user
- no backend secret management
If you want to deploy it publicly, the first things to add are:
- a server-side API layer
- secret storage
- auth
- rate limiting
- media publishing infrastructure
- confirm the selected source is camera, screen, or a video stream URL
- confirm the browser permission prompt was approved
- for screen share, confirm you actually selected a screen/window/tab
- enable the browser checkbox for tab/system audio during sharing
- prefer Chrome for the best support
- confirm you added a Gemini or ElevenLabs key for screen/stream workflows
- confirm your browser has mic permission if the selected source needs mic input
- switch the top mode toggle to
Enhanced - confirm at least one commentator is enabled in
Settings - confirm you have a Gemini API key configured
- check that the URL is reachable from your browser
- verify the file extension or stream format is supported
- if using local HLS, confirm your local media server is actually running