Voice Agent

AI voice assistant powered by OpenAI Realtime API.

How It Works

The agent operates in two modes:

IDLE Mode — Agent listens passively and only activates when it hears its wake word (agent name)
DIALOGUE Mode — Agent actively participates in conversation, responding to all speech

Flow

User speaks → VAD detects speech → Whisper transcribes → 
Agent checks for wake word → If found, enters DIALOGUE mode →
Agent responds via voice → After 30s of silence, returns to IDLE

Key Features

Wake Word Activation — Say the agent's name (e.g., "Alex") to start a conversation
Stop Words — Say "thanks", "stop", "bye" to end the dialogue
Real-time Voice — Uses OpenAI Realtime API for low-latency voice responses
WebRTC Connection — Direct peer-to-peer audio streaming

Example Dialogue

You: "What's the weather like today?"
[Agent ignores - wake word not detected]

You: "Alex, what's the weather like today?"
Alex: "Let me search for that..."
      [Uses web search tool]
Alex: "It's 72°F and sunny today."

You: "What about tomorrow?"
Alex: "Tomorrow will be partly cloudy with a high of 68°F."
[Agent responds without wake word - already in dialogue mode]

You: "Thanks!"
[Agent exits dialogue mode and returns to passive listening]

You: "Tell me a joke"
[Agent ignores - back in IDLE mode, wake word required]

Requirements

Modern browser (Chrome, Firefox, Safari, Edge)
OpenAI API key with Realtime API access
(Optional) Tavily API key for web search

Quick Start

Option 1: Just Open the File

Simply double-click index.html to open it in your browser. No server required!

Option 2: Python

python -m http.server 8000

Open http://localhost:8000

Option 3: Node.js

npx serve

Open http://localhost:3000

Option 4: VS Code Live Server

Install "Live Server" extension
Right-click on index.html → "Open with Live Server"

Usage

Enter your OpenAI API key
Configure agent name (default: "Alex")
Select voice and model
Click "Connect"
Say the agent's name to start a dialogue

Tools

Web Search — Search the internet via Tavily API
Calculator — Mathematical calculations
Date/Time — Current date and time

Configuration

Setting	Description
Agent Name	Wake word to activate the agent
Voice	OpenAI voice (alloy, ash, coral, echo, etc.)
Model	GPT-4o Mini Realtime or GPT-4o Realtime
Stop Words	Phrases that end the dialogue
System Prompt	Instructions for the agent's behavior

Files

index.html — Page structure
styles.css — Styling
app.js — Application logic

Debugging

Open browser Developer Console (F12 or Cmd+Option+I) to see detailed logs:

Speech detection events
Transcription results
Agent mode changes
API responses

Technical Details

Uses WebRTC for real-time audio streaming
Server VAD (Voice Activity Detection) for speech detection
Whisper for speech-to-text transcription
OpenAI Realtime API for voice synthesis

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
app.js		app.js
image.png		image.png
index.html		index.html
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Agent

How It Works

Flow

Key Features

Example Dialogue

Requirements

Quick Start

Option 1: Just Open the File

Option 2: Python

Option 3: Node.js

Option 4: VS Code Live Server

Usage

Tools

Configuration

Files

Debugging

Technical Details

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Brrovko/Voice-Assistant

Folders and files

Latest commit

History

Repository files navigation

Voice Agent

How It Works

Flow

Key Features

Example Dialogue

Requirements

Quick Start

Option 1: Just Open the File

Option 2: Python

Option 3: Node.js

Option 4: VS Code Live Server

Usage

Tools

Configuration

Files

Debugging

Technical Details

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages