# Create your first AzureLiveVoiceAgent

This notebook provides a guide for building real-time voice agents using Azure AI Agent Service and Azure Voice Live API.Usign our class `

## Architecture Overview

```
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────────┐
│   Client App    │◄──►│  Azure Voice     │◄──►│ Azure AI Agent      │
│   (This Code)   │    │  Live API        │    │ Service             │
│                 │    │                  │    │                     │
│ ┌─────────────┐ │    │ ┌──────────────┐ │    │ ┌─────────────────┐ │
│ │ Microphone  │ │    │ │ Speech-to-   │ │    │ │ Agent Logic     │ │
│ │ Input       │ ├────┤ │ Text (STT)   │ ├────┤ │ & Instructions  │ │
│ └─────────────┘ │    │ └──────────────┘ │    │ └─────────────────┘ │
│                 │    │                  │    │                     │
│ ┌─────────────┐ │    │ ┌──────────────┐ │    │ ┌─────────────────┐ │
│ │ Speaker     │ │    │ │ Text-to-     │ │    │ │ Knowledge Base  │ │
│ │ Output      │ │◄───┤ │ Speech (TTS) │ │◄───┤ │ & Functions     │ │
│ └─────────────┘ │    │ └──────────────┘ │    │ └─────────────────┘ │
└─────────────────┘    └──────────────────┘    └─────────────────────┘
```

## Key Components

1. **YAML Configuration**: Defines agent binding and voice settings
2. **WebSocket Connection**: Real-time bidirectional communication
3. **Audio Streaming**: Low-latency audio input/output processing
4. **Session Management**: Handles conversation state and events

## Prerequisites

- Azure AI Agent Service resource
- Azure Voice Live API access
- Python environment with required dependencies
- Audio input/output devices (microphone and speakers)

In [1]:
# 📂 Setup Working Directory for ARTAgent Framework Access
import logging
import os

# Configure logging to track directory changes
logging.basicConfig(level=logging.INFO)

# Navigate to the project root directory
# This ensures we can import ARTAgent framework modules properly
try:
    # Move up two directories from samples/hello_world/ to project root
    os.chdir("../../")
    
    # Allow override via environment variable for different setups
    target_directory = os.getenv(
        "TARGET_DIRECTORY", os.getcwd()
    )  # Use environment variable if available
    
    # Verify the target directory exists before changing
    if os.path.exists(target_directory):
        os.chdir(target_directory)
        print(f"✅ Changed directory to: {os.getcwd()}")
        logging.info(f"Successfully changed directory to: {os.getcwd()}")
    else:
        print(f"❌ Directory does not exist: {target_directory}")
        logging.error(f"Directory does not exist: {target_directory}")
        
except Exception as e:
    print(f"❌ Error changing directory: {e}")
    logging.exception(f"An error occurred while changing directory: {e}")

# Verify we're in the correct location
print(f"📁 Current working directory: {os.getcwd()}")
print(f"📋 Contents: {', '.join(os.listdir('.')[:10])}...")

INFO:root:Successfully changed directory to: c:\Users\pablosal\Desktop\gbb-ai-audio-agent


✅ Changed directory to: c:\Users\pablosal\Desktop\gbb-ai-audio-agent
📁 Current working directory: c:\Users\pablosal\Desktop\gbb-ai-audio-agent
📋 Contents: .azure, .devcontainer, .env, .env.aoai_pool, .env.sample, .files, .git, .github, .gitignore, .pre-commit-config.yaml...


In [2]:
# Step 2: YAML Configuration Structure

print("=== YAML Configuration Guide ===")
print("""
The agent configuration uses YAML to define:

agent:
  name: "Your Agent Name"
  description: "Agent purpose and capabilities"

model:
  deployment_id: "gpt-4o"    # Voice Live compatible model

azure_ai_foundry_agent_connected:
  agent_id: "${AI_FOUNDRY_AGENT_ID}"          # From Azure portal
  project_name: "${AI_FOUNDRY_PROJECT_NAME}"  # AI Foundry project

session:
  voice:
    name: "en-US-Ava:DragonHDLatestNeural"   # Voice selection
    temperature: 0.8                          # Voice variation
  vad_threshold: 0.5                         # Voice activity detection
  vad_prefix_ms: 300                         # Voice detection timing
  vad_silence_ms: 1000                       # Silence detection
""")

# Validate required environment variables
required_vars = [
    "AZURE_VOICE_LIVE_ENDPOINT",
    "AI_FOUNDRY_AGENT_ID", 
    "AI_FOUNDRY_PROJECT_NAME"
]

print("Environment Variable Check:")
missing_vars = []
for var in required_vars:
    value = os.getenv(var)
    if value:
        print(f"  ✓ {var}")
    else:
        print(f"  ❌ {var}: MISSING")
        missing_vars.append(var)

if missing_vars:
    print(f"\n⚠️  Missing variables: {', '.join(missing_vars)}")
else:
    print("\n✅ All required environment variables are set")

=== YAML Configuration Guide ===

The agent configuration uses YAML to define:

agent:
  name: "Your Agent Name"
  description: "Agent purpose and capabilities"

model:
  deployment_id: "gpt-4o"    # Voice Live compatible model

azure_ai_foundry_agent_connected:
  agent_id: "${AI_FOUNDRY_AGENT_ID}"          # From Azure portal
  project_name: "${AI_FOUNDRY_PROJECT_NAME}"  # AI Foundry project

session:
  voice:
    name: "en-US-Ava:DragonHDLatestNeural"   # Voice selection
    temperature: 0.8                          # Voice variation
  vad_threshold: 0.5                         # Voice activity detection
  vad_prefix_ms: 300                         # Voice detection timing
  vad_silence_ms: 1000                       # Silence detection

Environment Variable Check:
  ✓ AZURE_VOICE_LIVE_ENDPOINT
  ✓ AI_FOUNDRY_AGENT_ID
  ✓ AI_FOUNDRY_PROJECT_NAME

✅ All required environment variables are set


In [3]:
# Step 3: Agent Creation and Initialization

from apps.rtagent.backend.src.agents.Lvagent.factory import build_lva_from_yaml

print("=== Agent Creation Process ===")

# Load agent from YAML configuration
yaml_path = "apps\\rtagent\\backend\\src\\agents\\Lvagent\\agent_store\\auth_agent.yaml"

try:
    agent = build_lva_from_yaml(yaml_path)
    
    print("✅ Agent created successfully:")
    print(f"   Authentication: {agent.auth_method}")
    print(f"   Agent ID: {agent._binding.agent_id}")
    print(f"   Project: {agent._binding.project_name}")
    print(f"   Voice: {agent._session.voice_name}")
    
except Exception as e:
    print(f"❌ Agent creation failed: {e}")
    print("Check your environment variables and YAML configuration")
    raise

=== Agent Creation Process ===



INFO:azure.identity._credentials.environment:No environment configuration found.
INFO:azure.identity._credentials.managed_identity:ManagedIdentityCredential will use IMDS
INFO:azure.identity._credentials.managed_identity:ManagedIdentityCredential will use IMDS
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=REDACTED&resource=REDACTED'
Request method: 'GET'
Request headers:
    'User-Agent': 'azsdk-python-identity/1.19.0 Python/3.11.11 (Windows-10-10.0.26100-SP0)'
No body was attached to the request
INFO:azure.core.pipeline.policies.http_logging_policy:Request URL: 'http://169.254.169.254/metadata/identity/oauth2/token?api-version=REDACTED&resource=REDACTED'
Request method: 'GET'
Request headers:
    'User-Agent': 'azsdk-python-identity/1.19.0 Python/3.11.11 (Windows-10-10.0.26100-SP0)'
No body was attached to the request
INFO:azure.identity._credentials.chained:DefaultAzureCredential acquired a token 

✅ Agent created successfully:
   Authentication: token
   Agent ID: asst_Kp4exd80NINFuraHyWOftsuR
   Project: poc-ai-agents-voice
   Voice: en-US-Ava:DragonHDLatestNeural

   Authentication: token
   Agent ID: asst_Kp4exd80NINFuraHyWOftsuR
   Project: poc-ai-agents-voice
   Voice: en-US-Ava:DragonHDLatestNeural


In [4]:
# Step 4: Connection Testing and Validation

import time
import json

print("=== Connection Diagnostics ===")

# First, let's diagnose the connection configuration
print("🔍 Diagnosing Connection Configuration...")
print(f"WebSocket URL: {agent.url}")
print(f"Authentication method: {agent.auth_method}")
print(f"Endpoint: {agent._endpoint}")

# Check URL format
if "agent-access-token" not in agent.url:
    print("⚠️  WARNING: agent-access-token missing from URL")
else:
    print("✅ agent-access-token found in URL")

# Check headers
print(f"Headers: {list(agent._ws_headers.keys()) if hasattr(agent, '_ws_headers') else 'Not available'}")

def test_agent_connection():
    """Test the agent connection and session establishment with detailed error reporting."""
    try:
        print("🔌 Attempting WebSocket connection...")
        print(f"   URL: {agent.url[:100]}...")
        
        # Try to connect with detailed error handling
        agent.connect()
        
        # Wait for session events
        print("✅ Connection successful! Waiting for session events...")
        for i in range(10):
            msg = agent._ws.recv(timeout_s=0.5)
            if msg:
                try:
                    event = json.loads(msg)
                    event_type = event.get("type", "unknown")
                    print(f"   📨 Received: {event_type}")
                    
                    if event_type == "session.created":
                        session_id = event.get("session", {}).get("id", "unknown")
                        print(f"✅ Session created: {session_id}")
                        break
                    elif event_type == "error":
                        error = event.get("error", {})
                        print(f"❌ API Error: {error}")
                        return False
                except Exception as parse_error:
                    print(f"   📄 Raw message: {msg[:100]}...")
        
        # Test message sending
        test_message = {
            "type": "conversation.item.create",
            "item": {
                "type": "message",
                "role": "user", 
                "content": [{"type": "input_text", "text": "Hello, this is a connection test."}]
            }
        }
        agent._ws.send_dict(test_message)
        print("✅ Test message sent successfully")
        time.sleep(1)
        
        agent.close()
        return True
        
    except Exception as e:
        print(f"❌ Connection failed: {e}")
        
        # Provide specific guidance for 400 BadRequest
        if "400 BadRequest" in str(e):
            print("\n🛠️  400 BadRequest Troubleshooting:")
            print("   1. Check if agent-access-token is in the WebSocket URL")
            print("   2. Verify AI_FOUNDRY_AGENT_ID is correct")
            print("   3. Ensure AI_FOUNDRY_PROJECT_NAME matches your Azure AI Foundry project")
            print("   4. Check if your Azure authentication is valid")
            print("   5. Verify the endpoint format: https://your-resource.services.ai.azure.com/")
            
            # Additional diagnostics
            print("\n🔬 Additional Diagnostics:")
            print(f"   Agent ID: {agent._binding.agent_id}")
            print(f"   Project: {agent._binding.project_name}")
            print(f"   URL contains token: {'agent-access-token' in agent.url}")
            
        try:
            agent.close()
        except:
            pass
        return False

# Run connection test
success = test_agent_connection()

if success:
    print("\n✅ Agent is ready for voice streaming")
else:
    print("\n❌ Connection failed - please check the troubleshooting steps above")

=== Connection Diagnostics ===
🔍 Diagnosing Connection Configuration...
WebSocket URL: wss://poc-ai-agents-voice-resource.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-05-01-preview&agent-project-name=poc-ai-agents-voice&agent-id=asst_Kp4exd80NINFuraHyWOftsuR&agent-access-token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6IkpZaEFjVFBNWl9MWDZEQmxPV1E3SG4wTmVYRSIsImtpZCI6IkpZaEFjVFBNWl9MWDZEQmxPV1E3SG4wTmVYRSJ9.eyJhdWQiOiJodHRwczovL2FpLmF6dXJlLmNvbSIsImlzcyI6Imh0dHBzOi8vc3RzLndpbmRvd3MubmV0LzE2YjNjMDEzLWQzMDAtNDY4ZC1hYzY0LTdlZGEwODIwYjZkMy8iLCJpYXQiOjE3NTY5MzE1NjYsIm5iZiI6MTc1NjkzMTU2NiwiZXhwIjoxNzU2OTM1NjI5LCJhY3IiOiIxIiwiYWlvIjoiQWJRQVMvOFpBQUFBQ1JlL1JyZkt3ZndNZ0pBU0NuU0xTT3VreXg0TUZSVzMyVXlldnlWNENMV1dhVlVGZ3VjcUcvL3dlVkdDRjZyTU9HaW5KRHlDQ2F3Qm9SSUp3Q1Y2NEo5MG1TcUNybnVkdy9IOGlsVk90eEVjOEZFbFUrT05zMTM0L1grM0h4ZDRZM2E5Y1FQZHYvL3lnd0tPTkk5akVrUVh5UUEzR0tDS3htek1Gb2xHMVB4SXdaRmVmZlAvNFhSMSsvd3pWaGorSzBGR1MyTktMajA2YjB1aUpXdnMzallxcFFLZFNiZy9pQ3JFSVduRnBicz0iLCJhbHRzZWNpZ

INFO:websocket:Websocket connected
[2025-09-03 15:38:14,709] INFO - apps.rtagent.backend.src.agents.Lvagent.transport: WebSocket opened.
INFO:apps.rtagent.backend.src.agents.Lvagent.transport:WebSocket opened.
[2025-09-03 15:38:14,709] INFO - apps.rtagent.backend.src.agents.Lvagent.transport: WebSocket opened.
INFO:apps.rtagent.backend.src.agents.Lvagent.transport:WebSocket opened.
[2025-09-03 15:38:14,728] INFO - apps.rtagent.backend.src.agents.Lvagent.base: Connected to Azure Voice Live API
[2025-09-03 15:38:14,728] INFO - apps.rtagent.backend.src.agents.Lvagent.base: Connected to Azure Voice Live API
INFO:apps.rtagent.backend.src.agents.Lvagent.base:Connected to Azure Voice Live API
[2025-09-03 15:38:14,743] INFO - apps.rtagent.backend.src.agents.Lvagent.base: Session configuration sent
INFO:apps.rtagent.backend.src.agents.Lvagent.base:Connected to Azure Voice Live API
[2025-09-03 15:38:14,743] INFO - apps.rtagent.backend.src.agents.Lvagent.base: Session configuration sent
INFO:apps

✅ Connection successful! Waiting for session events...
   📨 Received: session.created
✅ Session created: sess_hepjvxDbyGhj0gzVwic9D
✅ Test message sent successfully
   📨 Received: session.created
✅ Session created: sess_hepjvxDbyGhj0gzVwic9D
✅ Test message sent successfully



[2025-09-03 15:38:15,776] INFO - apps.rtagent.backend.src.agents.Lvagent.audio_io: SpeakerSink stopped.
INFO:apps.rtagent.backend.src.agents.Lvagent.audio_io:SpeakerSink stopped.
 INFO - apps.rtagent.backend.src.agents.Lvagent.audio_io: SpeakerSink stopped.
INFO:apps.rtagent.backend.src.agents.Lvagent.audio_io:SpeakerSink stopped.
[2025-09-03 15:38:16,812] INFO - apps.rtagent.backend.src.agents.Lvagent.transport: WebSocket closed: code=None, msg=None
INFO:apps.rtagent.backend.src.agents.Lvagent.transport:WebSocket closed: code=None, msg=None
[2025-09-03 15:38:16,822][2025-09-03 15:38:16,812] INFO - apps.rtagent.backend.src.agents.Lvagent.transport: WebSocket closed: code=None, msg=None
INFO:apps.rtagent.backend.src.agents.Lvagent.transport:WebSocket closed: code=None, msg=None
[2025-09-03 15:38:16,822] INFO - apps.rtagent.backend.src.agents.Lvagent.base: Azure Live Voice Agent connection closed
INFO:apps.rtagent.backend.src.agents.Lvagent.base:Azure Live Voice Agent connection closed
 


✅ Agent is ready for voice streaming



In [5]:
# 🛠️ Troubleshooting: Fix 400 BadRequest Error
# Run this cell if you're getting WebSocket handshake errors

import os

print("=== 400 BadRequest Error Troubleshooting ===")

def diagnose_and_fix():
    """Diagnose common causes of 400 BadRequest errors."""
    
    issues_found = []
    
    # Check 1: Environment variables
    print("1️⃣ Checking Environment Variables...")
    required_vars = {
        "AZURE_VOICE_LIVE_ENDPOINT": os.getenv("AZURE_VOICE_LIVE_ENDPOINT"),
        "AI_FOUNDRY_AGENT_ID": os.getenv("AI_FOUNDRY_AGENT_ID"),
        "AI_FOUNDRY_PROJECT_NAME": os.getenv("AI_FOUNDRY_PROJECT_NAME")
    }
    
    for var, value in required_vars.items():
        if not value:
            print(f"   ❌ {var}: MISSING")
            issues_found.append(f"Missing {var}")
        else:
            print(f"   ✅ {var}: {value[:30]}...")
    
    # Check 2: Endpoint format
    print("\n2️⃣ Checking Endpoint Format...")
    endpoint = required_vars["AZURE_VOICE_LIVE_ENDPOINT"]
    if endpoint:
        if not endpoint.endswith(".services.ai.azure.com/"):
            print(f"   ⚠️  Endpoint should end with '.services.ai.azure.com/': {endpoint}")
            issues_found.append("Incorrect endpoint format")
        else:
            print(f"   ✅ Endpoint format looks correct")
    
    # Check 3: Agent configuration
    print("\n3️⃣ Checking Agent Configuration...")
    if 'agent' in globals():
        print(f"   Agent ID: {agent._binding.agent_id}")
        print(f"   Project: {agent._binding.project_name}")
        print(f"   URL: {agent.url[:100]}...")
        
        # Check if URL has required parameters
        if "agent-access-token" not in agent.url:
            print("   ❌ agent-access-token missing from WebSocket URL")
            issues_found.append("Missing agent-access-token in URL")
        else:
            print("   ✅ agent-access-token found in URL")
    
    # Check 4: Common solutions
    print("\n4️⃣ Recommended Solutions:")
    if issues_found:
        print("   Issues found:")
        for issue in issues_found:
            print(f"   • {issue}")
        
        print("\n   🔧 Try these fixes:")
        print("   1. Verify your environment variables are set correctly")
        print("   2. Ensure your Azure AI Foundry agent ID is correct")
        print("   3. Check that your project name matches exactly")
        print("   4. Verify your Azure authentication is working")
        print("   5. Try recreating the agent object:")
        print("      agent = build_lva_from_yaml(yaml_path)")
    else:
        print("   ✅ Configuration looks correct")
        print("   💡 This might be a temporary Azure service issue")
        print("   💡 Try running the agent creation cell again")
    
    return len(issues_found) == 0

# Run diagnostics
is_healthy = diagnose_and_fix()

if not is_healthy:
    print("\n⚠️  Issues detected - please fix the above problems and recreate the agent")
else:
    print("\n✅ Configuration appears healthy - try running the connection test again")

=== 400 BadRequest Error Troubleshooting ===
1️⃣ Checking Environment Variables...
   ✅ AZURE_VOICE_LIVE_ENDPOINT: https://poc-ai-agents-voice-re...
   ✅ AI_FOUNDRY_AGENT_ID: asst_Kp4exd80NINFuraHyWOftsuR...

1️⃣ Checking Environment Variables...
   ✅ AZURE_VOICE_LIVE_ENDPOINT: https://poc-ai-agents-voice-re...
   ✅ AI_FOUNDRY_AGENT_ID: asst_Kp4exd80NINFuraHyWOftsuR...
   ✅ AI_FOUNDRY_PROJECT_NAME: poc-ai-agents-voice...

2️⃣ Checking Endpoint Format...   ✅ AI_FOUNDRY_PROJECT_NAME: poc-ai-agents-voice...

2️⃣ Checking Endpoint Format...
   ⚠️  Endpoint should end with '.services.ai.azure.com/': https://poc-ai-agents-voice-resource.cognitiveservices.azure.com/

3️⃣ Checking Agent Configuration...
   Agent ID: asst_Kp4exd80NINFuraHyWOftsuR
   Project: poc-ai-agents-voice
   ⚠️  Endpoint should end with '.services.ai.azure.com/': https://poc-ai-agents-voice-resource.cognitiveservices.azure.com/

3️⃣ Checking Agent Configuration...
   Agent ID: asst_Kp4exd80NINFuraHyWOftsuR
   Project: poc

# Audio Processing Architecture

The live voice streaming system uses a multi-threaded architecture for real-time audio processing:

## Thread Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                    Main Application Thread                      │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐ │
│  │ Audio Input     │  │ Audio Output    │  │ User Input      │ │
│  │ Thread          │  │ Thread          │  │ Thread          │ │
│  │                 │  │                 │  │                 │ │
│  │ Microphone      │  │ Speaker         │  │ Keyboard        │ │
│  │ ↓               │  │ ↑               │  │ Monitor         │ │
│  │ PCM Audio       │  │ PCM Audio       │  │ ('q' to quit)  │ │
│  │ ↓               │  │ ↑               │  │                 │ │
│  │ Base64 Encode   │  │ Base64 Decode   │  │                 │ │
│  │ ↓               │  │ ↑               │  │                 │ │
│  │ WebSocket Send  │  │ WebSocket Recv  │  │                 │ │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
                                │
                                ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Azure Voice Live API                        │
│                                                                 │
│  Audio Input → STT → Agent Processing → TTS → Audio Output     │
│                                                                 │
│              └── Azure AI Agent Service ──┘                    │
└─────────────────────────────────────────────────────────────────┘
```

## Audio Flow

1. **Input**: Microphone captures audio at 24kHz sample rate
2. **Chunking**: Audio read in 20ms chunks (480 samples)
3. **Encoding**: Audio converted to Base64 for WebSocket transmission
4. **Processing**: Azure Voice Live API performs STT, agent processing, and TTS
5. **Output**: Processed audio returned and played through speakers

## Proven Audio Configuration (from working notebook 04)

**Input Stream:**
- **Sample Rate**: 24,000 Hz (Azure Voice Live API standard)
- **Channels**: 1 (Mono)
- **Data Type**: int16 (16-bit PCM)
- **Chunk Size**: 480 samples (20ms at 24kHz)
- **Read Strategy**: Check available samples before reading

**Output Stream:**
- **Sample Rate**: 24,000 Hz (matching input)
- **Channels**: 1 (Mono)  
- **Data Type**: int16 (16-bit PCM)
- **Block Size**: 2400 samples (~100ms buffer)
- **Queue Management**: deque with thread-safe operations

## Key Technical Details

- **Audio Latency**: ~20ms input chunks + ~100ms output buffer = ~120ms total
- **Format**: PCM 16-bit audio data (no float conversion needed)
- **Threading**: Non-blocking audio I/O with efficient deque-based buffering
- **Buffer Strategy**: Auto-start playback when data available
- **Error Handling**: Graceful degradation with status monitoring

## Performance Characteristics

- **Low CPU Usage**: Direct int16 processing without unnecessary conversions
- **Stable Playback**: Large output buffer (100ms) prevents dropouts
- **Real-time**: 20ms input chunks ensure responsive voice detection
- **Memory Efficient**: deque-based queue with automatic cleanup

In [6]:
# Step 5: Audio Processing Implementation

import threading
import queue
import json
import base64
import numpy as np
import sounddevice as sd
import time
from collections import deque

print("=== Audio Processing Functions ===")

# Global configuration - Matching working notebook 04
stop_event = threading.Event()
AUDIO_SAMPLE_RATE = 24000  # Hz - Azure Voice Live API standard
READ_SIZE = int(AUDIO_SAMPLE_RATE * 0.02)  # 20ms chunks = 480 samples

class AudioPlayerAsync:
    """
    Asynchronous audio player for real-time Voice Live API responses.
    
    Based on the working implementation from notebook 04-exploring-live-api.ipynb
    """
    
    def __init__(self):
        self.queue = deque()
        self.lock = threading.Lock()
        self.stream = sd.OutputStream(
            callback=self.callback,
            samplerate=AUDIO_SAMPLE_RATE,
            channels=1,
            dtype=np.int16,
            blocksize=2400,  # ~100ms at 24kHz
        )
        self.playing = False

    def callback(self, outdata, frames, time, status):
        """Audio callback function called by sounddevice."""
        if status:
            print(f"⚠️  Audio status: {status}")
            
        with self.lock:
            data = np.empty(0, dtype=np.int16)
            
            # Fill the output buffer from our queue
            while len(data) < frames and len(self.queue) > 0:
                item = self.queue.popleft()
                frames_needed = frames - len(data)
                data = np.concatenate((data, item[:frames_needed]))
                
                # If we have leftover data, put it back
                if len(item) > frames_needed:
                    self.queue.appendleft(item[frames_needed:])
            
            # Pad with silence if we don't have enough data
            if len(data) < frames:
                data = np.concatenate((data, np.zeros(frames - len(data), dtype=np.int16)))
                
        outdata[:] = data.reshape(-1, 1)

    def add_data(self, data: bytes):
        """Add audio data to the playback queue."""
        with self.lock:
            np_data = np.frombuffer(data, dtype=np.int16)
            self.queue.append(np_data)
            
            # Auto-start playback if we have data
            if not self.playing and len(self.queue) > 0:
                self.start()

    def start(self):
        """Start audio playback."""
        if not self.playing:
            self.playing = True
            self.stream.start()

    def stop(self):
        """Stop audio playback and clear buffer."""
        with self.lock:
            self.queue.clear()
        self.playing = False
        self.stream.stop()

    def terminate(self):
        """Terminate the audio player and release resources."""
        with self.lock:
            self.queue.clear()
        self.stream.stop()
        self.stream.close()

def listen_and_send_audio(connection):
    """Capture audio from microphone and send to Voice Live API."""
    print("🎤 Audio input started")

    # Create audio input stream - EXACT settings from working notebook
    stream = sd.InputStream(
        channels=1, 
        samplerate=AUDIO_SAMPLE_RATE, 
        dtype="int16"
    )
    
    try:
        stream.start()
        
        while not stop_event.is_set():
            if stream.read_available >= READ_SIZE:
                # Read audio data
                data, _ = stream.read(READ_SIZE)
                
                # Encode as base64
                audio = base64.b64encode(data).decode("utf-8")
                
                # Create API message
                param = {
                    "type": "input_audio_buffer.append", 
                    "audio": audio, 
                    "event_id": ""
                }
                
                # Send to API
                connection.send_dict(param)
            else:
                time.sleep(0.001)  # Small sleep to prevent busy waiting
                
    except Exception as e:
        print(f"❌ Audio input error: {e}")
        stop_event.set()
    finally:
        stream.stop()
        stream.close()

def receive_audio_and_playback(connection):
    """Receive messages from Voice Live API and handle audio playback."""
    print("🔊 Audio output started")
    
    # Create audio player
    audio_player = AudioPlayerAsync()
    last_audio_item_id = None
    
    try:
        while not stop_event.is_set():
            try:
                raw_message = connection.recv(timeout_s=0.1)
                if raw_message:
                    event = json.loads(raw_message)
                    event_type = event.get("type", "")
                    
                    # Handle different event types
                    if event_type == "session.created":
                        session = event.get("session", {})
                        session_id = session.get("id", "unknown")
                        print(f"✅ Session: {session_id}")
                        
                    elif event_type == "conversation.item.input_audio_transcription.completed":
                        transcript = event.get("transcript", "")
                        if transcript:
                            print(f"User: {transcript}")
                            
                    elif event_type == "response.audio_transcript.done":
                        transcript = event.get("transcript", "")
                        if transcript:
                            print(f"Agent: {transcript}")
                            
                    elif event_type == "response.audio.delta":
                        # New audio data from AI response
                        item_id = event.get("item_id", "unknown")
                        
                        if item_id != last_audio_item_id:
                            last_audio_item_id = item_id

                        # Decode and play audio - EXACT method from working notebook
                        bytes_data = base64.b64decode(event.get("delta", ""))
                        if bytes_data:
                            audio_player.add_data(bytes_data)
                            
                    elif event_type == "error":
                        error = event.get("error", {})
                        print(f"❌ API Error: {error.get('message', 'Unknown error')}")
                        stop_event.set()
                        
            except Exception as e:
                if not stop_event.is_set():
                    print(f"❌ Audio processing error: {e}")
                    
    except Exception as e:
        print(f"❌ Audio output error: {e}")
        stop_event.set()
    finally:
        audio_player.terminate()

def monitor_user_input():
    """Monitor keyboard input for quit command."""
    print("⌨️  Type 'q' + Enter to quit")
    
    while not stop_event.is_set():
        try:
            user_input = input().strip().lower()
            if user_input == 'q':
                stop_event.set()
                break
        except (EOFError, KeyboardInterrupt):
            stop_event.set()
            break
        except Exception as e:
            break
    
print("✅ Audio processing functions loaded")
print(f"   Configuration: {AUDIO_SAMPLE_RATE}Hz, {READ_SIZE} samples/chunk (20ms)")

=== Audio Processing Functions ===
✅ Audio processing functions loaded
   Configuration: 24000Hz, 480 samples/chunk (20ms)

✅ Audio processing functions loaded
   Configuration: 24000Hz, 480 samples/chunk (20ms)


In [None]:
# Step 6: Live Voice Agent Application

def run_live_voice_agent():
    """
    Main application orchestrating the live voice agent.
    
    Architecture:
    1. Establish WebSocket connection to Azure Voice Live API
    2. Send session configuration
    3. Start three concurrent threads:
       - Audio input (microphone → API)
       - Audio output (API → speakers)  
       - User input (keyboard monitoring)
    4. Coordinate graceful shutdown
    """
    global stop_event
    stop_event.clear()
    
    threads = []
    connection = None
    
    try:
        print("🚀 Starting Live Voice Agent...")
        
        # Establish connection
        agent.connect()
        connection = agent._ws
        
        # Configure session
        session_config = agent._session_update()
        connection.send_dict(session_config)
        
        # Wait for session establishment
        time.sleep(2)
        
        # Start processing threads
        thread_configs = [
            {"target": lambda: listen_and_send_audio(connection), "name": "AudioInput"},
            {"target": lambda: receive_audio_and_playback(connection), "name": "AudioOutput"},
            {"target": monitor_user_input, "name": "UserInput"}
        ]
        
        for config in thread_configs:
            thread = threading.Thread(target=config["target"], name=config["name"])
            thread.start()
            threads.append(thread)
        
        print("\n" + "="*40)
        print("🎙️  LIVE VOICE AGENT ACTIVE")
        print("   Speak into microphone")
        print("   Type 'q' + Enter to quit")
        print("="*40)
        
        # Wait for user termination
        threads[2].join()  # Wait for user input thread
        
    except Exception as e:
        print(f"❌ Application error: {e}")
        
    finally:
        print("\n🛑 Shutting down...")
        stop_event.set()
        
        # Stop threads with timeout
        for thread in threads:
            if thread.is_alive():
                thread.join(timeout=3)
        
        # Close connection
        if connection:
            try:
                agent.close()
            except Exception as e:
                pass
        
        print("✅ Shutdown complete")

print("✅ Live Voice Agent application ready")

✅ Live Voice Agent application ready



: 

In [8]:
# Step 7: Execute Live Voice Agent

# Clean up any existing connections first
import threading
import time

# Stop all running threads
if 'stop_event' in globals():
    stop_event.set()
    print("🛑 Stopping any running voice agents...")
    time.sleep(2)

# Close any existing agent connections
if 'agent' in globals():
    try:
        agent.close()
        print("🔌 Closed existing agent connection")
    except:
        pass

# Clear thread references
active_threads = [t for t in threading.enumerate() if t.name in ['AudioInput', 'AudioOutput', 'UserInput']]
if active_threads:
    print(f"⚠️  Found {len(active_threads)} active audio threads - they should stop shortly")
else:
    print("✅ No active audio threads found")


print("=== Live Voice Agent Ready ===")
print(f"Agent: {agent._binding.agent_id}")
print(f"Voice: {agent._session.voice_name}")
print(f"Audio: {AUDIO_SAMPLE_RATE}Hz, 20ms chunks")

print("\n🎯 Starting voice conversation...")
print("   • Speak into your microphone")
print("   • Agent will respond with voice")
print("   • Type 'q' + Enter to quit")

# Execute the live voice agent (single instance only)
try:
    run_live_voice_agent()
except KeyboardInterrupt:
    print("\n⏹️  Interrupted by user")
except Exception as e:
    print(f"\n❌ Error: {e}")
finally:
    print("✅ Session ended")

🛑 Stopping any running voice agents...



[2025-09-03 10:44:11,367] INFO INFO - apps.rtagent.backend.src.agents.Lvagent.audio_io - apps.rtagent.backend.src.agents.Lvagent.audio_io: SpeakerSink stopped.
: SpeakerSink stopped.
INFO:apps.rtagent.backend.src.agents.Lvagent.audio_io:SpeakerSink stopped.
[2025-09-03 10:44:11,406]INFO:apps.rtagent.backend.src.agents.Lvagent.audio_io:SpeakerSink stopped.
[2025-09-03 10:44:11,406] INFO - apps.rtagent.backend.src.agents.Lvagent.base: Azure Live Voice Agent connection closed
 INFO - apps.rtagent.backend.src.agents.Lvagent.base: Azure Live Voice Agent connection closed
INFO:apps.rtagent.backend.src.agents.Lvagent.base:Azure Live Voice Agent connection closed
INFO:apps.rtagent.backend.src.agents.Lvagent.base:Azure Live Voice Agent connection closed


🔌 Closed existing agent connection
✅ No active audio threads found

✅ No active audio threads found
=== Live Voice Agent Ready ===
Agent: asst_Kp4exd80NINFuraHyWOftsuR
Voice: en-US-Ava:DragonHDLatestNeural
Audio: 24000Hz, 20ms chunks=== Live Voice Agent Ready ===
Agent: asst_Kp4exd80NINFuraHyWOftsuR
Voice: en-US-Ava:DragonHDLatestNeural
Audio: 24000Hz, 20ms chunks

🎯 Starting voice conversation...
   • Speak into your microphone
   • Agent will respond with voice
   • Type 'q' + Enter to quit
🚀 Starting Live Voice Agent...


🎯 Starting voice conversation...
   • Speak into your microphone
   • Agent will respond with voice
   • Type 'q' + Enter to quit
🚀 Starting Live Voice Agent...


INFO:websocket:Websocket connected
[2025-09-03 10:44:13,444] INFO - apps.rtagent.backend.src.agents.Lvagent.transport: WebSocket opened.
[2025-09-03 10:44:13,444] INFO - apps.rtagent.backend.src.agents.Lvagent.transport: WebSocket opened.
INFO:apps.rtagent.backend.src.agents.Lvagent.transport:WebSocket opened.
[2025-09-03 10:44:13,501] INFO - INFO:apps.rtagent.backend.src.agents.Lvagent.transport:WebSocket opened.
[2025-09-03 10:44:13,501] INFO - apps.rtagent.backend.src.agents.Lvagent.base: Connected to Azure Voice Live API
INFO:apps.rtagent.backend.src.agents.Lvagent.base:Connected to Azure Voice Live API
[2025-09-03 10:44:13,540] INFO - apps.rtagent.backend.src.agents.Lvagent.base: Session configuration sent
INFO:apps.rtagent.backend.src.agents.Lvagent.base:Session configuration sent
apps.rtagent.backend.src.agents.Lvagent.base: Connected to Azure Voice Live API
INFO:apps.rtagent.backend.src.agents.Lvagent.base:Connected to Azure Voice Live API
[2025-09-03 10:44:13,540] INFO - apps.

🎤 Audio input started🔊 Audio output started
⌨️  Type 'q' + Enter to quit
⌨️  Type 'q' + Enter to quit

🎙️  LIVE VOICE AGENT ACTIVE


🎙️  LIVE VOICE AGENT ACTIVE

   Speak into microphone
   Speak into microphone
   Type 'q' + Enter to quit
✅ Session: sess_31yKBhkscHtifGUzkKXxVZ
User: Hi there. How are you doing?
Agent: Hello! I'm here and ready to help you with any questions or concerns you might have. How can I assist you today?
User: Doing great. Thank you so much. What can you do for me?
Agent: That’s great to hear! I can help you with a variety of things, including:

- Checking the status of your order (delivery updates, tracking, etc.)
- Providing information on returns, shipping, payments, warranties, and policies
- Assisting with issues like damaged products, incorrect billing, or account trouble
- Connecting you with a human support agent if needed

Let me know what you need, and I'll be happy to assist!
User: That's great to hear. I can help you with a variety of things, inclu

[2025-09-03 10:45:10,558] INFO - apps.rtagent.backend.src.agents.Lvagent.audio_io: SpeakerSink stopped.
INFO:apps.rtagent.backend.src.agents.Lvagent.audio_io:SpeakerSink stopped.
 INFO - apps.rtagent.backend.src.agents.Lvagent.audio_io: SpeakerSink stopped.
INFO:apps.rtagent.backend.src.agents.Lvagent.audio_io:SpeakerSink stopped.
[2025-09-03 10:45:11,635] INFO - apps.rtagent.backend.src.agents.Lvagent.transport: WebSocket closed: code=None, msg=None
INFO:apps.rtagent.backend.src.agents.Lvagent.transport:WebSocket closed: code=None, msg=None
[2025-09-03 10:45:11,635] INFO - apps.rtagent.backend.src.agents.Lvagent.transport: WebSocket closed: code=None, msg=None
INFO:apps.rtagent.backend.src.agents.Lvagent.transport:WebSocket closed: code=None, msg=None
[2025-09-03 10:45:11,651] INFO - apps.rtagent.backend.src.agents.Lvagent.base: Azure Live Voice Agent connection closed
[2025-09-03 10:45:11,651] INFO - apps.rtagent.backend.src.agents.Lvagent.base: Azure Live Voice Agent connection clos

✅ Shutdown complete
✅ Session ended

✅ Session ended


# Notes and Troubleshooting

## System Requirements

**Software Dependencies:**
- Python 3.11+
- sounddevice library for audio I/O
- websocket-client for WebSocket communication
- azure-identity for authentication
- numpy for audio processing

**Hardware Requirements:**
- Microphone for audio input
- Speakers/headphones for audio output
- Stable internet connection (minimum 1 Mbps)

**Azure Resources:**
- Azure AI Agent Service resource
- Azure Voice Live API access
- Proper RBAC permissions for agent access

## Configuration Files

**YAML Structure:**
```yaml
model:
  deployment_id: "gpt-4o"
  
azure_ai_foundry_agent_connected:
  agent_id: "${AI_FOUNDRY_AGENT_ID}"
  project_name: "${AI_FOUNDRY_PROJECT_NAME}"
  
session:
  voice:
    name: "en-US-Ava:DragonHDLatestNeural"
    temperature: 0.8
  vad_threshold: 0.5
  vad_prefix_ms: 300
  vad_silence_ms: 1000
```

**Environment Variables:**
```bash
AZURE_VOICE_LIVE_ENDPOINT=https://your-resource.services.ai.azure.com/
AI_FOUNDRY_AGENT_ID=asst_your_agent_id
AI_FOUNDRY_PROJECT_NAME=your-project-name
AZURE_VOICE_LIVE_API_KEY=optional_api_key
```

## Common Issues

**Connection Errors:**
- Verify endpoint format (.services.ai.azure.com)
- Check agent ID and project name
- Ensure proper Azure authentication

**Audio Issues:**
- Check microphone/speaker permissions
- Verify audio device availability
- Adjust sample rate if needed

**Performance Issues:**
- Monitor thread synchronization
- Check network latency
- Optimize buffer sizes for your environment