# Azure Voice Live API - Interactive Development Notebook

This notebook deconstructs the `voice-live-quickstart.py` script into manageable sections for educational purposes, testing, and development. 

## Overview

The Azure Voice Live API enables real-time voice conversations with AI models. This notebook demonstrates:

1. **Setup and Configuration** - Environment variables and authentication
2. **Core Classes** - Understanding the main components
3. **Audio Processing** - How audio input/output works
4. **WebSocket Communication** - Real-time message handling
5. **Threading Model** - Concurrent audio processing
6. **Complete Integration** - Putting it all together

## Prerequisites

- Azure Cognitive Services endpoint with Voice Live API access
- Python environment with required dependencies
- Audio devices (microphone and speakers) for testing

## 1. Setup and Dependencies

First, let's import all the required libraries and understand what each one does:

In [1]:
# Core Python libraries
import os
import uuid
import json
import time
import base64
import logging
import threading
import queue
import signal
import sys
from collections import deque
from datetime import datetime

# Audio processing
import numpy as np
import sounddevice as sd

# Azure SDK and authentication
from dotenv import load_dotenv
from azure.core.credentials import TokenCredential
from azure.identity import DefaultAzureCredential

# Type hints
from typing import Dict, Union, Literal, Set
from typing_extensions import Iterator, TypedDict, Required

# WebSocket communication
import websocket
from websocket import WebSocketApp

print("✅ All dependencies imported successfully!")

✅ All dependencies imported successfully!


## 2. Configuration and Global Variables

Let's set up the configuration and global variables used throughout the application:

In [2]:
# Load environment variables
load_dotenv("./.env", override=True)

# Global variables for thread coordination
stop_event = threading.Event()
connection_queue = queue.Queue()

# Audio configuration
AUDIO_SAMPLE_RATE = 24000

# Logger setup
logger = logging.getLogger(__name__)

# Configuration from environment variables
AZURE_VOICE_LIVE_ENDPOINT = os.environ.get("AZURE_VOICE_LIVE_ENDPOINT") or "https://aifoundry825233136833-resource.services.ai.azure.com"
AZURE_VOICE_LIVE_MODEL = os.environ.get("AZURE_VOICE_LIVE_MODEL") or "gpt-4o"
AZURE_VOICE_LIVE_API_VERSION = os.environ.get("AZURE_VOICE_LIVE_API_VERSION") or "2025-05-01-preview"
AZURE_VOICE_LIVE_API_KEY = os.environ.get("AZURE_VOICE_LIVE_API_KEY")

print("📋 Configuration loaded:")
print(f"  - Endpoint: {AZURE_VOICE_LIVE_ENDPOINT}")
print(f"  - Model: {AZURE_VOICE_LIVE_MODEL}")
print(f"  - API Version: {AZURE_VOICE_LIVE_API_VERSION}")
print(f"  - Audio Sample Rate: {AUDIO_SAMPLE_RATE} Hz")
print(f"  - API Key configured: {'✅' if AZURE_VOICE_LIVE_API_KEY else '❌'}")

📋 Configuration loaded:
  - Endpoint: https://aifoundry825233136833-resource.services.ai.azure.com
  - Model: gpt-4o
  - API Version: 2025-05-01-preview
  - Audio Sample Rate: 24000 Hz
  - API Key configured: ❌


## 3. WebSocket Connection Management
The WebSocket endpoint for the voice live API is `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-05-01-preview`. The endpoint is the same for all models. The only difference is the required model query parameter.

For example, an endpoint for a resource with a custom domain would be `wss://<your-ai-foundry-resource-name>.cognitiveservices.azure.com/voice-live/realtime?api-version=2025-05-01-preview&model=gpt-4o-mini-realtime-preview`

The `VoiceLiveConnection` class handles the WebSocket connection to the Azure Voice Live API:

In [3]:
class VoiceLiveConnection:
    """
    Manages WebSocket connection to Azure Voice Live API.
    
    Features:
    - Asynchronous message handling
    - Thread-safe message queue
    - Connection state management
    - Error handling and logging
    """
    
    def __init__(self, url: str, headers: dict) -> None:
        self._url = url
        self._headers = headers
        self._ws = None
        self._message_queue = queue.Queue()
        self._connected = False

    def connect(self) -> None:
        """Establish WebSocket connection with event handlers."""
        
        def on_message(ws, message):
            """Handle incoming messages by adding them to the queue."""
            self._message_queue.put(message)
        
        def on_error(ws, error):
            """Handle WebSocket errors."""
            logger.error(f"WebSocket error: {error}")
        
        def on_close(ws, close_status_code, close_msg):
            """Handle connection closure."""
            logger.info("WebSocket connection closed")
            self._connected = False
        
        def on_open(ws):
            """Handle successful connection."""
            logger.info("WebSocket connection opened")
            self._connected = True

        # Create WebSocket app with event handlers
        self._ws = websocket.WebSocketApp(
            self._url,
            header=self._headers,
            on_message=on_message,
            on_error=on_error,
            on_close=on_close,
            on_open=on_open
        )
        
        # Start WebSocket in a separate thread
        self._ws_thread = threading.Thread(target=self._ws.run_forever)
        self._ws_thread.daemon = True
        self._ws_thread.start()
        
        # Wait for connection to be established
        timeout = 10  # seconds
        start_time = time.time()
        while not self._connected and time.time() - start_time < timeout:
            time.sleep(0.1)
        
        if not self._connected:
            raise ConnectionError("Failed to establish WebSocket connection")

    def recv(self) -> str:
        """Receive a message from the queue (non-blocking with timeout)."""
        try:
            return self._message_queue.get(timeout=1)
        except queue.Empty:
            return None

    def send(self, message: str) -> None:
        """Send a message through the WebSocket."""
        if self._ws and self._connected:
            self._ws.send(message)

    def close(self) -> None:
        """Close the WebSocket connection."""
        if self._ws:
            self._ws.close()
            self._connected = False

print("✅ VoiceLiveConnection class defined")

✅ VoiceLiveConnection class defined


## 4. Azure Voice Live Client

The main client class that handles authentication and connection setup:

In [4]:
class AzureVoiceLive:
    """
    Main client for Azure Voice Live API.
    
    Handles:
    - Authentication (API key or token-based)
    - Connection management
    - URL construction for WebSocket endpoint
    """
    
    def __init__(
        self,
        *,
        azure_endpoint: str | None = None,
        api_version: str | None = None,
        token: str | None = None,
        api_key: str | None = None,
    ) -> None:
        self._azure_endpoint = azure_endpoint
        self._api_version = api_version
        self._token = token
        self._api_key = api_key
        self._connection = None

    def connect(self, model: str) -> VoiceLiveConnection:
        """
        Create a connection to the Voice Live API.
        
        Args:
            model: The AI model to use (e.g., 'gpt-4o')
            
        Returns:
            VoiceLiveConnection: Ready-to-use connection object
        """
        # if self._connection is not None:
        #     raise ValueError("Already connected to the Voice Live API.")
        if not model:
            raise ValueError("Model name is required.")

        # Convert HTTPS endpoint to WSS for WebSocket
        azure_ws_endpoint = self._azure_endpoint.rstrip('/').replace("https://", "wss://")

        # Construct WebSocket URL
        url = f"{azure_ws_endpoint}/voice-live/realtime?api-version={self._api_version}&model={model}"

        # Setup authentication headers
        auth_header = {"Authorization": f"Bearer {self._token}"} if self._token else {"api-key": self._api_key}
        request_id = uuid.uuid4()
        headers = {"x-ms-client-request-id": str(request_id), **auth_header}

        # Create and connect
        self._connection = VoiceLiveConnection(url, headers)
        self._connection.connect()
        return self._connection

print("✅ AzureVoiceLive client class defined")

✅ AzureVoiceLive client class defined


## 5. Audio Processing Components

The `AudioPlayerAsync` class handles real-time audio playback with buffering:

In [5]:
class AudioPlayerAsync:
    """
    Asynchronous audio player with buffering for real-time playback.
    
    Features:
    - Thread-safe audio queue
    - Real-time streaming playback
    - Automatic start/stop management
    - Low-latency audio processing
    """
    
    def __init__(self):
        self.queue = deque()
        self.lock = threading.Lock()
        self.stream = sd.OutputStream(
            callback=self.callback,
            samplerate=AUDIO_SAMPLE_RATE,
            channels=1,
            dtype=np.int16,
            blocksize=2400,  # ~100ms at 24kHz
        )
        self.playing = False

    def callback(self, outdata, frames, time, status):
        """
        Audio callback function called by sounddevice.
        
        This function is called in real-time by the audio system
        and must be very efficient to avoid dropouts.
        """
        if status:
            logger.warning(f"Stream status: {status}")
            
        with self.lock:
            data = np.empty(0, dtype=np.int16)
            
            # Fill the output buffer from our queue
            while len(data) < frames and len(self.queue) > 0:
                item = self.queue.popleft()
                frames_needed = frames - len(data)
                data = np.concatenate((data, item[:frames_needed]))
                
                # If we have leftover data, put it back
                if len(item) > frames_needed:
                    self.queue.appendleft(item[frames_needed:])
            
            # Pad with silence if we don't have enough data
            if len(data) < frames:
                data = np.concatenate((data, np.zeros(frames - len(data), dtype=np.int16)))
                
        outdata[:] = data.reshape(-1, 1)

    def add_data(self, data: bytes):
        """Add audio data to the playback queue."""
        with self.lock:
            np_data = np.frombuffer(data, dtype=np.int16)
            self.queue.append(np_data)
            
            # Auto-start playback if we have data
            if not self.playing and len(self.queue) > 0:
                self.start()

    def start(self):
        """Start audio playback."""
        if not self.playing:
            self.playing = True
            self.stream.start()

    def stop(self):
        """Stop audio playback and clear buffer."""
        with self.lock:
            self.queue.clear()
        self.playing = False
        self.stream.stop()

    def terminate(self):
        """Terminate the audio player and release resources."""
        with self.lock:
            self.queue.clear()
        self.stream.stop()
        self.stream.close()

print("✅ AudioPlayerAsync class defined")

✅ AudioPlayerAsync class defined


## 6. Audio Input Processing

Function to capture microphone input and send it to the API:

In [6]:
def listen_and_send_audio(connection: VoiceLiveConnection) -> None:
    """
    Capture audio from microphone and send to Voice Live API.
    
    This function runs in a separate thread and continuously:
    1. Reads audio from the microphone
    2. Encodes it as base64
    3. Sends it to the API via WebSocket
    
    Args:
        connection: Active VoiceLiveConnection instance
    """
    logger.info("Starting audio stream ...")

    # Create audio input stream
    stream = sd.InputStream(
        channels=1, 
        samplerate=AUDIO_SAMPLE_RATE, 
        dtype="int16"
    )
    
    try:
        stream.start()
        
        # Read audio in 20ms chunks (480 samples at 24kHz)
        read_size = int(AUDIO_SAMPLE_RATE * 0.02)
        
        while not stop_event.is_set():
            if stream.read_available >= read_size:
                # Read audio data
                data, _ = stream.read(read_size)
                
                # Encode as base64
                audio = base64.b64encode(data).decode("utf-8")
                
                # Create API message
                param = {
                    "type": "input_audio_buffer.append", 
                    "audio": audio, 
                    "event_id": ""
                }
                
                # Send to API
                data_json = json.dumps(param)
                connection.send(data_json)
            else:
                time.sleep(0.001)  # Small sleep to prevent busy waiting
                
    except Exception as e:
        logger.error(f"Audio stream interrupted. {e}")
    finally:
        stream.stop()
        stream.close()
        logger.info("Audio stream closed.")

print("✅ Audio input function defined")

✅ Audio input function defined


## 7. Audio Output Processing

Function to receive audio from the API and handle playback:

In [7]:
def receive_audio_and_playback(connection: VoiceLiveConnection) -> None:
    """
    Receive messages from Voice Live API and handle audio playback.
    
    This function runs in a separate thread and handles:
    1. Receiving WebSocket messages from the API
    2. Processing different event types
    3. Playing back audio responses
    4. Managing conversation state
    
    Args:
        connection: Active VoiceLiveConnection instance
    """
    last_audio_item_id = None
    audio_player = AudioPlayerAsync()

    logger.info("Starting audio playback ...")
    
    try:
        while not stop_event.is_set():
            # Receive message from API
            raw_event = connection.recv()
            if raw_event is None:
                continue
                
            try:
                event = json.loads(raw_event)
                event_type = event.get("type")
                print(f"Received event: {event_type}")

                # Handle different event types
                if event_type == "session.created":
                    session = event.get("session")
                    logger.info(f"Session created: {session.get('id')}")

                elif event_type == "response.audio.delta":
                    # New audio data from AI response
                    if event.get("item_id") != last_audio_item_id:
                        last_audio_item_id = event.get("item_id")

                    # Decode and play audio
                    bytes_data = base64.b64decode(event.get("delta", ""))
                    if bytes_data:
                        logger.debug(f"Received audio data of length: {len(bytes_data)}")   
                        audio_player.add_data(bytes_data)

                elif event_type == "input_audio_buffer.speech_started":
                    # User started speaking - stop AI playback
                    print("🎤 Speech started")
                    audio_player.stop()

                elif event_type == "error":
                    # Handle API errors
                    error_details = event.get("error", {})
                    error_type = error_details.get("type", "Unknown")
                    error_code = error_details.get("code", "Unknown")
                    error_message = error_details.get("message", "No message provided")
                    raise ValueError(f"Error received: Type={error_type}, Code={error_code}, Message={error_message}")
                    
            except json.JSONDecodeError as e:
                logger.error(f"Failed to parse JSON event: {e}")
                continue

    except Exception as e:
        logger.error(f"Error in audio playback: {e}")
    finally:
        audio_player.terminate()
        logger.info("Playback done.")

print("✅ Audio output function defined")

✅ Audio output function defined


## 8. User Input Management

Function to handle keyboard input for graceful shutdown:

In [8]:
def read_keyboard_and_quit() -> None:
    """
    Monitor keyboard input for quit command.
    
    This function runs in a separate thread and waits for the user
    to type 'q' to gracefully shutdown the application.
    """
    print("Press 'q' and Enter to quit the chat.")
    
    while not stop_event.is_set():
        try:
            user_input = input()
            if user_input.strip().lower() == 'q':
                print("Quitting the chat...")
                stop_event.set()
                break
        except EOFError:
            # Handle case where input is interrupted
            break

print("✅ User input function defined")

✅ User input function defined


## 9. Session Configuration

Now let's create the session configuration that defines how the AI should behave:

In [9]:
# Session configuration for the Voice Live API
session_config = {
    "type": "session.update",
    "session": {
        # Basic AI behavior
        "instructions": "You are a helpful AI assistant responding in natural, engaging language.",
        
        # Voice Activity Detection (VAD) settings
        "turn_detection": {
            "type": "azure_semantic_vad",
            "threshold": 0.3,  # Sensitivity for detecting speech
            "prefix_padding_ms": 200,  # Keep audio before speech starts
            "silence_duration_ms": 200,  # How long to wait for silence
            "remove_filler_words": False,  # Keep "um", "uh", etc.
            "end_of_utterance_detection": {
                "model": "semantic_detection_v1",
                "threshold": 0.01,  # When to consider speech finished
                "timeout": 2,  # Max time to wait for continuation
            },
        },
        
        # Audio preprocessing
        "input_audio_noise_reduction": {
            "type": "azure_deep_noise_suppression"  # Remove background noise
        },
        "input_audio_echo_cancellation": {
            "type": "server_echo_cancellation"  # Prevent feedback loops
        },
        
        # AI voice settings
        "voice": {
            "name": "en-US-Ava:DragonHDLatestNeural",  # High-quality neural voice
            "type": "azure-standard",
            "temperature": 0.8,  # Controls response creativity/randomness
        },
    },
    "event_id": ""
}

print("✅ Session configuration created")
print(f"   - Voice: {session_config['session']['voice']['name']}")
print(f"   - VAD Type: {session_config['session']['turn_detection']['type']}")
print(f"   - Noise Reduction: {session_config['session']['input_audio_noise_reduction']['type']}")
print(f"   - Echo Cancellation: {session_config['session']['input_audio_echo_cancellation']['type']}")

✅ Session configuration created
   - Voice: en-US-Ava:DragonHDLatestNeural
   - VAD Type: azure_semantic_vad
   - Noise Reduction: azure_deep_noise_suppression
   - Echo Cancellation: server_echo_cancellation


## 10. Client Setup and Authentication

Let's create the client and establish a connection:

In [10]:
# Setup logging for this session
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")

# Create logs directory if it doesn't exist
import os
if not os.path.exists('logs'):
    os.makedirs('logs')

# Configure logging
logging.basicConfig(
    filename=f'logs/{timestamp}_voicelive_notebook.log',
    filemode="w",
    level=logging.DEBUG,
    format='%(asctime)s:%(name)s:%(levelname)s:%(message)s'
)

# Setup authentication - prefer token-based auth
credential = DefaultAzureCredential()
scopes = "https://ai.azure.com/.default"

try:
    token = credential.get_token(scopes)
    auth_method = "token"
    print("✅ Using token-based authentication")
except Exception as e:
    print(f"⚠️  Token auth failed: {e}")
    print("🔑 Falling back to API key authentication")
    auth_method = "api_key"

# Create the client
client_kwargs = {
    "azure_endpoint": AZURE_VOICE_LIVE_ENDPOINT,
    "api_version": AZURE_VOICE_LIVE_API_VERSION,
}

if auth_method == "token":
    client_kwargs["token"] = token.token
else:
    client_kwargs["api_key"] = AZURE_VOICE_LIVE_API_KEY

client = AzureVoiceLive(**client_kwargs)

print(f"✅ Azure Voice Live client created successfully")
print(f"   - Authentication: {auth_method}")
print(f"   - Endpoint: {AZURE_VOICE_LIVE_ENDPOINT}")
print(f"   - Model: {AZURE_VOICE_LIVE_MODEL}")

✅ Using token-based authentication
✅ Azure Voice Live client created successfully
   - Authentication: token
   - Endpoint: https://aifoundry825233136833-resource.services.ai.azure.com
   - Model: gpt-4o


## 11. Test Connection (Optional)

Let's test the connection without starting the full voice chat:

In [25]:
# Test connection (optional - uncomment to run)
# This will establish a connection and immediately close it

try:
    print("🔌 Testing connection...")
    test_connection = client.connect(model=AZURE_VOICE_LIVE_MODEL)
    print("✅ Connection test successful!")
    test_connection.close()
    print("🔌 Test connection closed")
except Exception as e:
    print(f"❌ Connection test failed: {e}")

# print("ℹ️  Connection test code ready (uncomment to run)")

🔌 Testing connection...
✅ Connection test successful!
🔌 Test connection closed


## 12. Main Voice Chat Application

Now let's put it all together to create the complete voice chat experience:

In [12]:
def run_voice_chat():
    """
    Main function to run the complete voice chat application.
    
    This orchestrates:
    1. Connection establishment
    2. Session configuration
    3. Thread management for audio I/O
    4. Graceful shutdown
    """
    global stop_event
    
    # Reset the stop event for a fresh start
    stop_event.clear()
    
    try:
        print("🚀 Starting Voice Live chat application...")
        
        # 1. Connect to the API
        if hasattr(client, '_connection'):
            connection = client._connection
        else:
            connection = client.connect(model=AZURE_VOICE_LIVE_MODEL)
        print("✅ Connected to Voice Live API")
        
        # 2. Send session configuration
        connection.send(json.dumps(session_config))
        print("✅ Session configuration sent")
        
        # 3. Create and start threads
        print("🧵 Starting threads...")
        send_thread = threading.Thread(
            target=listen_and_send_audio, 
            args=(connection,),
            name="AudioInput"
        )
        receive_thread = threading.Thread(
            target=receive_audio_and_playback, 
            args=(connection,),
            name="AudioOutput"
        )
        keyboard_thread = threading.Thread(
            target=read_keyboard_and_quit,
            name="KeyboardInput"
        )

        # Start all threads
        send_thread.start()
        receive_thread.start()
        keyboard_thread.start()
        
        print("🎙️  Voice chat is now active!")
        print("💬 You can start speaking...")
        print("⌨️  Type 'q' and press Enter to quit")
        
        # 4. Wait for user to quit
        keyboard_thread.join()
        
        # 5. Graceful shutdown
        print("🛑 Shutting down...")
        stop_event.set()
        
        # Wait for threads to finish (with timeout)
        send_thread.join(timeout=2)
        receive_thread.join(timeout=2)
        
        # Close connection
        connection.close()
        print("✅ Voice chat ended successfully")
        
    except Exception as e:
        print(f"❌ Error in voice chat: {e}")
        stop_event.set()
        if 'connection' in locals():
            connection.close()

print("✅ Main voice chat function defined")

✅ Main voice chat function defined


## 13. Run the Voice Chat

**⚠️ Important:** Make sure you have:
1. A microphone and speakers/headphones connected
2. Azure Voice Live API credentials configured
3. A quiet environment for testing

Uncomment and run the cell below to start the voice chat:

In [13]:
# Run the voice chat application
# Uncomment the line below to start the chat
run_voice_chat()

print("🎯 Ready to start voice chat!")
print("📋 To run: uncomment the line above and execute this cell")
print("🎙️  Make sure your microphone and speakers are working")
print("📝 Check the logs directory for detailed logging")

🚀 Starting Voice Live chat application...
✅ Connected to Voice Live API
✅ Session configuration sent
🧵 Starting threads...
Press 'q' and Enter to quit the chat.
🎙️  Voice chat is now active!
💬 You can start speaking...
⌨️  Type 'q' and press Enter to quit


Received event: session.created
Quitting the chat...
🛑 Shutting down...
✅ Voice chat ended successfully
🎯 Ready to start voice chat!
📋 To run: uncomment the line above and execute this cell
🎙️  Make sure your microphone and speakers are working
📝 Check the logs directory for detailed logging


## 14. Development & Testing Utilities

Here are some utility functions for development and testing:

In [14]:
# Development and testing utilities

def check_audio_devices():
    """Check available audio input and output devices."""
    print("🎤 Available Audio Input Devices:")
    for i, device in enumerate(sd.query_devices()):
        if device['max_input_channels'] > 0:
            print(f"  {i}: {device['name']} (inputs: {device['max_input_channels']})")
    
    print("\n🔊 Available Audio Output Devices:")
    for i, device in enumerate(sd.query_devices()):
        if device['max_output_channels'] > 0:
            print(f"  {i}: {device['name']} (outputs: {device['max_output_channels']})")
    
    print(f"\n🎛️  Default Input Device: {sd.query_devices(sd.default.device[0])['name']}")
    print(f"🎛️  Default Output Device: {sd.query_devices(sd.default.device[1])['name']}")

def test_microphone(duration=3):
    """Test microphone input for a few seconds."""
    print(f"🎤 Testing microphone for {duration} seconds...")
    print("💬 Please speak into your microphone...")
    
    def callback(indata, frames, time, status):
        volume_norm = np.linalg.norm(indata) * 10
        print(f"📊 Volume level: {'█' * int(volume_norm)}")
    
    with sd.InputStream(callback=callback, channels=1, samplerate=AUDIO_SAMPLE_RATE):
        sd.sleep(duration * 1000)
    
    print("✅ Microphone test complete")

def create_custom_session_config(
    instructions="You are a helpful AI assistant.",
    voice_name="en-US-Ava:DragonHDLatestNeural",
    temperature=0.8,
    vad_threshold=0.3
):
    """Create a custom session configuration."""
    return {
        "type": "session.update",
        "session": {
            "instructions": instructions,
            "turn_detection": {
                "type": "azure_semantic_vad",
                "threshold": vad_threshold,
                "prefix_padding_ms": 200,
                "silence_duration_ms": 200,
                "remove_filler_words": False,
                "end_of_utterance_detection": {
                    "model": "semantic_detection_v1",
                    "threshold": 0.01,
                    "timeout": 2,
                },
            },
            "input_audio_noise_reduction": {
                "type": "azure_deep_noise_suppression"
            },
            "input_audio_echo_cancellation": {
                "type": "server_echo_cancellation"
            },
            "voice": {
                "name": voice_name,
                "type": "azure-standard",
                "temperature": temperature,
            },
        },
        "event_id": ""
    }

def view_logs():
    """View the most recent log file."""
    import glob
    log_files = glob.glob("logs/*.log")
    if log_files:
        latest_log = max(log_files, key=os.path.getctime)
        print(f"📝 Latest log file: {latest_log}")
        with open(latest_log, 'r') as f:
            lines = f.readlines()
            print("📄 Last 20 lines:")
            for line in lines[-20:]:
                print(line.strip())
    else:
        print("📝 No log files found")

print("✅ Development utilities loaded:")
print("  - check_audio_devices(): Check available audio devices")
print("  - test_microphone(duration=3): Test microphone input")
print("  - create_custom_session_config(): Create custom configurations")
print("  - view_logs(): View recent log entries")

✅ Development utilities loaded:
  - check_audio_devices(): Check available audio devices
  - test_microphone(duration=3): Test microphone input
  - create_custom_session_config(): Create custom configurations
  - view_logs(): View recent log entries


## 15. Example Tests

Run these cells to test individual components:

In [15]:
# Test 1: Check audio devices
# Uncomment to run:
# check_audio_devices()

print("🔍 Test 1: Audio device check (uncomment to run)")

🔍 Test 1: Audio device check (uncomment to run)


In [16]:
# Test 2: Test microphone input
# Uncomment to run:
# test_microphone(duration=3)

print("🎤 Test 2: Microphone test (uncomment to run)")

🎤 Test 2: Microphone test (uncomment to run)


## 16. Troubleshooting & Tips

### Common Issues and Solutions:

1. **Authentication Errors**
   - Ensure your Azure credentials are configured
   - Check that your API key is valid
   - Verify the endpoint URL is correct

2. **Audio Issues**
   - Run `check_audio_devices()` to verify device availability
   - Test microphone with `test_microphone()`
   - Check system audio permissions for the application

3. **Connection Problems**
   - Verify internet connectivity
   - Check firewall settings for WebSocket connections
   - Ensure the endpoint supports Voice Live API

4. **Performance Optimization**
   - Use headphones to prevent echo/feedback
   - Minimize background noise
   - Adjust VAD threshold in session config

### Voice Live API Features:

- **Real-time Voice Interaction**: Low-latency conversation with AI
- **Advanced VAD**: Semantic voice activity detection
- **Noise Suppression**: Built-in noise reduction and echo cancellation
- **Neural Voices**: High-quality text-to-speech synthesis
- **Flexible Configuration**: Customizable session parameters

### Next Steps:

1. **Experiment with Configuration**: Try different voices, VAD settings, and instructions
2. **Add Custom Logic**: Implement conversation tracking, session management
3. **Integration**: Connect to your applications and workflows
4. **Monitoring**: Add application insights and performance metrics

## 17. Debugging & Diagnostics

Let's run some diagnostics to understand why the voice chat isn't working properly:

In [17]:
# First, let's check your audio devices
print("🔍 Checking audio device configuration...")
check_audio_devices()

print("\n" + "="*60)
print("🎤 Testing microphone for 3 seconds...")
print("Please speak into your microphone!")
test_microphone(duration=3)

🔍 Checking audio device configuration...
🎤 Available Audio Input Devices:
  1: Jin’s AirPods Pro (inputs: 1)
  4: HD Pro Webcam C920 (inputs: 2)
  6: Jabra Link 380 (inputs: 1)
  7: MacBook Pro Microphone (inputs: 1)
  9: Microsoft Teams Audio (inputs: 1)

🔊 Available Audio Output Devices:
  0: LG ULTRAWIDE (outputs: 2)
  2: Jin’s AirPods Pro (outputs: 2)
  3: C49RG9x (outputs: 2)
  5: Jabra Link 380 (outputs: 2)
  8: MacBook Pro Speakers (outputs: 2)
  9: Microsoft Teams Audio (outputs: 1)

🎛️  Default Input Device: Jin’s AirPods Pro
🎛️  Default Output Device: Jin’s AirPods Pro

🎤 Testing microphone for 3 seconds...
Please speak into your microphone!
🎤 Testing microphone for 3 seconds...
💬 Please speak into your microphone...
📊 Volume level: 
📊 Volume level: 
📊 Volume level: 
📊 Volume level: ███
📊 Volume level: 
📊 Volume level: 
📊 Volume level: 
📊 Volume level: 
📊 Volume level: 
📊 Volume level: 
📊 Volume level: 
📊 Volume level: 
📊 Volume level: 
📊 Volume level: 
📊 Volume level: 
📊 Vol

In [18]:
def run_voice_chat_debug():
    """
    Enhanced voice chat function with better debugging and error handling.
    """
    global stop_event
    
    # Reset the stop event for a fresh start
    stop_event.clear()
    
    try:
        print("🚀 Starting Voice Live chat application (DEBUG MODE)...")
        
        # 1. Connect to the API
        connection = client.connect(model=AZURE_VOICE_LIVE_MODEL)
        print("✅ Connected to Voice Live API")
        
        # 2. Send session configuration
        connection.send(json.dumps(session_config))
        print("✅ Session configuration sent")
        print(f"   Instructions: {session_config['session']['instructions'][:50]}...")
        print(f"   Voice: {session_config['session']['voice']['name']}")
        
        # 3. Wait a moment to receive session.created event
        print("⏳ Waiting for session creation...")
        time.sleep(2)
        
        # Check for any immediate messages
        for i in range(5):
            msg = connection.recv()
            if msg:
                try:
                    event = json.loads(msg)
                    print(f"📨 Received: {event.get('type', 'unknown')}")
                    if event.get('type') == 'session.created':
                        print(f"   Session ID: {event.get('session', {}).get('id', 'unknown')}")
                    elif event.get('type') == 'error':
                        print(f"❌ Error: {event.get('error', {})}")
                except:
                    print(f"📨 Raw message: {msg[:100]}...")
            else:
                break
        
        # 4. Create and start threads with better error handling
        print("🧵 Starting threads...")
        
        def safe_listen_and_send_audio(connection):
            try:
                listen_and_send_audio(connection)
            except Exception as e:
                print(f"❌ Audio input error: {e}")
                stop_event.set()
        
        def safe_receive_audio_and_playback(connection):
            try:
                receive_audio_and_playback(connection)
            except Exception as e:
                print(f"❌ Audio output error: {e}")
                stop_event.set()
        
        send_thread = threading.Thread(
            target=safe_listen_and_send_audio, 
            args=(connection,),
            name="AudioInput"
        )
        receive_thread = threading.Thread(
            target=safe_receive_audio_and_playback, 
            args=(connection,),
            name="AudioOutput"
        )
        
        # Start audio threads
        send_thread.start()
        receive_thread.start()
        
        print("🎙️  Voice chat is now active!")
        print("💬 You can start speaking...")
        print("📊 Monitoring audio threads...")
        
        # Monitor threads instead of waiting for keyboard input
        start_time = time.time()
        while not stop_event.is_set():
            # Check if threads are still alive
            if not send_thread.is_alive():
                print("⚠️  Audio input thread died")
                break
            if not receive_thread.is_alive():
                print("⚠️  Audio output thread died")
                break
            
            # Print status every 10 seconds
            if time.time() - start_time > 10:
                print("📊 Still running... Threads alive. Speak into your microphone!")
                start_time = time.time()
            
            time.sleep(1)
        
        # 5. Graceful shutdown
        print("🛑 Shutting down...")
        stop_event.set()
        
        # Wait for threads to finish (with timeout)
        send_thread.join(timeout=2)
        receive_thread.join(timeout=2)
        
        # Close connection
        connection.close()
        print("✅ Voice chat ended successfully")
        
    except Exception as e:
        print(f"❌ Error in voice chat: {e}")
        import traceback
        traceback.print_exc()
        stop_event.set()
        if 'connection' in locals():
            connection.close()

print("✅ Debug voice chat function created")

✅ Debug voice chat function created


## 18. Debug Version - Run This Instead

Let's try the debug version which provides more information about what's happening:

In [19]:
# Run the debug version of voice chat
# This will show more detailed information about what's happening

# First check if we can see logs
print("📝 Checking recent logs:")
view_logs()

print("\n" + "="*60)
print("🚀 Starting DEBUG voice chat...")
print("This version will run for a while and show more status information")
print("Watch for error messages and connection status")

# Uncomment the line below to run debug version:
# run_voice_chat_debug()

print("💡 Uncomment the line above to run the debug version")

📝 Checking recent logs:
📝 Latest log file: logs/2025-08-26_18-42-15_voicelive_notebook.log
📄 Last 20 lines:
2025-08-26 18:42:17,176:azure.identity._internal.decorators:DEBUG:VisualStudioCodeCredential.get_token failed: VisualStudioCodeCredential requires the 'azure-identity-broker' package to be installed. You must also ensure you have the Azure Resources extension installed and have signed in to Azure via Visual Studio Code.
Traceback (most recent call last):
File "/Users/jinle/miniconda3/envs/audioagent/lib/python3.11/site-packages/azure/identity/_internal/decorators.py", line 23, in wrapper
token = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Users/jinle/miniconda3/envs/audioagent/lib/python3.11/site-packages/azure/identity/_credentials/vscode.py", line 211, in get_token
raise CredentialUnavailableError(message=self._unavailable_message)
azure.identity._exceptions.CredentialUnavailableError: VisualStudioCodeCredential requires the 'azure-identity-broker' package to be installed. Y

## 19. Quick Fix Attempts

Based on the issue you're experiencing, here are the most likely causes and fixes:

In [20]:
# Quick diagnostic: Check if the issue is with the keyboard input thread
# The original code might be exiting because of input() blocking in Jupyter

def run_voice_chat_jupyter_fixed():
    """
    Voice chat function optimized for Jupyter notebook environment.
    Removes the keyboard input thread that might cause issues in notebooks.
    """
    global stop_event
    
    # Reset the stop event for a fresh start
    stop_event.clear()
    
    try:
        print("🚀 Starting Voice Live chat application (JUPYTER OPTIMIZED)...")
        
        # 1. Connect to the API
        if hasattr(client, '_connection'):
            connection = client._connection
        else:
            connection = client.connect(model=AZURE_VOICE_LIVE_MODEL)
        print("✅ Connected to Voice Live API")
        
        # 2. Send session configuration
        connection.send(json.dumps(session_config))
        print("✅ Session configuration sent")
        
        # 3. Create and start threads (NO KEYBOARD THREAD)
        print("🧵 Starting audio threads...")
        send_thread = threading.Thread(
            target=listen_and_send_audio, 
            args=(connection,),
            name="AudioInput"
        )
        receive_thread = threading.Thread(
            target=receive_audio_and_playback, 
            args=(connection,),
            name="AudioOutput"
        )

        # Start all threads
        send_thread.start()
        receive_thread.start()
        
        print("🎙️  Voice chat is now active!")
        print("💬 You can start speaking...")
        print("⏰ This will run for 30 seconds, then auto-stop")
        print("🛑 To stop early, interrupt the kernel")
        
        # Run for 30 seconds instead of waiting for keyboard
        start_time = time.time()
        while time.time() - start_time < 30 and not stop_event.is_set():
            # Check threads are alive
            if not send_thread.is_alive():
                print("⚠️  Audio input thread stopped")
                break
            if not receive_thread.is_alive():
                print("⚠️  Audio output thread stopped")
                break
            
            # Print periodic status
            elapsed = int(time.time() - start_time)
            if elapsed % 5 == 0:
                print(f"⏱️  Running... {elapsed}s elapsed. Speak into microphone!")
            
            time.sleep(1)
        
        # 5. Graceful shutdown
        print("🛑 Shutting down...")
        stop_event.set()
        
        # Wait for threads to finish (with timeout)
        send_thread.join(timeout=2)
        receive_thread.join(timeout=2)
        
        # Close connection
        connection.close()
        print("✅ Voice chat ended successfully")
        
    except Exception as e:
        print(f"❌ Error in voice chat: {e}")
        import traceback
        traceback.print_exc()
        stop_event.set()
        if 'connection' in locals():
            connection.close()

# Try this version optimized for Jupyter:
print("🔧 Jupyter-optimized voice chat function ready!")
print("💡 This version runs for 30 seconds without keyboard input")
print("🚀 Uncomment the line below to try it:")
print("# run_voice_chat_jupyter_fixed()")

🔧 Jupyter-optimized voice chat function ready!
💡 This version runs for 30 seconds without keyboard input
🚀 Uncomment the line below to try it:
# run_voice_chat_jupyter_fixed()


## 20. Enhanced Audio Input Debugging

Let's test the audio input step by step to identify the exact issue:

In [21]:
# Step 1: Test basic microphone access
print("🔍 Step 1: Testing basic microphone access...")

try:
    # Test if we can access the microphone at all
    import sounddevice as sd
    import numpy as np
    
    print("Available audio devices:")
    for i, device in enumerate(sd.query_devices()):
        if device['max_input_channels'] > 0:
            print(f"  Input Device {i}: {device['name']} (channels: {device['max_input_channels']}, sample rate: {device['default_samplerate']})")
    
    print(f"\nDefault input device: {sd.query_devices(sd.default.device[0])['name']}")
    
    # Test basic recording
    print("\n🎤 Testing 3-second recording...")
    print("Please speak now!")
    
    duration = 3  # seconds
    sample_rate = 24000
    
    recording = sd.rec(int(duration * sample_rate), samplerate=sample_rate, channels=1, dtype='int16')
    sd.wait()  # Wait until recording is finished
    
    # Check if we got any audio
    max_amplitude = np.max(np.abs(recording))
    print(f"✅ Recording complete! Max amplitude: {max_amplitude}")
    
    if max_amplitude > 100:  # Some reasonable threshold
        print("✅ Microphone is working - detected audio!")
    else:
        print("⚠️  Very low audio detected - check microphone settings")
        
except Exception as e:
    print(f"❌ Microphone test failed: {e}")
    import traceback
    traceback.print_exc()

🔍 Step 1: Testing basic microphone access...
Available audio devices:
  Input Device 1: Jin’s AirPods Pro (channels: 1, sample rate: 24000.0)
  Input Device 4: HD Pro Webcam C920 (channels: 2, sample rate: 16000.0)
  Input Device 6: Jabra Link 380 (channels: 1, sample rate: 16000.0)
  Input Device 7: MacBook Pro Microphone (channels: 1, sample rate: 48000.0)
  Input Device 9: Microsoft Teams Audio (channels: 1, sample rate: 48000.0)

Default input device: Jin’s AirPods Pro

🎤 Testing 3-second recording...
Please speak now!
✅ Recording complete! Max amplitude: 31
⚠️  Very low audio detected - check microphone settings


In [22]:
# Step 2: Create an enhanced audio input function with debugging
def listen_and_send_audio_debug(connection: VoiceLiveConnection) -> None:
    """
    Enhanced version of audio input function with detailed debugging.
    """
    print("🎤 [AUDIO INPUT] Starting enhanced audio stream with debugging...")

    # Create audio input stream
    stream = sd.InputStream(
        channels=1, 
        samplerate=AUDIO_SAMPLE_RATE, 
        dtype="int16"
    )
    
    try:
        stream.start()
        print(f"🎤 [AUDIO INPUT] Stream started successfully at {AUDIO_SAMPLE_RATE}Hz")
        
        # Read audio in 20ms chunks (480 samples at 24kHz)
        read_size = int(AUDIO_SAMPLE_RATE * 0.02)
        print(f"🎤 [AUDIO INPUT] Reading chunks of {read_size} samples ({20}ms)")
        
        chunk_count = 0
        audio_sent_count = 0
        last_activity_time = time.time()
        
        while not stop_event.is_set():
            if stream.read_available >= read_size:
                # Read audio data
                data, overflowed = stream.read(read_size)
                chunk_count += 1
                
                if overflowed:
                    print("⚠️  [AUDIO INPUT] Audio buffer overflow detected!")
                
                # Check audio level
                audio_level = np.max(np.abs(data))
                
                # Log activity every 5 seconds or when there's significant audio
                current_time = time.time()
                if audio_level > 500 or (current_time - last_activity_time) > 5:
                    print(f"🎤 [AUDIO INPUT] Chunk {chunk_count}: level={audio_level}, available={stream.read_available}")
                    last_activity_time = current_time
                
                # Encode as base64
                audio = base64.b64encode(data).decode("utf-8")
                
                # Create API message
                param = {
                    "type": "input_audio_buffer.append", 
                    "audio": audio, 
                    "event_id": ""
                }
                
                # Send to API
                try:
                    data_json = json.dumps(param)
                    connection.send(data_json)
                    audio_sent_count += 1
                    
                    # Log significant audio being sent
                    if audio_level > 500:
                        print(f"🔊 [AUDIO INPUT] Sent significant audio chunk (level={audio_level}) - total sent: {audio_sent_count}")
                        
                except Exception as send_error:
                    print(f"❌ [AUDIO INPUT] Failed to send audio: {send_error}")
                    
            else:
                time.sleep(0.001)  # Small sleep to prevent busy waiting
                
    except Exception as e:
        print(f"❌ [AUDIO INPUT] Audio stream error: {e}")
        import traceback
        traceback.print_exc()
    finally:
        stream.stop()
        stream.close()
        print(f"🎤 [AUDIO INPUT] Stream closed. Total chunks processed: {chunk_count}, sent: {audio_sent_count}")

print("✅ Enhanced audio input function created")

✅ Enhanced audio input function created


In [23]:
# Step 3: Enhanced voice chat with comprehensive debugging
def run_voice_chat_full_debug():
    """
    Fully instrumented voice chat for debugging audio input issues.
    """
    global stop_event
    
    # Reset the stop event for a fresh start
    stop_event.clear()
    
    try:
        print("🚀 Starting FULL DEBUG Voice Live chat application...")
        
        # 1. Connect to the API
        connection = client._connection
        print("✅ Connected to Voice Live API")
        
        # 2. Send session configuration
        connection.send(json.dumps(session_config))
        print("✅ Session configuration sent")
        
        # 3. Wait and check for session.created
        print("⏳ Waiting for session.created event...")
        time.sleep(2)
        
        initial_messages = []
        for i in range(5):
            msg = connection.recv()
            if msg:
                try:
                    event = json.loads(msg)
                    initial_messages.append(event)
                    print(f"📨 Initial message {i+1}: {event.get('type', 'unknown')}")
                except:
                    print(f"📨 Raw initial message {i+1}: {msg[:100]}...")
            else:
                break
        
        print(f"📨 Received {len(initial_messages)} initial messages")
        
        # 4. Enhanced audio output function
        def receive_audio_and_playback_debug(connection):
            """Enhanced audio output with debugging."""
            print("🔊 [AUDIO OUTPUT] Starting enhanced audio playback thread...")
            last_audio_item_id = None
            audio_player = AudioPlayerAsync()
            message_count = 0
            
            try:
                while not stop_event.is_set():
                    raw_event = connection.recv()
                    if raw_event is None:
                        continue
                    
                    message_count += 1
                    try:
                        event = json.loads(raw_event)
                        event_type = event.get("type")
                        print(f"🔊 [AUDIO OUTPUT] Message {message_count}: {event_type}")

                        if event_type == "session.created":
                            session = event.get("session")
                            print(f"🔊 [AUDIO OUTPUT] Session created: {session.get('id')}")

                        elif event_type == "response.audio.delta":
                            if event.get("item_id") != last_audio_item_id:
                                last_audio_item_id = event.get("item_id")
                                print(f"🔊 [AUDIO OUTPUT] New audio item: {last_audio_item_id}")

                            bytes_data = base64.b64decode(event.get("delta", ""))
                            if bytes_data:
                                print(f"🔊 [AUDIO OUTPUT] Playing {len(bytes_data)} bytes of audio")   
                                audio_player.add_data(bytes_data)

                        elif event_type == "input_audio_buffer.speech_started":
                            print("🎤 [AUDIO OUTPUT] Speech started detected - stopping playback")
                            audio_player.stop()
                            
                        elif event_type == "input_audio_buffer.speech_stopped":
                            print("🎤 [AUDIO OUTPUT] Speech stopped detected")

                        elif event_type == "error":
                            error_details = event.get("error", {})
                            print(f"❌ [AUDIO OUTPUT] API Error: {error_details}")
                            
                        # Log any other interesting events
                        elif event_type in ["response.created", "response.done", "conversation.item.created"]:
                            print(f"📝 [AUDIO OUTPUT] Event: {event_type}")
                            
                    except json.JSONDecodeError as e:
                        print(f"❌ [AUDIO OUTPUT] JSON decode error: {e}")
                        continue

            except Exception as e:
                print(f"❌ [AUDIO OUTPUT] Error: {e}")
                import traceback
                traceback.print_exc()
            finally:
                audio_player.terminate()
                print(f"🔊 [AUDIO OUTPUT] Thread ended. Processed {message_count} messages")
        
        # 5. Start threads with enhanced functions
        print("🧵 Starting enhanced audio threads...")
        
        send_thread = threading.Thread(
            target=listen_and_send_audio_debug, 
            args=(connection,),
            name="AudioInputDebug"
        )
        receive_thread = threading.Thread(
            target=receive_audio_and_playback_debug, 
            args=(connection,),
            name="AudioOutputDebug"
        )

        send_thread.start()
        receive_thread.start()
        
        print("🎙️  FULL DEBUG voice chat is active!")
        print("💬 Speak into your microphone - you should see detailed logging")
        print("⏰ Running for 20 seconds with enhanced monitoring...")
        
        # Run for 20 seconds with detailed monitoring
        start_time = time.time()
        while time.time() - start_time < 20 and not stop_event.is_set():
            elapsed = int(time.time() - start_time)
            
            # Check thread health every 2 seconds
            if elapsed % 2 == 0:
                send_alive = send_thread.is_alive()
                receive_alive = receive_thread.is_alive()
                print(f"⏱️  [{elapsed}s] Threads - Input: {'✅' if send_alive else '❌'}, Output: {'✅' if receive_alive else '❌'}")
                
                if not send_alive or not receive_alive:
                    print("⚠️  Thread died - stopping...")
                    break
            
            time.sleep(1)
        
        # 6. Graceful shutdown
        print("🛑 Shutting down full debug session...")
        stop_event.set()
        
        send_thread.join(timeout=3)
        receive_thread.join(timeout=3)
        
        connection.close()
        print("✅ Full debug voice chat ended")
        
    except Exception as e:
        print(f"❌ Full debug error: {e}")
        import traceback
        traceback.print_exc()
        stop_event.set()
        if 'connection' in locals():
            connection.close()

print("✅ Full debug voice chat function ready!")
print("🔍 This version will show exactly what's happening with audio input/output")
print("🚀 Run: run_voice_chat_full_debug()")

✅ Full debug voice chat function ready!
🔍 This version will show exactly what's happening with audio input/output
🚀 Run: run_voice_chat_full_debug()


In [24]:
## 📋 Section 22: Environment and Permission Analysis

def analyze_environment():
    """
    Comprehensive environment analysis for audio troubleshooting.
    """
    print("🔍 ENVIRONMENT ANALYSIS")
    print("=" * 50)
    
    # 1. Python environment
    import sys
    import platform
    print(f"🐍 Python: {sys.version}")
    print(f"💻 Platform: {platform.platform()}")
    print(f"🏠 Running in: {'Jupyter' if 'jupyter' in sys.modules else 'Unknown'}")
    
    # 2. Check audio libraries
    print("\n🔊 AUDIO LIBRARIES")
    print("-" * 30)
    
    try:
        import sounddevice as sd
        print(f"✅ sounddevice: {sd.__version__}")
        print(f"📚 sounddevice backend: {sd.get_audio_backends()}")
    except Exception as e:
        print(f"❌ sounddevice error: {e}")
    
    try:
        import numpy as np
        print(f"✅ numpy: {np.__version__}")
    except Exception as e:
        print(f"❌ numpy error: {e}")
    
    # 3. Audio device information
    print("\n🎙️ AUDIO DEVICES")
    print("-" * 30)
    
    try:
        devices = sd.query_devices()
        print(f"📱 Total devices: {len(devices)}")
        
        default_input = sd.default.device[0] if sd.default.device[0] is not None else "None"
        default_output = sd.default.device[1] if sd.default.device[1] is not None else "None"
        
        print(f"🎤 Default input device: {default_input}")
        print(f"🔊 Default output device: {default_output}")
        
        # Show input devices specifically
        input_devices = [d for d in devices if d['max_input_channels'] > 0]
        print(f"\n🎤 Input devices ({len(input_devices)}):")
        for i, device in enumerate(input_devices):
            star = "⭐" if i == default_input else "  "
            print(f"{star} {i}: {device['name']} (channels: {device['max_input_channels']})")
            
    except Exception as e:
        print(f"❌ Audio device query error: {e}")
    
    # 4. Check for potential permission issues
    print("\n🔐 PERMISSION CHECK")
    print("-" * 30)
    
    try:
        # Try to open default input briefly
        with sd.InputStream(samplerate=24000, channels=1, dtype='int16') as stream:
            data, _ = stream.read(1)  # Read 1 frame
            print("✅ Microphone access test: SUCCESS")
            print(f"📊 Sample data shape: {data.shape}")
    except Exception as e:
        print(f"❌ Microphone access test: FAILED")
        print(f"   Error: {e}")
        print("   This might indicate permission issues!")
    
    # 5. Memory and threading info
    print("\n🧵 THREADING INFO")
    print("-" * 30)
    
    import threading
    active_threads = threading.active_count()
    print(f"🧵 Active threads: {active_threads}")
    
    for thread in threading.enumerate():
        print(f"   • {thread.name} ({'alive' if thread.is_alive() else 'dead'})")
    
    print("\n✅ Environment analysis complete!")

# Run the analysis
analyze_environment()

🔍 ENVIRONMENT ANALYSIS
🐍 Python: 3.11.13 | packaged by conda-forge | (main, Jun  4 2025, 14:52:34) [Clang 18.1.8 ]
💻 Platform: macOS-15.6-arm64-arm-64bit
🏠 Running in: Unknown

🔊 AUDIO LIBRARIES
------------------------------
✅ sounddevice: 0.5.2
❌ sounddevice error: module 'sounddevice' has no attribute 'get_audio_backends'
✅ numpy: 2.3.2

🎙️ AUDIO DEVICES
------------------------------
📱 Total devices: 10
🎤 Default input device: 1
🔊 Default output device: 2

🎤 Input devices (5):
   0: Jin’s AirPods Pro (channels: 1)
⭐ 1: HD Pro Webcam C920 (channels: 2)
   2: Jabra Link 380 (channels: 1)
   3: MacBook Pro Microphone (channels: 1)
   4: Microsoft Teams Audio (channels: 1)

🔐 PERMISSION CHECK
------------------------------
✅ Microphone access test: SUCCESS
📊 Sample data shape: (1, 1)

🧵 THREADING INFO
------------------------------
🧵 Active threads: 8
   • MainThread (alive)
   • IOPub (alive)
   • Heartbeat (alive)
   • Thread-1 (_watch_pipe_fd) (alive)
   • Thread-2 (_watch_pipe_fd) 