## Optimization Guidelines

### 1. LLM Optimization
- Use appropriate temperature settings for your use case
- Optimize max_tokens based on expected response length
- Consider using smaller models for faster responses

### 2. STT Optimization
- Enable streaming for faster initial responses
- Use appropriate sampling rates
- Implement proper error handling

### 3. TTS Optimization
- Balance stability and similarity boost
- Enable streaming for faster audio delivery
- Cache frequently used responses

### 4. General Tips
- Monitor and log performance metrics
- Implement proper error handling
- Use appropriate model sizes for your use case
- Consider latency vs quality tradeoffs

# Voice Agent Latency Optimization

This notebook demonstrates advanced techniques for optimizing the latency and performance of voice-enabled AI agents. It provides comprehensive metrics collection and analysis for each component of the voice interaction pipeline.

## Performance Monitoring Overview

The system tracks several key performance indicators:

1. **LLM Performance**
   - Token processing speed
   - Time to First Token (TTFT)
   - Total token usage

2. **Speech-to-Text Performance**
   - Processing duration
   - Audio duration analysis
   - Streaming efficiency

3. **Text-to-Speech Performance**
   - Time to First Byte (TTFB)
   - Audio generation speed
   - Stream processing metrics

4. **End-of-Utterance Detection**
   - Detection latency
   - Transcription delay
   - Overall response time

## Step 1: Import LiveKit Agent Modules and Plugins

In [None]:
import logging

from dotenv import load_dotenv
_ = load_dotenv(override=True)

logger = logging.getLogger("dlai-agent")
logger.setLevel(logging.INFO)

from livekit import agents
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, jupyter
from livekit.plugins import (
    openai,
    elevenlabs,
    silero,
)

from livekit.agents.metrics import LLMMetrics, STTMetrics, TTSMetrics, EOUMetrics
import asyncio

## System Configuration

# Importing required modules for metrics collection and performance monitoring. This includes:
# - LiveKit Agent core modules
# - Performance metrics collectors
# - Voice processing plugins

: 

In [None]:
class MetricsAgent(Agent):
    def __init__(self) -> None:
        llm = openai.LLM(model="gpt-4o")
        #llm = openai.LLM(model="gpt-4o-mini")   # Example with lower latency
        stt = openai.STT(model="whisper-1")
        tts = elevenlabs.TTS()
        silero_vad = silero.VAD.load()
        
        super().__init__(
            instructions="You are a helpful assistant communicating via voice",
            stt=stt,
            llm=llm,
            tts=tts,
            vad=silero_vad,
        )

        def llm_metrics_wrapper(metrics: LLMMetrics):
            asyncio.create_task(self.on_llm_metrics_collected(metrics))
        llm.on("metrics_collected", llm_metrics_wrapper)

        def stt_metrics_wrapper(metrics: STTMetrics):
            asyncio.create_task(self.on_stt_metrics_collected(metrics))
        stt.on("metrics_collected", stt_metrics_wrapper)

        def eou_metrics_wrapper(metrics: EOUMetrics):
            asyncio.create_task(self.on_eou_metrics_collected(metrics))
        stt.on("eou_metrics_collected", eou_metrics_wrapper)

        def tts_metrics_wrapper(metrics: TTSMetrics):
            asyncio.create_task(self.on_tts_metrics_collected(metrics))
        tts.on("metrics_collected", tts_metrics_wrapper)

    async def on_llm_metrics_collected(self, metrics: LLMMetrics) -> None:
        print("\n--- LLM Metrics ---")
        print(f"Prompt Tokens: {metrics.prompt_tokens}")
        print(f"Completion Tokens: {metrics.completion_tokens}")
        print(f"Tokens per second: {metrics.tokens_per_second:.4f}")
        print(f"TTFT: {metrics.ttft:.4f}s")
        print("------------------\n")

    async def on_stt_metrics_collected(self, metrics: STTMetrics) -> None:
        print("\n--- STT Metrics ---")
        print(f"Duration: {metrics.duration:.4f}s")
        print(f"Audio Duration: {metrics.audio_duration:.4f}s")
        print(f"Streamed: {'Yes' if metrics.streamed else 'No'}")
        print("------------------\n")

    async def on_eou_metrics_collected(self, metrics: EOUMetrics) -> None:
        print("\n--- End of Utterance Metrics ---")
        print(f"End of Utterance Delay: {metrics.end_of_utterance_delay:.4f}s")
        print(f"Transcription Delay: {metrics.transcription_delay:.4f}s")
        print("--------------------------------\n")

    async def on_tts_metrics_collected(self, metrics: TTSMetrics) -> None:
        print("\n--- TTS Metrics ---")
        print(f"TTFB: {metrics.ttfb:.4f}s")
        print(f"Duration: {metrics.duration:.4f}s")
        print(f"Audio Duration: {metrics.audio_duration:.4f}s")
        print(f"Streamed: {'Yes' if metrics.streamed else 'No'}")
        print("------------------\n")




In [None]:
async def entrypoint(ctx: JobContext):
    await ctx.connect()

    session = AgentSession()

    await session.start(
        agent=MetricsAgent(),
        room=ctx.room,
    )

class MetricsAgent(Agent):
    """Agent implementation with comprehensive performance monitoring.
    
    This agent extends the base voice agent with detailed metrics collection
    and analysis capabilities for all components of the voice interaction pipeline.
    """
    
    def __init__(self) -> None:
        # Initialize core components with performance monitoring
        llm = openai.LLM(
            model="gpt-4o",
            temperature=0.7,  # Adjust for balance of creativity vs determinism
            max_tokens=150    # Optimize for response length
        )
        stt = openai.STT(
            model="whisper-1",
            response_format="verbose_json"  # Enable detailed metrics
        )
        tts = elevenlabs.TTS(
            stability=0.5,      # Balance between speed and quality
            similarity_boost=0.5 # Optimize voice consistency
        )
        silero_vad = silero.VAD.load(
            threshold=0.5,  # Adjust for optimal voice detection
            sampling_rate=16000
        )
        
        super().__init__(
            instructions="You are a helpful assistant communicating via voice",
            stt=stt,
            llm=llm,
            tts=tts,
            vad=silero_vad,
        )

        # Set up metrics collectors
        self._setup_metrics_collectors(llm, stt, tts)

    def _setup_metrics_collectors(self, llm, stt, tts):
        """Configure metrics collection for all components."""
        def llm_metrics_wrapper(metrics: LLMMetrics):
            asyncio.create_task(self.on_llm_metrics_collected(metrics))
        llm.on("metrics_collected", llm_metrics_wrapper)

        def stt_metrics_wrapper(metrics: STTMetrics):
            asyncio.create_task(self.on_stt_metrics_collected(metrics))
        stt.on("metrics_collected", stt_metrics_wrapper)

        def eou_metrics_wrapper(metrics: EOUMetrics):
            asyncio.create_task(self.on_eou_metrics_collected(metrics))
        stt.on("eou_metrics_collected", eou_metrics_wrapper)

        def tts_metrics_wrapper(metrics: TTSMetrics):
            asyncio.create_task(self.on_tts_metrics_collected(metrics))
        tts.on("metrics_collected", tts_metrics_wrapper)

    async def on_llm_metrics_collected(self, metrics: LLMMetrics) -> None:
        """Process and display LLM performance metrics."""
        print("\n=== LLM Performance Metrics ===")
        print(f"Prompt Tokens: {metrics.prompt_tokens}")
        print(f"Completion Tokens: {metrics.completion_tokens}")
        print(f"Processing Speed: {metrics.tokens_per_second:.2f} tokens/sec")
        print(f"Time to First Token: {metrics.ttft:.3f}s")
        print("============================\n")

    async def on_stt_metrics_collected(self, metrics: STTMetrics) -> None:
        """Process and display Speech-to-Text metrics."""
        print("\n=== STT Performance Metrics ====")
        print(f"Processing Time: {metrics.duration:.3f}s")
        print(f"Audio Length: {metrics.audio_duration:.3f}s")
        print(f"Streaming: {'Enabled' if metrics.streamed else 'Disabled'}")
        print("==============================\n")

    async def on_eou_metrics_collected(self, metrics: EOUMetrics) -> None:
        """Process and display End-of-Utterance detection metrics."""
        print("\n=== Utterance Detection Metrics ====")
        print(f"Detection Delay: {metrics.end_of_utterance_delay:.3f}s")
        print(f"Transcription Delay: {metrics.transcription_delay:.3f}s")
        print("=================================\n")

    async def on_tts_metrics_collected(self, metrics: TTSMetrics) -> None:
        """Process and display Text-to-Speech metrics."""
        print("\n=== TTS Performance Metrics ====")
        print(f"Time to First Byte: {metrics.ttfb:.3f}s")
        print(f"Generation Time: {metrics.duration:.3f}s")
        print(f"Audio Duration: {metrics.audio_duration:.3f}s")
        print(f"Streaming: {'Enabled' if metrics.streamed else 'Disabled'}")
        print("==============================\n")


- To speak to the agent, unmute the microphone symbol on the left. You can ignore the 'Start Audio' button.
- The agent will try to detect the language you are speaking. To help it, start by speaking a long phrase like "hello, how are you today" in the language of your choice.

In [None]:
jupyter.run_app(WorkerOptions(entrypoint_fnc=entrypoint), jupyter_url="https://jupyter-api-livekit.vercel.app/api/join-token")