# Voice Agent Components

This notebook demonstrates the implementation of a sophisticated voice-enabled AI agent using LiveKit Agents. The agent combines state-of-the-art speech recognition, language understanding, and voice synthesis to create natural, interactive voice conversations.

## Setup and Configuration

Importing required LiveKit Agent modules and plugins for voice interaction capabilities. This includes:
- OpenAI for LLM and Speech-to-Text
- ElevenLabs for Text-to-Speech
- Silero for Voice Activity Detection

In [None]:
import logging

from dotenv import load_dotenv
_ = load_dotenv(override=True)

logger = logging.getLogger("dlai-agent")
logger.setLevel(logging.INFO)

from livekit import agents
from livekit.agents import Agent, AgentSession, JobContext, WorkerOptions, jupyter
from livekit.plugins import (
    openai,
    elevenlabs,
    silero,
)

## Step 2: Define Your Custom Agent

## Voice Assistant Implementation

The Assistant class implements a voice-enabled AI agent with the following components:

- **Speech-to-Text (STT)**: OpenAI Whisper for accurate transcription
- **Language Model (LLM)**: GPT-4 for natural language understanding
- **Text-to-Speech (TTS)**: ElevenLabs for high-quality voice synthesis
- **Voice Activity Detection**: Silero VAD for precise utterance detection

The agent can be customized with different voices and personalities through configuration.

In [None]:
class Assistant(Agent):
    def __init__(self, voice_id: str = None, instructions: str = None) -> None:
        """Initialize the voice assistant with configurable voice and instructions.
        
        Args:
            voice_id (str, optional): ElevenLabs voice ID for synthesis
            instructions (str, optional): Custom instructions for the agent
        """
        llm = openai.LLM(model="gpt-4o")
        stt = openai.STT(model="whisper-1")
        tts = elevenlabs.TTS(voice_id=voice_id) if voice_id else elevenlabs.TTS()
        silero_vad = silero.VAD.load()

        super().__init__(
            instructions=instructions or """
                You are a helpful and professional voice assistant.
                Communicate clearly and naturally, maintaining a friendly tone.
                Provide concise yet informative responses.
                When appropriate, ask clarifying questions.
            """,
            stt=stt,
            llm=llm,
            tts=tts,
            vad=silero_vad,
        )

## Step 3: Create the Entrypoint

In [None]:
async def entrypoint(ctx: JobContext):
    await ctx.connect()

    session = AgentSession()

    await session.start(
        room=ctx.room,
        agent=Assistant()
    )

## Usage Instructions

1. **Starting the Voice Interaction**:
   - Click the microphone icon on the left to enable voice input
   - The 'Start Audio' button can be ignored

2. **Language Detection**:
   - The agent automatically detects the language you speak
   - For optimal detection, begin with a complete sentence
   - Example: "Hello, how are you today?"

3. **Voice Customization**:
   - Different voice personalities are available through ElevenLabs
   - See the voice options section below for available voices

4. **Best Practices**:
   - Speak clearly and at a natural pace
   - Wait for the agent to complete its response before speaking
   - Keep background noise to a minimum

In [None]:
jupyter.run_app(
    WorkerOptions(entrypoint_fnc=entrypoint), 
    jupyter_url="https://jupyter-api-livekit.vercel.app/api/join-token"
)

## Step 5: Try new voices
Update step 2 with voice id's. For example:  
`tts = elevenlabs.TTS(voice_id="CwhRBWXzGAHq8TQ4Fs17") `

In [None]:
# Available ElevenLabs Voices
VOICE_OPTIONS = {
    "Roger": "CwhRBWXzGAHq8TQ4Fs17",  # Professional male voice
    "Sarah": "EXAVITQu4vr4xnSDxMaL",  # Natural female voice
    "Laura": "FGY2WhTYpPnrIDTdsKH5",  # Warm female voice
    "George": "JBFqnCBsd6RMkjVDRZzb"   # Authoritative male voice
}

## Experiment with ElevenLabs:
For more information about using Elevenlabs in your voice projects, look for more information at their [website](https://elevenlabs.io/conversational-ai).

## Additional Resources

### Voice AI Integration
- [ElevenLabs Documentation](https://elevenlabs.io/docs)
- [OpenAI Whisper Guide](https://platform.openai.com/docs/guides/speech-to-text)
- [Silero VAD Documentation](https://github.com/snakers4/silero-vad)

### Best Practices
- For optimal voice interaction, refer to the [ElevenLabs Best Practices](https://elevenlabs.io/docs/best-practices)
- For production deployment considerations, see [LiveKit Integration Guide](https://docs.livekit.io/)

### Performance Optimization
- For latency optimization techniques, refer to the `optimizing_latency.ipynb` notebook
