# Voice AI Agents: Conversational AI Framework for the Enterprise
In this notebook, we walk through how to craft and deploy a voice AI bot using Pipecat AI. We illustrate the basic Pipecat flow with the `nvidia/llama-3.1-nemotron-70b-instruct` LLM model and Riva for STT (Speech-To-Text) & TTS (Text-To-Speech). However, Pipecat is not opinionated and other models and STT/TTS services can easily be used. See [Pipecat documentation](https://docs.pipecat.ai/server/services/supported-services#supported-services) for other supported services.

Pipecat AI is an open-source framework for building voice and multimodal conversational agents. Pipecat simplifies the complex voice-to-voice AI pipeline, and lets developers build AI capabilities easily and with Open Source, commercial, and custom models. See [Pipecat's Core Concepts](https://docs.pipecat.ai/getting-started/core-concepts) for a deep dive into how it works.

The framework was developed by Daily, a company that has provided real-time video and audio communication infrastructure since 2016. It is fully vendor neutral and is not tightly coupled to Daily's infrastructure.

> ## 🤖🎧 Use headphones for this demo! 🎧🤖

## Step 1 - Install dependencies
First we set our environment.

We use Daily for transport, OpenAI for context aggregation, Riva for TTS & TTS, and Silero for VAD (Voice Activity Detection). If using different services, for example Cartesia for TTS, one would run `pip install pipecat-ai[cartesia]`.

> [Development note]: We're installing from the github main branch here to ensure we have the latest improvements. By the time we address feedback we'll have a new release of Pipecat and just install the pipecat parts we are using.

In [None]:
!pip install python-dotenv
%load_ext dotenv
%dotenv

!pip install "git+https://github.com/pipecat-ai/pipecat.git@main"
# !pip install "pipecat-ai[daily,local,openai,riva,silero]"

## Step 2 - Configure local audio transport for WebRTC communication
- Enable audio input and output for text-to-speech playback and enable VAD

In [None]:
from pipecat.audio.vad.silero import SileroVADAnalyzer
from pipecat.transports.base_transport import TransportParams
from pipecat.transports.local.audio import LocalAudioTransport

transport = LocalAudioTransport(
    TransportParams(
        audio_out_enabled=True,
        audio_in_enabled=True,
        vad_enabled=True,
        vad_analyzer=SileroVADAnalyzer(),
        vad_audio_passthrough=True,
        audio_out_is_live=True,
        )
    )

## Step 3 - Initialize LLM, STT, and TTS services
We can customize options, for example a different LLM `model` or `voice_id` for FastPitch TTS.

In [None]:
import os
from pipecat.services.nim import NimLLMService
from pipecat.services.riva import FastPitchTTSService, ParakeetSTTService

stt = ParakeetSTTService(api_key=os.getenv("NVIDIA_API_KEY"))

llm = NimLLMService(
    api_key=os.getenv("NVIDIA_API_KEY"), model="meta/llama-3.1-70b-instruct"
)

tts = FastPitchTTSService(api_key=os.getenv("NVIDIA_API_KEY"))

## Step 4 - Define prompt and initialize context aggregator
Edit the prompt as desired.

In [None]:
from pipecat.processors.aggregators.openai_llm_context import OpenAILLMContext

messages = [
    {
        "role": "system",
        "content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be converted to audio so don't include special characters in your answers. Respond to what the user said in a creative and helpful way that makes a cat pun if it is possible.",
    },
]

context = OpenAILLMContext(messages)
context_aggregator = llm.create_context_aggregator(context)

## Step 5 - Create pipeline
Here we align the services into a pipeline to process speech into text, send to llm, then turn the llm response text into speech.

In [None]:
from pipecat.pipeline.pipeline import Pipeline

pipeline = Pipeline(
    [
        transport.input(),  # Transport user input
        stt,  # STT
        context_aggregator.user(),  # User responses
        llm,  # LLM
        tts,  # TTS
        transport.output(),  # Transport bot output
        context_aggregator.assistant(),  # Assistant spoken responses
    ]
)

## Step 6 - Create PipelineTask

In [None]:
from pipecat.pipeline.task import PipelineParams, PipelineTask

task = PipelineTask(pipeline, PipelineParams(allow_interruptions=True))

## Step 7 - Create a pipeline runner
This manages the processing pipeline.

In [None]:
from pipecat.pipeline.runner import PipelineRunner

runner = PipelineRunner()

## Step 8 - Run the bot and say "hello"!

The first time you run the bot, it will load weights for a voice activity model into the local Python process. This will take 10-15 seconds.  
The bot will wait for you to speak first.  

> ### 🎧 Remember to use headphones!

In [None]:
await runner.run(task)

## Step 9: Stop the bot

In [None]:
await runner.cancel()