# Digital Human Pipelines with Pipecat and ACE Controller

This course is built around NVIDIA’s open-source Pipecat framework, called nvidia-pipecat, and the ACE Controller microservice. This course provides an end-to-end guide to building intelligent, interactive digital humans pipelines.

### Learning Objectives
- Understand the core concepts behind the nvidia-pipecat framework and ACE Controller microservice.
- Explore the applications it supports (voice assistants, avatars, and agents).

---

## 1: Introduction to Pipecat Framework
### Overview
[Pipecat](https://github.com/pipecat-ai/pipecat) is an open source Python framework for building real-time, multimodal AI applications. It streamlines the development of pipelines that orchestrate complex interactions across AI services, network transport, audio processing, and multimodal user interfaces.

### Core Terminology
| Term               | Definition                                                                 |
|--------------------|---------------------------------------------------------------------------|
| `Frame`            | Discrete unit of data (text/audio/image) or control signals.              |
| `FrameProcessor`   | Operates on frames (e.g., STT, TTS, LLM, filters).                         |
| `Pipeline`         | Sequence of linked FrameProcessors.                                       |
| `Transport`        | Input/output interface for audio and text streams.                        |
| `AI Services`      | External service processors (STT, TTS, LLM, etc.).                        |


The [nvidia-pipecat](https://github.com/NVIDIA/ace-controller/tree/main) library builds on the Pipecat framework by adding a suite of NVIDIA-developed frame processors, multimodal data types, and services tailored for creating intelligent avatar-based interactions. This includes integration with powerful NVIDIA technologies such as Riva (for ASR and TTS), Audio2Face (for real-time facial animation), and Foundational RAG (for retrieval-augmented generation).

In addition to connecting these services, nvidia-pipecat enhances the end-user experience through new processors that support speculative speech—enabling conversational agents to respond more quickly by processing stable interim speech results.

While pipecat and nvidia-pipecat give you the building blocks for creating multimodal AI agents, ACE Controller is the orchestration layer that makes these pipelines production-ready.

[ACE Controller](https://docs.nvidia.com/ace/ace-controller-microservice/1.0/index.html#ace-controller-microservice) wraps the Pipecat ecosystem in a scalable FastAPI microservice. It lets developers deploy voice-enabled digital humans (and other agents) that can handle multiple users, support RTSP audio/video input, and connect to NVIDIA ACE microservices such as:
	•	Riva (Speech Recognition and Synthesis)
	•	Audio2Face (Real-time facial animation)
	•	Animation Graph, Video Storage Toolkit (VST), and more
    
---
todo
### Example Pipeline Layout

```python
pipeline = Pipeline([
    transport.input(),
    stt,
    llm,
    tts,
    transport.output()
])