Crashout Buddy

A real-time, emotionally aware voice agent demo built on Vision Agents and Stream Video. The agent watches your face on the video track, derives an emotion/gaze/engagement state with MediaPipe, and steers Inworld TTS v2 delivery to match — whispering when you look sad, getting animated when you're engaged.

The backend is a Python Vision Agents service (Inworld TTS + Gemini + Deepgram + MediaPipe + Anam avatar). The frontend is a Next.js call experience that joins the Stream call, renders the avatar, and shows captions and metrics.

Quick Links

Stream Video — Register — API key + secret for the call edge
Inworld AI — TTS v2 API key (basic-auth base64 token from the console)
Deepgram — STT API key
Google AI Studio — Gemini API key
Anam — avatar API key and avatar ID
Vision Agents — the underlying agent framework
backend/README.md — deep dive on the face processor and TTS steering

Architecture

Browser (Next.js)  ──►  Stream Edge  ◄──  Backend (Vision Agents, Python)
       ▲                                         │
       │                                         ├── Deepgram (STT)
       │                                         ├── Gemini (LLM)
       │                                         ├── Inworld (TTS v2)
       │                                         ├── MediaPipe (face state)
       └──────────  Anam (avatar video)  ◄───────┘

The frontend hits the backend's HTTP API to create and close agent sessions, then joins the same Stream call as the agent. The backend runs the STT → LLM → TTS pipeline and publishes the agent's audio + Anam avatar video into the call. A MediaPipeFaceProcessor consumes the user's video track at 8 fps and emits smoothed emotion/gaze/engagement state that gets prepended to each LLM turn so the model can pick appropriate Inworld steering tags.

Repository Layout

backend/   Python Vision Agents service
frontend/  Next.js demo app

Getting Started

Backend

cd backend
cp .env.example .env
uv sync
uv run python scripts/download_face_model.py
uv run python main.py serve --host 127.0.0.1 --port 8000

Frontend

In another terminal:

cd frontend
cp .env.example .env.local
npm install
npm run dev

Fill both env files with the keys from the providers listed above (Stream credentials must match across the two files). Open http://localhost:3000. If you run the backend on a different host or port, set NEXT_PUBLIC_BASE_URL in frontend/.env.local.

For a backend-only smoke test that opens a Stream demo room directly:

cd backend
uv run python main.py run

Required Accounts

You need an account and API key from every provider below before the demo will run end-to-end.

Provider	Used for	Sign up	Env vars
Stream Video	Call edge (WebRTC), session tokens	getstream.io/video	`STREAM_API_KEY`, `STREAM_API_SECRET`, `NEXT_PUBLIC_STREAM_API_KEY`
Inworld AI	TTS v2 with inline steering tags	inworld.ai	`INWORLD_API_KEY`
Deepgram	Speech-to-text	console.deepgram.com	`DEEPGRAM_API_KEY`
Google AI Studio	Gemini LLM	aistudio.google.com	`GOOGLE_API_KEY`
Anam	Lip-synced avatar video	anam.ai	`ANAM_API_KEY`, `ANAM_AVATAR_ID`

The frontend additionally needs a user JWT (NEXT_PUBLIC_STREAM_TOKEN) and user ID (NEXT_PUBLIC_STREAM_USER_ID). See frontend/.env.example for the full list.

Free for Makers

Stream is free for most side and hobby projects. To qualify, your project/company needs to have < 5 team members and < $10k in monthly revenue. For complete pricing details, visit the Video Pricing Page.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crashout Buddy

Quick Links

Architecture

Repository Layout

Getting Started

Backend

Frontend

Required Accounts

Free for Makers

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crashout Buddy

Quick Links

Architecture

Repository Layout

Getting Started

Backend

Frontend

Required Accounts

Free for Makers

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages