Skip to content

GetStream/crashout-buddy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crashout Buddy

A real-time, emotionally aware voice agent demo built on Vision Agents and Stream Video. The agent watches your face on the video track, derives an emotion/gaze/engagement state with MediaPipe, and steers Inworld TTS v2 delivery to match — whispering when you look sad, getting animated when you're engaged.

The backend is a Python Vision Agents service (Inworld TTS + Gemini + Deepgram + MediaPipe + Anam avatar). The frontend is a Next.js call experience that joins the Stream call, renders the avatar, and shows captions and metrics.

Quick Links

Architecture

Browser (Next.js)  ──►  Stream Edge  ◄──  Backend (Vision Agents, Python)
       ▲                                         │
       │                                         ├── Deepgram (STT)
       │                                         ├── Gemini (LLM)
       │                                         ├── Inworld (TTS v2)
       │                                         ├── MediaPipe (face state)
       └──────────  Anam (avatar video)  ◄───────┘

The frontend hits the backend's HTTP API to create and close agent sessions, then joins the same Stream call as the agent. The backend runs the STT → LLM → TTS pipeline and publishes the agent's audio + Anam avatar video into the call. A MediaPipeFaceProcessor consumes the user's video track at 8 fps and emits smoothed emotion/gaze/engagement state that gets prepended to each LLM turn so the model can pick appropriate Inworld steering tags.

Repository Layout

backend/   Python Vision Agents service
frontend/  Next.js demo app

Getting Started

Backend

cd backend
cp .env.example .env
uv sync
uv run python scripts/download_face_model.py
uv run python main.py serve --host 127.0.0.1 --port 8000

Frontend

In another terminal:

cd frontend
cp .env.example .env.local
npm install
npm run dev

Fill both env files with the keys from the providers listed above (Stream credentials must match across the two files). Open http://localhost:3000. If you run the backend on a different host or port, set NEXT_PUBLIC_BASE_URL in frontend/.env.local.

For a backend-only smoke test that opens a Stream demo room directly:

cd backend
uv run python main.py run

Required Accounts

You need an account and API key from every provider below before the demo will run end-to-end.

Provider Used for Sign up Env vars
Stream Video Call edge (WebRTC), session tokens getstream.io/video STREAM_API_KEY, STREAM_API_SECRET, NEXT_PUBLIC_STREAM_API_KEY
Inworld AI TTS v2 with inline steering tags inworld.ai INWORLD_API_KEY
Deepgram Speech-to-text console.deepgram.com DEEPGRAM_API_KEY
Google AI Studio Gemini LLM aistudio.google.com GOOGLE_API_KEY
Anam Lip-synced avatar video anam.ai ANAM_API_KEY, ANAM_AVATAR_ID

The frontend additionally needs a user JWT (NEXT_PUBLIC_STREAM_TOKEN) and user ID (NEXT_PUBLIC_STREAM_USER_ID). See frontend/.env.example for the full list.

Free for Makers

Stream is free for most side and hobby projects. To qualify, your project/company needs to have < 5 team members and < $10k in monthly revenue. For complete pricing details, visit the Video Pricing Page.

License

MIT — see LICENSE.

About

Your friendly neighbourhood agent, always ready to chat regardless of how the day is going. Built by the Vision Agents team in collaboration with Inworld and Anam AI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors