Real-time AI speaking coach. Rehearse talks, interviews, demos, or pitches while PitchMirror listens, watches, and interrupts with measured feedback.
Category: Live Agents — Gemini Live Agent Challenge
Detailed view: docs/architecture_detailed.png
PitchMirror uses the Gemini Live API to simultaneously watch your webcam, listen to your microphone, and optionally analyze your shared screen while you speak. You can also upload a slide deck PDF (from PowerPoint/Canva/Keynote export) so the coach can critique actual slide visuals in real time. Before each run it captures a short intake:
- coach persona (
Professional Coach,Brutal VC,Supportive Mentor) - what you are preparing for
- delivery context (
virtual,in_person,hybrid) - primary goal (
balanced, fillers, pacing, confidence, structure)
When it detects:
| Problem | Trigger | Example coaching response |
|---|---|---|
| Filler words | 3+ "um/uh/like" in 30s | "Three 'ums' in ten seconds. Pause instead." |
| Pace | wpm_20s > 180 (rolling 20s window) |
"Slow down — you're rushing." |
| Eye contact | Prolonged disengaged gaze (context-aware threshold) | "Reconnect with the audience/camera." |
| Contradiction | Statement contradicts prior claim | "That contradicts your 'first to market' claim." |
| Clarity | Incomprehensible sentence | "That sentence lost everyone. Say it in one clause." |
| Slide clarity (optional screen share) | clutter/unreadable/weak hierarchy | "Slide is text-dense. Cut to three bullets." |
| Slide-speech mismatch (optional) | narration and slide conflict | "Narration and slide are misaligned." |
During the live session, the coach can trigger real UI actions (navigate_practice_slides, jump_to_slide, mark_slide_issue) and generate an in-session visual hint (generate_live_visual_hint).
After your session: a scorecard with per-category scores, timeline events, AI report, citations, and up to 2 generated visual coaching cards.
Eye-contact handling is context-aware:
virtual: camera-facing is prioritized.in_person: natural audience scanning is not penalized.hybrid: balanced room scanning plus periodic camera reconnection.
One low-latency bidirectional stream, one orchestrating agent — intentionally lean for sub-second response latency. Deep specialization moves to the post-session pipeline.
Browser (HTML/JS)
├── getUserMedia() → webcam + mic
├── getDisplayMedia() (optional) → screen share
├── PCM 16kHz audio → WebSocket → Cloud Run
├── Webcam JPEG @1fps → WebSocket → Cloud Run
├── Screen JPEG @0.5fps → WebSocket → Cloud Run
└── Coach audio 24kHz ← WebSocket ← Cloud Run
│
┌───────▼──────────────────────────────────────┐
│ Cloud Run (FastAPI + Uvicorn) │
│ │
│ LiveCoachAgent (gemini-2.5-flash-native- │
│ -audio-preview-12-2025) │
│ └── 10 ADK tools (closures over SessionState):
│ flag_issue · check_eye_contact │
│ get_speech_metrics · draw_overlay │
│ navigate_practice_slides │
│ jump_to_slide · mark_slide_issue │
│ generate_live_visual_hint (Imagen) │
│ get_recent_transcript · adjust_focus │
└───────┬──────────────────────────────────────┘
│
┌───────▼────────────┐ ┌──────────────────────┐
│ Gemini Live API │ │ Firestore │
│ (native audio) │ │ Session scorecards │
└────────────────────┘ │ User skill profiles │
└──────────────────────┘
ADK supports multi-agent
run_live()topologies viasub_agentsand live transfers. PitchMirror keeps the live phase as a single orchestrating agent by design — the tool pipeline provides deterministic measurement grounding and Imagen image generation without the latency cost of mid-session agent handoffs.
_run_post_session()
│
├─ PARALLEL_ANALYSTS (ADK ParallelAgent)
│ ├── DeliveryAnalyst (gemini-2.5-flash) → delivery_analysis
│ ├── ContentAnalyst (gemini-2.5-flash) → content_analysis
│ └── ResearchAgent (google_search) → research_tips
│
├─ SynthesisAgent (gemini-2.5-flash)
│ └── Combines all three → final report + SCORE_* lines + IMAGE_PROMPTS
│
└─ SessionSummaryAgent (gemini-2.5-flash)
└── Writes structured JSON skill profile → Firestore
(cross-session learning loop — feeds next session's context)
POST_SESSION_PIPELINE = SequentialAgent(
PARALLEL_ANALYSTS → SynthesisAgent → SessionSummaryAgent
)
Google Cloud services used:
- Cloud Run (backend hosting)
- Firestore (session persistence)
- Secret Manager (API key management)
- Artifact Registry (container images)
- Python 3.12+
- A Gemini API key with access to
gemini-2.5-flash-native-audio-preview-12-2025 - Docker (for deployment)
gcloudCLI +terraform(for cloud deployment)
# 1. Clone and enter the repo
git clone https://github.com/SmartGridsML/GeminiLiveAgentHack.git
cd GeminiLiveAgentHack
# 2. Create virtual environment and install deps
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# 3. Set your API key
cp .env.example .env
# Edit .env and set GOOGLE_API_KEY=your_key_here
# 4. Run the smoke test (validates Live API access before building)
python scripts/smoke_test.py
# 5. Run tests
python -m pytest tests/ -v
# 6. Start the server
uvicorn backend.main:app --host 0.0.0.0 --port 8080 --reload
# Open: http://localhost:8080Use this exact sequence to validate the project end-to-end in ~10 minutes.
# 1) Health check (must return {"status":"ok"...})
curl -s http://localhost:8080/health
# 2) Gemini connectivity smoke test (must print success, no traceback)
python scripts/smoke_test.py
# 3) Unit tests
python -m pytest tests/ -vExpected result:
/healthreturns JSON with"status": "ok".smoke_test.pycompletes without errors.pytestpasses (current baseline: 25 tests).
- Open
http://localhost:8080/app. - Session intake:
- Mode:
presentation - Delivery context:
virtual - Toggle
Screen-aware coachingON (optional) - Toggle
Deterministic demo modeON
- Mode:
- Click
Start Session. - Speak this planted test script:
So basically um, today I want to share our onboarding approach.
[LOOK DOWN at notes for 6+ seconds]
Our process is kind of sort of basically simple and literally works for everyone.
[SPEAK VERY FAST for ~10 seconds]
Expected result:
- Live interruptions appear in transcript and audio.
- ADK tool-call lines appear (for metrics and flags).
- Evidence chips show threshold/metric context on
flag_issue.
- Click
End Session. - Wait for pipeline steps to complete.
Expected result:
- Scorecard appears with category scores.
AI Coaching Reportsection renders.Evidence-Based Techniquessection renders.Multimodal Visualssection renders up to 2 generated cards (or fallback cards if image generation is unavailable).
# Recent sessions summary
curl -s http://localhost:8080/api/sessions?limit=3
# Session aggregates
curl -s http://localhost:8080/api/sessions/stats?limit=10If API_BEARER_TOKEN is set, include auth headers:
curl -s -H "Authorization: Bearer $API_BEARER_TOKEN" http://localhost:8080/api/sessions?limit=3Expected result:
- Both endpoints return valid JSON.
- Latest session appears in
/api/sessions.
# Set your project
export PROJECT_ID=your-project-id
export REGION=us-central1
export REPO=pitchmirror
export GOOGLE_API_KEY=your_gemini_api_key
export API_BEARER_TOKEN=$(openssl rand -hex 24)
# Enable APIs (build + deploy + data + secrets)
gcloud services enable run.googleapis.com firestore.googleapis.com \
secretmanager.googleapis.com artifactregistry.googleapis.com cloudbuild.googleapis.com \
--project=$PROJECT_ID
# Create Artifact Registry repo (one-time)
gcloud artifacts repositories create $REPO \
--repository-format=docker --location=$REGION --project=$PROJECT_ID
# Build + push current commit image with immutable digest
export SHA=$(git rev-parse --short HEAD)
export IMAGE_TAG=$REGION-docker.pkg.dev/$PROJECT_ID/$REPO/app:$SHA
gcloud builds submit . --tag $IMAGE_TAG --project=$PROJECT_ID
export DIGEST=$(gcloud artifacts docker images describe $IMAGE_TAG --project=$PROJECT_ID --format='value(image_summary.digest)')
export IMAGE_IMMUTABLE=$REGION-docker.pkg.dev/$PROJECT_ID/$REPO/app@$DIGEST
# Store API key in Secret Manager (create secret once, then add a version)
if ! gcloud secrets describe pitchmirror-gemini-api-key --project=$PROJECT_ID >/dev/null 2>&1; then
gcloud secrets create pitchmirror-gemini-api-key \
--replication-policy=automatic --project=$PROJECT_ID
fi
printf '%s' "$GOOGLE_API_KEY" | gcloud secrets versions add pitchmirror-gemini-api-key \
--data-file=- --project=$PROJECT_ID
# Deploy to Cloud Run
gcloud run deploy pitchmirror \
--image=$IMAGE_IMMUTABLE \
--region=$REGION \
--set-env-vars=GOOGLE_CLOUD_PROJECT=$PROJECT_ID,CORS_ALLOWED_ORIGINS=https://your-frontend-domain,API_BEARER_TOKEN=$API_BEARER_TOKEN,ENABLE_SCREEN_SHARE=true,ENABLE_IMAGE_GENERATION=true,DEMO_MODE_DEFAULT=false \
--set-secrets=GOOGLE_API_KEY=pitchmirror-gemini-api-key:latest \
--session-affinity \
--min-instances=1 \
--timeout=900 \
--allow-unauthenticated \
--project=$PROJECT_ID# ADC login for Terraform/provider auth
gcloud auth application-default login
# Optional explicit path (use $HOME, not "~")
export GOOGLE_APPLICATION_CREDENTIALS="$HOME/.config/gcloud/application_default_credentials.json"
# Avoid leaking values into shell history by using TF_VAR_*
export TF_VAR_project_id=$PROJECT_ID
# Prefer immutable digest from Option A build step (IMAGE_IMMUTABLE)
export TF_VAR_container_image=$IMAGE_IMMUTABLE
export TF_VAR_gemini_secret_id=pitchmirror-gemini-api-key
export TF_VAR_firestore_collection=pitchmirror_sessions
export TF_VAR_allowed_origins=https://your-frontend-domain
export TF_VAR_api_bearer_token=$(openssl rand -hex 24)
export TF_VAR_allow_unauthenticated=false
export TF_VAR_enable_screen_share=true
export TF_VAR_enable_image_generation=true
export TF_VAR_demo_mode_default=false
export TF_VAR_image_generation_timeout_s=24
export TF_VAR_image_generation_retries=1
export TF_VAR_image_model=imagen-4.0-fast-generate-001
# For public demo links, set true intentionally and keep API_BEARER_TOKEN enabled.
cd infra
terraform init
terraform apply
# Firestore composite index creation can take a few minutes on first deploy.
# Check output: firestore_user_history_indexLive deployed backend URL (include in project description):
https://pitchmirror-101569338664.us-central1.run.app
MIT
