Skip to content

anil9973/maitri

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MAITRI

The AI That Exists in the Silence

India is sending humans to space for the first time. Every 90 minutes, for 45 minutes, Gaganyaan passes beyond every relay satellite. No communication. Complete silence. 400km above everything the astronaut has ever known.

An astronaut in psychological distress during a blackout window has no one.

MAITRI changes that.

Live Dashboard Backend Gemini Live


Why This Only Exists Because of Gemini Live

Traditional REST APIs add 3–4 seconds of round-trip latency per exchange. In a psychological support context, 3 seconds of silence is rejection. It is the pause that confirms the astronaut is alone.

Gemini Live's sub-300ms bidirectional audio-video stream is the only technology on Earth that eliminates that pause. Not reduces it — eliminates it. The conversation feels like a human is present because the latency is below human perception of delay.

No other API makes this possible. This is not a product choice. It is a physical constraint that Gemini Live is uniquely built to satisfy.

"When the blackout ends and the relay link reconnects, the first voice the astronaut hears is MAITRI's. Not 3 seconds later. Instantly."


What MAITRI Does

BEFORE MAITRI                          AFTER MAITRI
──────────────────────────────         ──────────────────────────────
Blackout begins.                       Blackout begins.
Astronaut is alone.                    MAITRI is already there.
Ground cannot help.                    Has always been there.
45 minutes of silence.                 Monitors voice tone in real time.
No one knows what happens.             Scores affect from facial expression.
Ground gets nothing.                   Detects distress before it becomes crisis.
                                       Ground gets a full report the moment
                                       the relay link reconnects.

MAITRI operates across three simultaneous intelligence layers, all powered by the Gemini ecosystem:

Layer Technology What It Does
Conversation Gemini Live API Real-time bidirectional voice — listens, responds, supports
Affect Scoring Gemini Flash on Vertex AI Scores arousal + valence from video frames every 5 seconds
Protocol Intelligence Google ADK Orchestrates the MAITRI psychological support protocol

Three Gemini products. One mission. Each doing what it does best.


Architecture

System Overview

graph TB
    subgraph Spacecraft["🚀 Spacecraft (400km above Earth)"]
        Android["📱 Android App<br/>Kotlin / Jetpack Compose<br/>CameraX + Microphone"]
    end

    subgraph LiveKit["☁️ LiveKit Cloud"]
        LK["WebRTC Transport<br/>Audio + Video Tracks<br/>DataChannel"]
    end

    subgraph GCP["☁️ Google Cloud Platform"]
        subgraph CloudRun["Cloud Run — maitri-backend"]
            FastAPI["FastAPI Server<br/>Single Process"]
            Worker["LiveKit Worker<br/>asyncio background task"]
            ADK["Google ADK<br/>Agent Orchestration"]
            SM["Protocol State Machine<br/>BASELINE → ANOMALY → ACTIVE"]
        end

        GeminiLive["🔮 Gemini Live API<br/>sub-300ms bidirectional<br/>audio-video stream"]
        GeminiFlash["⚡ Gemini Flash<br/>Vertex AI<br/>Affect Scoring"]
        Firestore["🔥 Firestore<br/>Protocol State<br/>Real-time Sync"]
        PubSub["📨 Cloud Pub/Sub<br/>Critical Alert Dispatch"]
        GCS["🪣 Cloud Storage<br/>Session Archives"]
        Monitoring["📊 Cloud Monitoring<br/>Affect Gauges"]
        BigQuery["📈 BigQuery<br/>Session Analytics"]
    end

    subgraph Ground["🌍 Ground Control"]
        Dashboard["Svelte 5 Dashboard<br/>Firebase Hosting<br/>SSE Real-time Stream"]
    end

    Android <-->|"WebRTC Audio+Video"| LK
    LK <-->|"LiveKit SDK"| Worker
    Worker <-->|"LiveRequestQueue"| ADK
    ADK <-->|"Gemini Live Protocol"| GeminiLive
    Worker -->|"Video frames 5s cadence"| GeminiFlash
    GeminiFlash -->|"AffectScore"| SM
    SM -->|"State transitions"| Firestore
    SM -->|"Critical alerts"| PubSub
    Worker -->|"Session JSON"| GCS
    Worker -->|"Arousal/Valence gauges"| Monitoring
    GCS -->|"export_to_bq.py"| BigQuery
    Firestore -->|"onSnapshot SSE"| Dashboard
    Worker -->|"DataChannel"| LK
    LK -->|"DataChannel"| Android
Loading

Gemini Live — The Real-Time Intelligence Core

sequenceDiagram
    participant A as 📱 Android<br/>(Astronaut)
    participant LK as LiveKit<br/>WebRTC
    participant W as Python Worker<br/>Cloud Run
    participant GL as Gemini Live API
    participant ADK as Google ADK
    participant SM as State Machine

    A->>LK: PCM Audio (48kHz)
    A->>LK: Video Frames (CameraX)
    LK->>W: Audio Track subscribed
    LK->>W: Video Track subscribed

    Note over W,GL: Bidirectional sub-300ms stream<br/>Only possible with Gemini Live

    W->>GL: Resample 48kHz→16kHz<br/>Forward via LiveRequestQueue
    GL-->>W: Audio response stream (24kHz)
    W->>LK: Publish to Android speaker

    loop Every 5 seconds
        W->>W: Extract RGBA frame
        W->>+ADK: Score affect via Gemini Flash
        ADK-->>-W: AffectScore {arousal, valence}
        W->>SM: evaluate_score_and_transition()
    end

    Note over SM: BASELINE_MONITORING<br/>↓ threshold breach<br/>ANOMALY_FLAGGED<br/>↓ threshold breach<br/>ACTIVE_INTERVENTION

    SM->>GL: Inject [SYSTEM_CONTEXT] hint<br/>into live session
    SM->>LK: DataChannel state_change
    SM->>A: Android overlay updates
Loading

Multimodal Affect Pipeline

flowchart LR
    subgraph Input["Simultaneous Multimodal Input"]
        V["🎤 Voice<br/>Tone, cadence,<br/>hesitation"]
        F["📷 Face<br/>Micro-expressions,<br/>skin tone change"]
    end

    subgraph Gemini["Gemini Ecosystem"]
        GL["Gemini Live<br/>Voice conversation<br/>Real-time response"]
        GF["Gemini Flash<br/>Vertex AI<br/>Affect scoring"]
    end

    subgraph Output["Protocol Response"]
        B["BASELINE<br/>🟢 Passive monitoring"]
        AN["ANOMALY FLAGGED<br/>🟡 Grounding mode<br/>30s cadence"]
        AC["ACTIVE INTERVENTION<br/>🔴 Ground alert<br/>Immediate dispatch"]
    end

    V --> GL
    V --> GF
    F --> GF
    GL -->|"Conversation context"| B
    GF -->|"arousal > 25% deviation"| AN
    GF -->|"arousal > 35% deviation"| AC

    style GL fill:#4285f4,color:#fff
    style GF fill:#34a853,color:#fff
    style AC fill:#ea4335,color:#fff
    style AN fill:#fbbc04,color:#000
    style B fill:#34a853,color:#fff
Loading

Automated Cloud Deployment

flowchart LR
    subgraph Dev["Developer"]
        Push["git push<br/>to main"]
    end

    subgraph CI["Automated Deployment"]
        CB["Cloud Build<br/>cloudbuild.yaml<br/>Triggered on push"]
        GH["GitHub Actions<br/>firebase-hosting-merge.yml"]
    end

    subgraph GCP2["Google Cloud"]
        Docker["Build Docker Image<br/>python:3.12-slim"]
        GCR["Container Registry<br/>gcr.io/project/maitri-backend"]
        CR["Cloud Run<br/>Deploy new revision<br/>min-instances=1<br/>Zero downtime"]
    end

    subgraph Firebase["Firebase"]
        Build["npm run build<br/>Svelte → static"]
        FH["Firebase Hosting<br/>Global CDN<br/>maitri-astronaut.web.app"]
    end

    Push -->|"backend/"| CB
    Push -->|"dashboard/"| GH
    CB --> Docker --> GCR --> CR
    GH --> Build --> FH

    style CB fill:#4285f4,color:#fff
    style CR fill:#4285f4,color:#fff
    style FH fill:#ff6d00,color:#fff
Loading

Eight Google Cloud Services. One Mission.

GCP Service Role in MAITRI Why It Matters
Gemini Live API Real-time voice conversation Sub-300ms — the only latency acceptable for psychological support
Vertex AI — Gemini Flash Affect scoring from video frames Detects emotional state changes invisible to voice alone
Google ADK Agent orchestration + tool calling Protocol intelligence that escalates from support to alert
Cloud Run Backend runtime Zero compute on spacecraft — full intelligence on Google Cloud
Firestore Protocol state + real-time sync onSnapshot drives the ground dashboard without polling
Cloud Pub/Sub Critical alert dispatch Ground control notified the instant intervention is triggered
Cloud Storage Session archives Full telemetry JSON written post-session for flight surgeons
Cloud Monitoring Affect gauges affect_arousal + affect_valence custom metrics every 5s

"Zero compute on the spacecraft. Full intelligence on Google Cloud. Infinitely upgradeable without a hardware mission."


The Responsible AI Architecture

MAITRI is designed around a fundamental constraint: AI augments the psychologist. It never replaces them.

flowchart TD
    D["Distress Detected<br/>affect_score > threshold"]
    M["MAITRI responds<br/>Grounding technique<br/>Active listening"]
    A["ACTIVE_INTERVENTION<br/>triggered"]
    P["Ground Psychologist<br/>receives alert via<br/>Pub/Sub + Dashboard"]
    H["Human makes every<br/>clinical decision"]

    D --> M --> A --> P --> H

    style H fill:#4285f4,color:#fff
    style A fill:#ea4335,color:#fff
Loading

The system instruction contains a hard-coded constraint: when ACTIVE_INTERVENTION is triggered, MAITRI delivers one grounding sentence, then falls silent and waits. Ground control takes over. MAITRI is the first-responder signal. The human psychologist makes every clinical decision.

This is not a limitation. This is intentional. This is how AI should work in high-stakes medical contexts.


Automated Deployment

Every commit to main ships to production automatically.

backend/cloudbuild.yaml          → Cloud Build triggers on push
backend/deploy.sh                → Manual deploy with single command
.github/workflows/
  firebase-hosting-merge.yml     → Dashboard deploys on merge to main
  firebase-hosting-pull-request.yml → Preview channel on every PR

No manual steps. No deployment risk on demo day. Ground dashboard and Cloud Run backend deploy independently, automatically, on every merge.

View cloudbuild.yaml — full IaC pipeline for Cloud Run deployment


Live System

Component URL Status
Ground Dashboard maitri-astronaut.web.app Live on Firebase Hosting
Backend API Cloud Run — maitri-backend Auto-deployed
Source Code github.com/anil9973/maitri Public

Reproducible Testing

Three testing paths — judges can verify MAITRI at any depth level.


Path 1 — Live Dashboard (30 seconds, no setup)

The ground control dashboard is live and connected to the production backend.

→ Open MAITRI Ground Control Dashboard

1. Open the dashboard
2. Use demo key when prompted: maitri-ground-2026
3. Dashboard shows live protocol state: BASELINE_MONITORING
4. SSE stream is active — affect chart updates in real time
   when an Android client is connected
5. Click "Reset Protocol" → enter PIN 2026 → confirm state resets

Path 2 — Backend API (5 minutes, no Android required)

All REST endpoints are live on Cloud Run. No authentication needed except ground dashboard routes (X-Demo-Key header).

BACKEND=https://maitri-196791383799.us-east4.run.app

# 1. Health check — confirms worker running + protocol state
curl $BACKEND/health

# 2. Generate LiveKit room token (unauthenticated — Android path)
curl -X POST $BACKEND/api/room-token \
  -H "Content-Type: application/json" \
  -d '{"identity": "judge-test-01"}'

# 3. Open SSE stream — observe real-time events
curl "$BACKEND/api/session-status?demo_key=maitri-ground-2026"
# Stream stays open — Ctrl+C to stop
# You will see: ": MAITRI SSE stream connected"
# Then heartbeat comments every 30 seconds

# 4. Reset protocol state (ground controller action)
curl -X POST $BACKEND/api/reset-session \
  -H "X-Demo-Key: maitri-ground-2026"
# Expected: {"status":"ok","previous_state":"BASELINE_MONITORING",...}

Path 3 — Full Stack Local (15 minutes, requires Android device)

Prerequisites

# Python 3.12+, uv package manager
pip install uv

# Google Cloud credentials
gcloud auth application-default login

# Environment file
cp backend/.env.example backend/.env
# Fill in: GOOGLE_API_KEY, LIVEKIT_URL, LIVEKIT_API_KEY,
#          LIVEKIT_API_SECRET, LIVEKIT_TOKEN

Backend

cd backend/
uv sync
uv run python scripts/seed_firestore.py   # seed initial Firestore state
uv run uvicorn api.main:app --host 0.0.0.0 --port 8080

# Confirm healthy:
curl localhost:8080/health
# → {"status":"ok","worker_state":"running","protocol_state":"BASELINE_MONITORING"}

Android App

1. Open project in Android Studio
2. In local.properties add:
   LIVEKIT_URL=ws://YOUR_LOCAL_IP:7880
   LIVEKIT_TOKEN=eyJ...  (generate with lk token create, see DEPLOY.md)
   BACKEND_URL=http://YOUR_LOCAL_IP:8080
3. Build and run on physical Android device (API 26+)
4. Grant microphone and camera permissions
5. Tap Connect — backend logs show "Participant joined"

Dashboard

cd dashboard/
echo "VITE_BACKEND_URL=http://localhost:8080" > .env.local
echo "VITE_DEMO_KEY=maitri-ground-2026" >> .env.local
npm install
npm run dev
# Open: http://localhost:5173

Trigger the happy path

1. Android connected → backend logs "Bridge active"
2. Speak into phone: "MAITRI, I'm feeling very anxious"
3. Watch dashboard: affect chart updates every 5 seconds
4. Express visible distress → state transitions ANOMALY_FLAGGED → ACTIVE_INTERVENTION
5. Dashboard shows critical alert popup
6. Reset: POST /api/reset-session or dashboard Reset button

Verify Automated Deployment

# Trigger Cloud Build manually (requires GCP access)
cd backend/
gcloud builds submit --config cloudbuild.yaml

# Or run deploy script directly
bash deploy.sh

# Firebase Hosting deploys automatically on push to main
# Verify: github.com/anil9973/maitri/actions

Expected Outputs at Each Step

Test Expected Output
GET /health {"status":"ok","worker_state":"running","protocol_state":"BASELINE_MONITORING"}
POST /api/room-token {"token":"eyJ...","roomName":"maitri-mission-01","livekitUrl":"wss://..."}
GET /api/session-status SSE stream opens, : MAITRI SSE stream connected
Android connects Backend log: Participant joined, Bridge active
Speak into mic Backend log: FIRST audio frame forwarded to ADK
Dashboard open Green BASELINE_MONITORING badge, affect chart visible
Reset endpoint {"status":"ok","previous_state":"...","new_state":"BASELINE_MONITORING"}

Built for the Gemini Live Agent Challenge by Anil Kumar

Powered entirely by Google Cloud. Not a prototype. A mission-critical system — ready for the people who will take India to the stars.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors