ARCHITECT — Live Spatial Intelligence

Point your camera at a room. Tell ARCHITECT how you want it to feel. Watch it redesign the space and find the furniture — in real-time.

Built for Google ADK + Gemini hackathon. Uses Gemini Live API for voice+vision, Imagen 3 for photorealistic redesigns, and ADK multi-agent architecture for room analysis and furniture shopping.

Architecture

                    ┌─────────────────────────────────────────┐
                    │           React Frontend                 │
                    │  Camera → WebSocket → Audio Playback     │
                    │  Transcript | Designs | Shopping tabs    │
                    └──────────────┬──────────────────────────┘
                                   │ WebSocket (PCM audio + JPEG frames + JSON)
                    ┌──────────────▼──────────────────────────┐
                    │         FastAPI Backend (Cloud Run)      │
                    │                                          │
                    │  ┌────────────────────────────────────┐  │
                    │  │         ADK Root Agent             │  │
                    │  │   Gemini 2.0 Flash Live API        │  │
                    │  │                                    │  │
                    │  │  Tools:                            │  │
                    │  │  • analyze_room      → Firestore   │  │
                    │  │  • generate_redesign → Imagen 3    │  │
                    │  │                        → GCS       │  │
                    │  │  • generate_color_palette          │  │
                    │  │  • search_furniture  → Gemini Flash│  │
                    │  │  • build_shopping_list             │  │
                    │  └────────────────────────────────────┘  │
                    └─────────────────────────────────────────┘
                              │              │
                    ┌─────────▼──┐    ┌──────▼──────┐
                    │ Firestore  │    │  Cloud      │
                    │ (sessions) │    │  Storage    │
                    └────────────┘    │  (images)   │
                                      └─────────────┘

How It Works

Connect — Click "Start Session." Camera and mic activate.
Scan — ARCHITECT sees your room through the camera and describes what it observes. It calls analyze_room to save the spatial data.
Design — Say a style: "Make it mid-century modern" or "Go Scandinavian." ARCHITECT calls generate_redesign (Imagen 3) and streams the photorealistic result to the Designs tab.
Shop — ARCHITECT calls search_furniture to find matching pieces with prices from real retailers. A complete shopping list appears in the Shopping tab.

Tech Stack

Layer	Technology
LLM Orchestration	Google ADK + Gemini 2.0 Flash Live API
Image Generation	Imagen 3 (`gemini-2.0-flash-exp-image-generation`)
Sub-agent LLM	Gemini 2.0 Flash
Backend	Python 3.12, FastAPI, uvicorn
Frontend	React 18, Vite, TailwindCSS, TypeScript
Audio	AudioWorklet API (PCM 16kHz capture → 24kHz playback)
Storage	Google Cloud Firestore + Cloud Storage
Deployment	Cloud Run + Cloud Build

Local Development

Prerequisites

Python 3.12+
Node.js 20+
Google API key with Gemini access

Backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# Edit .env: add GOOGLE_API_KEY

python -m app.main
# → http://localhost:8080

Frontend

cd frontend
npm install
npm run dev
# → http://localhost:5173

Deploy to GCP

# One-time setup (creates buckets, Firestore, secrets)
./deploy/setup.sh

# Deploy
gcloud builds submit --config=deploy/cloudbuild.yaml .

Project Structure

architect/
├── backend/
│   ├── app/
│   │   ├── main.py          # FastAPI + WebSocket session
│   │   ├── config.py        # Env vars + model names
│   │   ├── models.py        # Pydantic data models
│   │   ├── agents/
│   │   │   └── root_agent.py    # ADK LlmAgent
│   │   ├── prompts/
│   │   │   └── root_system.py   # System prompt
│   │   ├── tools/
│   │   │   ├── spatial.py       # analyze_room
│   │   │   ├── design.py        # generate_redesign, color_palette
│   │   │   └── shopping.py      # search_furniture, shopping_list
│   │   └── services/
│   │       ├── firestore.py     # Session persistence
│   │       └── cloud_storage.py # Image uploads
│   ├── requirements.txt
│   └── Dockerfile
├── frontend/
│   ├── src/
│   │   ├── App.tsx              # Main app
│   │   ├── hooks/
│   │   │   ├── useWebSocket.ts
│   │   │   ├── useAudioCapture.ts
│   │   │   ├── useAudioPlayback.ts
│   │   │   └── useCameraCapture.ts
│   │   ├── components/
│   │   │   ├── CameraPreview.tsx
│   │   │   ├── TranscriptPanel.tsx
│   │   │   ├── DesignGallery.tsx
│   │   │   ├── ShoppingPanel.tsx
│   │   │   ├── StatusBar.tsx
│   │   │   └── RoomAnalysisCard.tsx
│   │   └── types/
│   │       └── events.ts        # TypeScript event types
│   └── public/
│       ├── pcm-capture-processor.js
│       └── pcm-playback-processor.js
└── deploy/
    ├── cloudbuild.yaml
    └── setup.sh

WebSocket Protocol

Client → Server:

Binary: [JSON header]\x00[PCM audio bytes] — 16kHz PCM audio chunks
JSON { type: "video_frame", data: "<base64 JPEG>" } — camera frames at 1fps
JSON { type: "text", text: "..." } — typed messages

Server → Client:

Binary: [JSON header]\x00[PCM audio bytes] — 24kHz PCM audio from agent
JSON events: connected, transcript, phase_change, room_analysis, design_generated, products_found, shopping_list, turn_complete, error

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
backend		backend
deploy		deploy
docs		docs
frontend		frontend
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARCHITECT — Live Spatial Intelligence

Architecture

How It Works

Tech Stack

Local Development

Prerequisites

Backend

Frontend

Deploy to GCP

Project Structure

WebSocket Protocol

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ARCHITECT — Live Spatial Intelligence

Architecture

How It Works

Tech Stack

Local Development

Prerequisites

Backend

Frontend

Deploy to GCP

Project Structure

WebSocket Protocol

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages