Spatial Context Agent

An autonomous AI tour guide for cultural and heritage sites — point your phone at any landmark and get a real-time, persona-driven narration with conversational follow-ups.

What it does

A tourist photographs the Brandenburg Gate. The system extracts GPS from the photo's EXIF metadata, classifies the scene with CLIP, finds the nearest landmark in its knowledge base, retrieves contextual facts via RAG, and streams a narration through GPT-4o-mini — all in one API call. The user can then ask follow-up questions in natural language; the agent detects intent, queries live web results when needed, and responds in character.

Core loop:

Photo → EXIF GPS → CLIP scene → nearest landmark → RAG context → GPT narration (streamed)
                                                                       ↓
                                              follow-up chat with intent detection + web search

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Streamlit UI                             │
│   Upload photo  ──►  SSE stream narration  ──►  Chat follow-ups │
└────────────────────────────┬────────────────────────────────────┘
                             │ HTTP
┌────────────────────────────▼────────────────────────────────────┐
│                     FastAPI  (port 8001)                        │
│                                                                 │
│  POST /agent/stream          POST /agent/followup               │
│         │                           │                           │
│  ┌──────▼──────────────────┐  ┌─────▼──────────────────────┐   │
│  │    LangGraph Pipeline   │  │   Intent Detector          │   │
│  │  vision → context →     │  │   keyword + GPT-4o-mini    │   │
│  │  narration (streamed)   │  │   routing to tools         │   │
│  └──────┬──────────────────┘  └─────┬──────────────────────┘   │
│         │                           │                           │
│  ┌──────▼────────┐  ┌───────────────▼──────────────────────┐   │
│  │  CLIP ViT-B/32│  │  Tools                               │   │
│  │  scene class. │  │  • RAG retriever (pgvector)          │   │
│  └───────────────┘  │  • Foursquare reverse geocoder       │   │
│                     │  • DuckDuckGo web search (live)      │   │
│  ┌──────────────┐   │  • Nearby places (DB)                │   │
│  │  PostgreSQL  │   └──────────────────────────────────────┘   │
│  │  landmarks   │                                               │
│  │  pgvector    │   Session memory (DB-backed)                  │
│  │  conv. turns │   Location context injected per request       │
│  └──────────────┘                                               │
└─────────────────────────────────────────────────────────────────┘

Tech Stack

Layer	Technology
Language	Python 3.11
Vision	OpenAI CLIP (ViT-B/32) via PyTorch — zero-shot scene classification
Orchestration	LangGraph `StateGraph` — vision → context → narration nodes
LLM	GPT-4o-mini via LangChain (`langchain-openai`)
RAG	pgvector + `sentence-transformers` (`all-MiniLM-L6-v2`)
Geolocation	EXIF GPS extraction + Foursquare Places API v3
Web search	DuckDuckGo (`duckduckgo-search`) — live results, no API key
API	FastAPI + Uvicorn, SSE via `StreamingResponse`
Database	PostgreSQL 16 + pgvector extension
ORM	SQLAlchemy 2.0 with `Mapped` / `mapped_column`
Session memory	DB-backed (`UserSession` + `ConversationTurn` models)
Containerisation	Docker multi-stage build + docker-compose
CI	GitHub Actions — ruff lint + 84 pytest tests on every push
Demo UI	Streamlit with live SSE token streaming
Testing	pytest, SQLite `StaticPool`, all CLIP calls mocked
Linting	Ruff

Quick Start

# 1. Clone
git clone https://github.com/Rithub14/spatial-context-agent.git
cd spatial-context-agent

# 2. Copy env and fill in your keys
cp .env.example .env
# Required: OPENAI_API_KEY
# Optional: FOURSQUARE_API_KEY (falls back to DB-only landmarks without it)

# 3. Start API + PostgreSQL (pgvector image)
docker-compose up --build -d

# 4. Seed 18 Berlin landmarks
docker-compose exec api python -m src.db.seed

# 5. (Optional) Build the RAG knowledge base from Wikipedia
docker-compose exec api python -m src.db.build_knowledge_base

# 6. Launch Streamlit
pip install streamlit httpx pandas Pillow
streamlit run streamlit_app/app.py

Or run the API locally without Docker:

uv venv && uv pip install -r requirements.txt
uvicorn src.api.main:app --port 8001 --reload

API Endpoints

`POST /api/v1/agent/stream` — main endpoint (SSE)

Runs the full LangGraph pipeline and streams narration token-by-token.

Request

{
  "image": "<base64 JPEG/PNG>",
  "latitude": 52.5163,
  "longitude": 13.3777,
  "persona": "historian",
  "session_id": null
}

latitude / longitude are optional if the image contains GPS EXIF metadata. persona options: historian · storyteller · local · child_friendly

SSE event stream

data: {"type": "step",  "content": "👁️ Scene identified: monument (87% confidence)"}
data: {"type": "step",  "content": "📍 Landmark found: Brandenburg Gate (42m away)"}
data: {"type": "step",  "content": "📚 Retrieved knowledge context (1842 chars)"}
data: {"type": "token", "content": "Standing before"}
data: {"type": "token", "content": " the iconic"}
...
data: {"type": "done",  "session_id": "...", "scene": {...}, "location": {...}, "metadata": {...}}

`POST /api/v1/agent/followup` — conversational follow-ups

{
  "session_id": "abc123",
  "question": "any exhibitions happening nearby?",
  "persona": "historian"
}

Response

{
  "session_id": "abc123",
  "answer": "As of March 2026, the Pergamon Museum is...",
  "intent": "current_events",
  "intent_confidence": 1.0
}

Detected intents:

Intent	Triggered by	Action
`nearby_places`	"what else is nearby?"	DB proximity search
`historical_facts`	"when was it built?"	RAG knowledge retrieval
`current_events`	"any exhibitions?"	Live DuckDuckGo web search
`opening_hours`	"when does it open?"	LLM with advisory note
`directions`	"how do I get there?"	LLM using last known GPS
`translation`	"say that in German"	LLM
`photo_tip`	"best angle for a photo?"	LLM
`moved`	"I've moved"	Prompt to upload new photo
`general`	anything else	LLM

The user's GPS coordinates from their last uploaded photo are injected into every follow-up so distance questions ("how far is X?") are answered relative to their actual location.

Today's date is always injected so the LLM cannot hallucinate stale event information.

`POST /api/v1/analyze` — non-streaming (legacy)

Same pipeline as /agent/stream but returns a single JSON response. Useful for testing or non-streaming clients.

`GET /api/v1/locations`

Paginated list of all landmarks in the database.

`POST /api/v1/locations`

Add a landmark (requires X-API-Key header when ENABLE_AUTH=true).

`GET /health`

{"status": "ok", "model_loaded": true, "db_connected": true, "uptime_seconds": 142.7}

Project Structure

spatial-context-agent/
├── src/
│   ├── api/
│   │   ├── main.py                  # FastAPI app, lifespan, middleware wiring
│   │   ├── routes/
│   │   │   ├── agent.py             # All agent endpoints + intent routing helpers
│   │   │   └── health.py            # GET /health
│   │   ├── middleware/
│   │   │   ├── auth.py              # API key validation (toggleable)
│   │   │   └── rate_limiter.py      # Per-IP sliding window (toggleable)
│   │   └── schemas/
│   │       ├── request.py           # Pydantic request models
│   │       └── response.py          # Pydantic response models
│   ├── agent/
│   │   ├── graph.py                 # LangGraph StateGraph (vision→context→narration)
│   │   ├── tools.py                 # LangChain @tool wrappers for pipeline components
│   │   └── memory.py                # DB-backed session memory (UserSession, ConversationTurn)
│   ├── pipeline/
│   │   ├── clip_inference.py        # CLIP model loading + logit-scaled inference
│   │   ├── scene_classifier.py      # Zero-shot classification (12 categories)
│   │   ├── location_extractor.py    # EXIF GPS extraction (IFDRational-safe)
│   │   ├── context_retriever.py     # Haversine nearest-landmark lookup
│   │   ├── narration_engine.py      # GPT-4o-mini narration + template fallback
│   │   ├── embedder.py              # sentence-transformers singleton
│   │   ├── rag_retriever.py         # pgvector cosine similarity retrieval
│   │   └── intent_detector.py       # Intent classification (LLM + keyword fallback)
│   ├── db/
│   │   ├── models.py                # Landmark, InferenceLog, LandmarkChunk,
│   │   │                            # UserSession, ConversationTurn
│   │   ├── seed.py                  # 18 Berlin landmarks with GPS + narration templates
│   │   ├── build_knowledge_base.py  # Embeds DB text + Wikipedia into pgvector
│   │   └── session.py               # Engine, SessionLocal, get_db
│   └── config.py                    # pydantic-settings (all config from env vars)
├── tests/                           # 84 tests — CLIP always mocked, SQLite StaticPool
├── streamlit_app/
│   └── app.py                       # Streamlit demo: SSE streaming + chat interface
├── scripts/
│   └── smoke_test.py                # Manual health + analyze + locations check
├── docker/
│   └── init-pgvector.sql            # CREATE EXTENSION vector (runs on DB init)
├── .github/workflows/
│   └── ci-cd.yml                    # CI: ruff + pytest on every push/PR
├── Dockerfile                       # Multi-stage: builder (gcc, git) + runtime
├── docker-compose.yml               # pgvector/pgvector:pg16 + FastAPI
├── requirements.txt
└── .env.example

Testing

# Run all 84 tests
PYTHONPATH=. pytest tests/ -v

# With coverage
PYTHONPATH=. pytest --cov=src --cov-report=term-missing tests/

# Smoke test against a running instance
python scripts/smoke_test.py --url http://localhost:8001

CLIP is always mocked (slow to load). The test database uses SQLite in-memory with StaticPool so all connections share one instance. pgvector tests are skipped in SQLite mode.

Environment Variables

# Database
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/spatial_agent

# Model
CLIP_MODEL_NAME=ViT-B/32
DEVICE=cpu

# API
API_HOST=0.0.0.0
API_PORT=8000

# Security (toggleable)
ENABLE_AUTH=false
ENABLE_RATE_LIMIT=false
API_KEY=dev-key-change-in-production
RATE_LIMIT_RPM=60

# Agentic system (required for LLM narration, intent detection, web search)
OPENAI_API_KEY=sk-...

# Foursquare (optional — enables worldwide reverse geocoding)
FOURSQUARE_API_KEY=...

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spatial Context Agent

What it does

Architecture

Tech Stack

Quick Start

API Endpoints

`POST /api/v1/agent/stream` — main endpoint (SSE)

`POST /api/v1/agent/followup` — conversational follow-ups

`POST /api/v1/analyze` — non-streaming (legacy)

`GET /api/v1/locations`

`POST /api/v1/locations`

`GET /health`

Project Structure

Testing

Environment Variables

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
docker		docker
scripts		scripts
src		src
streamlit_app		streamlit_app
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Spatial Context Agent

What it does

Architecture

Tech Stack

Quick Start

API Endpoints

POST /api/v1/agent/stream — main endpoint (SSE)

POST /api/v1/agent/followup — conversational follow-ups

POST /api/v1/analyze — non-streaming (legacy)

GET /api/v1/locations

POST /api/v1/locations

GET /health

Project Structure

Testing

Environment Variables

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/v1/agent/stream` — main endpoint (SSE)

`POST /api/v1/agent/followup` — conversational follow-ups

`POST /api/v1/analyze` — non-streaming (legacy)

`GET /api/v1/locations`

`POST /api/v1/locations`

`GET /health`

Packages