Edge-computed FastAPI backend + Next.js frontend for Urban
Subsurface AI: it turns dense USGS NYC geological PDFs into a 3D
subsurface model (.glb + depth-to-bedrock scalar field) using a
Groq-hosted Llama LLM (or local Ollama) and scipy / gempy
modeling.
One-click cloud deploy: click the button above, then see
DEPLOYMENT.md— both the API (geo-nyc-api) and the Next.js frontend (geo-nyc-web) ship from the same Render Blueprint. The only secret you have to paste is yourGEO_NYC_GROQ_API_KEY.
This README is the operational guide for the backend repo (Part 2 of
the workstream split). The full project blueprint, frontend strategy,
and team coordination notes live in Project Blueprint
at the bottom.
Part 3 (GIS / Optimizer) runbook: see
PART3_MEMBER_C.mdfor the data-ingest, Kriging, and/api/optimizeimplementation owned by the Part 3 workstream.
- Architecture in 30 seconds
- Quick start
- Running the demo
- Deploying with the Vercel frontend
- API contracts
- Static asset URLs
- Field grid schema
- Environment variables
- Manual smoke checklist
- Tests, lint, and CI
- Repository layout
- Troubleshooting
- Project blueprint
- License
┌──────────────────────────────────────────┐
│ FastAPI (api/) │
PDF upload ─────────▶ │ /api/documents/* /api/run /api/runs │
│ /api/health /api/llm/health │
│ /static/exports/* /static/fields/* │
└────────┬───────────────────────┬─────────┘
│ │
┌───────────────▼─────────┐ ┌─────────▼──────────┐
│ geo_nyc.documents │ │ geo_nyc.runs │
│ • PyMuPDF text extract │ │ • RunService │
│ • content-hash IDs │ │ • Ranked chunks │
│ • per-page JSON store │ │ • LLM extraction │
└─────────────┬───────────┘ │ • DSL build │
│ │ • GemPy constraints │
▼ │ • Mesh + field │
┌─────────────────────────┐ │ • Manifest writer │
│ geo_nyc.ai (Ollama) │ └──────────┬──────────┘
│ • httpx, JSON mode │ │
│ • repair loop │ │
└─────────────────────────┘ │
▼
┌────────────────────────────────────────┐
│ geo_nyc.modeling │
│ • RBFRunner, GemPyRunner, Synthetic │
│ • field_builder (RBF / mesh resample) │
│ • mesh_export (.glb), field_export │
└────────────────────────────────────────┘
Every run writes a self-describing manifest.json plus mesh, field,
DSL, extraction, and validation artifacts under
data/runs/{run_id}/. The frontend fetches them by absolute URL,
stamped from GEO_NYC_PUBLIC_BASE_URL.
- macOS / Linux (M-series Mac is the reference platform).
- Python 3.12 (3.13 is not supported —
pyproject.tomlpins>=3.12,<3.13because of GemPy / scientific wheels). - Ollama running locally on
http://localhost:11434. - (Optional) GemPy for full implicit modeling. The default
RBFRunneris the always-available fallback, so you can ship a demo without GemPy. - (Optional, for live demo with Vercel) ngrok or cloudflared to expose the laptop API to the internet.
# Homebrew (recommended on macOS)
brew install python@3.12 ollamaStart the Ollama daemon (in its own shell or as a background service):
ollama servePull the demo model once:
ollama pull llama3.1:8bSanity-check Ollama is up:
curl -s http://localhost:11434/api/tags | jq .git clone https://github.com/7dracoder/geo-nyc.git
cd geo-nyc
python3.12 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e ".[dev]"To enable the real GemPy modeling path (optional), also install:
pip install -e ".[modeling]"RBFRunner is the default mesh engine and works without GemPy. The
service detects whichever runners are importable and records the
chosen one in the run manifest.
Copy the template and edit as needed:
cp .env.example .envThe defaults work for local development out of the box:
GEO_NYC_OLLAMA_BASE_URL=http://localhost:11434
GEO_NYC_OLLAMA_MODEL=llama3.1:8b
GEO_NYC_USE_FIXTURES=true
GEO_NYC_API_HOST=127.0.0.1
GEO_NYC_API_PORT=8000
GEO_NYC_PUBLIC_BASE_URL=http://localhost:8000See Environment variables for the full list, and Deploying with the Vercel frontend for the values to change when running behind a tunnel.
Two equivalent ways:
# Console script (installed by pyproject.toml)
geo-nyc
# Or, classically:
uvicorn api.main:app --reload --host 127.0.0.1 --port 8000OpenAPI / Swagger UI: http://localhost:8000/docs.
curl -s http://localhost:8000/api/health | jq .
curl -s http://localhost:8000/api/llm/health | jq .Both should return "status": "ok". If llm/health says "down",
Ollama is not reachable.
This is what /api/run does by default — no PDF, no LLM, no GemPy.
It exists so the demo always finishes, and so Part 1 (frontend) and
Part 3 (GIS/optimizer) can integrate against a stable artifact set.
curl -s -X POST http://localhost:8000/api/run \
-H "content-type: application/json" \
-d '{}' | jq '.run_id, .artifacts | length, .mesh_summary, .field_summary'You should see something like:
"r_20260426181022_a1b2c3d4"
6
{ "engine": "rbf", "fallback_from": [], "vertices": 4225, "faces": 8192 }
{ "engine": "rbf", "fallback_from": [], "resolution_m": 50, "stats": { ... } }Open the generated .glb directly in a browser (or in
gltf.report) using the mesh_url field.
Upload a PDF, extract its text, then run the full pipeline with the LLM enabled:
# 1. Upload (returns a content-hash document_id)
DOC_ID=$(curl -s -X POST http://localhost:8000/api/documents/upload \
-F "file=@./planning/sample-usgs-i2306.pdf" | jq -r .id)
# 2. Run text extraction (PyMuPDF)
curl -s -X POST http://localhost:8000/api/documents/$DOC_ID/extract | jq '.pages_with_text'
# 3. Trigger a full run with the LLM enabled
curl -s -X POST http://localhost:8000/api/run \
-H "content-type: application/json" \
-d "{\"document_id\":\"$DOC_ID\",\"use_llm\":true}" | jq '.run_id, .llm_summary, .dsl_summary'If the LLM produces invalid JSON or DSL, the service automatically
runs the repair loop up to GEO_NYC_LLM_MAX_REPAIR_ATTEMPTS times
before falling back to the fixture path. Either way the response
shape is identical.
When the LLM-derived DSL parses + validates, the service uses that
DSL for the rest of the pipeline (mesh + GemPy inputs); the fixture
is reduced to extent + horizon scaffolding. The run manifest records
this as mode: "document_llm_dsl". If the LLM step fails, the run
gracefully falls back to the fixture DSL and the manifest reads
mode: "document_chunks+fixture" (or "document_llm+fixture" if the
extraction succeeded but the DSL build did not).
Three reference PDFs drive Part 2 (§3 of the blueprint)
— USGS I-2306, the Walloomsac (Lower Manhattan), and the Inwood Marble
abstract. They are declared in
geonyc-data/genyc_data/source_pdfs/sources.json
with download URLs, but the binaries are intentionally not checked
in.
To download all three, ingest them, and create one PDF-driven /api/run
per PDF:
make seed-runs
# or
python -m geo_nyc.runs.bootstrap # all 3, with Ollama
python -m geo_nyc.runs.bootstrap --no-llm # all 3, chunks-only (no Ollama needed)
python -m geo_nyc.runs.bootstrap --only usgs_i2306The script is idempotent (PDFs are cached on disk) and prints a JSON summary of what landed:
{
"results": [
{ "id": "usgs_i2306", "document_id": "d_abc...", "run_id": "r_...", "mode": "document_llm_dsl", "status": "succeeded" },
{ "id": "cjm_2010", "document_id": "d_def...", "run_id": "r_...", "mode": "document_llm_dsl", "status": "succeeded" },
{ "id": "walloomsac_2010_primary", "document_id": "d_ghi...", "run_id": "r_...", "mode": "document_llm_dsl", "status": "succeeded" }
]
}Once one of these runs lands, the deployed Vercel frontend's 3D dock automatically prefers it over fixture-mode runs (see Repository layout § frontend GLB resolution).
The frontend is a separate Next.js app deployed at geo-nyc.vercel.app. Because the backend runs on your laptop (not in the cloud), you have to make two things true:
- Vercel can reach the laptop API → expose port 8000 via a tunnel.
- Run-manifest URLs are publicly fetchable → tell the backend what its public URL is.
Pick one. You only need one running.
| Tool | Command | Notes |
|---|---|---|
| ngrok | ngrok http 8000 |
https://<random>.ngrok-free.app or .ngrok-free.dev. URL rotates on restart on the free plan. |
| Cloudflare Tunnel | brew install cloudflared && cloudflared tunnel --url http://localhost:8000 |
Free, more stable URLs with a named tunnel. |
| Tailscale Funnel | tailscale funnel 8000 |
Stable *.ts.net URL if you already use Tailscale. |
Note the resulting https://... URL.
In geo-nyc/.env:
GEO_NYC_API_HOST=0.0.0.0
GEO_NYC_API_PORT=8000
GEO_NYC_PUBLIC_BASE_URL=https://<your-tunnel-host>GEO_NYC_PUBLIC_BASE_URL is what the run service stamps into every
mesh_url / field_url it writes into a manifest, so the frontend
can fetch them as absolute URLs from any origin.
Restart the backend after editing .env.
Settings.cors_origins ships with https://geo-nyc.vercel.app
included by default, and Settings.cors_origin_regex accepts every
Vercel preview URL of the form geo-nyc-*.vercel.app, so feature
branch deployments work without redeploying the backend.
To allow extra origins (e.g. another teammate's preview):
GEO_NYC_CORS_ORIGINS=http://localhost:3000,http://localhost:5173,https://geo-nyc.vercel.app,https://other.example.com
GEO_NYC_CORS_ORIGIN_REGEX=^https://geo-nyc(-[a-z0-9-]+)?\.vercel\.app$In the Vercel dashboard for geo-nyc → Settings → Environment
Variables, set:
NEXT_PUBLIC_API_BASE_URL = https://<your-tunnel-host>
then trigger a redeploy.
From any machine not on your laptop's wifi:
curl https://<your-tunnel-host>/api/health
curl -X POST https://<your-tunnel-host>/api/run -H "content-type: application/json" -d '{}'The second call returns a manifest whose mesh_url and field_url
are absolute and openable in a browser. If both work, the live demo
pipe is clean.
All endpoints are under /api/*. Static artifacts live under
/static/exports/* and /static/fields/*.
GET /api/health{ "status": "ok", "version": "0.1.0", "use_fixtures": true, "enable_gempy": false }GET /api/llm/health{
"status": "ok",
"provider": "ollama",
"base_url": "http://localhost:11434",
"model": "llama3.1:8b",
"model_pulled": true,
"available_models": ["llama3.1:8b"],
"detail": null
}| Method & path | Purpose | Notes |
|---|---|---|
POST /api/documents/upload (multipart file) |
Upload a PDF; returns DocumentRecord |
Idempotent on SHA256 of the bytes. |
POST /api/documents/{id}/extract |
Run PyMuPDF text extraction; returns ExtractionResult |
Cached per document id. |
GET /api/documents |
List DocumentSummary records |
?limit= (default 100). |
GET /api/documents/{id} |
Full DocumentRecord |
404 if unknown. |
GET /api/documents/{id}/extraction |
Cached ExtractionResult |
404 if not extracted yet. |
POST /api/run
Content-Type: application/json
{
"document_id": null, // null → fixture mode
"use_fixtures": null, // override env default for this run
"fixture_name": null, // default: "nyc_demo"
"use_llm": false, // requires document_id when true
"top_k_chunks": null // 1..32, optional
}Response: a full RunManifest (also written to
data/runs/{run_id}/manifest.json):
{
"run_id": "r_20260426181022_a1b2c3d4",
"status": "succeeded",
"created_at": "2026-04-26T18:10:22Z",
"updated_at": "2026-04-26T18:10:23Z",
"mode": "fixture",
"request": { "use_llm": false },
"artifacts": [
{
"kind": "mesh",
"filename": "model.glb",
"relative_path": "r_20260426181022_a1b2c3d4/model.glb",
"url": "http://localhost:8000/static/exports/r_20260426181022_a1b2c3d4/model.glb",
"bytes": 87340,
"media_type": "model/gltf-binary",
"metadata": { "engine": "rbf", "vertices": 4225, "faces": 8192 }
},
{
"kind": "field",
"filename": "depth_to_bedrock.npz",
"relative_path": "r_20260426181022_a1b2c3d4/depth_to_bedrock.npz",
"url": "http://localhost:8000/static/fields/r_20260426181022_a1b2c3d4/depth_to_bedrock.npz",
"bytes": 16512,
"metadata": { "engine": "rbf", "schema_version": 2 }
}
// ...field_meta, dsl, extraction, validation_report
],
"validation": { "is_valid": true, "error_count": 0, "warning_count": 0, "errors": [], "warnings": [] },
"extent": { "x_min": 0, "x_max": 1000, "y_min": 0, "y_max": 1000, "z_min": -200, "z_max": 0 },
"mesh_summary": { "engine": "rbf", "fallback_from": [], "duration_ms": 142 },
"field_summary": { "engine": "rbf", "fallback_from": [], "resolution_m": 50 }
}| Method & path | Purpose |
|---|---|
GET /api/run/{run_id} |
Re-fetch a manifest by id. |
GET /api/runs?limit=50 |
List recent manifests, newest first. |
Status codes:
201 Createdfor a successful new run.404 Not Foundifdocument_idorrun_idis unknown.422 Unprocessable Entityif the run pipeline fails validation (e.g. invalid DSL even after the repair loop).
The FastAPI app mounts two static dirs:
| Mount | Local dir | Used for |
|---|---|---|
/static/exports |
data/exports/ |
Per-run .glb mesh files (<run_id>/model.glb). |
/static/fields |
data/fields/ |
Per-run depth_to_bedrock.npz + .json sidecar (<run_id>/depth_to_bedrock.npz). |
Each Artifact.url in a manifest is built from
GEO_NYC_PUBLIC_BASE_URL, so:
- Local dev →
http://localhost:8000/static/exports/r_<timestamp>_<hex8>/model.glb - Vercel demo →
https://<your-tunnel>/static/exports/r_<timestamp>_<hex8>/model.glb
Run ids are sortable by creation time (r_YYYYMMDDhhmmss_<hex8>), so
sorting filenames alphabetically gives newest-last.
The frontend should always use Artifact.url directly rather than
joining paths itself.
Per the team-wide contract in §10.4 of the blueprint, every run
writes a depth_to_bedrock.npz plus a JSON sidecar.
depth_to_bedrock.npz (NumPy v1 archive):
| Key | Shape | dtype | Meaning |
|---|---|---|---|
grid |
(ny, nx) |
float32 |
Depth from ground to bedrock, meters below surface. |
x |
(nx,) |
float32 |
Easting coordinates of grid columns (projected CRS). |
y |
(ny,) |
float32 |
Northing coordinates of grid rows (projected CRS). |
mask |
(ny, nx) |
uint8 |
Optional — 1 for valid cells, 0 for outside-AOI / nodata. |
depth_to_bedrock.json (sidecar):
{
"schema_version": 2,
"name": "depth_to_bedrock_m",
"units": "meters_below_surface",
"source": "rbf",
"run_id": "r_20260426181022_a1b2c3d4",
"crs": "EPSG:32618",
"projected_crs": "EPSG:32618",
"geographic_crs": "EPSG:4326",
"bbox": [-74.05, 40.66, -73.92, 40.81],
"bbox_xy_m": { "x_min": 583000.0, "x_max": 595000.0, "y_min": 4505000.0, "y_max": 4521000.0 },
"extent": { "x_min": 583000.0, "x_max": 595000.0, "y_min": 4505000.0, "y_max": 4521000.0, "z_min": -250.0, "z_max": 50.0 },
"resolution_m": 50,
"nx": 64, "ny": 64,
"shape": [64, 64],
"dtype": "float32",
"has_mask": false,
"stats": { "min": 0.5, "max": 187.3, "mean": 42.8, "valid_cells": 4096 }
}source is always one of "gempy" | "rbf" | "synthetic" | "stub",
recording the actual fallback that produced the field. The 3-tier
chain in priority order is:
- RBF interpolation from
GemPyInputs.surface_points(preferred — the LLM's evidence drives the surface). - Mesh resample from the bedrock layer of the produced
.glb. - Stub — a deterministic smooth field. Always succeeds.
All variables are prefixed GEO_NYC_ and loaded from .env by
pydantic-settings.
| Variable | Default | Purpose |
|---|---|---|
GEO_NYC_LLM_PROVIDER |
ollama |
Only ollama is supported. Cloud providers are intentionally absent. |
GEO_NYC_OLLAMA_BASE_URL |
http://localhost:11434 |
Ollama HTTP base URL. |
GEO_NYC_OLLAMA_MODEL |
llama3.1:8b |
Model used for extraction. Pull with ollama pull. |
GEO_NYC_OLLAMA_FAST_MODEL |
unset | Optional override for relevance / repair calls. |
GEO_NYC_LLM_TEMPERATURE |
0.2 |
Sampling temperature. |
GEO_NYC_LLM_MAX_TOKENS |
4096 |
Max tokens per LLM call. |
GEO_NYC_LLM_TIMEOUT_SECONDS |
120 |
Per-call timeout. |
GEO_NYC_LLM_MAX_REPAIR_ATTEMPTS |
2 |
Repair loop iterations on invalid JSON / DSL. |
GEO_NYC_API_HOST |
127.0.0.1 |
Bind address. Use 0.0.0.0 when tunneling. |
GEO_NYC_API_PORT |
8000 |
API port. |
GEO_NYC_DEBUG |
false |
Reload + verbose logs. |
GEO_NYC_CORS_ORIGINS |
http://localhost:3000,http://localhost:5173,https://geo-nyc.vercel.app |
Comma-separated allowed origins. |
GEO_NYC_CORS_ORIGIN_REGEX |
^https://geo-nyc(-[a-z0-9-]+)?\.vercel\.app$ |
Regex matched against Origin for Vercel previews. |
GEO_NYC_PUBLIC_BASE_URL |
http://localhost:8000 |
Used to build absolute artifact URLs in run manifests. |
GEO_NYC_DATA_DIR |
./data |
Root for all on-disk artifacts. |
GEO_NYC_DOCUMENTS_RAW_DIR |
./data/documents/raw |
Uploaded PDFs. |
GEO_NYC_DOCUMENTS_EXTRACTED_DIR |
./data/documents/extracted |
Cached PyMuPDF extractions. |
GEO_NYC_RUNS_DIR |
./data/runs |
Per-run manifests + intermediate artifacts. |
GEO_NYC_EXPORTS_DIR |
./data/exports |
Per-run .glb mesh files (/static/exports/<run_id>/). |
GEO_NYC_FIELDS_DIR |
./data/fields |
Per-run depth_to_bedrock.npz + sidecar (/static/fields/<run_id>/). |
GEO_NYC_CACHE_DIR |
./data/cache |
Misc cache (relevance scores, etc.). |
GEO_NYC_USE_FIXTURES |
true |
When true, /api/run defaults to fixture-mode artifacts. |
GEO_NYC_ENABLE_GEMPY |
false |
When true, GemPyRunner is preferred over RBFRunner. |
GEO_NYC_LOG_LEVEL |
INFO |
One of DEBUG, INFO, WARNING, ERROR. |
Trailing slashes on OLLAMA_BASE_URL and PUBLIC_BASE_URL are
stripped automatically.
Run this before a live demo or after a clean pull. (Phase 12.3.)
-
ollama serveis running andollama listshowsllama3.1:8b. -
curl http://localhost:11434/api/tagsreturns the model list. -
.venvis active andpython -c "import geo_nyc; print(geo_nyc.__version__)"works. -
pytest -qpasses (currently 192 passed, 1 skipped). -
geo-nyc(oruvicorn api.main:app) starts cleanly. -
curl http://localhost:8000/api/health→"ok". -
curl http://localhost:8000/api/llm/health→"ok". -
POST /api/runwith{}returns a manifest in <2s. -
POST /api/documents/uploadwith a real USGS PDF returns a content-hash id. -
POST /api/documents/{id}/extractreturnspages_with_text > 0. -
POST /api/runwith{ "document_id": "...", "use_llm": true }returns a manifest withllm_summarynon-null. -
mesh_urlfrom the manifest opens in a browser /gltf.report. - When tunneling: same
mesh_urlopens from a phone on cellular.
# Full suite (unit + integration)
pytest -q
# Lint
ruff check geo_nyc api tests
# Type check (mypy is configured in pyproject.toml)
mypy geo_nyc apiTest guidelines:
- LLM tests use
app.dependency_overrides[get_run_service]to inject stubbed services, so they never hit the network. tests/test_gempy_runner.pyis auto-skipped whengempyis not installed.- Integration tests for the field fallback chain live in
tests/test_run_field_fallback.py.
geo-nyc/
├── api/ # FastAPI app + routers
│ ├── main.py # create_app(), CORS, static mounts, lifespan
│ ├── schemas.py # Public API Pydantic models
│ └── routers/ # /health, /documents, /run(s)
│
├── geo_nyc/ # Core package
│ ├── ai/ # Ollama HTTP client + repair-aware extractor
│ ├── config.py # Pydantic settings (env-driven)
│ ├── documents/ # PDF upload, PyMuPDF extraction, content-hash IDs
│ ├── extraction/ # Chunking + relevance ranking
│ ├── prompts/ # nyc_geology_extraction.md, repair_extraction.md
│ ├── parsers/ # Lark DSL grammar, parser, validator, builder
│ ├── modeling/ # Constraints, RBF/GemPy/Synthetic runners,
│ │ # mesh export, field builder + exporter
│ └── runs/ # RunService, RunManifest, fixtures
│
├── tests/ # pytest suite (192+ tests)
├── data/ # On-disk artifacts (gitignored at runtime)
│ ├── fixtures/ # Bundled demo fixtures
│ ├── documents/ # raw PDFs + extracted JSON
│ ├── runs/ # per-run manifests + artifacts
│ ├── exports/ # /static/exports/<run_id>/model.glb
│ └── fields/ # /static/fields/<run_id>/depth_to_bedrock.{npz,json}
│
├── geo-lm/ # (sibling) Reference repo, NOT imported
├── planning/ # Master blueprint + phase tasks
│
├── pyproject.toml # Python 3.12, deps, ruff, pytest config
├── .env.example # Copy to .env and edit
└── README.md # ← you are here
curl http://localhost:11434/api/tagsShould return JSON. If it fails:
- Run
ollama serve(in a separate terminal). - Confirm no firewall/VPN is blocking
localhost:11434. - If
/api/llm/healthsays"down", the FastAPI app'sGEO_NYC_OLLAMA_BASE_URLmay be wrong — restart after editing.env.
ollama list
ollama pull llama3.1:8bMake sure GEO_NYC_OLLAMA_MODEL matches the exact tag printed by
ollama list (including the :8b suffix).
- Confirm the venv is active (
which pythonshould be inside.venv/bin). - Check Python version:
python --versionmust be 3.12.x. - Reinstall:
pip install -e ".[dev]". - Don't mix Poetry and
.venv— pick one toolchain.
The repair loop handles most of these automatically. If it still fails:
- Lower
GEO_NYC_LLM_TEMPERATURE(try0.0). - Reduce
GEO_NYC_LLM_MAX_TOKENSif the model is rambling. - Check
data/runs/{run_id}/llm_attempts/— every prompt and raw response is persisted for offline debugging. - As a last resort, re-run with
use_llm=falseto fall back to the fixture path.
- Without
gempyinstalled,RBFRunneris the default and always works —mesh_summary.enginewill be"rbf". - If GemPy is installed but raises during import on Apple Silicon,
set
GEO_NYC_ENABLE_GEMPY=falseand rely on RBF for the demo. The manifest records the fallback chain so judges still see the intent.
- Open
mesh_urldirectly in a browser: it should download the.glbwithout auth. - Confirm
GEO_NYC_PUBLIC_BASE_URLmatches the public URL (it goes into every manifest, so old runs may have stale URLs — re-run after changing it). - Check the browser DevTools Console for the exact origin: it must
be present in
GEO_NYC_CORS_ORIGINSor matched byGEO_NYC_CORS_ORIGIN_REGEX. - For Vercel previews,
geo-nyc-*.vercel.appshould be matched by the default regex; if not, runpython -c "import re; print(re.fullmatch(r'^https://geo-nyc(-[a-z0-9-]+)?\.vercel\.app$', 'https://YOUR-PREVIEW-URL'))"and adjust the regex.
The field can fall back through three tiers (RBF → mesh resample →
stub). Check field_summary.engine and field_summary.fallback_from
in the manifest. If you're hitting "stub", the LLM didn't produce
enough surface-point evidence — increase top_k_chunks or feed a
denser PDF.
Free-tier ngrok issues a new hostname on each session, which means
both GEO_NYC_PUBLIC_BASE_URL and the Vercel NEXT_PUBLIC_API_BASE_URL
need to be updated. For a stable URL, use cloudflared named
tunnels or a paid ngrok reserved domain.
The sections below are the original team-wide blueprint (frontend strategy, GIS workstream, optimization, demo run-of-show). They are preserved here for new contributors.
We use williamjsdavis/geo-lm as a
reference repository (MIT license). It demonstrates the hard parts
we needed: PDF upload → document processing → geology DSL → GemPy-
oriented 3D workflow, with a FastAPI backend (api/), core
package (geo_lm/), and an upstream React + Vite demo UI
(web/) — reference only; our shipped UI is Next.js on Vercel.
What we keep from upstream:
- REST shape: document upload/extract, DSL parse/validate/create, workflow status endpoints.
- DSL pipeline: structured "geology DSL" as the contract between language understanding and modeling (Lark grammar, validation, retries).
- GemPy integration path: implicit 3D geological modeling after DSL is stable.
What we re-implemented for this hackathon:
- Local inference only: upstream defaults to cloud providers (Anthropic, OpenAI, Llama API keys in
.env). We replaced the LLM client layer with an Ollama adapter (geo_nyc.ai). - NYC corpus: generic upstream examples were swapped for the USGS NYC PDFs below; prompts encode NYC stratigraphy (formations, contacts, depths).
- Frontend strategy: FastAPI backend stays in
geo-nyc. The product UI is a separate Next.js app on Vercel — not a fork of upstreamweb/.
- Aesthetic: minimalistic, white background, no dark/cyberpunk themes.
- Geospatial Mapping Layers: judges can toggle:
- Base 2D street map.
- NYC borough outline (Manhattan, Bronx, Queens — see §6).
- NYC Open Data layers (flood, infrastructure context).
- Optional ML-derived rasters (smoothed depth-to-bedrock).
- 3D subsurface mesh (
.glbfrom this backend).
- Optimization UI: a small "What-if" panel driving §8's toy optimizer against precomputed scalar fields — never blocking the UI on a full GemPy solve.
- Master Stratigraphy: Bedrock and Engineering Geologic Maps of Bronx County and parts of New York and Queens Counties (USGS I-2306).
- Infrastructure Risk Data: Newly Mapped Walloomsac Formation in Lower Manhattan and New York Harbor and the Implications for Engineers.
- Depth Metrics: Stratigraphy, Structural Geology and Metamorphism of the Inwood Marble Formation, Northern Manhattan (NYC Water Tunnel Data).
| Phase | Repo path | Status |
|---|---|---|
| A. Local AI engine | geo_nyc/ai/ (Ollama HTTPX client) |
done |
| B. Document ingestion | geo_nyc/documents/, /api/documents/* |
done |
| B'. Chunking + relevance ranking | geo_nyc/extraction/ |
done |
| C. LLM → DSL | geo_nyc/prompts/, geo_nyc/parsers/dsl/ |
done |
| D. 3D modeling bridge | geo_nyc/modeling/{rbf,gempy,synthetic}_runner.py |
done (RBF default, GemPy optional) |
| D'. Scalar field export | geo_nyc/modeling/field_{builder,export}.py |
done |
| E. Optimization | Owned by Part 3 — /api/optimize |
out-of-scope for this repo |
| F. Deployment | cors_origins, cors_origin_regex, public_base_url |
done |
"City planners waste months manually extracting data from 400-page USGS PDFs to plan geothermal grids and subway expansions. We built Urban Subsurface AI. It uses a 100% local, edge-computed AI agent to read those unstructured reports and instantly generate interactive 3D infrastructure models. No cloud latency, absolute data privacy, and a seamless visual pipeline to build the city of the future safely."
Owned by Part 3 (separate workstream), not this repo. Key contracts:
- AOI: Manhattan + Bronx + Queens (BoroCode 1, 2, 4) clipped from Borough Boundaries.
- Outputs:
data/layers/*.geojson+data/layers/manifest.json, synced to the Next.js repo'spublic/layers/. - CRS: all browser GeoJSON in EPSG:4326; server-side analytic work in a projected CRS (suggested EPSG:32618 / UTM 18N).
This repo's contribution: per-run depth_to_bedrock.npz produced by
RBF interpolation from LLM-derived surface points. Schema in
Field grid schema.
Part 3 may consume this and produce GeoJSON contours in
data/layers/depth_contours.geojson.
POST /api/optimize is owned by Part 3 and is not implemented in
this backend. It reads depth_to_bedrock.npz and returns:
{ "optimal_d": 24.3, "objective": 1.27, "constraints_ok": true, "diagnostics": { ... } }- Part 1 (Frontend): Next.js on Vercel — MapLibre + R3F.
- Part 2 (this repo): FastAPI, Ollama, RBF/GemPy, mesh + field exports.
- Part 3 (GIS / Optimizer): Open Data ingest, Kriging,
/api/optimize— implementation runbook inPART3_MEMBER_C.md.
The full contract list (file layouts, API shapes, CRS, demo
run-of-show) lives in planning/part-2-design.md and
planning/part-2-tasks.md. The runtime API contract for Part 1 + 3
is the API contracts section above — that is
authoritative.
- geo-nyc code: MIT.
- geo-lm reference code: MIT — preserved attribution where re-used in spirit.
- NYC Open Data: per-dataset terms; provenance recorded in
manifest.json. - USGS reports: public domain; cited in the app About panel.