A server-rendered Next.js app that turns real New York City locations into immersive historical scenes you can explore on desktop, phone, or in Google Cardboard. Open the map, choose a landmark, step into a 360Β° AI-aged panorama of that place in a specific year β with hotspots, voice clips, and editorial period notes anchored to the world around you.
ποΈ "See New York through time."
The current build ships with 12 scenes spanning Manhattan, Brooklyn, the Bronx, Queens, Staten Island, and New York harbor β anchored on years between 1873 and 1917. Each panorama is a real modern Mapillary 360Β° photo of the actual location, edited by Google Gemini 2.5 Flash Image (Nano Banana) to look period-correct using web-grounded historical research. Character voice clips are synthesised at runtime via a server-side ElevenLabs proxy with disk caching, so each unique line costs once.
flowchart TB
USER["π€ User<br/><i>Desktop / Phone / Cardboard</i>"]
subgraph FE["π₯οΈ Frontend β Next.js App Router"]
LANDING["π Landing Page<br/><code>/</code>"]
MAP["πΊοΈ Map Page<br/><code>/map</code>"]
VIEWER["π₯½ VR Viewer<br/><code>/viewer/[locId]/[eraId]</code>"]
EDITOR["βοΈ Hotspot Editor<br/><code>/editor/[locId]/[eraId]</code>"]
end
subgraph API["β‘ API Routes β Server-side"]
TTS["π TTS Proxy<br/><code>/api/tts/[sceneId]/[hotspotId]</code>"]
TTS_INTRO["π€ Intro TTS<br/><code>/api/tts/intro/[locId]/[eraId]</code>"]
CACHE["πΎ Disk Cache<br/><code>cache/tts/{sha256}.mp3</code>"]
end
subgraph BUILD["π§ Build-time Pipeline"]
B1["πΈ scrape-panorama"]
B2["π gather-period-research"]
B3["π¨ age-panorama"]
B4["π― generate-hotspots-vision"]
B5["π generate-summaries"]
B6["πΌοΈ bake-infopanels"]
end
subgraph EXT["βοΈ External APIs"]
MAPILLARY["πΊοΈ Mapillary API<br/><i>360Β° Street Imagery</i>"]
GEMINI["β¨ Google Gemini 2.5 Flash<br/><i>Image Gen + Search + Vision</i>"]
ELEVEN["ποΈ ElevenLabs API<br/><i>Text-to-Speech</i>"]
end
%% User β Frontend
USER -->|"Browse"| LANDING
USER -->|"Explore map"| MAP
USER -->|"Enter scene"| VIEWER
VIEWER -->|"Gaze hotspot"| TTS
VIEWER -->|"Landmark intro"| TTS_INTRO
%% Frontend β API
TTS -->|"Cache miss?"| ELEVEN
TTS_INTRO -->|"Cache miss?"| ELEVEN
TTS -->|"Cache hit"| CACHE
TTS_INTRO -->|"Cache hit"| CACHE
ELEVEN -->|"Write MP3"| CACHE
%% Build pipeline β External APIs
B1 -->|"Download 360Β° photos"| MAPILLARY
B2 -->|"Search-grounded research"| GEMINI
B3 -->|"Image-to-image aging"| GEMINI
B4 -->|"Vision + bbox detection"| GEMINI
B5 -->|"Summarize"| GEMINI
%% Build outputs feed the frontend
B1 -.->|"public/panoramas-modern/"| B3
B2 -.->|"data/period-research.json"| B3
B3 -.->|"public/panoramas/"| VIEWER
B4 -.->|"data/scenes/"| VIEWER
B5 -.->|"data/location-summaries.json"| MAP
B6 -.->|"public/infopanels/"| VIEWER
%% Styling
classDef user fill:#f4efe6,stroke:#8a2a1f,color:#333,stroke-width:2px
classDef frontend fill:#fff3e0,stroke:#e65100,color:#333,stroke-width:2px
classDef api fill:#e8f5e9,stroke:#2e7d32,color:#333,stroke-width:2px
classDef build fill:#e3f2fd,stroke:#1565c0,color:#333,stroke-width:2px
classDef external fill:#f3e5f5,stroke:#7b1fa2,color:#333,stroke-width:2px
class USER user
class LANDING,MAP,VIEWER,EDITOR frontend
class TTS,TTS_INTRO,CACHE api
class B1,B2,B3,B4,B5,B6 build
class MAPILLARY,GEMINI,ELEVEN external
| # | π Location | ποΈ Borough | π Year | β Anchor |
|---|---|---|---|---|
| I | Times Square | Manhattan | 1904 | IRT subway opens; LongacreβTimes rename |
| II | Brooklyn Bridge | Brooklyn | 1883 | May 24 opening day |
| III | Liberty Island | NY Harbor | 1907 | Peak immigration year across the bay |
| IV | Central Park | Manhattan | 1873 | Bethesda Terrace unveiled |
| V | Lower East Side | Manhattan | 1911 | Triangle Shirtwaist fire; Hester Street market peak |
| VI | Coney Island | Brooklyn | 1904 | Luna Park's million-bulb opening |
| VII | Prospect Park | Brooklyn | 1873 | Olmsted/Vaux's Brooklyn masterpiece |
| VIII | DUMBO | Brooklyn | 1900 | Empire Stores warehouses, working waterfront |
| IX | Grand Concourse | Bronx | 1909 | Risse's Champs-ΓlysΓ©es-modeled boulevard opens |
| X | South Bronx Hub | Bronx | 1909 | Third Ave El + commercial peak |
| XI | Jackson Heights | Queens | 1917 | First cooperative garden apartments rise |
| XII | St. George Ferry Terminal | Staten Island | 1908 | New terminal opens after the 1905 fire |
build-time pipeline (run once) runtime (Next.js server)
ββββββββββββββββββββββββββββββ βββββββββββββββββββββββββ
1. πΈ scrape-panorama Mapillary 360 β GET / π landing
β public/panoramas-modern/ GET /map πΊοΈ NYC map + accordion sidebar
GET /viewer/[loc]/[era]
2. π gather-period-research Gemini Search β π₯½ A-Frame 360 + hotspots
~200 words/scene (gaze 1.5s β InfoPanel)
3. π¨ age-panorama Nano Banana β GET /api/tts/[scene]/[hotspot]
(modern + research) π ElevenLabs proxy + disk cache
β public/panoramas/ (only runtime API call)
4. π― generate-hotspots-vision Gemini vision +
bbox detection β
β data/scenes/
5. π generate-summaries Wikipedia + Gemini
Flash-Lite β
β data/location-summaries.json
6. πΌοΈ bake-infopanels @napi-rs/canvas β
β public/infopanels/ (PNG textures used inside Cardboard mode where
HTML/CSS doesn't reach into the WebGL scene)
The only runtime API call is /api/tts/[sceneId]/[hotspotId] β it proxies ElevenLabs, hashes the (text + voiceId) pair with SHA-256, and caches the resulting MP3 on disk under cache/tts/{sha}.mp3. Each unique line burns the API once for the lifetime of the cache.
| Category | Technology |
|---|---|
| β‘ Framework | Next.js 15 App Router + TypeScript, server-rendered (with API routes β not a static export) |
| π¨ Styling | Tailwind for layout primitives + hand-written editorial CSS in app/globals.css |
| π Motion | motion library + CSS keyframes + IntersectionObserver for scroll reveals |
| πΊοΈ Map | react-leaflet with CartoDB light_nolabels tiles, run through an SVG feColorMatrix + feComponentTransfer color-lookup filter that maps tile luminance through a 6-stop gradient (ink β 4 oxblood intermediates β paper) so the map reads in the site palette instead of fighting it |
| π₯½ VR / 360 viewer | A-Frame (loaded via next/script), with built-in stereoscopic Cardboard mode and magic-window fallback |
| πΌοΈ Build-time canvas | @napi-rs/canvas for InfoPanel PNG bakes |
| π€ Fonts | Playfair Display + EB Garamond + Inter Tight + JetBrains Mono (self-hosted via @fontsource) |
Cream paper background (#f4efe6), oxblood accent (#8a2a1f), generous serif typography, hairline rules, animated rule-line draws, letter-by-letter title reveals, drop caps, kicker labels, paper grain overlay. Thematically the UI should feel like opening the morning paper to read about Times Square in 1904 β not like a generic Tailwind site.
CSS tokens live in app/globals.css. Editorial primitives live in components/ui/.
- π’ Node.js 18+
- πΊοΈ A Mapillary access token (free): https://www.mapillary.com/dashboard/developers
- β¨ A Google Gemini API key with billing enabled (image generation and search grounding are paid features; total spend per full re-build is under $1): https://aistudio.google.com/app/apikey
- ποΈ An ElevenLabs API key (free tier is fine β runtime cache means each unique line burns the budget once): https://elevenlabs.io/app/settings/api-keys
# install dependencies
npm install
# create env file (do not commit)
cp .env.local.example .env.local
# then edit .env.local and fill in:
# MAPILLARY_ACCESS_TOKEN=...
# GEMINI_API_KEY=...
# ELEVENLABS_API_KEY=...
# run the build-time pipeline (in order)
npm run scrape-panorama # πΈ Mapillary β public/panoramas-modern/
npm run gather-period-research # π Gemini search β data/period-research.json
npm run age-panorama # π¨ Nano Banana β public/panoramas/
npm run generate-hotspots-vision # π― Gemini vision β data/scenes/
npm run generate-summaries # π Wikipedia + Gemini β data/location-summaries.json
npm run bake-infopanels # πΌοΈ canvas β public/infopanels/
# start the dev server
npm run devOpen http://localhost:3000. π
iOS Safari requires HTTPS for DeviceOrientationEvent.requestPermission(). Plain LAN HTTP from npm run dev won't trigger the gyroscope on iPhone. Use Cloudflare Tunnel:
# from the project directory, with the dev server running on port 3000
cloudflared tunnel --url http://localhost:3000 --protocol http2The --protocol http2 flag is important β without it cloudflared defaults to QUIC over UDP/443, which many networks block. Cloudflared prints a https://*.trycloudflare.com URL β open that on your phone, navigate to a viewer route, tap A-Frame's VR icon, grant motion permission, drop in headset. π₯½
All scripts are idempotent β safe to re-run. Most accept --force to regenerate ignoring cached state.
| Script | π What it does | π° Cost (full 12-scene run) |
|---|---|---|
npm run scrape-panorama |
Calls Mapillary Graph API for each location, picks the highest-resolution 360Β° photo, downloads to public/panoramas-modern/{locId}.jpg. Records contributor attribution in data/panorama-sources.json. |
Free |
npm run gather-period-research |
Calls Gemini 2.5 Flash with Google Search grounding to produce ~200 words of web-sourced visual context per (loc, era) β buildings present that year, signage by name, vehicles, dress, surfaces, lighting. Cited URLs preserved in the JSON. | ~$0.42 |
npm run age-panorama |
Sends each modern 360 + the locked aging prompt + research blob to Nano Banana (gemini-2.5-flash-image); writes public/panoramas/{locId}__{eraId}.jpg. Append -- --force to overwrite. |
~$0.47 |
npm run generate-hotspots-vision |
Sends each aged JPG to Gemini 2.5 Flash with vision; uses bbox detection (box_2d in [ymin, xmin, ymax, xmax]/[0, 1000]) to spatially place 3β5 period-grounded hotspots per scene. Computes box centers and converts to (yaw, pitch) via the equirectangular projection. |
~$0.05 |
npm run generate-summaries |
Wikipedia REST + Gemini Flash-Lite β ~50-word period blurbs that show in the sidebar accordion. | Negligible |
npm run bake-infopanels |
Renders each hotspot's editorial paper-card to a PNG via @napi-rs/canvas. The PNGs are loaded as <a-image> planes inside Cardboard mode, where HTML/CSS can't reach into the WebGL scene. |
Free |
Total cost for a full re-build of all 12 scenes: under $1 π΅
For the older Wikidata-based hotspot generator (kept as a fallback), run npm run generate-hotspots.
If a lat/lon search returns nothing (common in dense central NYC and outer-borough residential areas), the scrape script tries progressively wider bboxes (150m β 300m β 500m β 800m β 1200m). If still empty, you have two escape hatches:
- π Find a Mapillary 360 image manually at https://www.mapillary.com/, copy its image ID from the URL, set
MAPILLARY_FALLBACK_<LOCID_UPPER>=<id>in.env.local, and re-run. - π Drop a hand-sourced equirectangular JPG into
public/panoramas-modern/{locId}.jpg. The aging step picks it up regardless of source.
The current build uses fallback IDs for LIBERTY_ISLAND, DUMBO, SOUTH_BRONX_HUB, and JACKSON_HEIGHTS.
- π Add an entry to data/locations.json with
id,name,lat,lon,numeral,blurb, and one or moreeras(each withid,year,label,labelLong, and aneventanchor sentence). - πΊοΈ If the lat/lon has no Mapillary 360 coverage, set the fallback env var (above).
- π
Add an entry to
ERA_NOTESin scripts/age-panorama.ts for any new year (vehicles, dress, signage cues for the period). - π Run the pipeline scripts in order.
A hidden /editor/[locId]/[eraId] route exists for manual hotspot fine-tuning if you want pixel-perfect placement; it's gated behind NODE_ENV === 'development'. π οΈ
Hotspots are placed by Gemini's vision bounding-box detection β the model returns box_2d in [ymin, xmin, ymax, xmax]/[0, 1000] (the format Gemini was trained on for grounding) and the script computes the box center, then maps to (yaw, pitch) via:
yaw = (x - 0.5) Γ 360Β° // x in [0, 1] from the equirect
pitch = -(y - 0.5) Γ 180Β°
The viewer's visible gaze marker is a thin oxblood ring (cosmetic only, ~0.04m thick annulus). The actual raycaster hit target is an invisible 0.45m circle at each hotspot β gives ~10Γ the catch area so the gaze reticle reliably triggers the hotspot when the user holds their gaze in the rough vicinity. ποΈ
The browser plays MP3s via the <audio> element. The bytes come from /api/tts/{sceneId}/{hotspotId} β a Next.js server route that:
- π Looks up the hotspot's
voice.text+voice.voiceIdin the scene JSON - π Hashes that pair (SHA-256) and checks
cache/tts/{hash}.mp3 - π¨ On miss: calls ElevenLabs, writes the bytes to disk, returns them
- β‘ On hit: streams the cached file
Each unique line burns its character cost once for the lifetime of the cache. Twelve scenes Γ ~120 chars/line β 1500 chars total, well under the 10k chars/month free tier. πΆ
npm run dev on a laptop is the demo target. If you want a public share link:
vercel
β οΈ Caveat: Vercel functions are stateless, so thecache/tts/directory does not survive cold starts in production. Either commit the prewarmed MP3s into the repo (removecache/from.gitignoreandgit add cache/tts/*.mp3), or move the cache to Vercel Blob storage (free tier).
app/ β β‘ Next.js routes (page.tsx, viewer/, editor/, api/tts/)
components/ β π§© UI components (LandingClient, MapPageClient, VRScene, InfoPanel, etc.)
components/ui/ β β¨ Editorial design primitives (Rule, SplitText, Kicker, Button, PaperGrain)
data/locations.json β π 12 NYC locations
data/scenes/ β π― Per-scene hotspot JSON (generated)
data/period-research.json β π Web-grounded visual research per (loc, era) (generated)
data/location-summaries.json β π Sidebar period blurbs (generated)
data/panorama-sources.json β πΊοΈ Mapillary contributor attribution (generated)
scripts/ β π§ Build-time pipelines
public/panoramas-modern/ β πΈ Modern Mapillary 360s (input to aging step)
public/panoramas/ β π¨ Aged 360 JPGs (final output)
public/infopanels/ β πΌοΈ Baked editorial paper-card PNGs (generated)
public/fonts/raw/ β π€ TTF files for the canvas bake (auto-downloaded)
public/ambient/ β π΅ Looping ambient MP3s (drop in CC0 audio from freesound.org)
cache/tts/ β πΎ Runtime ElevenLabs cache (gitignored)
lib/ β π¦ Server-side helpers (scenes loader, ElevenLabs client, types)
- π° $0 to ship the demo locally beyond the one-time build pipeline (~$1 total). Runtime spend is bounded by the number of unique voice lines, not the number of users, because of the disk cache.
β οΈ A-Frame + Next.js SSR breaks on first import. All A-Frame mounts must usedynamic({ ssr: false }).- πΌοΈ Image-to-image aging is bounded by the modern Mapillary frame's geometry. Nano Banana can re-skin facades, swap signage, change dress and vehicles, and remove modern displays β but it cannot add buildings that weren't in the source photo, can't move the camera, and can't recreate literal historical geometry.
- πΊοΈ 360Β° street-level imagery Β© Mapillary contributors, used per their open license.
- β¨ Period imagery aged via Google Gemini 2.5 Flash Image (Nano Banana).
- ποΈ Voice clips synthesised with ElevenLabs.
- πΊοΈ Map tiles Β© OpenStreetMap contributors, served via CARTO.
- π Period research grounded in citations returned by Google Search via Gemini's grounding tool.