A literary geography platform that maps fiction to the physical world.
Canonical data layer status (Mar 17, 2026):
- Working scaffold: 11,073 entries
- Strict release: 9,057 entries
- Active versioned release: backend/data/releases/2026-03-17-strict-v2/literary_places.json
The name means "undivided" in Sanskrit. The platform treats South Asia's literary geography as a continuous space, ignoring political boundaries in favor of narrative ones.
This project builds on the work of Cities in Fiction, an archival project by Apoorva Saini and Divya Ravindranath that documents real-world places in Indian literature. Their curated entries (436 total across two sources) are integrated here with full attribution. Akhand extends this with NLP extraction, multi-source data ingestion, and WebGL visualization.
Frontend (Next.js 14, MapLibre GL, deck.gl)
|
| GET /api/places (fallback to static data.ts if backend is down)
v
Backend API (FastAPI, Pydantic)
|
|-- /api/places serves canonical release entries with search/filter
|-- /api/meta dataset version/source/count metadata
|-- /api/places.geojson full GeoJSON FeatureCollection export
|-- /api/export bulk CSV export
|-- /api/extract spaCy + GLiNER + Gemini NLP pipeline
|-- /api/wikidata/* SPARQL proxy for Wikidata P840
|
Data Ingestion (CLI scripts)
|
|-- ingest.py Open Library search-by-place, 54 cities, alias expansion
|-- cif_ingest.py CitiesInFiction.xlsx parser + Nominatim geocoder
|-- openlibrary.py async client with rate limiting
|-- wikidata.py P840 narrative location queries
Current corpus snapshot:
- Working scaffold (generated): 11,073
- Strict canonical release (v2026-03-17-strict-v2): 9,057
- Frontend static index/details synced to strict release: 9,057
- Enrichment run continues in background with checkpointed resume.
Four layers, designed so each failure degrades gracefully instead of crashing:
Layer 1: spaCy NER (en_core_web_md, 50MB). Fast first pass extracting GPE, LOC, FAC entities. The md model includes word vectors that improve recognition of out-of-vocabulary place names in literary syntax.
Layer 2: GLiNER zero-shot NER (urchade/gliner_medium-v2.1). Runs domain-specific labels: City, Village, Region, Country, River, Mountain, Neighborhood, Landmark, Historical Place Name, Fictional Place, Route, Body of Water. When both models agree on an entity, confidence is boosted. Threshold set to 0.4 to reduce noise from metaphorical place usage in literary text.
Layer 3: Geocoding (Nominatim via geopy). Converts entity text to coordinates. 80+ pre-populated coordinates avoid rate limiting.
Layer 4: Gemini 3 Flash structured extraction (gemini-3-flash-preview). Called only on passages containing NER-detected entities, not on full texts. A 100,000-word novel produces maybe 20 passages (6,000 characters) instead of 500,000 characters. At Gemini Flash pricing, that is $0.0006/book instead of $0.05, an 83x cost reduction. Extracts sentiment, themes, place classification.
If Gemini fails, the pipeline falls back to rule-based sentiment. If GLiNER fails to load, spaCy runs alone. If the backend is down entirely, the frontend serves curated entries from a static file.
Three deck.gl layers on MapLibre GL (CARTO Dark Matter basemap, no API key):
- Scatter: sentiment-colored dots, radius scales with book density
- Heatmap: geographic clustering of literary places
- Arcs: author connection networks across cities
PMTiles protocol registered for future zero-cost self-hosted tile serving.
| Method | Path | Description |
|---|---|---|
| GET | /api/places |
List places. Params: q, region, city, author, genre, year_min, year_max, limit, offset |
| GET | /api/places/{id} |
Single place by ID |
| GET | /api/meta |
Dataset version/source/count metadata |
| GET | /api/places.geojson |
Full dataset as GeoJSON FeatureCollection |
| GET | /api/export?format=csv |
Full dataset as CSV |
| POST | /api/places/refresh |
Hot-reload data from disk after re-ingestion |
| POST | /api/extract |
Run NLP pipeline on arbitrary text |
| POST | /api/extract/summary |
Gemini structured extraction from book summary |
| GET | /api/wikidata/narrative-locations |
Wikidata P840 query. Param: region=south_asia |
| GET | /health |
Pipeline status |
Full-text search across titles, authors, cities, genres, themes, and passages. All query terms must match (AND logic).
All commands run from the project root (akhand/), not from subdirectories.
# Frontend only (250 curated fallback entries, no backend needed)
cd frontend && npm install && npm run dev
# Backend
pip install -r backend/requirements.txt
python -m spacy download en_core_web_md
uvicorn backend.main:app --port 8000
# Frontend + backend together
# Terminal 1: uvicorn backend.main:app --port 8000
# Terminal 2: cd frontend && npm run dev
# Open http://localhost:3000/explore
# Re-ingest data
python -m backend.data.ingest # Open Library (54 cities)
python -m backend.data.cif_ingest --merge # merge CIF spreadsheet + archive
curl -X POST http://localhost:8000/api/places/refresh
# Cut a versioned release
python -m backend.scripts.quality_gate --input backend/data/releases/2026-03-17-strict-v2/literary_places.json --threshold 0.55 --reject --block-filler --filler-min-hits 2 --output backend/data/generated/literary_places_release_strict_next.json --output-report backend/data/generated/quality_report_strict_next.json
python -m backend.scripts.cut_release --input backend/data/generated/literary_places_release_strict_next.json --report backend/data/generated/quality_report_strict_next.json --version 2026-03-18-strict
# Docker (full stack)
docker compose upFrontend: Next.js 14, React 18, MapLibre GL 4.7, deck.gl 9.1, Framer Motion, Tailwind CSS, PMTiles
Backend: FastAPI, spaCy 3.8 (en_core_web_md), GLiNER 0.2, Google GenAI (Gemini 3 Flash), geopy, httpx
Database (schema written, not yet wired): PostgreSQL 17, PostGIS, pgvector (HNSW), ltree, pg_trgm
- API write/extract/admin routes are key-protected and rate-limited. Public read routes remain open.
- CORS allows
localhost:3000andshahdev.me. Additional origins require updating the middleware. - Full enrichment is still in progress for the complete scaffold. The strict release intentionally excludes weaker rows until re-enriched.
- Neither source contains actual literary passages, only plot summaries (Open Library) and contributor descriptions (CIF). Copyrighted text requires publisher APIs or Project Gutenberg (public domain, pre-1928).
- Geocoding approximates regions to centroids. "Marwar region in Western part of Rajasthan" maps to Jodhpur. State-level entries and fictional places are similarly approximate.
- Open Library sorts by relevance, not recency. Recently published books are underrepresented.
- Wikidata SPARQL endpoint rate-limits heavily (429 on every query during development). Code is correct but the live endpoint is unreliable for bulk queries.
- The
en_core_web_mdspaCy model, while better thansm, still misses literary place names in unusual syntactic positions. GLiNER compensates but its 0.4 threshold needs manual benchmarking against annotated passages.
Set these environment variables in production:
AKHAND_ADMIN_API_KEY(required for/api/places/refresh)AKHAND_WRITE_API_KEY(required for/api/contribute)AKHAND_EXTRACT_API_KEY(required for/api/extract*and/api/analyze/passage)AKHAND_TRUSTED_HOSTS(comma-separated host allowlist, e.g.api.example.com)AKHAND_CORS_ORIGINS(comma-separated explicit origins)AKHAND_CORS_METHODS(defaultGET,POST,OPTIONS)AKHAND_CORS_HEADERS(defaultContent-Type,X-API-Key)AKHAND_ENABLE_SECURITY_HEADERS=1(enables HSTS/XFO/nosniff/referrer-policy)
Release hardening:
python -m backend.scripts.data_cleanup \
--input backend/data/releases/2026-03-19-research-v1/literary_places.json \
--output backend/data/generated/literary_places_cleaned_prod.json \
--manifest backend/data/generated/cleanup_manifest_prod.json
python -m backend.scripts.quality_gate \
--input backend/data/generated/literary_places_cleaned_prod.json \
--reject --threshold 0.6 --geo-threshold 0.65 \
--output backend/data/generated/literary_places_release_prod.json \
--output-report backend/data/generated/quality_report_prod.json
python -m backend.scripts.cut_release \
--input backend/data/generated/literary_places_release_prod.json \
--report backend/data/generated/quality_report_prod.json \
--version 2026-03-19-prod \
--min-passing-ratio 0.60