# Querying the Inheritage API Suite with Python

This Kaggle-ready notebook demonstrates how to interact with the Inheritage public API (`https://inheritage.foundation/api/v1`) using `requests`. It follows the production contract documented in `docs/public-api-suite.md` and the SDK guidelines in `sdk/README.md`, including required attribution headers, pagination helpers, and cache-aware requests.

> ℹ️ All endpoints are open and do not require authentication, but every call **must** include `X-Inheritage-Attribution: visible`. The examples below show how to set headers once and reuse them across requests.



## 1. Environment Setup

Kaggle notebooks already ship with `requests`, `pandas`, and mapping libraries. Run the cell below only if you need to ensure the latest versions or when executing outside Kaggle.


In [None]:
# Optional: upgrade libraries when running locally (skip on Kaggle for faster start)
# %pip install --quiet requests pandas folium


## 2. Shared Client Configuration

Set the base URL, required attribution headers, and helper utilities to respect the API contract (rate-limit headers, caching validators, and trace IDs).


In [None]:
import os
import textwrap
from datetime import datetime
from typing import Any, Dict, Optional

import pandas as pd
import requests

BASE_URL = os.getenv("INHERITAGE_API_BASE", "https://inheritage.foundation/api/v1")
DEFAULT_HEADERS = {
    "User-Agent": "kaggle-notebook/1.0",
    "Accept": "application/json",
    "X-Inheritage-Attribution": "visible",
}

print(f"Using API base: {BASE_URL}")
print("Default headers:")
for name, value in DEFAULT_HEADERS.items():
    print(f"  {name}: {value}")


In [None]:
def api_request(
    path: str,
    *,
    method: str = "GET",
    params: Optional[Dict[str, Any]] = None,
    json: Optional[Dict[str, Any]] = None,
    headers: Optional[Dict[str, str]] = None,
    timeout: float = 20.0,
) -> requests.Response:
    """Perform an HTTP request and raise informative errors."""
    url = f"{BASE_URL.rstrip('/')}/{path.lstrip('/')}"
    merged_headers = {**DEFAULT_HEADERS, **(headers or {})}
    response = requests.request(
        method,
        url,
        params=params,
        json=json,
        headers=merged_headers,
        timeout=timeout,
    )

    try:
        response.raise_for_status()
    except requests.HTTPError as exc:
        try:
            payload = response.json()
        except ValueError:
            payload = {"error": {"message": response.text}}
        error = payload.get("error", {})
        message = textwrap.dedent(
            f"""
            API request failed: {exc}
            code={error.get('code')}
            message={error.get('message')}
            hint={error.get('hint')}
            trace_id={error.get('trace_id')}
            """
        ).strip()
        raise RuntimeError(message) from exc

    return response


def show_rate_limit_headers(response: requests.Response) -> None:
    keys = [
        "X-RateLimit-Limit",
        "X-RateLimit-Remaining",
        "X-RateLimit-Reset",
        "Retry-After",
        "Cache-Control",
        "ETag",
        "Last-Modified",
        "X-Trace-Id",
    ]
    print("Relevant response headers:")
    for key in keys:
        if key in response.headers:
            print(f"  {key}: {response.headers[key]}")


## 3. Heritage Catalogue Examples

The heritage endpoints expose paginated records with filters, projections, and conditional requests. Below are a few starter queries mirroring the SDK usage patterns.


In [None]:
response = api_request(
    "/heritage",
    params={
        "state": "Karnataka",
        "sort": "-completion_score",
        "limit": 5,
        "fields": "id,slug,name,state,status,media",
    },
)
show_rate_limit_headers(response)

payload = response.json()
heritage_df = pd.json_normalize(payload["data"])
heritage_df[["slug", "name", "state", "status.completion_score"]]


The `Link` response header follows RFC 8288 conventions for pagination. Use `payload["links"]["next"]` to request the next page, or pass the `offset` query parameter explicitly.


### Fetch a Specific Heritage Site


In [None]:
slug = heritage_df.iloc[0]["slug"]
print(f"Inspecting site: {slug}")

detail_response = api_request(f"/heritage/{slug}")
site = detail_response.json()

print(f"Name: {site['name']}")
print(f"Period: {site.get('period')}")
print(f"Primary image: {site.get('media', {}).get('primary_image')}")
citations = site.get('citations')
if citations and isinstance(citations, list) and len(citations) > 0 and 'required_display' in citations[0]:
    print(f"Citations: {citations[0]['required_display']}")
else:
    print("Citations: none")


You can reuse the ETag and `Last-Modified` headers to make conditional requests. The snippet below shows a 304 Not Modified workflow.


In [None]:
etag = detail_response.headers.get("ETag")
last_modified = detail_response.headers.get("Last-Modified")
conditional_headers = {}
if etag:
    conditional_headers["If-None-Match"] = etag
if last_modified:
    conditional_headers["If-Modified-Since"] = last_modified

if conditional_headers:
    cached = api_request(
        f"/heritage/{slug}",
        headers=conditional_headers,
    )
    if cached.status_code == 304:
        print("Resource unchanged; safe to reuse cached payload.")
    else:
        print("Resource updated; refresh local cache.")
else:
    print("Endpoint did not return validators; conditional fetch skipped.")


## 4. Geospatial Features and Nearby Search

The geospatial API returns GeoJSON FeatureCollections. Use the `Accept: application/geo+json` header for geo-aware tools, or stick with JSON for standard parsing.


In [None]:
geo_response = api_request(
    "/geo/nearby",
    params={"lat": 12.9716, "lon": 77.5946, "radius_km": 50},
)
geo_payload = geo_response.json()

print(f"Returned {len(geo_payload['features'])} features")
features_df = pd.json_normalize(
    geo_payload["features"],
    sep=".",
)
features_df[["properties.slug", "properties.name", "properties.state"]].head()


In [None]:
try:
    import folium

    if not features_df.empty:
        lats = [coords[1] for coords in features_df["geometry.coordinates"]]
        lons = [coords[0] for coords in features_df["geometry.coordinates"]]
        centroid = (sum(lats) / len(lats), sum(lons) / len(lons))
    else:
        centroid = (12.9716, 77.5946)

    fmap = folium.Map(location=centroid, zoom_start=7)
    for _, row in features_df.iterrows():
        lat, lon = row["geometry.coordinates"][1], row["geometry.coordinates"][0]
        folium.Marker(
            location=(lat, lon),
            tooltip=row["properties.name"],
            popup=row["properties.citation"],
        ).add_to(fmap)
    fmap
except ImportError:
    print("folium is not installed. Run the pip cell above to enable mapping.")


## 5. Media Bundles and Watermark Headers

Media endpoints return grouped assets with explicit watermark requirements. Inspect the `X-Inheritage-Watermark` header to determine whether downstream rendering must display watermarks.


In [None]:
media_response = api_request(f"/media/{slug}")
media_payload = media_response.json()

print("Watermark policy:", media_response.headers.get("X-Inheritage-Watermark"))
print("Content-Disposition:", media_response.headers.get("Content-Disposition"))

media_df = pd.json_normalize(media_payload["items"])
media_df[["type", "url", "citation"]].head()


## 6. Citation & Attribution API

Use the citation service to fetch ready-to-render attribution snippets. Free-tier callers must continue sending `X-Inheritage-Attribution: visible` or requests will receive `403 FORBIDDEN`.


In [None]:
citation_response = api_request(f"/citation/{slug}")
citation_payload = citation_response.json()

print("HTML snippet:\n", citation_payload["citation_html"])
print("\nMarkdown snippet:\n", citation_payload["citation_markdown"])
print("\nPlain text:\n", citation_payload["citation_text"])
print("\nSource URL:", citation_payload.get("source_url"))


## 7. Dataset Manifest & Stats

The root dataset manifest and stats endpoint expose aggregates for dashboards and search-index ingest.


In [None]:
stats_response = api_request("/stats")
stats = stats_response.json()

print("Counts:", stats["counts"])
print("\nTop states by published sites:")
for state, total in list(stats["breakdown"]["by_state"].items())[:5]:
    print(f"  {state}: {total}")


## 8. AI Context & Similarity

The AI endpoints return deterministic embeddings with the `X-Embedding-Model: inheritage-d1` header. The sample below shows how to request the canonical context paragraph and find similar sites.


In [None]:
context_response = api_request(f"/ai/context/{slug}")
context_payload = context_response.json()

print("Embedding length:", len(context_payload["embedding"]))
print("Model:", context_response.headers.get("X-Embedding-Model"))
print("Context excerpt:\n")
print("\n".join(textwrap.wrap(context_payload["context"], width=100)))


In [None]:
similar_response = api_request(
    "/ai/similar",
    method="POST",
    json={"slug": slug, "limit": 5},
)
similar_payload = similar_response.json()

print("Similarity results:")
for entry in similar_payload["data"]:
    score = entry["score"]
    site_info = entry["site"]
    print(f"  {score:.3f} → {site_info['slug']} ({site_info['state']})")


## 9. Next Steps

- Explore additional filters (`dynasty`, `style`, `material`) to build thematic galleries.
- Cache API responses locally using the `ETag` and `Last-Modified` headers demonstrated earlier.
- Join the Kaggle dataset version (CSV/GeoJSON exports) with live API data for hybrid analytics.
- Share your derivative notebooks with attribution: `Data © Inheritage Foundation (CC BY 4.0)`.
