# Scalable visualization of large Delta Lake tables with GEOMETRY columms with DuckDB Spatial MVT and MaplibreGL JS

![Maplibre visualization of a Delta Lake table with aGEOMETRY column, via DuckDB Spatial](img/maplibre.png)

We have [MVT support](https://github.com/duckdb/duckdb-spatial/issues/241) in DuckDB Spatial since version 1.4. This means that we can feed MVT to e.g. MaplibreGL JS, as shown by DuckDB Spatial author Max Gabrielsson [here in an example Flask app](https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955).

(Another nice tool to consume DuckDB MVT's would be Martin, this is tracked in this [issue](https://github.com/maplibre/martin/issues/1693).)

But how do we efficiently generate MVT's from a Delta Lake table containing a GEOMETRY, given the tile indices?

The key thing to consider is that [now](https://www.databricks.com/blog/introducing-spatial-sql-databricks-80-functions-high-performance-geospatial-analytics) Databricks has very efficient spatial join filtering via e.g. `ST_Intersect`, especially if what you are filtering for is a constant. So the following query can be sub-second for e.g. a billion polygons such as Overture Maps buildings (note that we are not using any spatial grid or bounding box filters anymore):


```sql
select
  geometry
from
  t
where
  st_intersects(
    geometry,
    st_geomfromtext(
      'POLYGON ((5.44921875 52.160454557747045, 5.44921875 52.2143386082582, 5.537109374999999 52.2143386082582, 5.537109374999999 52.160454557747045, 5.44921875 52.160454557747045))'
    )
  )
limit 30000
```

::: {.callout-note}

For this to work fast, the geometry really has to be [GEOMETRY](https://docs.databricks.com/aws/en/sql/language-manual/data-types/geometry-type) (or probably GEOGRAPHY) type, and not WKB, which latter is still the case as of writing for the [CARTO-provided tables such as `carto_overture_maps_buildings.carto.building`](https://marketplace.databricks.com/details/ccacdfa3-b85d-4065-bd70-efa673c197e1/CARTO_Overture-Maps-Buildings).

:::

## Create a sample table

We create here a sample table of buildings in the Netherlands -- the same worked for me also for all 2.5B Overture Maps buildings of the world, but if you tried to persist that table, you'd probably run against the daily usage limit of _Databricks Free Edition_ as I did.

In [None]:
OVERTUREMAPS_RELEASE = "2025-10-22.0"
COUNTRY_CODE = "NL"

country_bbox = (
    spark.read.parquet(
        f"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=divisions/type=division_area"
    )
    .where(f"subtype = 'country' and class = 'land' and country = '{COUNTRY_CODE}'")
    .select("bbox.*")
    .toPandas()
    .iloc[0]
)

In [None]:
from pyspark.sql import functions as F

spark.read.parquet(
    f"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=buildings/type=building"
).where(
    f"""bbox.xmin < {country_bbox["xmax"]}
        and bbox.xmax > {country_bbox["xmin"]}
        and bbox.ymin < {country_bbox["ymax"]}
        and bbox.ymax > {country_bbox["ymin"]}
        """
).withColumn("geometry", F.expr("st_geomfromwkb(geometry)")).write.mode(
    "overwrite"
).saveAsTable("workspace.default.building_geom")

Now we can build on Maxxen's [gist](https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955), with the following adjustments:

- We keep DuckDB doing the MVT generation incl. the preprocessing of calculating the [ST_TileEnvelope](https://duckdb.org/docs/stable/core_extensions/spatial/functions#st_tileenvelope) for the tiles needed for the current viewport, but of course we need Databricks SQL to actually spatial filter our Delta Table (DuckDB delta_scan currently [does not read](https://github.com/duckdb/duckdb-delta/issues/248) GEOMETRY data types.)
   - An alternative approach could be to [wrap the used DuckDB functions into Spark UDF's](../stfunctions/duckdb_udf.ipynb), if we wanted to move some compute from your browser to DBSQL.
- For DBSQL we use the [Python `databricks-sql-connector`](https://docs.databricks.com/aws/en/dev-tools/python-sql-connector), authenticating with a [Personal Access Token](https://docs.databricks.com/aws/en/dev-tools/python-sql-connector#databricks-personal-access-token-authentication) -- for serious work, you'd want to use OAuth instead.
- **Graceful feature limit.** What to do if a tile has too many features? A common solution would be to define a minimum zoom level, but this would make it very cumbersome to move around the map, so we define a `MAX_FEATURES_PER_TYLE` instead. If this is reached, we gracefully fail and only show the tile boundaries -- the user would only need to further zoom in to reveal all the features within that viewport.
- MVT expects SRID 3857, while our table is probably in another SRID, so we need to use some `st_transform` there and back.
- **Tile throttling**. we also added JS code under `// === Tile throttling logic ===` to take a 2 second pause starting any zoom and move interaction, in order to avoid overloading the warehouse with tile requests and therefore avoid tile queueing.
   - Note that in the current implementation this means that during zooming and moving the map, the feature layer is temporarily not visible -- this probably could be improved. For example, without tile throttling, the objects would remain visible during zoom/pan, but we would need to wait much longer for the results after a big move.

In the below video (showing a table with all 2.5B buildings worldwide, not just one country), you can see the tiling at work -- note how 1) the graceful feature limit means that "too busy" tiles are just shown as rectangles, and 2) zoom-and-pan pauses the feature layer but after a short timeout, the tiles are drawn, with sub-second latency per tile.

Find the full code below the video, which you can run [locally](https://flask.palletsprojects.com/en/stable/quickstart/) as a Flask app (you could also embed it within a Databricks App if preferred, but the local app is of course a bit more cost-effective).


[![Watch the video](img/maplibre_vid.png)](https://www.youtube.com/watch?v=6d87gGNTyRg)

(click on the above image to play the video.)

::: {.callout-note}

What if you find this approach still too "slow", from the end-user standpoint? Then you can use [PMTiles](./PMTiles.ipynb). The difference is that with the MVT approach, you directly read the Delta Lake table, and the PMTile you would need to generate which means extra compute and time.

:::

```python
# Based on https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955 by Max Gabrielsson

import os

import duckdb
import flask

from databricks import sql  # type: ignore

MAX_FEATURES_PER_TILE = 30_000

# Initialize Flask app
app = flask.Flask(__name__)

config = {"allow_unsigned_extensions": "true"}
duckdb_con = duckdb.connect(config=config)

duckdb_con.execute("INSTALL spatial")

duckdb_con.execute("load spatial")


dbx_con = sql.connect(
    server_hostname=os.getenv("DATABRICKS_SERVER_HOSTNAME"),
    http_path=os.getenv("DATABRICKS_HTTP_PATH"),
    access_token=os.getenv("DATABRICKS_TOKEN"),
)


# Tile endpoint to serve vector tiles
@app.route("/tiles/<int:z>/<int:x>/<int:y>.pbf")
def get_tile(z, x, y):
    # Query to get the tile data from DuckDB
    # - Note that the geometry is assumed to be projected to `EPSG:3857` (Web Mercator)

    # Use con.cursor() to avoid threading issues with Flask
    with duckdb_con.cursor() as local_con:
        tileenv = local_con.execute(
            """
            select st_astext(st_transform(
            st_tileenvelope($1, $2, $3),
            'EPSG:3857',
            'OGC:CRS84'
            ))
            """,
            [z, x, y],
        ).fetchone()

    query = f"""
        select
         st_aswkb(geometry) as geometry
        from
        `workspace`.`default`.`building_geom`
            where st_intersects(geometry, st_geomfromtext('{tileenv[0]}'))
            limit {MAX_FEATURES_PER_TILE}"""

    with dbx_con.cursor() as cursor:
        cursor.execute(query)
        da = cursor.fetchall_arrow()  # noqa: F841

    # Use con.cursor() to avoid threading issues with Flask
    with duckdb_con.cursor() as local_con:
        tile_blob = None
        tile_count = local_con.execute(
            """
            select count(*) cnt from da
            """
        ).fetchone()[0]
        if tile_count == MAX_FEATURES_PER_TILE:
            # If we hit the limit, return an empty tile to avoid incomplete data
            tile_blob = local_con.execute(
                """
                select ST_AsMVT({
                    "geometry": ST_AsMVTGeom(
                        ST_TileEnvelope($1, $2, $3),
                        ST_Extent(ST_TileEnvelope($1, $2, $3))
                        )
                    }) 
                """,
                [z, x, y],
            ).fetchone()
        else:
            tile_blob = local_con.execute(
                """
                select ST_AsMVT({
                    "geometry": ST_AsMVTGeom(
                        st_transform(st_geomfromwkb(geometry), 'OGC:CRS84', 'EPSG:3857'),
                        ST_Extent(ST_TileEnvelope($1, $2, $3))
                        )
                    }) from da
                """,
                [z, x, y],
            ).fetchone()

        # Send the tile data as a response
        tile = tile_blob[0] if tile_blob and tile_blob[0] else b""
        return flask.Response(tile, mimetype="application/x-protobuf")


# HTML content for the index page
INDEX_HTML = """
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <title>Vector Tile Viewer</title>
    <meta name="viewport" content="initial-scale=1,maximum-scale=1,user-scalable=no">
    <script src='https://unpkg.com/maplibre-gl@3.6.2/dist/maplibre-gl.js'></script>
    <link href='https://unpkg.com/maplibre-gl@3.6.2/dist/maplibre-gl.css' rel='stylesheet' />
    <style>
        body { margin: 0; padding: 0; }
        #map { position: absolute; top: 0; bottom: 0; width: 100%; }
    </style>
</head>
<body>
<div id="map"></div>
<script>
    const map = new maplibregl.Map({
        container: 'map',
        style: {
            version: 8,
            sources: {
                'buildings': {
                    type: 'vector',
                    tiles: [`${window.location.origin}/tiles/{z}/{x}/{y}.pbf`]
                },
                // Also use a public open source basemap
                'osm': {
                    type: 'raster',
                    tiles: [
                        'https://a.tile.openstreetmap.org/{z}/{x}/{y}.png',
                        'https://b.tile.openstreetmap.org/{z}/{x}/{y}.png',
                        'https://c.tile.openstreetmap.org/{z}/{x}/{y}.png'
                    ],
                    tileSize: 256
                }
            },
            layers: [
                {
                    id: 'background',
                    type: 'background',
                    paint: { 'background-color': '#a0c8f0' }
                },
                {
                    id: 'osm',
                    type: 'raster',
                    source: 'osm'
                },
                {
                    id: 'buildings-fill',
                    type: 'fill',
                    source: 'buildings',
                    'source-layer': 'layer',
                    paint: {
                        'fill-color': 'blue',
                        'fill-opacity': 0.6,
                        'fill-outline-color': '#ffffff'
                    }
                },
                {
                    id: 'buildings-stroke',
                    type: 'line',
                    source: 'buildings',
                    'source-layer': 'layer',
                    paint: {
                        'line-color': 'black',
                        'line-width': 0.5
                    }
                }
            ]
        },
        // Zoom in on amf
        center: [5.38327, 52.15660],
        zoom: 12,
        prefetchZoomDelta: 0, // disables zoom-level prefetch
        refreshExpiredTiles: false, // donâ€™t re-request tiles that have expired

    });

    map.addControl(new maplibregl.NavigationControl());

    // Add click handler to show feature properties
    map.on('click', 'buildings-fill', (e) => {
        const coordinates = e.lngLat;
        const properties = e.features[0].properties;

        let popupContent = '<h3>Building Properties</h3>';
        for (const [key, value] of Object.entries(properties)) {
            popupContent += `<p><strong>${key}:</strong> ${value}</p>`;
        }

        new maplibregl.Popup()
            .setLngLat(coordinates)
            .setHTML(popupContent)
            .addTo(map);
    });

    // Change cursor on hover
    map.on('mouseenter', 'buildings-fill', () => {
        map.getCanvas().style.cursor = 'pointer';
    });

    map.on('mouseleave', 'buildings-fill', () => {
        map.getCanvas().style.cursor = '';
    });


// ---- Throttle building tile loading ----
let reloadTimeout;

function removeBuildingLayers() {
    if (map.getLayer('buildings-fill')) map.removeLayer('buildings-fill');
    if (map.getLayer('buildings-stroke')) map.removeLayer('buildings-stroke');
    if (map.getSource('buildings')) map.removeSource('buildings');
}

function addBuildingLayers() {
    if (map.getSource('buildings')) return;

    map.addSource('buildings', {
        type: 'vector',
        tiles: [`${window.location.origin}/tiles/{z}/{x}/{y}.pbf`]
    });

    map.addLayer({
        id: 'buildings-fill',
        type: 'fill',
        source: 'buildings',
        'source-layer': 'layer',
        paint: {
            'fill-color': 'blue',
            'fill-opacity': 0.6,
            'fill-outline-color': '#ffffff'
        }
    });

    map.addLayer({
        id: 'buildings-stroke',
        type: 'line',
        source: 'buildings',
        'source-layer': 'layer',
        paint: {
            'line-color': 'black',
            'line-width': 0.5
        }
    });
}

// When user starts moving or zooming
function onInteractionStart() {
    clearTimeout(reloadTimeout);
    removeBuildingLayers();
}

// When user stops moving or zooming
function onInteractionEnd() {
    clearTimeout(reloadTimeout);
    reloadTimeout = setTimeout(() => {
        addBuildingLayers();
    }, 2000);
}

// Bind to move & zoom events
map.on('movestart', onInteractionStart);
map.on('moveend', onInteractionEnd);
map.on('zoomstart', onInteractionStart);
map.on('zoomend', onInteractionEnd);

</script>
</body>
</html>
"""


# Serve the static HTML file for the index page
@app.route("/")
def index():
    return flask.Response(INDEX_HTML, mimetype="text/html")


if __name__ == "__main__":
    # Start on localhost
    app.run(debug=True)
