# Streaming Flatgeobuf

## Setup

In [None]:
%pip install duckdb --quiet

import os

import duckdb

In [None]:
CATALOG = "overturemaps"
SCHEMA = "buildings"
VOLUME = "default"
TABLENAME = "building_nl"

table_fullname = f"{CATALOG}.{SCHEMA}.{TABLENAME}"

In [None]:
duckdb.sql("install spatial; load spatial")

::: {.callout-note}

If `install spatial` fails (especially if you are _not_ using the Free Edition or Serverless Compute, but classic compute), check whether HTTP is blocked on your (corporate) network. If so, then you need to work around it as described [here](../appendix/https_install_duckdbextension.ipynb).

:::

## Delta Lake to DuckDB via Temporary Table Credentials

The `delta` extension of DuckDB does not support GEOMETRY types yet (as of July 2025), so the below approach only makes sense if your geometry column is still in WKB (or WKT).

In [None]:
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.catalog import TableOperation

w = WorkspaceClient()

In [None]:
ttc = w.temporary_table_credentials.generate_temporary_table_credentials(
    operation=TableOperation.READ,
    table_id=w.tables.get(f"{table_fullname}").table_id,
)

metastore_region = w.metastores.get(w.metastores.current().metastore_id).region

storage_location = w.tables.get(f"{table_fullname}").storage_location

In [None]:
os.environ["AWS_ACCESS_KEY_ID"] = ttc.aws_temp_credentials.access_key_id
os.environ["AWS_SECRET_ACCESS_KEY"] = ttc.aws_temp_credentials.secret_access_key
os.environ["AWS_SESSION_TOKEN"] = ttc.aws_temp_credentials.session_token
os.environ["AWS_DEFAULT_REGION"] = metastore_region

duckdb.sql("""
CREATE OR REPLACE SECRET (
    TYPE s3,
    PROVIDER credential_chain
)""")

In [None]:
duckdb.sql(f"""
select * replace (st_geomfromwkb(geometry) as geometry)
from
delta_scan('{storage_location}')
""")

### Side story: Streaming Flatgeobuf
We can also use this Parquet copy and DuckDB to further convert it into a Flatgeobuf file, which can e.g. be very efficiently streamed to QGIS:

In [None]:
fgb_volume_path = f"/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/fgb/{TABLENAME}.fgb"
!mkdir -p "$(dirname {fgb_volume_path})"

duckdb.sql(
    f"""COPY (
    select
        * exclude(geometry, bbox, sources, names),  -- structs are not supported in fgb
        st_geomfromwkb(geometry) as geometry,
        bbox.*,
        names.primary
    from delta_scan('{storage_location}')
    limit 100
) TO '{fgb_volume_path}' (
    FORMAT GDAL,
    DRIVER flatgeobuf,
    LAYER_CREATION_OPTIONS 'TEMPORARY_DIR=/tmp/'
)
"""
)

You can download the above Flatgeobuf file and open it in QGIS -- or even better, with a PAT, you can stream it via the Files API. Copy the result of the below cell into the source of your new vector layer in QGIS, replacing the section `<INSERT PAT>` with your actual PAT:

In [None]:
print(
    f"/vsicurl?header.Authorization=Bearer%20<INSERT PAT>&url=https://{spark.conf.get('spark.databricks.workspaceUrl')}/api/2.0/fs/files{fgb_volume_path}"
)