# Streaming Flatgeobuf

## Setup

In [0]:
%pip install duckdb --quiet

import duckdb 
import os

In [0]:
CATALOG = "overturemaps"
SCHEMA = "buildings"
VOLUME = "default"
TABLENAME = "building_nl"

table_fullname = f"{CATALOG}.{SCHEMA}.{TABLENAME}"

In [0]:
duckdb.sql("install spatial; load spatial")

### Side story: Streaming Flatgeobuf
We can also use this Parquet copy and DuckDB to further convert it into a Flatgeobuf file, which can e.g. be very efficiently streamed to QGIS:

(theoretically we could read in directly with `from delta_scan()`, but didn't seem to work)

In [0]:
parquet_volume_path = f"/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/parquet/{TABLENAME}.parquet"

spark.table(table_fullname).write.mode('overwrite').parquet(parquet_volume_path)

In [0]:
fgb_volume_path = f"/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}/fgb/{TABLENAME}.fgb"

duckdb.sql(
    f"""COPY (
    select * replace(st_geomfromwkb(geometry) as geometry)
    from read_parquet('{parquet_volume_path}/part-*.parquet')
    limit 100
) TO '{fgb_volume_path}' (
    FORMAT GDAL,
    DRIVER flatgeobuf,
    LAYER_CREATION_OPTIONS 'TEMPORARY_DIR=/tmp/'
)
"""
)

You can download the above Flatgeobuf file and open it in QGIS -- or even better, with a PAT, you can stream it via the Files API. Copy the result of the below cell into the source of your new vector layer in QGIS, replacing the section `<INSERT PAT>` with your actual PAT:

In [0]:
f"/vsicurl?header.Authorization=Bearer%20<INSERT PAT>&url=https://{spark.conf.get('spark.databricks.workspaceUrl')}/api/2.0/fs/files{fgb_volume_path}"