# Read in a Delta Lake table with a geometry column in DuckDB

The best method depends on whether the table (or the sample you need) fits into (driver) memory or not.
- If it does, you can simply go through Arrow.
- If not, you can write out a copy of your data to plain Parquet file(s) in a Volume, which DuckDB can read.
- Finally, if your you can use the Delta extension of DuckDB, but this somes with some limitations. Finally, if your data set is so large that you want to avoid the copy, you can use Temporary Table Credentials, but this requires extra permissions on the Unity Catalog object and the caller.

## Delta Lake to DuckDB via Arrow

If your (sample) data fits into memory, you can go through Arrow:


In [0]:
dfa = spark.sql("select st_point(1, 2, 28992) as g").toArrow()

In [0]:
%pip install duckdb --quiet

import duckdb 

> [!NOTE]
> If the below install stalls, you might have HTTP traffic blocked, see [TODO: link] for the workaround.

In [0]:
HTTP_BLOCKED = True

if HTTP_BLOCKED:
    import os
    from urllib.parse import urlparse
    import requests
    
    ARCHITECTURE = "linux_amd64"
    duckdb_version = duckdb.__version__
    url = f"https://extensions.duckdb.org/v{duckdb_version}/{ARCHITECTURE}/httpfs.duckdb_extension.gz"

    output_file = os.path.basename(urlparse(url).path)
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    with open(output_file, "wb") as f:
        f.write(response.content)

    duckdb.install_extension(output_file)

    os.remove(output_file)

    duckdb.sql("SET custom_extension_repository='https://extensions.duckdb.org'")

In [0]:
duckdb.sql("install spatial; load spatial")

In [0]:
duckdb.sql("select g.srid, st_geomfromwkb(g.wkb) geometry from dfa")

## Delta Lake to DuckDB via a Parquet copy in Volumes

In [0]:
CATALOG = "mainworkspace_1863054340605750"
SCHEMA = "dsparing"
VOLUME = "default"
VOLUME_PATH = "/"
TABLENAME = "test"

In [0]:
volume_path = f"/Volumes/{CATALOG}/{SCHEMA}/{VOLUME}{VOLUME_PATH}{TABLENAME}.parquet"

spark.sql("select st_point(id, id) geometry from range(100)").write.mode('overwrite').parquet(volume_path)

In [0]:
!ls {volume_path}/part-*.parquet

In [0]:
duckdb.sql(f"select st_geomfromwkb(geometry) geometry from read_parquet('{volume_path}/part-*.parquet')")

## Delta Lake to DuckDB via Temporary Table Credentials

