# DuckDB on Databricks

[DuckDB](https://duckdb.org/) is a formidable new single-machine analytics tool, tracing its origins to the same Dutch research institute as Python. Crucially for this guide, it comes with a remarkably good [Spatial extension](https://duckdb.org/docs/stable/core_extensions/spatial/overview.html).

While Databricks comes with its own set of geospatial features, such as [H3 functions](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-h3-geospatial-functions), nothing stops you to use DuckDB on the side as well.

(What you do have to keep in mind though is that while much of Databricks's tooling, namely Apache Spark, is focused on big data analysis multi-node clusters, your DuckDB instead will just run on single-node, just like e.g. Pandas would. So use single-node clusters.)


## Running DuckDB on Databricks

In [0]:
%pip install duckdb --quiet

import duckdb

con = duckdb.connect()

# Install the Spatial Extension:
con.sql("install spatial; load spatial")

This allows you to directly use the DuckDB Spatial, for example:

In [0]:
con.sql("select st_distance(st_geomfromtext('POINT(0 0)'), st_geomfromtext('POINT(1 1)')) d")

### Spatial functions


### Visualize output

If your data is lon/lat, you can make use of the built-in point map visualization in Databricks Notebooks if you convert the DuckDB to a Spark DataFrame via Pandas. Once the result is shown, click on the `+` icon right of the `Table` tab to add the visualization "Map (Markers)" such as the one shown on the below image. 

In [0]:
query = con.sql(
    """
with t as (
    select st_geomfromtext('POINT(0 0)') g
    union all
    select st_geomfromtext('POINT(1 1)') g
)
select st_x(g) lon, st_y(g) lat from t
""")
query

In [0]:
spark.createDataFrame(query.df()).display()

Databricks visualization. Run in Databricks to view.

![point_map](img/point_map.png)

Or visualize with lonboard, which will work also for other geometry types like linestrings and polygons:

In [0]:
%pip install lonboard shapely --quiet

from lonboard import viz

In [0]:


query = con.sql(
    """
select st_geomfromtext('POINT(0 0)') g
union all
select st_geomfromtext('POINT(1 1)') g
""")
query

In [0]:
viz(query, con=con).as_html()

### Read Delta Tables with DuckDB

We can read data from a delta table to duckdb via Arrow (or Pandas). (This assumes that the data volume is not prohibitively large to load into the memory of a single machine.)

In [0]:
dfa = spark.read.table(tablename).toArrow()

query = con.sql("""
select
    st_length(st_geomfromwkb(wkb)) length_m
from
    pda;
""")
query

### Write Delta Tables from DuckDB

If you want to write back a result to a delta table, you can use Pandas as an intermediary format:

In [0]:
spark.createDataFrame(query.df()).createOrReplaceTempView("t")
# spark.createDataFrame(query.df()).write.saveAsTable("t")

In [0]:
%sql
select * from vw_t