<table class="ee-notebook-buttons" align="left">
    <td><a target="_blank"  href="https://github.com/ac-willeke/urban-climate"><img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" style="filter: invert(100%)"/> View source on GitHub</a></td>
    <td><a target="_blank"  href="https://colab.research.google.com/github/NINAnor/mapper-soilCondition/blob/main/src/data/GEE_display_data.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" /> Run in Google Colab</a></td>
</table>

# Caclulate statistics per district using DuckDB

**Author**: Willeke A'Campo

**Description:** This notebooks shows how to calculate the Ecosystem Service statistics per district using DuckDB. The results are stored in a new table in the database and exported to GeoJSON.

**Documentation:** 

### Data conversion | GeoJSON to GeoParquet   

In [None]:
import leafmap 
import geopandas as gpd

# Read the geojson data 
municipality = "kristiansand"
raw_path = r"/workspaces/urban-climate/data/01_raw"
v_study_area = gpd.read_file(raw_path + "/study_area.geojson")
v_districts = gpd.read_file(raw_path + "/districts.geojson")
v_bldg = gpd.read_file(raw_path + "/buildings.geojson")
v_res_bldg = gpd.read_file(raw_path + "/residential_buildings.geojson")
v_open_space = gpd.read_file(raw_path + "/open_space.geojson")
v_public_os = gpd.read_file(raw_path + "/public_open_space.geojson")
v_private_os = gpd.read_file(raw_path + "/private_open_space.geojson")
v_crowns = gpd.read_file(raw_path + "/crowns.geojson")

# Add files to dictionary
gdf_vectors = {
    "study_area": v_study_area,
    "districts": v_districts,
    "buildings": v_bldg,
    "residential_buildings": v_res_bldg,
    "open_space": v_open_space,
    "public_open_space": v_public_os,
    "private_open_space": v_private_os,
    "crowns": v_crowns
}

In [None]:
# Convert GeoDataFrame to Parquet
for key, gdf in gdf_vectors.items():
    gdf.to_parquet(
        path = raw_path + "/" + key + ".parquet",
        index = None, 
        compression = "snappy"
    )
    
parquet_dict = {
    "study_area": raw_path + "/study_area.parquet",
    "districts": raw_path + "/districts.parquet",
    "buildings": raw_path + "/buildings.parquet",
    "residential_buildings": raw_path + "/residential_buildings.parquet",
    "open_space": raw_path + "/open_space.parquet",
    "public_open_space": raw_path + "/public_open_space.parquet",
    "private_open_space": raw_path + "/private_open_space.parquet",
    "crowns": raw_path + "/crowns.parquet"
}

In [None]:
# Create a map 
gdf = leafmap.read_parquet(parquet_dict["study_area"], return_type = gdf, src_crs = "EPSG:25832")
leafmap.view_vector(gdf, get_fill_color=[0, 0, 255, 128])

### DuckDB Connection

You can use the following commands to load your data into DuckDB database
- `con.sql` executes a SQL query writes it to a DuckDB database 
- `con.execute` executes a SQL query and returns the result as a duckdb.DuckDBPyResult object
   - con.execut("CREATE TABLE ...") to create a new table
   - con.execute("INSERT INTO ...") to insert data into a table
   - con.execute("COPY ... FROM ...") to load data from a file into a table
- `gdf.to_sql` to write a GeoDataFrame to a table in the DuckDB database

con.execute("""
COPY table_name 
FROM 'path_to_your_file.parquet' 
WITH (FORMAT 'PARQUET')
""")

In [None]:
import duckdb

con = duckdb.connect(database=":memory:", read_only=False)
con.install_extension("spatial")
con.load_extension("spatial")

# Create a table for parquet file 
for key, path in parquet_dict.items():
    con.sql(f"SELECT * EXCLUDE geometry, ST_GeomFromWKB(geometry) FROM '{path}'")

# Example of how to load a GeoDataFrame into DuckDB
#gdf.to_sql('table_name', con, if_exists='replace', index=False)

**Create a new Table with Tree Crown Center Points**

In [None]:
crowns_xy = con.execute("SELECT ST_X(ST_Centroid(geometry)), ST_Y(ST_Centroid(geometry)) FROM crowns").fetchdf()

**Split open space, private open space and public open space by district**

In [None]:
# create new table split_open_space 
# with open space split by district boundaries
con.execute(
    """
    CREATE TABLE split_open_space AS 
    SELECT
        districts.grunnkretsnummer,
        ST_Intersection(districts.geometry, open_space.geometry) AS geom
    FROM 
        districts, open_space
    WHERE
        ST_Intersects(districts.geom, open_space.geom)
    """
    )

# create new table split_buildings
# with buildings split by district boundaries
con.execute(
    """
    CREATE TABLE split_buildings AS 
    SELECT
        districts.grunnkretsnummer,
        ST_Intersection(districts.geometry, buildings.geometry) AS geom
    FROM 
        districts, buildings
    WHERE
        ST_Intersects(districts.geom, buildings.geom)
    """
    )

# create new table split_res_buildings
# with residential buildings split by district boundaries
con.execute(
    """
    CREATE TABLE split_res_buildings AS 
    SELECT
        districts.grunnkretsnummer,
        ST_Intersection(districts.geometry, residential_buildings.geometry) AS geom
    FROM 
        districts, residential_buildings
    WHERE
        ST_Intersects(districts.geom, residential_buildings.geom)
    """
    )

### Generate Columns with Count Statistics

| Name | Alias | Description | Type |  Unit | 
| --- | --- | --- | --- | --- |
| n_trees | Antall trær | Number of trees in the district | INT |
| n_bldg | Antall bygninger | Number of buildings in the district | INT |
| n_res_bldg | Antall boliger | Number of residential buildings in the district | INT |
| n_res_bldg_near_gs | Antall boliger nær grøntområde (300 m) | Number of residential buildings near green space (300 m) | INT |
| n_trees_near_bldg | Antall trær nær boliger (15 m) | Number of trees near residential buildings (15 m) | INT |
| n_viewshed | Antall viewshed piksler | Number of viewshed pixels that intersect with the building edge | INT |



In [None]:
# add count columsn to district table 

# add count columns: n_trees, n_bldg, n_res_bldg, n_res_bldg_near_gs, n_trees_near_bldg
con.execute(
    """
    ALTER TABLE districts
    ADD COLUMN n_trees INTEGER,
    ADD COLUMN n_bldg INTEGER,
    ADD COLUMN n_res_bldg INTEGER,
    ADD COLUMN n_res_bldg_near_gs INTEGER,
    ADD COLUMN n_trees_near_bldg INTEGER
    """
    )

# calculate count per district using geometry intersection
con.execute(
    """
    UPDATE districts SET
    n_trees = (SELECT COUNT(*) FROM crowns_xy WHERE ST_Within(geom, districts.geom)),
    n_bldg = (SELECT COUNT(*) FROM split_buildings WHERE ST_Within(geom, districts.geom)),
    n_res_bldg = (SELECT COUNT(*) FROM split_res_buildings WHERE ST_Within(geom, districts.geom)),
    n_res_bldg_near_gs = (SELECT COUNT(*) FROM split_res_buildings WHERE ST_DWithin(geom, (SELECT geom FROM crowns_xy WHERE ST_Within(geom, districts.geom)), 300)),
    n_trees_near_bldg = (SELECT COUNT(*) FROM crowns_xy WHERE ST_DWithin(geom, (SELECT geom FROM split_buildings WHERE ST_Within(geom, districts.geom)), 15))
    """
    )







### Generate Columns with Area Statistics

| Name | Alias | Description | Type |  Unit |
| --- | --- | --- | --- | --- |
| a_district | Grunnkretsareal | Area of the district | FLOAT | m2 |
| a_open_space | Åpent område | Area of open space | FLOAT | m2 |
| a_private_space | Privat område | Area of private space | FLOAT | m2 |
| a_public_space | Offentlig område | Area of public space | FLOAT | m2 |
| a_green_space | Grøntområde | Area of green space | FLOAT | m2 |
| a_crown | Kroneareal | Crown coverage area within the district | FLOAT | m2 |
| a_crown_public | Kroneareal i offentlig område | Crown coverage area within public space | FLOAT | m2 |
| a_crown_private | Kroneareal i privat område | Crown coverage area within private space | FLOAT | m2 |


In [None]:


# Assuming you have already loaded your tables into DuckDB
# You can use con.execute("CREATE TABLE ...") and con.execute("COPY ... FROM ...") to load your data

# Update district_polygons table
con.execute("""
    UPDATE district_polygons SET
        n_trees = (SELECT COUNT(*) FROM tree_crown_polygons WHERE ST_Within(geom, district_polygons.geom)),
        n_building = (SELECT COUNT(*) FROM building_polygons WHERE ST_Within(geom, district_polygons.geom)),
        n_res_bldg = (SELECT COUNT(*) FROM residential_building_polygons WHERE ST_Within(geom, district_polygons.geom)),
        n_res_bldg_green_space = (SELECT COUNT(*) FROM residential_building_polygons WHERE ST_DWithin(geom, (SELECT geom FROM green_area_polygons), 300)),
        a_district = ST_Area(geom),
        area_open_space = (SELECT SUM(ST_Area(geom)) FROM open_area_polygon WHERE ST_Within(geom, district_polygons.geom)),
        area_private_space = (SELECT SUM(ST_Area(geom)) FROM private_public_area_polygons WHERE ST_Within(geom, district_polygons.geom) AND type = 'private'),
        area_public_space = (SELECT SUM(ST_Area(geom)) FROM private_public_area_polygons WHERE ST_Within(geom, district_polygons.geom) AND type = 'public'),
        area_green_space = (SELECT SUM(ST_Area(geom)) FROM green_area_polygons WHERE ST_Within(geom, district_polygons.geom)),
        area_crown_coverage = (SELECT SUM(ST_Area(geom)) FROM tree_crown_polygons WHERE ST_Within(geom, district_polygons.geom)),
        area_crown_coverage_public_space = (SELECT SUM(ST_Area(geom)) FROM tree_crown_polygons WHERE ST_Within(geom, (SELECT geom FROM private_public_area_polygons WHERE type = 'public')))
""")