# Development of highways over time
(pyiceberg -> **duckdb** -> lonboard map)
In this notebook we demonstrate how to analyze and visualize the development highways in OSM over time.

These are the steps you see further down:

* Set the query params
* Set the connection params to Iceberg Rest Catalog and Minio S3 Storage
* Prepare the data in 3 steps
    * Do an iceberg table scan with a pre-filter
    * Fine filter the data in a Dataframe after download
    * Transform the columns into the format that we need for mapping the features with `lonboard`
* Create a **Map**, an **interactive Slider** to filter the map data and display the data as a **mapping saturation chart**   

In [16]:
import os
import datetime

import duckdb
import polars as pl
import pandas as pd
import geopandas as gpd

from pyiceberg.catalog.rest import RestCatalog

import ipywidgets as widgets
from lonboard.layer_extension import DataFilterExtension
from lonboard import Map, ScatterplotLayer, PolygonLayer, SolidPolygonLayer, basemap
from lonboard.colormap import apply_continuous_cmap
from palettable.matplotlib import Viridis_20
from ipywidgets import FloatRangeSlider, jsdlink, Layout
from IPython.display import display, HTML

## Prepare the Iceberg connection

In [17]:
s3_user = os.environ["S3_ACCESS_KEY_ID"]  # add your user here
s3_password = os.environ["S3_SECRET_ACCESS_KEY"]  # add your password here

In [18]:
catalog = RestCatalog(
    name="default",
    **{
        "uri": "https://sotm2024.iceberg.ohsome.org",
        "s3.endpoint": "https://sotm2024.minio.heigit.org",
        "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO",
        "s3.access-key-id": s3_user,
        "s3.secret-access-key": s3_password,
        "s3.region": "eu-central-1"
    }
)

## Prepare DuckDB

In [19]:
con = duckdb.connect(
    config={
        'threads': 32,
        'max_memory': '50GB'
    }
)
con.install_extension("spatial")
con.load_extension("spatial")

## Prepare the input params for your map

In [21]:
bboxes = {
    'heidelberg': (8.629761, 49.379556, 8.742371, 49.437890),  #  ~20.000 buildings
    'nairobi': (36.650938, -1.444471, 37.103887, -1.163522),
    'mannheim': (8.41416, 49.410362, 8.58999, 49.590489),  #  ~100.000 buildings
    'berlin': (13.088345, 52.338271, 13.761161, 52.675509)  # ~700.000 buildings
}

# select your input params

# bbox
(xmin, ymin, xmax, ymax) = bboxes['heidelberg']

# date only, do not appent timepart
(min_valid_from, max_valid_from) = ('2019-01-01', '2025-01-01')

# iceberg table
namespace = 'geo_sort'
tablename = 'contributions_germany'
icebergtable = catalog.load_table((namespace, tablename))

In [5]:
bboxes = {
    'heidelberg': (8.629761, 49.379556, 8.742371, 49.437890),  #  ~ 150.000 highways --> mapped to  ~2 Mio. for monthly snapshots
    'nairobi': (36.650938, -1.444471, 37.103887, -1.163522),
    'mannheim': (8.41416, 49.410362, 8.58999, 49.590489),      #  ~ 440.000 highways --> mapped to  ~7 Mio. for monthly snapshots
    'berlin': (13.088345, 52.338271, 13.761161, 52.675509)     # ~3.000.000 highways --> mapped to ~36 Mio. for monthly snapshots
}

# select your input params

# bbox
(xmin, ymin, xmax, ymax) = bboxes['heidelberg']

# date only, do not appent timepart
(min_valid_from, max_valid_from) = ('2019-01-01', '2025-01-01')

# iceberg table
namespace = 'geo_sort'
tablename = 'contributions_germany'

## Get the data

### 1st Filter step: Define a pre-filter for the iceberg table scan

Based in this pre-filter pyIceberg will use the Iceberg Tables Metadata to minimize the number of parquet files which have to be touched.  

In [22]:
ice = icebergtable.scan(
    row_filter=f"""
        (status = 'latest' or status = 'history')
        and geometry_type = 'LineString'
        and valid_from < '{max_valid_from}T00:00:00'
        and valid_from >= '{min_valid_from}T00:00:00' 
        and (xmax >= {xmin} and xmin <= {xmax})
        and (ymax >= {ymin} and ymin <= {ymax})   
    """,
    selected_fields=(
        "osm_id",
        "valid_from",
        "valid_to",
        "road_length",
        "tags",
        "geometry"
    )
)

### 2nd filter step: Further filter the data 

Some properties are not reflected in the Iceberg Metadata and must be fine filtered after fetching the parquet data.

In [23]:
con = ice.to_duckdb('prefiltered_osm', connection=con)

query = f"""
SELECT
    osm_id,
    valid_from,
    valid_to,
    road_length,
    tags,
    ST_AsWKB(ST_Intersection(ST_GeomFromWKB(geometry),
    st_makeenvelope({xmin},{ymin},{xmax},{ymax}))) as geometry
FROM prefiltered_osm
WHERE 1=1
    and tags['highway'][1] is not null
ORDER BY valid_from ASC
;
"""

duckdb_table = con.sql(query)
num_features = duckdb_table.count('*').fetchnumpy().get('count_star()').item()
display(HTML(
    f"""<h2>Attention</h2><p style="font-size:large">You are going to download <strong>{num_features}</strong> features.</p>"""))

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))