## Visualizing large geospatial datasets using deck.gl

See the documentation for the python library here: https://pydeck.gl/

Learn more about deck.gl here: https://deck.gl/ 

In [None]:
# install libraries needed for this tutorial
! pip install pydeck[jupyter] pandas h3 gtfs-kit numpy

In [None]:
import pandas as pd
import numpy as np
import pydeck as pdk
import h3.api.basic_str as h3
import gtfs_kit

### Example 1: Visualizing transit routes using deck.gl

For this example, we will put all train routes in Hamburg, Germany on a map. You can replace this with a GTFS feed of your choice. 

<mark>WARNING!</mark> this notebook might need a lot of RAM

Most transit agencies publish their timetables in GTFS format (https://gtfs.org/schedule/reference/). This contains the entire timetable but also shapes of the transit lines. 

In [None]:
# load a feed, in this case from Hamburg, Germany
feed = gtfs_kit.read_feed("https://transitfeeds.com/p/hamburger-verkehrsverbund-gmbh/1010/latest/download", dist_units="km")
LAT, LON = 53.55, 9.9

# alterantive - all of Denmark (this is a little bigger though)
# feed = gtfs_kit.read_feed("https://github.com/potatoTVnet/transit/raw/master/resources/rejseplanen.zip", dist_units="km")
# LAT, LON = 55.5, 12.6

In [None]:
routes_with_geo = gtfs_kit.routes.geometrize_routes(feed, route_ids=feed.routes.query("route_type < 3 or route_type == 109")['route_id'].tolist())
print(routes_with_geo.shape)

routes_with_geo.head()

Pydeck prefers to handle data in a binary format, therefore we need to transform the geometries into arrays. It can also handle Geodataframes directly, but that might result in worse performance. 

In [None]:
def get_color(route_type):
    if route_type == 109:
        # s train
        return [255, 10, 10]
    elif route_type == 1:
        # metro / subway
        return [0, 156, 211]
    elif route_type == 2:
        # rail
        return [230, 230, 230]
    else:
        # tram or other rail
        return [255, 204, 0]

def multi_line_to_array(geom):
    try:
        # its a multilinestring
        geom_lst = list(map(lambda y: y.xy, geom.geoms))
    except AttributeError:
        # only a linestring
        geom_lst = [geom.xy]
    for i, (x_line, y_line) in enumerate(geom_lst):
        geom_lst[i] = list(zip(x_line, y_line))
    return geom_lst

routes_with_geo['paths'] = routes_with_geo['geometry'].apply(lambda x: multi_line_to_array(x))
routes_with_geo['color'] = routes_with_geo['route_type'].apply(lambda x: get_color(x))

In [None]:
plot_df = routes_with_geo.explode('paths')[['route_short_name', 'paths', 'color']]

In [None]:
plot_df.sample(5)

In [None]:
INITIAL_VIEW_STATE = pdk.ViewState(latitude=LAT, longitude=LON, zoom=9, max_zoom=16, pitch=30, bearing=0)

In [None]:
path = pdk.Layer(
    type="PathLayer",
    data=plot_df,
    pickable=True,
    get_color="color",
    opacity=0.7,
    width_scale=10,
    width_min_pixels=2,
    get_path="paths",
    get_width=3,
)

r = pdk.Deck(layers=[path], initial_view_state=INITIAL_VIEW_STATE, tooltip={"text": "{route_short_name}"})

r.show()

### Example 2: Visualizing transit accessibility in the Copenhagen area

The data contains the transit accessibility metric for every point that can be reached by walking. The data was prepared for every hour and every day of the week. 

It was part of a geospatial data science project, which can be found here: https://potatotvnet.github.io/transit/ 

In [None]:
h3_data = pd.read_json("https://raw.githubusercontent.com/potatoTVnet/transit/master/docs/h3/841f059ffffffff.json")
h3_data.head()

In [None]:
# we are selecting Monday morning from the accessibility metric array. Monday at 10 has the index 34.
h3_plot_data = h3_data.copy()
h3_plot_data['freq'] = h3_plot_data['freq'].apply(lambda x: np.float32(x[34]))
h3_plot_data = h3_plot_data[['h3', 'freq']]

In [None]:
INITIAL_VIEW_STATE = pdk.ViewState(latitude=55.5, longitude=12.6, zoom=9, max_zoom=16, pitch=30, bearing=0)

In [None]:
h3 = pdk.Layer(
    type="H3HexagonLayer",
    data=h3_plot_data,
    get_hexagon="h3",
    get_elevation="freq",
    get_fill_color=[230, 230, 230, 80],
    pickable=False,
    wireframe=False,
    filled=True,
    extruded=True,
    elevationScale=40,
)

s = pdk.Deck(layers=[h3], initial_view_state=INITIAL_VIEW_STATE)

s.show()