<h1>BikeDNA</h1>
<a href="https://github.com/anerv/BikeDNA">Github</a>

# Example reference data preprocessing: GeoDanmark

This notebook provides an example of how a spatial dataset with data on cycling infrastructure can be converted to the format required by BikeDNA. When using your own data, The preprocessing must be adapted to content and format.

The data used in this notebook are from *GeoDanmark* and were downloaded from [dataforsyningen.dk](https://dataforsyningen.dk/) under the [GeoDanmark license](https://www.geodanmark.dk/wp-content/uploads/2022/08/Vilkaar-for-brug-af-frie-geografiske-data_GeoDanmark-grunddata-august-2022.pdf).

As stated in the data set requirements, the reference data should:

- only contain **cycling infrastructure** (i.e. not also the regular street network)
- have all geometries as **LineStrings** (not MultiLineString)
- for each row, the geometry should be a **straight** LineString only defined by its start- and end nodes
- have start/end nodes at **intersections**
- be in a **CRS** recognised by GeoPandas
- contain a column describing whether each feature is a physically **protected**/separated infrastructure or if it is **unprotected**
- contain a column describing whether each feture is **bidirectional** or not
- contain a column describing how features have been **digitized** ('geometry type')
- contain a column with a unique **ID** for each feature

In [1]:
import contextily as cx
import folium
import geopandas as gpd
import matplotlib.pyplot as plt
import momepy
from shapely.ops import linemerge

from src import graph_functions as gf
from src import plotting_functions as pf

folium_layers = {
    "Google Satellite": folium.TileLayer(
        tiles="https://mt1.google.com/vt/lyrs=s&x={x}&y={y}&z={z}",
        attr="Google",
        name="Google Satellite",
        overlay=True,
        control=True,
        show=False,
    ),
    "whiteback": folium.TileLayer(
        tiles="https://api.mapbox.com/styles/v1/krktalilu/ckrdjkf0r2jt217qyoai4ndws/tiles/256/{z}/{x}/{y}@2x?access_token=pk.eyJ1Ijoia3JrdGFsaWx1IiwiYSI6ImNrcmRqMXdycTB3NG8yb3BlcGpiM2JkczUifQ.gEfOn5ttzfH5BQTjqXMs3w",
        name="Background: White",
        attr="Mapbox",
        control=True,
        overlay=True,
        show=False,
    ),
    "Stamen TonerLite": folium.TileLayer(
        tiles="https://stamen-tiles-{s}.a.ssl.fastly.net/toner-lite/{z}/{x}/{y}{r}.png",
        attr='Map tiles by <a href="http://stamen.com">Stamen Design</a>, <a href="http://creativecommons.org/licenses/by/3.0">CC BY 3.0</a> &mdash; Map data &copy; <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> contributors',
        name="Stamen TonerLite",
        control=True,
        overlay=True,
        show=False,
    ),
    "CyclOSM": folium.TileLayer(
        tiles="https://{s}.tile-cyclosm.openstreetmap.fr/cyclosm/{z}/{x}/{y}.png",
        attr='Map data &copy; <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> contributors',
        name="CyclOSM",
        control=True,
        overlay=True,
        show=False,
    ),
    "OSM": folium.TileLayer(
        tiles="openstreetmap",
        name="OpenStreetMap",
        attr='Map data &copy; <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a> contributors',
        control=True,
        overlay=True,
    ),
}

In [3]:
geodk = gpd.GeoDataFrame.from_file("ex2_cph_geodk/raw/vejmidte_brudt_subset.gpkg")

geodk.sample(10)

Unnamed: 0,FOT_ID,MOB_ID,FEAT_KODE,FEAT_TYPE,FEATSTATUS,GEOMSTATUS,STARTKNUDE,SLUTKNUDE,NIVEAU,OVERFLADE,...,VEJKLASSE,VEJ_MYND,VEJ_TYPE,PLADS,FIKTIV,TIMEOF_CRE,TIMEOF_PUB,TIMEOF_REV,TIMEOF_EXP,geometry
63248,1210513957,0,9963,Vejmidte_brudt,Taget i brug,Endelig,-999,-999,Ikke tildelt,Ubefæstet,...,Hovedsti,Ikke tildelt,Sti,f,f,2020-03-20,2020-03-20,,2021-12-01,MULTILINESTRING Z ((588069.740 6134506.870 26....
10754,1095450504,0,9963,Vejmidte_brudt,Taget i brug,Endelig,-999,-999,Ikke tildelt,Ubefæstet,...,Anden vej,Ikke tildelt,Vej,f,f,2016-04-14,2016-04-14,,2021-12-01,MULTILINESTRING Z ((581071.450 6141623.400 14....
58286,1067470412,0,9963,Vejmidte_brudt,Taget i brug,Endelig,-999,-999,Ikke tildelt,Ubefæstet,...,Lokalvej-Tertiær,Ikke tildelt,Vej,f,f,2014-07-17,2014-07-17,,2021-12-01,MULTILINESTRING Z ((593834.890 6143340.300 3.6...
42728,1067503761,0,9963,Vejmidte_brudt,Taget i brug,Endelig,-999,-999,Ikke tildelt,Befæstet,...,"Sti, diverse",Ikke tildelt,Sti,f,f,2014-07-18,2014-07-18,,2021-12-01,MULTILINESTRING Z ((590363.140 6140474.580 11....
957,1095474928,0,9963,Vejmidte_brudt,Taget i brug,Endelig,-999,-999,Ikke tildelt,Befæstet,...,Cykelbane langs vej,Ikke tildelt,Sti,f,f,2016-04-14,2016-04-14,,2021-12-01,MULTILINESTRING Z ((589694.110 6138787.110 16....
11343,1095452559,0,9963,Vejmidte_brudt,Taget i brug,Endelig,-999,-999,Ikke tildelt,Befæstet,...,Hovedsti,Ikke tildelt,Sti,f,f,2016-04-14,2016-04-14,,2021-12-01,MULTILINESTRING Z ((587269.780 6139207.950 11....
63391,1210515838,0,9963,Vejmidte_brudt,Taget i brug,Endelig,-999,-999,Ikke tildelt,Befæstet,...,Hovedsti,Ikke tildelt,Sti,f,f,2020-03-20,2020-03-20,,2021-12-01,MULTILINESTRING Z ((588069.270 6134430.210 25....
72193,1113470757,0,9963,Vejmidte_brudt,Taget i brug,Endelig,-999,-999,Ikke tildelt,Befæstet,...,Lokalvej-Tertiær,Ikke tildelt,Vej,f,f,2021-04-28,2021-04-28,,2021-12-01,MULTILINESTRING Z ((587335.230 6140675.560 5.5...
61866,1067639717,0,9963,Vejmidte_brudt,Taget i brug,Endelig,-999,-999,Ikke tildelt,Ubefæstet,...,Lokalvej-Tertiær,Ikke tildelt,Vej,f,f,2020-07-09,2020-07-09,,2021-12-01,MULTILINESTRING Z ((599289.350 6138380.450 13....
60009,1067495089,0,9963,Vejmidte_brudt,Taget i brug,Endelig,-999,-999,Ikke tildelt,Befæstet,...,Cykelsti langs vej,Ikke tildelt,Sti,f,f,2019-02-12,2019-02-12,,2021-12-01,MULTILINESTRING Z ((591315.300 6138511.940 15....


Our dataset contains the entire road network, including bicycle tracks and lanes. We are only interested in the dedicated cycling infrastructure and thus need to select a subset of the data.
We also only want to include infrastructure that is completed or under construction.

Some of the data might be outside of the study area we are interested in, but the data processing in notebook 01 will clip all data to the desired extent.

In [None]:
# Creating subset only with existing cycling infrastructure

geodk_selection = geodk.loc[
    (geodk.VEJKLASSE.isin(["Cykelsti langs vej", "Cykelbane langs vej"]))
    & (geodk.FEATSTATUS.isin(["Taget i brug", "Under anlæg"]))
].copy()

geodk_selection.explore()

For all code to run without errors, our dataset can only contain LineString geometries. Let's check what we have:

In [None]:
geodk_selection.geom_type.unique()

In this dataset, we only have MultiLineStrings. To fix this, we first try to merge the MultiLineStrings. 
If some of the MultiLinestrings are not connected (i.e. there are gaps in the lines), the aboves step will not be able to merge them. In that case we can instead 'explode' them.

In [None]:
geodk_linestrings = geodk_selection.copy()
# Convert MultiLineStrings to LineString
geodk_linestrings["geometry"] = geodk_linestrings["geometry"].apply(
    lambda x: linemerge(x) if x.geom_type == "MultiLineString" else x
)

if (
    len(geodk_linestrings.geom_type.unique()) > 1
    or geodk_linestrings.geom_type.unique()[0] != "LineString"
):

    print("Exploding MultiLineStrings...")
    geodk_linestrings = geodk_selection.explode(ignore_index=True)

assert len(geodk_linestrings.geom_type.unique()) == 1
assert geodk_linestrings.geom_type.unique()[0] == "LineString"
geodk_linestrings.geom_type.unique()

For the code to work, the data need to be in a CRS recognized by GeoPandas, and to have that CRS defined. Let's check that we have a CRS defined:

In [None]:
geodk_linestrings.crs

The analysis of data quality is based on the concept of a *network*. For the results to be accurate we need a dataset with nodes at intersections (i.e. where the lines defining the cycling infrastructure intersect).

Use the folium plot below to check that you do have nodes at intersections.
If not, this will have to be fixed - or it will be an aspect of low data quality that will become apparent in the analysis of data quality...

Don't worry if there are more nodes than just those at intersections and start/end points - we will take care of that in the data loading notebook.

In [None]:
G = momepy.gdf_to_nx(
    geodk_linestrings.to_crs("EPSG:25832"), approach="primal", directed=True
)  # We reproject the network data to avoid warnings - final reprojection will happen later

nodes, edges = momepy.nx_to_gdf(G)

# Feature groups for OSM
edges_folium = pf.make_edgefeaturegroup(
    gdf=edges, mycolor="black", myweight=2, nametag="edges", show_edges=True
)

nodes_folim = pf.make_nodefeaturegroup(
    gdf=nodes, mycolor="red", mysize=2, nametag="nodes", show_nodes=True
)

feature_groups = [edges_folium, nodes_folim]

m = pf.make_foliumplot(
    feature_groups=feature_groups,
    layers_dict=folium_layers,
    center_gdf=nodes,
    center_crs=nodes.crs,
)

display(m)

We don't technically need to drop any unnecessary columns, but let's avoid loading unnecessary data later on.

In [None]:
geodk_linestrings.columns

In [None]:
# Drop unnecessary columns

geodk_linestrings = geodk_linestrings[["FOT_ID", "VEJKLASSE", "geometry"]]

For consistency, we rename all column names to lower case letters:

In [None]:
geodk_linestrings = gf.clean_col_names(geodk_linestrings)

For this dataset we assume of all features to be 'true' geometry mappings and one directional, so we can specify this in config file and do not have to add it to the data.

The rest of the pre-processing, such as projecting to the chosen CRS, clipping the data to the study area etc. will happen in [notebook 2a](../REFERENCE/2a_initialize_reference.ipynb).

**Final dataset**

In [None]:
geodk_linestrings.sample(10)

**Export dataset**

In [None]:
geodk_linestrings.to_file(
    "../data/ex1_cph_municipality/cph_cycling_infra.gpkg", driver="GPKG"
)

*Contains data from GeoDanmark (retrieved spring 2022)*
*© SDFE (Styrelsen for Dataforsyning og Effektivisering og Danske kommuner)*

*License: [GeoDanmark](https://www.geodanmark.dk/wp-content/uploads/2022/08/Vilkaar-for-brug-af-frie-geografiske-data_GeoDanmark-grunddata-august-2022.pdf)*