# Creating Folkersma joint network layer

Assuming that the network data we received from Niras and Folkersma are in the folders `data/NIRAS` and `data/FOLKERSMA`, respectively (from Dropbox [here for Niras](https://www.dropbox.com/scl/fo/76ome1g3nqpr71akrifba/h?rlkey=3vijuusrgwil3wprd0gmfajhj&dl=0) and [here for Folkersma]())

This script imports nodes and edges from both sources and merges them (one file for all nodes; one file for all edges)

Run from main repo folder!

In [42]:
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import contextily as cx
from collections import Counter

In [21]:
# read in Niras data
edges1 = gpd.read_file("data/NIRAS/straekninger.gpkg")
nodes1 = gpd.read_file("data/NIRAS/knudepunkter.gpkg")

# read in Folkersma data
edges2 = gpd.read_file("data/FOLKERSMA/stretch.shp")
nodes2 = gpd.read_file("data/FOLKERSMA/node.shp")

In [22]:
# check that crs is the same for all data sets
assert all([edges1.crs, edges2.crs, nodes1.crs, nodes2.crs])
assert edges1.crs == edges2.crs
assert nodes1.crs == nodes2.crs
assert edges1.crs == nodes1.crs

In [23]:
# make sure we have the same columns present in both data sets
assert sorted(list(edges1.columns)) == sorted(list(edges2.columns))
assert sorted(list(nodes1.columns)) == sorted(list(nodes2.columns))

In [24]:
# make sure data sets don't intersect
assert edges1.unary_union.intersection(edges2.unary_union).is_empty
assert nodes1.unary_union.intersection(nodes2.unary_union).is_empty

In [25]:
# merge the two data frames
nodes = pd.concat([nodes1, nodes2], join = "outer", ignore_index=True)
edges = pd.concat([edges1, edges2], join = "outer", ignore_index=True)
assert len(nodes.columns) == len(nodes1.columns)
assert len(edges.columns) == len(edges1.columns)

In [26]:
# explode geometries
nodes = nodes.explode(ignore_index=True)
edges = edges.explode(ignore_index=True)

# Correcting child/parent node relations

* Some nodes are signalled as main, but have a refmain node that is different from their node ID
    * if refmain exists: set ismain to 0
    * if remain doesn't exist: ? (not the case here)
* Some nodes are signalled as children, but their parent node ID does not exist in the node ID list. 
    * set refmain to NA


In [27]:
# "ismain" nodes where refmain and id do not coincide

# get index of nodes that shouldn't be main
set_ismain_to_0 = nodes[(nodes.ismain == 1) & nodes.refmain.isin(list(nodes.id))].index

# check that their refmain node id exists
assert all([id in list(nodes.id) for id in list(int(id) for id in nodes.loc[set_ismain_to_0]["refmain"])])

# set their refmain to 0
nodes.loc[set_ismain_to_0, "ismain"] = 0

In [28]:
# children whose parents do not exist in our data set: set their refmain to NAN
set_refmain_to_na = nodes[(nodes.ismain == 0) & nodes.refmain.notna() & -nodes.refmain.isin(list(nodes.id))].index
nodes.loc[set_refmain_to_na, "refmain"] = np.nan

In [29]:
# nodes who are indicated as main, but whose refmain (that shouldnt be there to begin with) 
# doesn't exist in the node id list
set_refmain_to_na = nodes[(nodes.ismain == 1) & nodes.refmain.notna() & -nodes.refmain.isin(list(nodes.id))].index
nodes.loc[set_refmain_to_na, "refmain"] = np.nan

In [30]:
# children whose refmains are faulty
# faulty refmains found manually, e.g. nodes are too far away from each other
faulty_refmains = [1715, 1822, 4657]
for faulty_refmain in faulty_refmains:
    nodes.loc[nodes.refmain == faulty_refmain, "refmain"] = np.nan

In [31]:
# faulty_refmain = 1715
# fig, ax = plt.subplots(1,1, figsize = (20,20))
# nodes[nodes.refmain == faulty_refmain].plot(ax=ax)
# nodes[nodes.id == faulty_refmain].plot(ax=ax, color = "red")
# cx.add_basemap(ax=ax, crs = nodes.crs, source = cx.providers.CartoDB.Voyager)


**save to file**

In [13]:
# save as gpkgs
nodes.to_file("data/raw/network/nodes.gpkg")
edges.to_file("data/raw/network/edges.gpkg")