# Graph Creation

This notebook enables the generation of graphs based on pre-collected OSM data. It is important that the data has already been retrieved and stored in a GeoJSON file, with each row containing a geometry of type LineString, MultiLineString, Polygon, or MultiPolygon.

The required data can be obtained using the notebooks notebook_get_data.ipynb and notebook_assign_risk.ipynb.
The resulting GeoJSON file from those steps serves as input for this notebook; you only need to specify the city name.
Make sure you also know which distribution center corresponds to the selected city, as this must be provided as well.

In [17]:
from model_get_data import read_geojson
from model_create_graph import create_graph_with_nodes
from model_add_linear_infrastructures import add_linesstrings_to_graph
from model_add_polygon_infrastructures import add_grid_polygons_to_graph
from model_connect_postnl_nodes import connect_to_nearest_edge
from model_delete_edges_in_no_fly import remove_no_fly_zones_from_graph
from model_check import fill_missing_edge_heights_from_csv, diagnose_graph

import pickle

### 0. Specifying the Area
Set the city variable to indicate which area the data was collected for.
Set the distribution variable to specify the corresponding distribution center.
If you're unsure which distribution center belongs to the selected city, you can look it up in the file:
model/postNL/output/postnl_distribution_cleaned.json.

In [6]:
city = 'breda' # use lower case
depot = ['Breda'] # use capitalized

### 0. Get Data
Uses the read_geojson function to load and prepare the data for the selected area, so that a graph can be constructed from the OSM data.

In [7]:
gdf, lines_gdf, polygons_gdf, post_nl_gdf, no_fly_zones_gdf = read_geojson(f'/Users/cmartens/Documents/thesis_cf_martens/3.no_fly_zones/output/data_for_graph_{city}.geojson', depot, all=True)

Getting separate dataframes for lines, polygons, postnl points, distribution points and no fly zones
Looking for distribution points in ['Breda']
Found 1 distribution points in ['Breda']
Found 15416 lines, 5838 polygons, 65 postnl points, 1 distribution point and 5 no-fly zones


In [8]:
no_fly_zones_gdf

Unnamed: 0,name,id,description,area_type,category,risk,Height,air_type,geometry
21457,Industriegebied met risicos op zware ongevallen,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Industry,no_fly_zone,,High-Risk Areas,"POLYGON ((112630.908 402373.839, 112630.871 40..."
21458,Industriegebied met risicos op zware ongevallen,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Industry,no_fly_zone,,High-Risk Areas,"POLYGON ((111140.677 390426.526, 111140.64 390..."
21459,Industriegebied met risicos op zware ongevallen,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Industry,no_fly_zone,,High-Risk Areas,"POLYGON ((112382.091 402530.484, 112382.053 40..."
21460,Gebied met verbod vanuit beveiligingsoverwegingen,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Security,no_fly_zone,,Restricted Zones,"POLYGON ((113524.959 396494.41, 113524.921 396..."
21461,Natura 2000 Area Ulvenhoutse Bos,,Natura 2000-besluit 2010,No-fly zone,Environment,no_fly_zone,,Protected Nature Reserves,"MULTIPOLYGON (((115010.48 396804.794, 115003.8..."


## 1. Create Initial Graph with Nodes
Creates an unconnected graph by adding all PostNL points and the distribution center as nodes.

In [9]:
G = create_graph_with_nodes(post_nl_gdf)

Graph creation summary:
 - Total rows: 65
 - Distributions added: 1
 - PostNL added: 64
 - Skipped (non-Point geometry): 0


## 2. Add LineStrings to Graph
Adds all LineString and MultiLineString geometries to the graph as edges. Ensures they are connected to each other and to the existing nodes by snapping nearby points together.

In [10]:
G_with_linestrings = add_linesstrings_to_graph(G, lines_gdf)

Exploding MultiLineString geometries into separate LineString geometries.
Finding intersections between lines.
Found 60104 intersection points.
Adding snapped points to the graph.


  return lib.line_locate_point(line, other)
  return lib.line_locate_point(line, other)



 Graph update summary:
 - Lines found: 15416
 - Intersections found: 15361
 - Line intersection nodes added: 18874
 - Edges added: 71282
 - New waypoint nodes added: 43908
 - Skipped short/invalid segments: 1360
 - Final graph has 63025 nodes and 71405 edges


## 3. Add polygons to 

Generates a grid for each polygon and connects the grid points to the polygon boundaries. Adds all polygons to the graph and ensures they are properly snapped to the existing graph structure, so that all elements are correctly connected.

In [11]:
G_with_grids_and_boundaries = add_grid_polygons_to_graph(G_with_linestrings, polygons_gdf, lines_gdf)

Excecuting add_grid_polygons_to_graph function
G before polygons: 63025 nodes, 71405 edges

 Step 1
Adding grid structure to 5838 polygons, with grid size 50 and max cells 10000.
G after grid: 123617 nodes, 151370 edges

 Step 2
Adding local polygon boundaries to the graph with snap tolerance 35 and boundary sample distance 50.

 Step 3
Finding intersections for 256185 edges.
Found 646059 intersections between polygon-polygon and polygon-line.

 Step 4
Number of unique snapt points: 222978

 Step 5
Mapping geometries to edges.

 Step 6
Adding intersection nodes.
Intersection nodes added: 65744

 Step 7
 Splitting edges at intersections.
 Edges split and added: 0

 Step 8
STRtree indexing over 51415 polygon_boundary points.
Connected nodes: 25902
Skipped 35769 line intersections/waypoints that could not be connected to polygon boundaries. Consider increasing the max_connection_distance.
G after connecting: 240776 nodes, 363611 edges


## 4. Connect postnl and distribution points to graph

All points are linked to the nearest location in the graph using a connector, provided the distance does not exceed max_connection_distance.

In [12]:
G_with_grids_and_boundaries_connected = connect_to_nearest_edge(G_with_grids_and_boundaries, max_connection_distance=10)

Total edges indexed: 363611
Start iterating over postnl and distribution nodes.
Node 0 connected to edge ((110417.924, 402818.717)-(110433.45, 402852.769)) via new node POINT (110430.90699739025 402847.1916249602) at d = 5.45 m
Node 1 connected to edge ((110529.515, 397712.008)-(110521.223, 397722.212)) via new node POINT (110526.3363129125 397715.9201147066) at d = 4.79 m
Node 2 connected to edge ((109618.085, 396234.978)-(109618.085, 396244.034)) via new node POINT (109618.08494436 396241.9102484932) at d = 0.82 m
Node 3 connected to edge ((112432.771, 398188.99)-(112436.357, 398194.829)) via new node POINT (112435.73005925784 398193.80845655006) at d = 0.83 m
Node 4 connected to edge ((109564.018, 399604.742)-(np.float64(109561.845), np.float64(399618.032))) via new node POINT (109563.13040467404 399610.17050523794) at d = 0.16 m
Node 5 connected to edge ((112768.725, 399871.756)-(112768.725, 399897.258)) via new node POINT (112768.72489341111 399891.23400346964) at d = 1.48 m
Node 

## 5. Delete edges inside nofly zones

All edges located within no-fly zones are detected and removed from the graph.

In [13]:
G_final = remove_no_fly_zones_from_graph(G_with_grids_and_boundaries_connected, no_fly_zones_gdf)

Get all lines from the graph.
Get all lines that are inside no-fly zones.
Checking against 5 no-fly zones.
8975 edges marked for removal (within no-fly zones).
Number of edges inside no-fly zones: 8975
Removing filtered edges from the graph.
8975 edges removed from the graph.


## 6. Save graph

Graph will be saved to output folder.

In [15]:
with open(f"output/{city}_raw.pkl", "wb") as f:
    pickle.dump(G_final, f)

print(f"Graph saved to output/{city}_raw.pkl")

Graph saved to output/breda_raw.pkl


## 7. Plot graph

Plotting the graph on Folium.

In [None]:
# plot_graph_on_folium(G_final, gdf_polygons=polygons_gdf, plot_waypoints=True)

# 8. Check graph

Use diagnose_graph to check if all nodes and edges are added correctly. If some height are missed these can still be added by using fill_missing_edge_heights_from_gdf.

In [18]:
diagnose_graph(G_final)

GRAPH DIAGNOSTICS

Graph is NOT fully connected. Components: 20896
Component size distribution: Counter({1: 20495, 4: 97, 6: 66, 3: 28, 8: 28, 7: 22, 5: 21, 9: 13, 10: 12, 2: 11, 14: 11, 18: 11, 12: 10, 13: 7, 11: 7, 15: 4, 19: 4, 17: 4, 22: 3, 29: 3, 20: 2, 71: 2, 16: 2, 35: 2, 26: 2, 21: 2, 25: 2, 28: 2, 31: 2, 215866: 1, 110: 1, 181: 1, 40: 1, 53: 1, 112: 1, 65: 1, 24: 1, 38: 1, 127: 1, 61: 1, 66: 1, 56: 1, 34: 1, 69: 1, 43: 1, 44: 1, 39: 1, 46: 1, 83: 1, 36: 1})

Node types (ntype):
 - postnl: 64
 - distribution: 1
 - line_intersection: 18874
 - waypoint: 44151
 - grid_point: 60592
 - polygon_boundary: 51415
 - poly_intersection: 65744

Edge types (etype):
 - postnl_connector: 64
 - Regional roads: 25001
 - Pedestrian and cycling paths: 34763
 - connector: 25418
 - Tracks and rural access roads: 24274
 - Residential areas: 46413
 - Railways: 1636
 - Industrial zones: 5865
 - Meadows and open grass: 94811
 - Cultural sites: 3662
 - Motorways and major roads: 1452
 - Hospitals: 569
 

In [19]:
G_final = fill_missing_edge_heights_from_csv(G_final)


Height map loaded from CSV: {'Motorways and major roads': 30.0, 'Regional roads': 30.0, 'Tracks and rural access roads': 30.0, 'Living and residential streets': 30.0, 'Pedestrian and cycling paths': 30.0, 'Railways': 30.0, 'Power lines': 60.0, 'Power plants': 60.0, 'Communication towers': 60.0, 'High infrastructures': 60.0, 'Industrial zones': 60.0, 'Commercial zones': 60.0, 'Retail zones': 60.0, 'Residential areas': 60.0, 'Recreational zones': 30.0, 'Agricultural lands': 30.0, 'Forests and woodlands': 30.0, 'Meadows and open grass': 30.0, 'Rivers, canals and streams': 30.0, 'Lakes and ponds': 30.0, 'Water reservoirs': 30.0, 'Wetlands': 30.0, 'Schools and universities': 30.0, 'Hospitals': 60.0, 'Prisons': 60.0, 'Religious sites': 30.0, 'Cultural sites': 30.0, 'Cemeteries': 30.0, 'Parks': 30.0, 'connector': 30.0}
Patched edge ((110608.5, 402198.074) → POINT (110612.65480404728 402196.21864274645)) with height=30.0 from etype='Pedestrian and cycling paths'
Patched edge ((109193.771, 4021

In [20]:
diagnose_graph(G_final)

GRAPH DIAGNOSTICS

Graph is NOT fully connected. Components: 20896
Component size distribution: Counter({1: 20495, 4: 97, 6: 66, 3: 28, 8: 28, 7: 22, 5: 21, 9: 13, 10: 12, 2: 11, 14: 11, 18: 11, 12: 10, 13: 7, 11: 7, 15: 4, 19: 4, 17: 4, 22: 3, 29: 3, 20: 2, 71: 2, 16: 2, 35: 2, 26: 2, 21: 2, 25: 2, 28: 2, 31: 2, 215866: 1, 110: 1, 181: 1, 40: 1, 53: 1, 112: 1, 65: 1, 24: 1, 38: 1, 127: 1, 61: 1, 66: 1, 56: 1, 34: 1, 69: 1, 43: 1, 44: 1, 39: 1, 46: 1, 83: 1, 36: 1})

Node types (ntype):
 - postnl: 64
 - distribution: 1
 - line_intersection: 18874
 - waypoint: 44151
 - grid_point: 60592
 - polygon_boundary: 51415
 - poly_intersection: 65744

Edge types (etype):
 - postnl_connector: 64
 - Regional roads: 25001
 - Pedestrian and cycling paths: 34763
 - connector: 25418
 - Tracks and rural access roads: 24274
 - Residential areas: 46413
 - Railways: 1636
 - Industrial zones: 5865
 - Meadows and open grass: 94811
 - Cultural sites: 3662
 - Motorways and major roads: 1452
 - Hospitals: 569
 

In [21]:
with open(f"output/{city}.pkl", "wb") as f:
    pickle.dump(G_final, f)

print(f"Graph saved to output/{city}.pkl")

Graph saved to output/breda.pkl
