# Graph Creation

This notebook enables the generation of graphs based on pre-collected OSM data. It is important that the data has already been retrieved and stored in a GeoJSON file, with each row containing a geometry of type LineString, MultiLineString, Polygon, or MultiPolygon.

The required data can be obtained using the notebooks notebook_get_data.ipynb and notebook_assign_risk.ipynb.
The resulting GeoJSON file from those steps serves as input for this notebook; you only need to specify the city name.
Make sure you also know which distribution center corresponds to the selected city, as this must be provided as well.

In [15]:
from model_get_data import read_geojson
from model_create_graph import create_graph_with_nodes
from model_add_linear_infrastructures import add_linesstrings_to_graph
from model_add_polygon_infrastructures import add_grid_polygons_to_graph
from model_connect_postnl_nodes import connect_to_nearest_edge
from model_delete_edges_in_no_fly import remove_no_fly_zones_from_graph
from model_check import fill_missing_edge_heights_from_csv, diagnose_graph

import pickle

### 0. Specifying the Area
Set the city variable to indicate which area the data was collected for.
Set the distribution variable to specify the corresponding distribution center.
If you're unsure which distribution center belongs to the selected city, you can look it up in the file:
model/postNL/output/postnl_distribution_cleaned.json.

In [2]:
city = 'alphen-waddinxveen' # use lower case
depot = ['Goes'] # use capitalized

### 0. Get Data
Uses the read_geojson function to load and prepare the data for the selected area, so that a graph can be constructed from the OSM data.

In [5]:
gdf, lines_gdf, polygons_gdf, post_nl_gdf, no_fly_zones_gdf = read_geojson(f'/Users/cmartens/Documents/thesis_cf_martens/3.no_fly_zones/output/data_for_graph_{city}.geojson', depot, all=True)

Getting separate dataframes for lines, polygons, postnl points, distribution points and no fly zones
Looking for distribution points in ['Goes']
Found 1 distribution points in ['Goes']
Found 22830 lines, 9108 polygons, 82 postnl points, 1 distribution point and 7 no-fly zones


In [6]:
no_fly_zones_gdf

Unnamed: 0,name,id,description,area_type,category,risk,Height,air_type,geometry
32346,Industriegebied met risicos op zware ongevallen,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Industry,no_fly_zone,,High-Risk Areas,"POLYGON ((104560.374 461789.357, 104560.336 46..."
32347,Gebied met verbod vanuit beveiligingsoverwegingen,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Security,no_fly_zone,,Restricted Zones,"POLYGON ((103921.671 460714.176, 103921.633 46..."
32348,Gebied met verbod vanuit beveiligingsoverwegingen,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Security,no_fly_zone,,Restricted Zones,"POLYGON ((103752.757 460932.286, 103752.719 46..."
32349,geografische zone van een beveiligd gebied of ...,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Security,no_fly_zone,,Restricted Zones,"POLYGON ((104950.064 461596.382, 104950.052 46..."
32350,geografische zone van een beveiligd gebied of ...,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Security,no_fly_zone,,Restricted Zones,"POLYGON ((100237.286 460193.56, 100237.274 460..."
32351,Natura 2000 Area De Wilck,,Natura 2000-besluit 7 mei 2013,No-fly zone,Environment,no_fly_zone,,Protected Nature Reserves,"MULTIPOLYGON (((97168.115 459219.521, 97153.53..."
32352,Natura 2000 Area Nieuwkoopse Plassen & De Haeck,,Natura 2000-besluit 06 januari 2014,No-fly zone,Environment,no_fly_zone,,Protected Nature Reserves,"MULTIPOLYGON (((116918.154 464732.231, 116914...."


## 1. Create Initial Graph with Nodes
Creates an unconnected graph by adding all PostNL points and the distribution center as nodes.

In [7]:
G = create_graph_with_nodes(post_nl_gdf)

Graph creation summary:
 - Total rows: 82
 - Distributions added: 1
 - PostNL added: 81
 - Skipped (non-Point geometry): 0


## 2. Add LineStrings to Graph
Adds all LineString and MultiLineString geometries to the graph as edges. Ensures they are connected to each other and to the existing nodes by snapping nearby points together.

In [8]:
G_with_linestrings = add_linesstrings_to_graph(G, lines_gdf)

Exploding MultiLineString geometries into separate LineString geometries.
Finding intersections between lines.
Found 82608 intersection points.
Adding snapped points to the graph.


  return lib.line_locate_point(line, other)
  return lib.line_locate_point(line, other)
  return lib.line_locate_point(line, other)
  return lib.line_locate_point(line, other)
  return lib.line_locate_point(line, other)
  return lib.line_locate_point(line, other)
  return lib.line_locate_point(line, other)
  return lib.line_locate_point(line, other)



 Graph update summary:
 - Lines found: 22830
 - Intersections found: 22629
 - Line intersection nodes added: 25979
 - Edges added: 89330
 - New waypoint nodes added: 52838
 - Skipped short/invalid segments: 2300
 - Final graph has 79630 nodes and 89862 edges


## 3. Add polygons to 

Generates a grid for each polygon and connects the grid points to the polygon boundaries. Adds all polygons to the graph and ensures they are properly snapped to the existing graph structure, so that all elements are correctly connected.

In [9]:
G_with_grids_and_boundaries = add_grid_polygons_to_graph(G_with_linestrings, polygons_gdf, lines_gdf)

Excecuting add_grid_polygons_to_graph function
G before polygons: 79630 nodes, 89862 edges

 Step 1
Adding grid structure to 9108 polygons, with grid size 50 and max cells 10000.
G after grid: 183619 nodes, 220591 edges

 Step 2
Adding local polygon boundaries to the graph with snap tolerance 35 and boundary sample distance 50.

 Step 3
Finding intersections for 424698 edges.
Found 1005599 intersections between polygon-polygon and polygon-line.

 Step 4
Number of unique snapt points: 324821

 Step 5
Mapping geometries to edges.

 Step 6
Adding intersection nodes.
Intersection nodes added: 60074

 Step 7
 Splitting edges at intersections.
 Edges split and added: 0

 Step 8
STRtree indexing over 104230 polygon_boundary points.
Connected nodes: 30803
Skipped 47047 line intersections/waypoints that could not be connected to polygon boundaries. Consider increasing the max_connection_distance.
G after connecting: 347923 nodes, 538079 edges


## 4. Connect postnl and distribution points to graph

All points are linked to the nearest location in the graph using a connector, provided the distance does not exceed max_connection_distance.

In [10]:
G_with_grids_and_boundaries_connected = connect_to_nearest_edge(G_with_grids_and_boundaries, max_connection_distance=10)

Total edges indexed: 538079
Start iterating over postnl and distribution nodes.
Node 0 connected to edge ((np.float64(104169.3), np.float64(450747.668))-(104180.479, 450762.63)) via new node POINT (104179.15837716208 450760.862475275) at d = 8.67 m
Node 1 connected to edge ((103460.764, 450132.613)-(103420.234, 450133.656)) via new node POINT (103453.4920450252 450132.8001366652) at d = 3.13 m
Node 2 connected to edge ((105262.931, 450842.367)-(105262.931, 450868.86)) via new node POINT (105262.93090006734 450862.52590566926) at d = 2.40 m
Node 3 connected to edge ((104662.931, 449392.367)-(104612.931, 449392.357)) via new node POINT (104620.06979521007 449392.35838599905) at d = 5.26 m
Node 4 connected to edge ((104328.235, 453174.248)-(104328.235, 453223.785)) via new node POINT (104328.23500360834 453212.0651888811) at d = 2.68 m
Node 5 connected to edge ((102417.22, 448079.461)-(102418.2, 448056.205)) via new node POINT (102417.82633599809 448065.07373607904) at d = 2.84 m
Node 6 c

## 5. Delete edges inside nofly zones

All edges located within no-fly zones are detected and removed from the graph.

In [11]:
G_final = remove_no_fly_zones_from_graph(G_with_grids_and_boundaries_connected, no_fly_zones_gdf)

Get all lines from the graph.
Get all lines that are inside no-fly zones.
Checking against 7 no-fly zones.
11113 edges marked for removal (within no-fly zones).
Number of edges inside no-fly zones: 11113
Removing filtered edges from the graph.
11113 edges removed from the graph.


## 6. Save graph

Graph will be saved to output folder.

In [12]:
with open(f"output/{city}_raw.pkl", "wb") as f:
    pickle.dump(G_final, f)

print(f"Graph saved to output/{city}_raw.pkl")

Graph saved to output/alphen-waddinxveen_raw.pkl


## 7. Plot graph

Plotting the graph on Folium.

In [13]:
# plot_graph_on_folium(G_final, gdf_polygons=polygons_gdf, plot_waypoints=True)

# 8. Check graph

Use diagnose_graph to check if all nodes and edges are added correctly. If some height are missed these can still be added by using fill_missing_edge_heights_from_gdf.

In [16]:
diagnose_graph(G_final)

GRAPH DIAGNOSTICS

Graph is NOT fully connected. Components: 20268
Component size distribution: Counter({1: 18590, 4: 451, 6: 186, 2: 129, 3: 61, 8: 59, 9: 56, 10: 44, 7: 40, 5: 39, 12: 35, 11: 33, 18: 22, 15: 21, 22: 20, 21: 19, 14: 16, 32: 15, 13: 15, 17: 15, 16: 14, 27: 14, 19: 12, 26: 12, 29: 12, 33: 12, 31: 11, 24: 11, 40: 11, 41: 11, 25: 11, 35: 10, 28: 10, 37: 9, 23: 9, 44: 9, 30: 9, 20: 8, 48: 8, 38: 8, 56: 7, 47: 7, 36: 7, 49: 7, 34: 7, 66: 7, 39: 6, 53: 6, 43: 6, 45: 6, 52: 5, 54: 4, 42: 4, 74: 4, 58: 4, 64: 4, 82: 4, 70: 4, 67: 4, 51: 3, 115: 3, 63: 3, 57: 3, 46: 3, 69: 3, 145: 2, 87: 2, 116: 2, 85: 2, 114: 2, 72: 2, 81: 2, 112: 2, 122: 2, 204: 2, 61: 2, 59: 2, 100: 2, 55: 2, 88: 2, 50: 2, 91: 2, 65: 2, 299387: 1, 78: 1, 325: 1, 147: 1, 267: 1, 167: 1, 203: 1, 271: 1, 97: 1, 385: 1, 129: 1, 132: 1, 186: 1, 93: 1, 124: 1, 108: 1, 83: 1, 79: 1, 134: 1, 338: 1, 76: 1, 159: 1, 68: 1, 89: 1, 105: 1, 95: 1, 80: 1, 101: 1, 155: 1, 150: 1, 202: 1, 149: 1, 92: 1, 62: 1})

Node types 

In [17]:
G_final = fill_missing_edge_heights_from_csv(G_final)


Height map loaded from CSV: {'Motorways and major roads': 30.0, 'Regional roads': 30.0, 'Tracks and rural access roads': 30.0, 'Living and residential streets': 30.0, 'Pedestrian and cycling paths': 30.0, 'Railways': 30.0, 'Power lines': 60.0, 'Power plants': 60.0, 'Communication towers': 60.0, 'High infrastructures': 60.0, 'Industrial zones': 60.0, 'Commercial zones': 60.0, 'Retail zones': 60.0, 'Residential areas': 60.0, 'Recreational zones': 30.0, 'Agricultural lands': 30.0, 'Forests and woodlands': 30.0, 'Meadows and open grass': 30.0, 'Rivers, canals and streams': 30.0, 'Lakes and ponds': 30.0, 'Water reservoirs': 30.0, 'Wetlands': 30.0, 'Schools and universities': 30.0, 'Hospitals': 60.0, 'Prisons': 60.0, 'Religious sites': 30.0, 'Cultural sites': 30.0, 'Cemeteries': 30.0, 'Parks': 30.0, 'connector': 30.0}
Patched edge ((104743.957, 462726.729) → POINT (104740.18141350291 462727.1187671847)) with height=30.0 from etype='Pedestrian and cycling paths'
Patched edge ((104708.949, 462

In [18]:
diagnose_graph(G_final)

GRAPH DIAGNOSTICS

Graph is NOT fully connected. Components: 20268
Component size distribution: Counter({1: 18590, 4: 451, 6: 186, 2: 129, 3: 61, 8: 59, 9: 56, 10: 44, 7: 40, 5: 39, 12: 35, 11: 33, 18: 22, 15: 21, 22: 20, 21: 19, 14: 16, 32: 15, 13: 15, 17: 15, 16: 14, 27: 14, 19: 12, 26: 12, 29: 12, 33: 12, 31: 11, 24: 11, 40: 11, 41: 11, 25: 11, 35: 10, 28: 10, 37: 9, 23: 9, 44: 9, 30: 9, 20: 8, 48: 8, 38: 8, 56: 7, 47: 7, 36: 7, 49: 7, 34: 7, 66: 7, 39: 6, 53: 6, 43: 6, 45: 6, 52: 5, 54: 4, 42: 4, 74: 4, 58: 4, 64: 4, 82: 4, 70: 4, 67: 4, 51: 3, 115: 3, 63: 3, 57: 3, 46: 3, 69: 3, 145: 2, 87: 2, 116: 2, 85: 2, 114: 2, 72: 2, 81: 2, 112: 2, 122: 2, 204: 2, 61: 2, 59: 2, 100: 2, 55: 2, 88: 2, 50: 2, 91: 2, 65: 2, 299387: 1, 78: 1, 325: 1, 147: 1, 267: 1, 167: 1, 203: 1, 271: 1, 97: 1, 385: 1, 129: 1, 132: 1, 186: 1, 93: 1, 124: 1, 108: 1, 83: 1, 79: 1, 134: 1, 338: 1, 76: 1, 159: 1, 68: 1, 89: 1, 105: 1, 95: 1, 80: 1, 101: 1, 155: 1, 150: 1, 202: 1, 149: 1, 92: 1, 62: 1})

Node types 

In [19]:
with open(f"output/{city}.pkl", "wb") as f:
    pickle.dump(G_final, f)

print(f"Graph saved to output/{city}.pkl")

Graph saved to output/alphen-waddinxveen.pkl
