# Graph Creation

This notebook enables the generation of graphs based on pre-collected OSM data. It is important that the data has already been retrieved and stored in a GeoJSON file, with each row containing a geometry of type LineString, MultiLineString, Polygon, or MultiPolygon.

The required data can be obtained using the notebooks notebook_get_data.ipynb and notebook_assign_risk.ipynb.
The resulting GeoJSON file from those steps serves as input for this notebook; you only need to specify the city name.
Make sure you also know which distribution center corresponds to the selected city, as this must be provided as well.

In [21]:
from model_get_data import read_geojson
from model_create_graph import create_graph_with_nodes
from model_add_linear_infrastructures import add_linesstrings_to_graph
from model_add_polygon_infrastructures import add_grid_polygons_to_graph
from model_connect_postnl_nodes import connect_to_nearest_edge
from model_delete_edges_in_no_fly import remove_no_fly_zones_from_graph
from model_check import fill_missing_edge_heights_from_csv, diagnose_graph

import pickle

### 0. Specifying the Area
Set the city variable to indicate which area the data was collected for.
Set the distribution variable to specify the corresponding distribution center.
If you're unsure which distribution center belongs to the selected city, you can look it up in the file:
model/postNL/output/postnl_distribution_cleaned.json.

In [5]:
city = 'borsele' # use lower case
depot = ['Goes'] # use capitalized

### 0. Get Data
Uses the read_geojson function to load and prepare the data for the selected area, so that a graph can be constructed from the OSM data.

In [6]:
gdf, lines_gdf, polygons_gdf, post_nl_gdf, no_fly_zones_gdf = read_geojson(f'/Users/cmartens/Documents/thesis_cf_martens/no_fly_zones/output/data_for_graph_{city}.geojson', depot, all=True)

Getting separate dataframes for lines, polygons, postnl points, distribution points and no fly zones
Looking for distribution points in ['Goes']
Found 1 distribution points in ['Goes']
Found 5309 lines, 4938 polygons, 24 postnl points, 1 distribution point and 6 no-fly zones


In [7]:
no_fly_zones_gdf

Unnamed: 0,name,id,description,area_type,category,risk,Height,air_type,geometry
10571,TRAUMAHELIKOPTER landingsplaats GOES/ Adrz,,Vluchten in de OPEN categorie zijn hier niet t...,No-fly zone,Special Operations,no_fly_zone,,Air Ambulance Landing Sites,"POLYGON ((53637.181 389447.667, 53637.091 3894..."
10572,Sloegebied (Vlissingen-Oost) Havengebied met r...,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Industry,no_fly_zone,,High-Risk Areas,"POLYGON ((40996.43 385892.928, 40981.46 385292..."
10573,Zeesteiger Zeeland Refinery Havengebied met ri...,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Industry,no_fly_zone,,High-Risk Areas,"POLYGON ((39745.895 381261.033, 39220.917 3816..."
10574,Gebied met verbod vanuit beveiligingsoverwegingen,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Security,no_fly_zone,,Restricted Zones,"POLYGON ((39393.122 383773.851, 39393.084 3837..."
10575,Gebied met verbod vanuit beveiligingsoverwegingen,,OPEN cat. vluchten zijn hier verboden SPEC cat...,No-fly zone,Security,no_fly_zone,,Restricted Zones,"POLYGON ((39145.428 384768.77, 39145.39 384762..."
10576,Natura 2000 Area Westerschelde & Saeftinghe,,"Natura 2000-besluit 2010, wijzigingsbesluit 2012",No-fly zone,Environment,no_fly_zone,,Protected Nature Reserves,"MULTIPOLYGON (((38488.379 384072.327, 38481.00..."


## 1. Create Initial Graph with Nodes
Creates an unconnected graph by adding all PostNL points and the distribution center as nodes.

In [9]:
G = create_graph_with_nodes(post_nl_gdf)

Graph creation summary:
 - Total rows: 24
 - Distributions added: 1
 - PostNL added: 23
 - Skipped (non-Point geometry): 0


## 2. Add LineStrings to Graph
Adds all LineString and MultiLineString geometries to the graph as edges. Ensures they are connected to each other and to the existing nodes by snapping nearby points together.

In [11]:
G_with_linestrings = add_linesstrings_to_graph(G, lines_gdf)

Exploding MultiLineString geometries into separate LineString geometries.
Finding intersections between lines.
Found 18862 intersection points.
Adding snapped points to the graph.


  return lib.line_locate_point(line, other)
  return lib.line_locate_point(line, other)
  return lib.line_locate_point(line, other)



 Graph update summary:
 - Lines found: 5309
 - Intersections found: 5288
 - Line intersection nodes added: 5957
 - Edges added: 29815
 - New waypoint nodes added: 22190
 - Skipped short/invalid segments: 213
 - Final graph has 28217 nodes and 29840 edges


## 3. Add polygons to 

Generates a grid for each polygon and connects the grid points to the polygon boundaries. Adds all polygons to the graph and ensures they are properly snapped to the existing graph structure, so that all elements are correctly connected.

In [13]:
G_with_grids_and_boundaries = add_grid_polygons_to_graph(G_with_linestrings, polygons_gdf, lines_gdf)

Excecuting add_grid_polygons_to_graph function
G before polygons: 28217 nodes, 29840 edges

 Step 1
Adding grid structure to 4938 polygons, with grid size 50 and max cells 10000.
G after grid: 139185 nodes, 196362 edges

 Step 2
Adding local polygon boundaries to the graph with snap tolerance 35 and boundary sample distance 50.

 Step 3
Finding intersections for 347017 edges.
Found 955178 intersections between polygon-polygon and polygon-line.

 Step 4
Number of unique snapt points: 257145

 Step 5
Mapping geometries to edges.

 Step 6
Adding intersection nodes.
Intersection nodes added: 59271

 Step 7
 Splitting edges at intersections.
 Edges split and added: 0

 Step 8
STRtree indexing over 72029 polygon_boundary points.
Connected nodes: 15945
Skipped 11525 line intersections/waypoints that could not be connected to polygon boundaries. Consider increasing the max_connection_distance.
G after connecting: 270485 nodes, 437958 edges


## 4. Connect postnl and distribution points to graph

All points are linked to the nearest location in the graph using a connector, provided the distance does not exceed max_connection_distance.

In [16]:
G_with_grids_and_boundaries_connected = connect_to_nearest_edge(G_with_grids_and_boundaries, max_connection_distance=10)

Total edges indexed: 437958
Start iterating over postnl and distribution nodes.
Node 0 connected to edge ((46052.099, 388124.03)-(46052.682, 388142.246)) via new node POINT (46052.4162386423 388133.9429940057) at d = 2.02 m
Node 1 connected to edge ((45566.436, 388448.955)-(45624.995, 388401.499)) via new node POINT (45614.94534030524 388409.6432075595) at d = 7.39 m
Node 2 connected to edge ((np.float64(45039.54), np.float64(382496.538))-(44993.759, 382430.667)) via new node POINT (45005.653770497614 382447.7815328291) at d = 3.90 m
Node 3 connected to edge ((45630.308, 378936.329)-(45676.169, 378936.329)) via new node POINT (45669.09780062066 378936.3290701152) at d = 0.90 m
Node 4 connected to edge ((51617.191, 386508.132)-(51667.191, 386508.132)) via new node POINT (51639.79375365123 386508.1320369003) at d = 6.59 m
Node 5 connected to edge ((50027.255, 383812.92)-(50032.5, 383815.54)) via new node POINT (50032.038546567754 383815.30975489947) at d = 4.17 m
Node 6 connected to edge

## 5. Delete edges inside nofly zones

All edges located within no-fly zones are detected and removed from the graph.

In [17]:
G_final = remove_no_fly_zones_from_graph(G_with_grids_and_boundaries_connected, no_fly_zones_gdf)

Get all lines from the graph.
Get all lines that are inside no-fly zones.
Checking against 6 no-fly zones.
99148 edges marked for removal (within no-fly zones).
Number of edges inside no-fly zones: 99148
Removing filtered edges from the graph.
99148 edges removed from the graph.


## 6. Save graph

Graph will be saved to output folder.

In [19]:
with open(f"output/{city}_raw.pkl", "wb") as f:
    pickle.dump(G_final, f)

print(f"Graph saved to output/{city}_raw.pkl")

Graph saved to output/borsele_raw.pkl


## 7. Plot graph

Plotting the graph on Folium.

In [11]:
# plot_graph_on_folium(G_final, gdf_polygons=polygons_gdf, plot_waypoints=True)

# 8. Check graph

Use diagnose_graph to check if all nodes and edges are added correctly. If some height are missed these can still be added by using fill_missing_edge_heights_from_gdf.

In [22]:
diagnose_graph(G_final)

GRAPH DIAGNOSTICS

Graph is NOT fully connected. Components: 70847
Component size distribution: Counter({1: 70606, 4: 62, 6: 25, 8: 18, 2: 17, 9: 16, 5: 15, 10: 13, 7: 11, 3: 10, 11: 6, 14: 5, 12: 5, 18: 4, 16: 3, 13: 3, 19: 3, 15: 3, 23: 2, 39: 2, 197434: 1, 26: 1, 42: 1, 36: 1, 80: 1, 67: 1, 142: 1, 21: 1, 74: 1, 40: 1, 17: 1, 66: 1, 48: 1, 31: 1, 22: 1, 20: 1, 89: 1, 33: 1})

Node types (ntype):
 - postnl: 23
 - distribution: 1
 - line_intersection: 5957
 - waypoint: 22258
 - grid_point: 110968
 - polygon_boundary: 72029
 - poly_intersection: 59271

Edge types (etype):
 - postnl_connector: 22
 - Tracks and rural access roads: 13858
 - Regional roads: 7537
 - connector: 13437
 - Pedestrian and cycling paths: 7263
 - Motorways and major roads: 481
 - Rivers, canals and streams: 141
 - Meadows and open grass: 99165
 - Railways: 413
 - Residential areas: 17463
 - Industrial zones: 2354
 - Power lines: 393
 - Cultural sites: 25
 - Agricultural lands: 160488
 - Living and residential stre

In [23]:
G_final = fill_missing_edge_heights_from_csv(G_final)


Height map loaded from CSV: {'Motorways and major roads': 30.0, 'Regional roads': 30.0, 'Tracks and rural access roads': 30.0, 'Living and residential streets': 30.0, 'Pedestrian and cycling paths': 30.0, 'Railways': 30.0, 'Power lines': 60.0, 'Power plants': 60.0, 'Communication towers': 60.0, 'High infrastructures': 60.0, 'Industrial zones': 60.0, 'Commercial zones': 60.0, 'Retail zones': 60.0, 'Residential areas': 60.0, 'Recreational zones': 30.0, 'Agricultural lands': 30.0, 'Forests and woodlands': 30.0, 'Meadows and open grass': 30.0, 'Rivers, canals and streams': 30.0, 'Lakes and ponds': 30.0, 'Water reservoirs': 30.0, 'Wetlands': 30.0, 'Schools and universities': 30.0, 'Hospitals': 60.0, 'Prisons': 60.0, 'Religious sites': 30.0, 'Cultural sites': 30.0, 'Cemeteries': 30.0, 'Parks': 30.0, 'connector': 30.0}
Patched edge ((45566.436, 388448.955) → POINT (45614.94534030524 388409.6432075595)) with height=30.0 from etype='Regional roads'
Patched edge ((50336.978, 387943.268) → POINT 

In [24]:
diagnose_graph(G_final)

GRAPH DIAGNOSTICS

Graph is NOT fully connected. Components: 70847
Component size distribution: Counter({1: 70606, 4: 62, 6: 25, 8: 18, 2: 17, 9: 16, 5: 15, 10: 13, 7: 11, 3: 10, 11: 6, 14: 5, 12: 5, 18: 4, 16: 3, 13: 3, 19: 3, 15: 3, 23: 2, 39: 2, 197434: 1, 26: 1, 42: 1, 36: 1, 80: 1, 67: 1, 142: 1, 21: 1, 74: 1, 40: 1, 17: 1, 66: 1, 48: 1, 31: 1, 22: 1, 20: 1, 89: 1, 33: 1})

Node types (ntype):
 - postnl: 23
 - distribution: 1
 - line_intersection: 5957
 - waypoint: 22258
 - grid_point: 110968
 - polygon_boundary: 72029
 - poly_intersection: 59271

Edge types (etype):
 - postnl_connector: 22
 - Tracks and rural access roads: 13858
 - Regional roads: 7537
 - connector: 13437
 - Pedestrian and cycling paths: 7263
 - Motorways and major roads: 481
 - Rivers, canals and streams: 141
 - Meadows and open grass: 99165
 - Railways: 413
 - Residential areas: 17463
 - Industrial zones: 2354
 - Power lines: 393
 - Cultural sites: 25
 - Agricultural lands: 160488
 - Living and residential stre

In [25]:
with open(f"output/{city}.pkl", "wb") as f:
    pickle.dump(G_final, f)

print(f"Graph saved to output/{city}.pkl")

Graph saved to output/borsele.pkl
