The purpose of this notebook is to demonstrate the use of the connect_pois tool to connect points (such as houses) to a road network, in order to be used for more precise network distance calculations. 

In [2]:
import pandas as pd
import networkx as nx
import geopandas as gpd
from shapely import wkt
import sys
  
# adding functions 
sys.path.insert(0, 'C:\\Users\\z3258367\\OneDrive - UNSW\\#PhD\\Walkability\\Other Cities\\Open-Walk-Index')

from toolbox_JR import connect_poi

Get POIs, then convert to projected CRS and get the centroid of polygonal POIs (building footprints in this case).

In [3]:
#%% # get pois. which must not be multipoints, for connect_poi to work
folder = "C:\\Users\\z3258367\\OneDrive - UNSW\\#PhD\\Walkability\\Other Cities\\Melbourne Data\\"
poi_gdf = gpd.read_file(''.join(
    (folder + "VicMap Features of Interest\\FOI_POINT.shp")))
buildings = gpd.read_file(''.join((
    folder + "melbourne_bf.shp")))

In [4]:
poi_gdf = poi_gdf.to_crs("EPSG:7856")
buildings_ctrs = buildings.to_crs("EPSG:7856")
buildings_ctrs.geometry = buildings_ctrs.geometry.centroid

In [219]:
parks_gdf = gpd.read_file(''.join(("C:\\Users\\z3258367\\OneDrive - UNSW\\#PhD\\Walkability\\Other Cities" + 
                                  "\\Shared Aus Data\\OSM parks vertices.gpkg")))

parks_gdf = parks_gdf.to_crs("EPSG:7856")
parks_gdf = gpd.clip(parks_gdf, Greater_Melbourne)

In [220]:
parks_ex = parks_gdf.explode(column=None, ignore_index=True, index_parts=False)
parks_ex.reset_index(drop=True, inplace=True)
parks_ex['poi_id'] = parks_ex.index

Connect_pois does not work with multipart/multipoint geometries so need to convert them just in case.
Create new POI indexes.

In [6]:
Greater_Melbourne = gpd.read_file((folder + "Greater_Melbourne_GCCSA_2016.shp")
    ).to_crs("EPSG:7856")

buildings_ex = buildings_ctrs.explode(column=None, ignore_index=True, index_parts=False)
buildings_ex.reset_index(drop=True, inplace=True)
buildings_ex['key'] = buildings_ex.index

poi_ex = gpd.clip(poi_gdf, Greater_Melbourne).explode(column=None, ignore_index=True, index_parts=False)
poi_ex.reset_index(drop=True, inplace=True)
poi_ex['poi_id'] = poi_ex.index

Import a network and convert it to a graph.
For Melbourne I have used the OSM network with every fclass kept except motorway and motorway_link.

In [63]:
GS = nx.read_shp(''.join(
    (folder + "Melbourne Ped Network clean 1.gpkg")),
     simplify=False,geom_attrs=True) 

edges_df = nx.to_pandas_edgelist(GS)
edges_gdf = gpd.GeoDataFrame(edges_df, geometry = gpd.GeoSeries.from_wkt(edges_df['Wkt']))
edges_gdf = edges_gdf.set_crs("EPSG:7856",allow_override=True).to_crs("EPSG:7856")

nodes_df = pd.DataFrame(GS.nodes(data=True))
nodes_gdf = gpd.GeoDataFrame(nodes_df, geometry=gpd.points_from_xy(list(zip(*nodes_df[0]))[0],list(zip(*nodes_df[0]))[1]))
nodes_gdf = nodes_gdf.set_crs("EPSG:7856",allow_override=True).to_crs("EPSG:7856")

  GS = nx.read_shp(''.join(


<class 'str'>


OSM road data does not necessarily have lengths, so add this now.

In [66]:
edges_gdf['length'] = edges_gdf.geometry.length

Establish a new column connect_id to index the original network nodes. Then use this column to create new 'to' and 'from' columns for the edges. (Previous 'source' and 'target' columns are based on OSM IDs of the nodes, but adding new fictional 'OSM IDs' for new nodes created in the POI joining process creates confusion, thus the creation of an unambiguous connect_id column).

In [67]:
nodes_gdf['connect_id'] = nodes_gdf.index

node_Ids = pd.Series(nodes_gdf.connect_id.values, index=nodes_gdf[0]).to_dict()
edges_gdf['to'] = edges_gdf['target'].map(node_Ids)
edges_gdf['from'] = edges_gdf['source'].map(node_Ids)

# remove extraneous columns from nodes and edges
nodes_gdf = nodes_gdf.drop([0,1], axis = 1).copy()
edges_gdf = edges_gdf.drop(['source','target','Wkb','code','Json','Wkt'], axis = 1)

Run connect_pois - this is a slow step. This finds the nearest network edge for each property/POI node, breaks the edge and makes a new node there, and a new edge connecting the POI to this node.

In [None]:
#%% # run connect pois with property centroids
new_nodes, new_edges = connect_poi(buildings_ex, nodes_gdf, edges_gdf, key_col='key', path=None, meter_epsg=7856)

In [216]:
#%% # run connect pois with POIs
# , prefix=8990000000
new_nodes_2, new_edges_2 = connect_poi(poi_ex, new_nodes, new_edges, key_col='poi_id', path=None, meter_epsg=7856)

Building rtree...
Updating external nodes...
Projecting POIs to the network...
Updating internal nodes...
Updating internal edges...
Missing 'to' nodes: 28
Updating external links...
Missing 'to' nodes: 0
Remove faulty projections: 81/17171 (0.47%)
NOTE: duplication in node coordinates keys
Nodes count: 4966801
Node coordinates key count: 4638224
Missing 'from' nodes: 1881
Missing 'to' nodes: 1776


In [221]:
#%% # run connect pois with parks
new_nodes_3, new_edges_3 = connect_poi(parks_ex, new_nodes_2, new_edges_2, key_col='poi_id', path=None, meter_epsg=7856)

Building rtree...
Updating external nodes...
Projecting POIs to the network...
Updating internal nodes...
Updating internal edges...
Missing 'to' nodes: 342
Updating external links...
Missing 'to' nodes: 0
Remove faulty projections: 809/214154 (0.38%)
NOTE: duplication in node coordinates keys
Nodes count: 5395109
Node coordinates key count: 4974021
Missing 'from' nodes: 2175
Missing 'to' nodes: 2065


In [222]:
# sometimes there appears two duplicate nodes due to rounding issues
d_nodes = new_nodes_3.round({'x':5, 'y':5}).drop_duplicates(subset=['x', 'y'], ignore_index=True)

In [223]:
#I'm not sure why this is still necessary as I tried to fix it in the toolbox, but for now it is.
#%% # have to rematch the edges to the nodes, as some of them are the wrong precision
nodes_coord = d_nodes['geometry'].map(lambda x: wkt.loads(wkt.dumps(x, rounding_precision=4)).coords[0])
nodes_id_dict = dict(zip(nodes_coord, d_nodes['connect_id'].astype('int64')))
matched_edges = new_edges_3.copy()

matched_edges['from'] = matched_edges['geometry'].map(lambda x: nodes_id_dict.get(list(wkt.loads(wkt.dumps(x, rounding_precision=4)).coords)[0], None))
matched_edges['to'] = matched_edges['geometry'].map(lambda x: nodes_id_dict.get(list(wkt.loads(wkt.dumps(x, rounding_precision=4)).coords)[-1], None))

matched_edges['from'] = matched_edges['from'].astype('Int64')
matched_edges['to'] = matched_edges['to'].astype('Int64')

dropped_edges = matched_edges.dropna(subset=['from','to'])

dropped_edges = dropped_edges.drop_duplicates(subset=['from','to'])

This next cell is necessary otherwise there may eventually be errors when trying to run pandana. (See https://github.com/UDST/pandana/issues/88). All edges must reference existing nodes.

In [224]:
dropped_edges = dropped_edges[dropped_edges['to'].isin(d_nodes['connect_id']) & dropped_edges['from'].isin(d_nodes['connect_id'])]

### Export new nodes & edges

In [225]:
d_nodes.to_csv("melbourne_nodes.csv")
dropped_edges.to_csv("melbourne_edges.csv")