```{index} single: application; shortest path
```
```{index} networkx
```
```{index} network optimization
```
```{index} pandas dataframe
```

# Extra material: Shortest path in real life


In [1]:
import subprocess
import sys

class ColabInstaller():

    def __init__(self):
        reqs = subprocess.check_output([sys.executable, '-m', 'pip', 'freeze'])
        self.installed_packages = [r.decode().split('==')[0] for r in reqs.split()]

    def on_colab(self):
        return "google.colab" in sys.modules

    def install(self, package):
        if self.on_colab():
            subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])
            self.installed_packages.append(package)

    def upgrade(self, package):
        if self.on_colab():
            subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "--upgrade", package])

colab = ColabInstaller()
colab.upgrade('numpy')
colab.upgrade('matplotlib')
colab.install('geopandas')
colab.install('geopy')
colab.install('osmnx')
colab.install('osmnet')
colab.install('pandana')
colab.upgrade('geopy')

# Introduction

Google brought with [maps]( https://www.google.com/maps) the world to our screens, including accurate geocoding and routing for several modalities. For the most, the usage of [maps]( https://www.google.com/maps) is interactive. As data and analytics professionals we often need a programmatically support for the services that [maps]( https://www.google.com/maps) offer us, preferably free. It also offers a plethora of [development support](https://developers.google.com/), but unfortunately most is paid. That is even more so for [maps]( https://developers.google.com/maps/documentation).

## Some background information and history 
[Geoff Boeing]( https://geoffboeing.com/about/) is a true leader in demystifying urban data analytics, with a strong emphasis on street networks. His [peer reviewed publications]( https://geoffboeing.com/publications/) are open and accompanied by usable demonstrations using his own [OSMnx]( https://geoffboeing.com/2018/03/osmnx-features-roundup/) package.
Professor [Peter Sanders]( https://algo2.iti.kit.edu/english/sanders.php), see also his [Wikipedia]( https://en.wikipedia.org/wiki/Peter_Sanders_(computer_scientist)) page, has moved his interests to other areas but his [route planning]( http://algo2.iti.kit.edu/routeplanning.php) project shaped the world of truly scalable road routing algorithms. 
From his alumni I distinguish two persons:
 * [Dominik Schultes](http://algo2.iti.kit.edu/schultes/) who won the [DIMACS challenge on shortest paths]( http://www.diag.uniroma1.it//challenge9/data/tiger/) and made it to the [Scientific American top 50]( https://www.scientificamerican.com/article/sciam-50-the-fastest-way/). Before Dominik’s research scalable shortest paths on large national road networks where heuristics, now they are exact and can be computed at world scale. 
 * [Dennis Luxen]( http://algo2.iti.kit.edu/english/luxen.php) for creating https://github.com/Project-OSRM/osrm-backend which offers a free, scalable, implementation of [contraction hierarchies]( https://en.wikipedia.org/wiki/Contraction_hierarchies).
 
Finally, I mention [Fletcher Foti]( https://fletcherfoti.weebly.com/) who gave us [pandana]( http://udst.github.io/pandana/).
 


## Geocoding

The world is mapped with the [geographic coordinate system](https://en.wikipedia.org/wiki/Geographic_coordinate_system) but we have difficulties remembering [latitudes]( https://en.wikipedia.org/wiki/Latitude) and [longitudes]( https://en.wikipedia.org/wiki/Longitude).

We learn and remember the world better from addresses. 


In [2]:
import geopy
    
def FreeLocator():
    return geopy.Photon(user_agent='myGeocoder')

Amsterdam = FreeLocator().geocode('Amsterdam,NL')

# Visualization

In [3]:
import folium
Map = folium.Map(location=(Amsterdam.latitude,Amsterdam.longitude), zoom_start=12)
Map

In [4]:
def locate_geopy(description):
    location = FreeLocator().geocode(description)
    if location is not None:
        return location.latitude, location.longitude
    return None, None

In [5]:
import pandas as pd
pd.options.display.float_format = '{:.6f}'.format

data = {'address': [ 'Centraal Station',
                     'Amsterdam Business School',
                     'Artis',
                     'Arena',
                     'Ziggo Dome' ], 
        'color'  : [ 'blue',
                     'black',                   
                     'green',
                     'red',
                     'purple' ]}
# Create DataFrame.
df = pd.DataFrame(data)
df['city']    = 'Amsterdam'
df['country'] = 'NL'
df

Unnamed: 0,address,color,city,country
0,Centraal Station,blue,Amsterdam,NL
1,Amsterdam Business School,black,Amsterdam,NL
2,Artis,green,Amsterdam,NL
3,Arena,red,Amsterdam,NL
4,Ziggo Dome,purple,Amsterdam,NL


In [6]:
locations = [ locate_geopy(','.join(row[['address','city','country']])) for _, row in df.iterrows() ]
df['lat'] = [ loc[0] for loc in locations ]
df['lon'] = [ loc[1] for loc in locations ]
df

Unnamed: 0,address,color,city,country,lat,lon
0,Centraal Station,blue,Amsterdam,NL,52.378901,4.900581
1,Amsterdam Business School,black,Amsterdam,NL,52.365107,4.911718
2,Artis,green,Amsterdam,NL,-30.401207,-56.481521
3,Arena,red,Amsterdam,NL,52.31599,4.942931
4,Ziggo Dome,purple,Amsterdam,NL,52.313629,4.938207


In [7]:
for _, row in df.iterrows():
    folium.Marker((row.lat,row.lon),icon=folium.Icon(color=row.color), tooltip=row.address).add_to(Map)
Map

In [8]:
import osmnx as ox
import networkx as nx
from IPython.display import display

ox.config(log_console=True, use_cache=True)



In [9]:
%%time 
G_walk = ox.graph_from_place('Amsterdam, NL', network_type='walk')

CPU times: user 1min, sys: 1.12 s, total: 1min 1s
Wall time: 1min 2s


In [10]:
print( G_walk.number_of_nodes(), G_walk.number_of_edges() )

37079 104568


In [11]:
df['osmnx'] = ox.distance.nearest_nodes(G_walk,df.lon,df.lat)
df

Unnamed: 0,address,color,city,country,lat,lon,osmnx
0,Centraal Station,blue,Amsterdam,NL,52.378901,4.900581,5629072001
1,Amsterdam Business School,black,Amsterdam,NL,52.365107,4.911718,46356661
2,Artis,green,Amsterdam,NL,-30.401207,-56.481521,1768194163
3,Arena,red,Amsterdam,NL,52.31599,4.942931,4622542635
4,Ziggo Dome,purple,Amsterdam,NL,52.313629,4.938207,1925143759


In [12]:
%time route = nx.shortest_path(G_walk,df.iloc[0].osmnx,df.iloc[1].osmnx,weight='length')
print(route)

CPU times: user 29.3 ms, sys: 4.09 ms, total: 33.3 ms
Wall time: 33.7 ms
[5629072001, 5629072000, 5629071975, 4239313191, 4239313081, 4239313075, 4239312638, 4239312409, 9683540350, 4239311933, 4239310797, 3189915006, 3175727789, 1263352467, 9807404628, 9959600367, 9283437392, 9982135036, 9982135039, 9982135032, 9982135049, 9982118415, 9971788071, 9807403578, 46411146, 46410382, 8238495242, 46409003, 46405684, 46402187, 5933448325, 4959039686, 9407636386, 1291587477, 9913510572, 8223730972, 1291587576, 8223730971, 8223752620, 46383339, 46380741, 46379359, 46374887, 46374082, 3781170573, 7995426344, 3781170150, 3781170147, 3781170139, 2021897089, 2021897044, 7223158294, 7223158307, 2021897032, 7223210075, 7223210071, 1291468055, 3160068713, 3160068154, 46356661]


In [13]:
route_map = ox.plot_route_folium(G_walk, route)
display(route_map)

In [14]:
%%time 
nodes = pd.DataFrame.from_dict(dict(G_walk.nodes(data=True)), orient='index')

CPU times: user 144 ms, sys: 14 ms, total: 158 ms
Wall time: 203 ms


In [15]:
%%time 
edges = nx.to_pandas_edgelist(G_walk)

  arr = construct_1d_object_array_from_listlike(values)


CPU times: user 4.16 s, sys: 55.8 ms, total: 4.22 s
Wall time: 4.25 s


In [16]:
nodes.street_count.describe()

count   37079.000000
mean        2.821840
std         0.994714
min         1.000000
25%         3.000000
50%         3.000000
75%         3.000000
max         7.000000
Name: street_count, dtype: float64

In [17]:
edges.length.describe()

count   104568.000000
mean        52.067069
std         66.243924
min          0.048000
25%         14.511750
50%         32.674000
75%         65.676500
max       1550.813000
Name: length, dtype: float64

In [18]:
edges.loc[edges.length.idxmax()]

source                                              2632720004
target                                              4340833704
width                                                      NaN
service                                                    NaN
junction                                                   NaN
est_width                                                  NaN
ref                                                        NaN
highway                                               tertiary
maxspeed                                                   NaN
tunnel                                                     NaN
geometry     LINESTRING (5.0484437 52.4098672, 5.0492122 52...
area                                                       NaN
bridge                                                     NaN
osmid                                                  7039463
length                                             1550.813000
name                                             Uitdam

In [19]:
%time longest = nx.shortest_path(G_walk,2632720004,46544942,weight='length')
print(longest)

CPU times: user 1.71 ms, sys: 70 µs, total: 1.78 ms
Wall time: 2.61 ms
[2632720004, 4868500506, 46526747, 3646762450, 46515220, 7896482872, 7896482772, 7896482536, 2876000370, 2876030674, 46525998, 2875982072, 2876049179, 2876046111, 2834850861, 3380450298, 3380450325, 3380450323, 1970783754, 1970783822, 1970783783, 1970783792, 1970789262, 1970785372, 46546263, 3478950226, 2869259996, 2876043898, 46544942]


In [20]:
route_map = ox.plot_route_folium(G_walk, longest)
display(route_map)

# Dijkstra on steroids for road networks

In [21]:
%%time
import pandana
network = pandana.Network(nodes['x'], nodes['y'], edges['source'], edges['target'], edges[['length']],twoway=True)

CPU times: user 2.75 s, sys: 44.6 ms, total: 2.8 s
Wall time: 2.98 s


In [22]:
network.nodes_df.head()

Unnamed: 0,x,y
6316199,4.888396,52.370173
25596455,4.923563,52.36484
25596477,4.906097,52.367
25645989,4.925075,52.365727
25658579,4.930438,52.364544


In [23]:
network.edges_df.head()

Unnamed: 0,from,to,length
0,6316199,46379627,42.497
1,6316199,46389218,225.577
2,6316199,391355271,62.907
3,25596455,8383889398,1.791
4,25596455,46356773,41.7


In [24]:
df['pandana'] = network.get_node_ids(df.lon, df.lat).values
df

Unnamed: 0,address,color,city,country,lat,lon,osmnx,pandana
0,Centraal Station,blue,Amsterdam,NL,52.378901,4.900581,5629072001,5629071974
1,Amsterdam Business School,black,Amsterdam,NL,52.365107,4.911718,46356661,46356661
2,Artis,green,Amsterdam,NL,-30.401207,-56.481521,1768194163,1768194163
3,Arena,red,Amsterdam,NL,52.31599,4.942931,4622542635,2928936658
4,Ziggo Dome,purple,Amsterdam,NL,52.313629,4.938207,1925143759,4622542635


In [25]:
%time path_pandana = network.shortest_path(df.iloc[2].pandana, df.iloc[3].pandana)

CPU times: user 10.6 ms, sys: 5.56 ms, total: 16.1 ms
Wall time: 15.3 ms


In [26]:
%time path_nx = nx.shortest_path(G_walk,df.iloc[2].osmnx,df.iloc[3].osmnx,weight='length')

CPU times: user 268 ms, sys: 17.9 ms, total: 286 ms
Wall time: 291 ms


In [27]:
A = set(path_pandana)
B = set(path_nx)
(A | B) - (A & B)

{46232158,
 46237572,
 46240581,
 46241169,
 46241205,
 46242937,
 46242940,
 46245940,
 46247806,
 46255649,
 46260704,
 46264996,
 46265577,
 46265994,
 46266419,
 46267226,
 46267229,
 46274385,
 46274670,
 46275359,
 46275510,
 46278192,
 46278488,
 46280478,
 46280842,
 46281091,
 46281209,
 46281430,
 46281524,
 46281920,
 46283239,
 46283971,
 46284393,
 46284949,
 46285251,
 46285451,
 46288061,
 46291556,
 254477712,
 262890789,
 497304704,
 497304720,
 878805275,
 878806012,
 878806213,
 878808875,
 1101933078,
 1259406528,
 1545015322,
 1548111106,
 1688255243,
 1725810217,
 1725960722,
 1925156931,
 1925156934,
 2801224787,
 2928936658,
 3132312675,
 4382189126,
 4382189139,
 4538188789,
 4622542635,
 4684390284,
 4684390320,
 5133014954,
 5323513026,
 5394045965,
 5394053135,
 5438332847,
 5438332850,
 6614935661,
 6653983649,
 6654004829,
 6879899205,
 6879899216,
 6892142909,
 6892143052,
 6958932853,
 6958932854,
 6958932858,
 7124847108,
 7434739098,
 7434739102,
 8492

In [28]:
origs = [o for o in df.pandana for d in df.pandana]
dests = [d for o in df.pandana for d in df.pandana]
%time distances = network.shortest_path_lengths(origs, dests)

CPU times: user 14.1 ms, sys: 3.95 ms, total: 18 ms
Wall time: 16.7 ms


In [29]:
import numpy as np 

n = len(df)
pd.options.display.float_format = '{:.2f}'.format
pd.DataFrame(np.array(list(distances)).reshape(n,n),index=df.address,columns=df.address)

address,Centraal Station,Amsterdam Business School,Artis,Arena,Ziggo Dome
address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Centraal Station,0.0,2218.2,11501.29,6274.28,6084.35
Amsterdam Business School,2218.2,0.0,11825.68,4130.58,4537.02
Artis,11501.29,11825.68,0.0,14651.95,13954.61
Arena,6274.28,4130.58,14651.95,0.0,4127.14
Ziggo Dome,6084.35,4537.02,13954.61,4127.14,0.0


In [30]:
np.random.seed(2021)
n = 500
sample = np.random.choice(np.array(network.nodes_df.index.values.tolist()), n, replace=False)
origs = [o for o in sample for d in sample]
dests = [d for o in sample for d in sample]

In [31]:
%time distances = network.shortest_path_lengths(origs, dests)
%time table = pd.DataFrame(np.array(list(distances)).reshape(n,n),index=sample,columns=sample)

CPU times: user 5.23 s, sys: 65.9 ms, total: 5.3 s
Wall time: 5.33 s
CPU times: user 21.1 ms, sys: 297 µs, total: 21.4 ms
Wall time: 21.5 ms


In [32]:
departure = table.max(axis=1).idxmax()
arrival = table.loc[departure].idxmax()
%time path_pandana = network.shortest_path(departure, arrival)
%time path_nx = nx.shortest_path(G_walk,departure,arrival,weight='length')
A = set(path_pandana)
B = set(path_nx)
(A | B) - (A & B)

CPU times: user 4.6 ms, sys: 3.47 ms, total: 8.07 ms
Wall time: 7.24 ms
CPU times: user 594 ms, sys: 17.1 ms, total: 611 ms
Wall time: 617 ms


set()

In [33]:
%time paths = network.shortest_paths(origs,dests)

CPU times: user 12.9 s, sys: 549 ms, total: 13.5 s
Wall time: 13.5 s


In [34]:
sum(map(len,paths))

35377572

In [35]:
for u,v in zip(paths[1][:-1],paths[1][1:]):
    print(G_walk.get_edge_data(u,v)[0].get('name',''))








Oostoever

Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat



Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Jan Evertsenstraat
Orteliuskade
Orteliuskade
Orteliuskade
Van Middellandtstraat
Orteliusstraat
Postjesweg
Van Spilbergenstraat
Postjesweg




Hoofdweg
Hoofdweg
Hoofdweg
Hoofdweg
Hoofdweg
Hoofdweg
Surinameplein
Surinameplein
Surinamestraat
Surinamestraat
Surinamestraat
['Surinamestraat', 'Overtoom']
Overtoom

Overtoom
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg


Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Amstelveenseweg
Laan der Hesperiden



Stadionplein
Laan der Hesperiden
Laan der Hesperiden

In [36]:
route_map = ox.plot_route_folium(G_walk, paths[1],color='red',map=Map)
display(route_map)