# Routing with OpenStreetMap

**by Alexander Michels**

September 16, 2022

GGIS 407

Project Assignment 2: Proposal Presentation

## Why Routing?

Routing is the process of figuring out efficient pathways between multiple points. Think Google Maps, Waze, etc.

Routing is important for practical purposes (getting to and from the places you need to visit) and research purposes (calculating costs/distances/times between locations).

## Routing and Research

**What Kind of Research Questions Care about Routing?**

* Accessibility (can people obtain the goods/services they need)
* Evacuation (routing agents out of an area/away from a hazard)
* Logistics (how to optimize travel plans)
* Urban Planning/Transportation/Land Use

I have done work in this realm and I'm hoping to learn how to make it more efficient.

## Routing Data

One difficulty with routing is getting quality data. The best publically available source is OpenStreetMap (OSM). The data isn't perfect, but it's plentiful and free!

In [1]:
import folium
import geopandas as gpd
import shapely.wkt
%load_ext sql

In [2]:
m = folium.Map(location=[40.12, -88.24], zoom_start=12)
m

## Routing is Hard

Routing is very computationally intensive due to the size of the data (tens to hundreds of millions of roads). Recent work performing routing between census tracts claim:

> At a national scale, this process is very slow; loading the entire
North American OpenStreetMap road network into PostGIS and then routing between all relevant points takes weeks on a single computer.

> A matrix of this size and granularity would cost over half a million
dollars through the Google Maps API, and the license would not allow caching this result. A single-term license of ArcGIS NetworkAnalyst costs $600 and—on a single machine—would be incapable of performing the calculation.

-  James Saxon & Daniel Snow (2020) A Rational Agent Model for the Spatial Accessibility of Primary Health Care, Annals of the American Association of Geographers, 110:1, 205-222, DOI: 10.1080/24694452.2019.1629870

This makes the task ideal for cyberinfrastructure-enabled GIS (cyberGIS)

## How to Route?

There are a variety of tools for routing, but I'm going to explore the use of [pgrouting](https://pgrouting.org/). pgrouting is built on PostGIS/PostgreSQL (geospatial database) and is designed for routing!


<img src="https://pgrouting.org/_images/pgrouting.png">

This is the same approached used by Saxon and Snow (2020)

## Data Pipeline

I'm going to be working with a tutorial from a [FOSS4G Workshop](https://workshop.pgrouting.org/2.7/en/index.html) as a starting point. The pipeline looks like:

* Download OSM extract from [Geofabrik](https://download.geofabrik.de/)
* Use database in Docker using [pgrouting/pgrouting](https://hub.docker.com/r/pgrouting/pgrouting) image.
* Install a few utilities (osm2pgrouting, osmctools, osmium-tool) for data cleaning/clipping/conversion.
* Connect to the database through Jupyter with ipython-sql

## Preliminary Work and VIz

I've been able to construct the Docker container and scripts for a database. We can connect to the database with:

In [3]:
%sql postgresql://root:cybergis@localhost:5432/routing

and then obtain the network geometry within a bounding box with the following SQL command:

In [4]:
%%sql network_geom <<
SELECT ST_AsText(the_geom) AS route_readable FROM ways
WHERE the_geom && ST_MakeEnvelope(-88.23, 40.10, -88.22, 40.11, 4326);

 * postgresql://root:***@localhost:5432/routing
3093 rows affected.
Returning data to local variable network_geom


...then load the geometry with shapely and convert to a GeoDataFrame...

In [5]:
geom = [shapely.wkt.loads(x[0]) for x in network_geom]
gdf = gpd.GeoDataFrame(crs="epsg:4326", geometry = geom)
print(len(gdf))
gdf.head()

3093


Unnamed: 0,geometry
0,"LINESTRING (-88.22381 40.10357, -88.22348 40.1..."
1,"LINESTRING (-88.22381 40.10357, -88.22381 40.1..."
2,"LINESTRING (-88.22349 40.10253, -88.22345 40.1..."
3,"LINESTRING (-88.22377 40.10152, -88.22350 40.1..."
4,"LINESTRING (-88.22400 40.10068, -88.22293 40.1..."


...and create an interactive visualization with Geopandas explore()

In [6]:
gdf.explore()

## Next Steps:

* Perform simple routing on the full network
* Perform cost-based routing (based on time rather than length)
* Perform routing on a vehicle only network (exclude pedestrian and cycling routes)
* Perform routing between arbitrary coordinates (not node IDs).

**Stretch Goals:**

* Routing between many origins and destinations
* Load origin and destination tables and route between them
* Implement more data cleaning (remove disconnected components, etc.)