# Initial steps in building a traffic simulation: supply, demand data and shortest path calculation
Below is a demonstration of some useful techniques in building a traffic simulation. In Quiz 3 and Assignment 2, you will be asked to simulate the traffic in Berkeley using static and dynamic traffic assignments with given inputs. In this notebook, we will cover some background knowledge on how those inputs are generated. You don't need such knowledge to complete your quiz and assignment (though it would be helpful to look at the examples here). However, knowing these tools and techniques is helpful for you to build traffic simulations for other places, whenever there is such a need or interest.

This demonstration tutorial is divided into three parts: (1) retrieving road network data; (2) establishing travel demands as trip origin-destinations (ODs); and (3) computing the shortest path between a given OD pair. Unlike previous exercises, there is no missing code below. 

---

### 1. Retrieving Road Network Data
Data about the road network, such as the locations of intersections, the length, lanes, speed limit and geometry of a road link, is an integral part in traffic simulation. Thinking of traffic modeling as getting outcomes (e.g., traffic flow) under given road supply and travel demand, the road network then defines the supply of the system. The go-to place for finding such information is the OpenStreetMap (OSM), where you can find freely-available community-sourced road network data. Based on our experience, the data quality is very good for the US and may be a little unreliable (but still pretty useful) in other countries. Most countries also have official road network information provided to licensed users. If you can get the official information, the overall idea in how to process them for traffic analysis is still similar to what we will do below for OSM.

In this section, we will show you three different ways to retrieve road network data from OSM. You can select the most convenient method depending on the scale of your problem.
- Small and interactive: Overpass-turbo (graphical)
- Large datasets: Overpass API (shell commands)
- Cleaned data, can be used directly for many types of analyses: OSMnx (Python)

##### 1.1 Small network: Overpass-turbo
Overpass-turbo is a handy way to download data for a small area. You can access overpass-turbo at https://overpass-turbo.eu/. Select an area of interest (e.g., UC Berkeley campus) and type your queries on the left.
<img src="https://github.com/UCB-CE170a/Fall2020/blob/master/python-exercises/Traffic%20Exercise%201/overpass-turbo-1-note.png?raw=1" alt="drawing" width="800"/>

Next, click "Run" and see the query results under the "Data" tab. You can copy and paste the result to a text editor for future use.

<table><tr>
<td> <img src="https://github.com/UCB-CE170a/Fall2020/blob/master/python-exercises/Traffic%20Exercise%201/overpass-turbo-2.png?raw=1" alt="Drawing" width="400"/>  </td>
<td> <img src="https://github.com/UCB-CE170a/Fall2020/blob/master/python-exercises/Traffic%20Exercise%201/overpass-turbo-3.png?raw=1" alt="Drawing" width="400"/> </td>
</tr></table>

##### 1.2 Large network: Overpass API
If you want to download the road network data for a relatively large area (e.g., a city), it is better to use the [Overpass API from a command line terminal](http://overpass-api.de/command_line.html). Let's first create a file called `query.osm`, which specifies:
- output format: JSON.
- element type: roadways, including motorway, trunck, primary, secondary, tertiary, and residential roads.
- query area: within a polygon area defined by the lat/lon coordinates of the vertices.

Note: below are the commands on Linux system (e.g., Google colab). If you are using a different operation system, the command may be slightly different.

In [None]:
!echo "data=[out:json];way[highway~'motorway|motorway_link|motorway_junction|trunk|trunk_link|primary|primary_link|secondary|secondary_link|tertiary|tertiary_link|residential'](poly:'37.84615220228875 -122.4537615259554 37.9205424163833 -122.42627473534108 37.97662600406699 -122.27439451079097 37.93944332562459 -122.18153759427354 37.85457143622269 -122.14440323003141 37.78981170543388 -122.19181902134872 37.764909622499076 -122.34990028037001 37.78612226780035 -122.42013291803424 37.84615220228875 -122.4537615259554');(._;>;);out;"> query.osm

Next, we can download the data specified in `query.osm` into a file called `target.osm` from `https://overpass-api.de/api/interpreter`. `wget` is the Linux command for downloading content from web servers.

In [None]:
!wget -O target.osm --post-file=query.osm "https://overpass-api.de/api/interpreter"

Now you should see the `target.osm` data in your Colab file directory. It is in JSON format that we have introduced briefly in [Python Exercise 5](https://github.com/UCB-CE170a/Fall2020/tree/master/python-exercises/Day%205). You can save the data and do further processing.

##### 1.3 Cleaned network: OSMnx
Overpass-turbo and Overpass introduced above can be used to download raw data from the OSM. It is useful if you want to make customized road network from the scratch. However, this is often not the case as cleaning the raw data can be a time consuming process. For example, in traffic simulations, we need the road network as a directed graph, which the OSM raw data is close to, but not exactly so. If you want to get a cleaned graph, you can use the excellent [OSMnx package](https://geoffboeing.com/2016/11/osmnx-python-street-networks/). It was developed by [Geoff Boeing](https://geoffboeing.com/) while he was a PhD student at UC Berkeley. The [original blog post](https://geoffboeing.com/2016/11/osmnx-python-street-networks/) and [code repo](https://github.com/gboeing/osmnx) provide clear instructions of how to use it. Here we will show how to retrieve the road network of North Berkeley, the one we will use for the homework of this module.

In [None]:
!sudo apt install libspatialindex-dev python-rtree
!pip install geopandas rtree osmnx

In [None]:
import osmnx as ox
import geopandas as gpd
from shapely import geometry

In [None]:
# you can get a cleaned road network by address
G0 = ox.graph_from_address('1878 Euclid Avenue, Berkeley, California', network_type='drive')

# plot the network
ox.plot_graph(G0)

In [None]:
# create a shapely Polygon object. This polygon covers the corner of Berkeley to the northeast of Hearst and Martin Luther King Jr. Way
north_berkeley_shape = gpd.read_file('north_berkeley/north_berkeley.shp')['geometry'].iloc[0]

# get road network by address
G = ox.graph_from_polygon(north_berkeley_shape, network_type='drive')

# plot the road network
ox.plot_graph(G)

In [None]:
# convert the graph into nodes and edges Geopandas dataframe
north_berkeley_nodes, north_berkeley_edges = ox.graph_to_gdfs(G)
display(north_berkeley_nodes.head(2))
display(north_berkeley_edges.head(2))

This is how we retrieved the original network for the quizzes and assignments in this course module. The actual road network you will be using has been modified manually to remove some redundant features. However, without manual modifications, the outputs from OSMnx are in general good to be used directly for graph analysis and simulation.

---
### 2. Building travel demand
For traffic analysis, obtaining the travel demand data is among the most challenging tasks. Accurate travel demand inputs are indispensable in building a realistic model. However, we often cannot get such data due to resource or privacy constraints. There are several potential sources of travel demand data: regional or national travel surveys (conducted every 10 years or so, not reflecting short-term changes) and increasingly from commercial data providers that source anonymous and aggregated journey information using mobile phone signals. There are also travel-demand generation models based on land-use information.

| Trip_ID | start_node | end_node | departure_time |
|---------|------------|----------|----------------|
| 1       | 1          | 10       | 0              |
| 2       | 15         | 55       | 10             |
| ..      | ..         | ..       | ..             |

Travel demand inputs are expressed as origin-destination (OD) traffic flow tables. For static analysis, there is usually one OD table for a, say, three hour period (morning peak OD, mid-day OD, evening peak OD, etc.). For dynamic simulations, there could be an OD table for every 15 minutes or less. In the demonstration below, we will show you how to construct a simple travel demand file for an hypothesized evacuation setting, where we want to evacuate residents in the dangerous area to safe locations.

The key step in our code below, i.e., identifying nodes in an evacuation zone, is inspired by the [blog post](https://geoffboeing.com/2016/10/r-tree-spatial-index-python/) from Geoff Boeing, author of OSMnx.

In [None]:
# get all nodes in the study area
!wget "https://raw.githubusercontent.com/UCB-CE170a/Fall2020/master/traffic_data/berkeley_nodes.csv" -O berkeley_nodes.csv

# import some modules that will be used later
from shapely.geometry import Point

In [None]:
# read in the road intersection information - we normally assume a journey starts at an road intersection
all_nodes = pd.read_csv('network/berkeley_nodes.csv')
all_nodes_gdf = gpd.GeoDataFrame(all_nodes, crs='epsg:4326', geometry=[Point(xy) for xy in zip(all_nodes.lon, all_nodes.lat)])
all_nodes_sindex = all_nodes_gdf.sindex

# read in the evacuation area (this area is manually created in QGIS - not real evacuation zones in Berkeley)
evac_zone = gpd.read_file('evacuation_zone/evacuation_zone.shp')
evac_zone_geom = evac_zone['geometry'].iloc[0]
# get all nodes that are inside the evacuation boundary, which would be the origin of the journey
coarse_node_ids = list(all_nodes_sindex.intersection(evac_zone_geom.bounds))
coarse_nodes = all_nodes_gdf.iloc[coarse_node_ids]
precise_nodes = coarse_nodes[coarse_nodes.intersects(evac_zone_geom)]

# suppose there are 50 vehicle trips originating from each origin - this is greatly simplified, where we normally obtain the nodal travel demand from parcel maps
od = pd.DataFrame({'origin_osmid': precise_nodes['node_osmid'].values.tolist()*50})
# the destination node
od['destin_osmid'] = 'vn_sink'
print(od.shape)
display(od.head())

---
### 3. Shortest path calculation
Route computation holds a special position in traffic model because it maps the travel demand to the network supply. A driver may take routes considering a variety of factors: time, monetary cost, safety, familiarity, emission, just to name a few. It is a common to assume drivers would take the fastest path, which is the shortest path on a road network graph weighted by the travel time (factoring in the distance as well as congestion status). Certainly in sophisticated models, the route choice criteria will be a lot more complex than purely the shortest path. However, we will use this simple assumption in this course.

There are numerous python packages that can perform the shortest-path calculation, with the most notable ones being [NetworkX](https://networkx.github.io/) and [python-igraph](https://igraph.org/python/). There are multiple [shortest-path finding algorithms](https://en.wikipedia.org/wiki/Shortest_path_problem), while the [Dijkstra's Algorithm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) is the most generally applicable one. In this class, we will use a specially developed shortest path code, [sp](https://github.com/cb-cities/sp), which implements priority-queue based Dijkstra's Algorithm and has been tested to run more efficiently than other packages.

The [sp](https://github.com/cb-cities/sp) code is developed by Dr Krishna Kumar at UT Austin (formerly at the Soga group). It computes shortest path using Dijkstra's Algorithm efficiently. If your problem has special features, sometimes using other algorithm can give you even faster results. The [sp](https://github.com/cb-cities/sp) code is written in C++ with a Python wrapper. Normally you would need to compile the code on your computer. Here on colab, we provide a compiled dynamic library `liblsp.so`.

In [1]:
# install some python packages
!pip install geopandas shapely folium

# retrieve the sp code
!rm -rf sp && mkdir sp
!wget "https://github.com/UCB-CE170a/Fall2020/raw/master/traffic_data/liblsp.so" -O sp/liblsp.so
!wget "https://raw.githubusercontent.com/UCB-CE170a/Fall2020/master/traffic_data/interface.py" -O sp/interface.py

# retrieve the road network
!wget "https://raw.githubusercontent.com/UCB-CE170a/Fall2020/master/traffic_data/berkeley_edges.csv" -O berkeley_edges.csv

# import modules that will be used later
import folium
import pandas as pd
import geopandas as gpd
import shapely.wkt
from shapely.geometry import Point
from sp import interface

/bin/sh: wget: command not found
/bin/sh: wget: command not found
/bin/sh: wget: command not found


In [None]:
# read edges file
edges_df = pd.read_csv('berkeley_edges.csv')
display( edges_df.head(2) )

# create a graph
# supply the name of the edges dataframe, column name of the start node ID, end node ID and graph weights (free-flow travel time) column
g = interface.from_dataframe(edges_df, 'start_node_id', 'end_node_id', 'fft')

Let's get the path from the CEE department (North gate: osmid 53055202) to Cheeseboard Pizza (Shattuck Avenue and Vine Street: osmid 239617031).

In [None]:
print( 'The node id of the start location is: ', edges_df.loc[edges_df['start_osmid']==53055202, 'start_node_id'].unique()[0] )
print( 'The node id of the end location is: ', edges_df.loc[edges_df['end_osmid']=='239617031', 'end_node_id'].unique()[0] )

In [None]:
# get path
def get_path(origin, destin):
    sp = g.dijkstra(origin, destin)
    sp_dist = sp.distance(destin)

    if sp_dist > 10e7:
        route = []
    else:
        route = [(start_sp, end_sp) for (start_sp, end_sp) in sp.route(destin)]
    sp.clear()
    
    return route, sp_dist

origin = 354 ### the origin node id of a trip
destin = 196 ### the end node id of a trip
route, distance = get_path(origin, destin) ### hint: use the provided function `get_path`.
print('The trip travel time is {:.2f} minutes.'.format(distance/60))

In [None]:
# visualize
one_path = pd.DataFrame(route, columns=['start_node_id', 'end_node_id']).merge(
    edges_df[['start_node_id', 'end_node_id', 'geometry']])
one_path_gdf = gpd.GeoDataFrame(one_path, crs='epsg:4326', geometry=one_path['geometry'].map(shapely.wkt.loads))
one_path_json = one_path_gdf.to_json()

start_json = one_path_gdf.iloc[0]['geometry'].coords[0]
end_json = one_path_gdf.iloc[-1]['geometry'].coords[0]

berkeley_map = folium.Map([37.88, -122.25], zoom_start=14)
berkeley_map.add_child(folium.features.GeoJson(one_path_json))
folium.Marker(list(start_json)[::-1], icon = folium.Icon(color='blue')).add_to(berkeley_map)
folium.Marker(list(end_json)[::-1], icon = folium.Icon(color='red')).add_to(berkeley_map)
berkeley_map