## Tutorial: Generating a road network graph and plotting a map using Openstreetmap data
### Part 1: From OSM data to Pandas Dataframe

#### Environment:
- Python 3.5 or higher

#### Major Dependencies: 
- PyOsmium package : [GitHub](https://github.com/osmcode/pyosmium/releases), [website](https://osmcode.org/pyosmium/)
- PyOsmium command line tool : [website](https://github.com/osmcode/osmium-tool)  
    - If you have pip: `pip install osmium-tool`
- Basemap (Towards the end of step 5) 
    
#### Estimated Completion time:
- 8 hours

#### Tutorial does not include:
- Adjacency Matrix (For a graph, a file will be created for nodes and another file for edges)

In [11]:
import os
import math
import osmium
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## Step 1: Download data 

__Method 1:__ Download data from OpenStreetMap, OSM file of your bounding box

__Method 2:__ If the bounding box area from OpenStreetMap is too large.
- you may have to use alternative download site.
- Alternative sites may have their own caveats, I used geofabrik.de which has maps of entire regions.
- So I had to download the map data of entire South California. [here]( http://download.geofabrik.de/north-america/us/california/socal.html)
- I will cut the desired bounding box from this big map.

__NOTE:__ The file I downloaded and unzipped is south_cal.osm (10 GB size). 
- Its too big to be uploaded in GitHub, assume that it exists in './OSM_files/south_cal.osm'

#### Use the 2 cells below only if you used method 2 in Step 1, If you used Method 1... directly go to Step 2
- Essentially, we get the desired bounding box from a large map in the 2 cells below
- Getting the Bounding Box can be tricky (with negative and positive values). see the diagram below 

![alt text](images/Group.png)

In [12]:
# Apply bounding box (-b = bounding box, -v = verbose, -o = output file)
# Using the osmium command line tool to get a slice of the map that is relevant to us

!osmium extract -v -b -118.6983,33.8017,-117.7269,34.5720 './OSM_files/south_cal.osm' -o './OSM_files/bounding_box.osm'

[ 0:00] Started osmium extract
[ 0:00]   osmium version 1.13.0
[ 0:00]   libosmium version 2.16.0
[ 0:00] Command line options and default settings:
[ 0:00]   input options:
[ 0:00]     file name: ./OSM_files/south_cal.osm
[ 0:00]     file format: 
[ 0:00]   output options:
[ 0:00]     file name: ./OSM_files/bounding_box.osm
[ 0:00]     file format: 
[ 0:00]     generator: osmium/1.13.0
[ 0:00]     overwrite: no
[ 0:00]     fsync: no
[ 0:00]   strategy options:
[ 0:00]     strategy: complete_ways
[ 0:00]     with history: no
[ 0:00]   other options:
[ 0:00]     config file: 
[ 0:00]     output directory: 
[ 0:00] 
[ 0:00] Extracts:
[ 0:00] [01] Output:      ./OSM_files/bounding_box.osm
[ 0:00]      Format:      XML
[ 0:00]      Description: 
[ 0:00]      Envelope:    (-118.6983,33.8017,-117.7269,34.572)
[ 0:00]      Type:        bbox
[ 0:00]      Geometry:    BOX(-118.6983 33.8017,-117.7269 34.572)
[ 0:00] 
[ 0:00] Running 'complete_ways' strategy in two passes...
[ 0:00] First pass (o

In [15]:
# This is still a huge file
!osmium fileinfo './OSM_files/bounding_box.osm'

File:
  Name: ./OSM_files/bounding_box.osm
  Format: XML
  Compression: none
  Size: 5698020449
Header:
  Bounding boxes:
  With history: no
  Options:
    generator=osmium/1.13.0
    version=0.6


## Step 2: Extract relevant road type i.e. create a filter using osmium

- The openstreet data model consists of Nodes, Ways and Relations
- Everything in OSM is made up of nodes. Roads are Ways which are collection of nodes
- The gcgrnn_box.osm file will have a list of nodes with node id and other attributes first
- Followed by list of ways with way id, which internally has reference to node ids among other attributes 
- Ways also have XML tags in them that specify what the way belongs to
- From bounding_box.osm, we want ways with "highway" XML tags of the following 11 types:
    1. motorway
    2. trunk
    3. secondary
    4. tertiary
    5. unclassified
    6. residential
    7. motorway_link
    8. trunk_link
    9. primary_link
    10. secondary_link
    11. tertiary_link
    
- If you inspect the XML files key = highway and value = one of the 11 types above

#### Sub-Step: Collect all the nodes that belong to the highway type above
- Get all the Node IDs that belong to one of the 11 highway types

In [16]:
highway_nodes = set()

# works as callbacks
# osmium.Simplehandler to create filter
class NodesCollect(osmium.SimpleHandler):
    def __init__(self):
        super(NodesCollect, self).__init__()
        
    def way(self, w):
        if w.tags.get('highway') in ('motorway',
                                     'trunk',
                                     'secondary',
                                     'tertiary',
                                     'unclassified',
                                     'residential',
                                     'motorway_link',
                                     'trunk_link',
                                     'primary_link',
                                     'secondary_link',
                                     'tertiary_link'):
            try:
                # Get all the nodes in this way
                highway_nodes.update([n.ref for n in w.nodes])
            except osmium.InvalidLocationError:
                print("WARNING: way %d incomplete"%w.id)

node_collector = NodesCollect()

# Here we do not need locations
node_collector.apply_file("./OSM_files/bounding_box.osm")

print("Total number of highway nodes = ",len(highway_nodes))

Total number of highway nodes =  840009


#### Sub-Step: Using the list of nodes, get all the ways that reference the nodes above
- Get all the Way IDs that belong to one of the 11 highway types

In [17]:
writer = osmium.SimpleWriter("./OSM_files/Highways_nodes_ways.osm")

class HighwayFilter(osmium.SimpleHandler):
    def __init__(self):
        super(HighwayFilter, self).__init__()
        
    def node(self, n):

        # now if the node belong to the highway add it to file
        if n.id in highway_nodes:
            writer.add_node(n)
            
    def way(self, w):
        if  w.tags.get('highway') in ('motorway',
                          'trunk', 
                          'secondary', 
                          'tertiary', 
                          'unclassified', 
                          'residential', 
                          'motorway_link',
                          'trunk_link',
                          'primary_link',
                          'secondary_link',
                          'tertiary_link') :
            writer.add_way(w)
            
high_filter = HighwayFilter()

# Here we do need locations
high_filter.apply_file("./OSM_files/bounding_box.osm", locations=True)

- It may seem strange why we had to get seperate lists of nodes and ways 
- its because in OSM data, all nodes have locations (lat and long) but ways do not
- we will add location data to all our ways below

## Step 3: Add location to ways

In [18]:
# -o is the output file
# -n Keep the untagged nodes in the output file (file size will be a lot bigger than input)

!osmium add-locations-to-ways --ignore-missing-nodes -v -o './OSM_files/highways_final.osm' './OSM_files/Highways_nodes_ways.osm'

[ 0:00] Started osmium add-locations-to-ways
[ 0:00]   osmium version 1.13.0
[ 0:00]   libosmium version 2.16.0
[ 0:00] Command line options and default settings:
[ 0:00]   input options:
[ 0:00]     file names: 
[ 0:00]       ./OSM_files/Highways_nodes_ways.osm
[ 0:00]     file format: 
[ 0:00]   output options:
[ 0:00]     file name: ./OSM_files/highways_final.osm
[ 0:00]     file format: 
[ 0:00]     generator: osmium/1.13.0
[ 0:00]     overwrite: no
[ 0:00]     fsync: no
[ 0:00]   other options:
[ 0:00]     index type (for positive ids): flex_mem
[ 0:00]     index type (for negative ids): flex_mem
[ 0:00]     keep untagged nodes: no
[ 0:00] 
[ 0:00] Copying input file './OSM_files/Highways_nodes_ways.osm'
XML parsing error at line 3454145, column 0: no element found


In [19]:
# Even with the error above, 100% of the file is processed. And does not effect mapping.

In [20]:
!osmium fileinfo './OSM_files/highways_final.osm' 

File:
  Name: ./OSM_files/highways_final.osm
  Format: XML
  Compression: none
  Size: 129123712
Header:
  Bounding boxes:
  With history: no
  Options:
    generator=osmium/1.13.0
    version=0.6


## Step 4: Create Pandas DataFrame from ways

- There are 2 ways to access location data from nodes in ways n.location and (n.location.x, n.location.y)
- May encounter: "Ways callback keeps reference error", just restart kernel

In [21]:
# Each way tags have nodes inside

class OSM_to_pandas_ways(osmium.SimpleHandler):
    def __init__(self):
        osmium.SimpleHandler.__init__(self)
        self.osm_data=[]
        
    def way(self,w):
        
        #print("\n#Way:{}".format(w.id))
        
        node_ids = [n.ref for n in w.nodes]
        
        #Some diagnostic print statements
        #print("\nNode IDS:",node_ids)
        
        #node_locations = [n.location for n in w.nodes]
        #print("\n Node Locations",node_locations )
        
        node_locations=[]
        for n in w.nodes:
            
            #Some more diagnostic print statements
            #print(n.location)

            # .location loses one decimal point value
            # .location.x keeps that 
            
            #print(str(n.location))
            #a = str(n.location.x)+"/"+str(n.location.y)
            #print(a)
            
            long = float(str(n.location.x))/10000000
            lat = float(str(n.location.y))/10000000
            
            node_locations.append((lat, long))
            
        #print("\n Node Locations",node_locations)
        
        # Append
        self.osm_data.append([w.id, node_ids, node_locations])
            
way_creator = OSM_to_pandas_ways()
way_creator.apply_file("OSM_files/highways_final.osm")

# The directionality is not embedded

In [24]:
# Column names for Pandas Dataframe
col_names = ['Way ID', 'Node IDs','Lat/Long']
df_ways = pd.DataFrame(way_creator.osm_data, columns=col_names)
df_ways.head()

Unnamed: 0,Way ID,Node IDs,Lat/Long
0,2417713,"[297523835, 364042999]","[(34.2599243, -118.4387025), (34.2567447, -118..."
1,2430283,"[598686507, 1377652371, 268525224, 3768827286,...","[(34.0906254, -118.1460757), (34.0906925, -118..."
2,2430510,"[27059058, 268524718, 4033681836]","[(34.0875384, -118.1449876), (34.0875564, -118..."
3,3102198,"[26390344, 14806190, 1132180415, 367134010, 11...","[(34.5541692, -118.6712305), (34.5545403, -118..."
4,3124334,"[27363713, 371965757, 371965758, 14918242]","[(34.1107781, -118.1799885), (34.1107071, -118..."


## Step 5: Towards graph

- Here, we have all the ways, all the node IDs associated with ways and locations of the nodes
- To create a graph we will create 2 files, one for nodes and one for edges
- Starting from this step, most work is done in Pandas

#### Create a new ID for all nodes 
- We will create a unique list of node IDs and use its index as new node IDs

In [26]:
# All the edges
# New node IDs are the index of nodes in this list (start from 0)

already_processed_nodes =[]

# Nested for loop and a if condition, very inefficient
for i, (w_id, n_id, lat_long) in df_ways.iterrows():
    #print(i)
    for j in n_id:
        
        #print(n_id[j])
        
        # Make a unique copy of all the nodes
        if j not in already_processed_nodes:
            already_processed_nodes.append(j)

already_processed_nodes = np.array(already_processed_nodes)

# Verify
print(len(already_processed_nodes))
a = np.unique(already_processed_nodes)
print(len(a))

777017
777017


In [28]:
# we go through the list again
new_node_ids_column = []

# again , very inefficient
for i, (w_id, n_id, lat_long) in df_ways.iterrows():
    #print(i)
    new_node_ids=[]
    for j in n_id:
        new_node_ids.append(np.where(already_processed_nodes==j)[0][0])
        #print(type(np.where(already_processed_nodes==j)[0]))
    new_node_ids_column.append(new_node_ids)
    
df_ways.insert(loc=2, column='New node IDs', value=new_node_ids_column)
df_ways.head()

Unnamed: 0,Way ID,Node IDs,New node IDs,Lat/Long
0,2417713,"[297523835, 364042999]","[0, 1]","[(34.2599243, -118.4387025), (34.2567447, -118..."
1,2430283,"[598686507, 1377652371, 268525224, 3768827286,...","[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1...","[(34.0906254, -118.1460757), (34.0906925, -118..."
2,2430510,"[27059058, 268524718, 4033681836]","[18, 19, 20]","[(34.0875384, -118.1449876), (34.0875564, -118..."
3,3102198,"[26390344, 14806190, 1132180415, 367134010, 11...","[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 3...","[(34.5541692, -118.6712305), (34.5545403, -118..."
4,3124334,"[27363713, 371965757, 371965758, 14918242]","[37, 38, 39, 40]","[(34.1107781, -118.1799885), (34.1107071, -118..."


In [29]:
# save file
df_ways.to_pickle('./other_files/new_node_ids_data_frame.pkl') 

In [30]:
new_nodes_ids_df = pd.read_pickle('./other_files/new_node_ids_data_frame.pkl')
new_nodes_ids_df.head()

Unnamed: 0,Way ID,Node IDs,New node IDs,Lat/Long
0,2417713,"[297523835, 364042999]","[0, 1]","[(34.2599243, -118.4387025), (34.2567447, -118..."
1,2430283,"[598686507, 1377652371, 268525224, 3768827286,...","[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1...","[(34.0906254, -118.1460757), (34.0906925, -118..."
2,2430510,"[27059058, 268524718, 4033681836]","[18, 19, 20]","[(34.0875384, -118.1449876), (34.0875564, -118..."
3,3102198,"[26390344, 14806190, 1132180415, 367134010, 11...","[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 3...","[(34.5541692, -118.6712305), (34.5545403, -118..."
4,3124334,"[27363713, 371965757, 371965758, 14918242]","[37, 38, 39, 40]","[(34.1107781, -118.1799885), (34.1107071, -118..."
