# Generating OD matrices for the demand

We'll going to work with demand data provided for various types of traffic in Fremont.
Demand datasets **(that we currently have)** fall into three categories based on **origin** and **destination** of the cars driving through the areas:

1. Cars that **start** their trip within internal centroid zones and **end** within internal centroid zones **(internal-internal demand)**
2. Cars that **start** their trip within internal centroid zones and **end** within external centroid zones **(internal-external demand)**
3. Cars that **start** their trip within external centroid zones and **end** within internal centroid zones **(external-internal demand)**

As you can see, for now we don't have any knowledge about cars that drive through the area - in other words, they **start** their trip within external centroid zones (or are coming from outside) and **end** their trip within external centroid zones (or are on their way somewhere away from the city).

In [15]:
# --- Global variables

# Setting up the Coordinate Reference Systems up front in the necessary format.
crs_degree = {'init': 'epsg:4326'} # CGS_WGS_1984 (what the GPS uses)

# --- Paths

# Root path of Fremont Dropbox
from fremontdropbox import get_dropbox_location
dbx = get_dropbox_location()

# Temporary! Location of the folder where the restructuring is currently happening
data_path = dbx + '/Private Structured data collection'

# Processing output path
output_path = data_path + '/Processed data'

Internal and external centroid zones are exported from ArcGIS. They contain geometries (shapes created from polygons around certain zone) so they can be loaded with `GeoPandas` as `GeoDataFrame`:

In [2]:
# Read more about GeoPandas data structures here: http://geopandas.org/data_structures.html
from geopandas import GeoDataFrame

# Centroid zones
int_centroid_zones = GeoDataFrame.from_file(data_path + "/Data processing/Demand/TAZ/InternalCentroidZones.shp")
ext_centroid_zones = GeoDataFrame.from_file(data_path + "/Data processing/Demand/TAZ/ExternalCentroidZones.shp")

Let's load all the Fremont "legs", e.g. all categories of demand data we have:

In [3]:
import pandas as pd

# Cars that stay within internal centroid zones
internal_legs = pd.read_csv(data_path+'/Data processing/Demand/SFCTA demand data/internal_fremont_legs.csv')

# Cars that start within internal centroid zones and end outside
starting_legs = pd.read_csv(data_path+'/Data processing/Demand/SFCTA demand data/starting_fremont_legs.csv')

# Cars that start outside and end within internal centroid zones
ending_legs = pd.read_csv(data_path+'/Data processing/Demand/SFCTA demand data/ending_fremont_legs.csv')

We have to convert lattitude and longitude into **`Point` geometries** and convert Fremont legs `DataFrame`s into `GeoDataFrame`s. For "enhancing" the datasets with geometries, we're defining `add_point_geometry` function:

In [4]:
def add_point_geometry(df, lng_column='start_node_lng', lat_column='start_node_lat', geometry_column='geometry'):
    """
    Add a new Point geometry column
    Parameters
    ----------
    df : DataFrame
        DataFrame representing demand legs (internal|starting|ending)
    lat_column : string
        Name of the column representing lattitude
    lng_column : string
        Name of the column representing longitude
    geometry_column : string
        Name of the column that will represent geometry column

    Returns
    -------
    df_with_geometry : GeoDataFrame
        GeoDataFrame representing demand legs (internal|starting|ending) with added point geometry.
    """
    # Process XY coordinates data as Point geometry
    from shapely.geometry import Point
    points = [Point(xy) for xy in zip(df[lng_column], df[lat_column])]
    
    gdf = GeoDataFrame(df, crs=crs_degree, geometry=points)
    gdf = gdf.rename(columns={'geometry': geometry_column}).set_geometry(geometry_column)
    return gdf

In [5]:
# Geometries column names
start_node_geometry_column = 'start_node_geometry'
end_node_geometry_column = 'end_node_geometry'

# Converting each leg into GeoDataFrame (with two geometries for start end nodes)
int_int_legs = add_point_geometry(internal_legs, lng_column='start_node_lng', lat_column='start_node_lat', geometry_column=start_node_geometry_column)
int_int_legs = add_point_geometry(int_int_legs, lng_column='end_node_lng', lat_column='end_node_lat', geometry_column=end_node_geometry_column)

int_ext_legs = add_point_geometry(starting_legs, lng_column='start_node_lng', lat_column='start_node_lat', geometry_column=start_node_geometry_column)
int_ext_legs = add_point_geometry(int_ext_legs, lng_column='end_node_lng', lat_column='end_node_lat', geometry_column=end_node_geometry_column)

ext_int_legs = add_point_geometry(ending_legs, lng_column='start_node_lng', lat_column='start_node_lat', geometry_column=start_node_geometry_column)
ext_int_legs = add_point_geometry(ext_int_legs, lng_column='end_node_lng', lat_column='end_node_lat', geometry_column=end_node_geometry_column)

---

## Spatial joins

In a Spatial Join, observations from GeoDataFrames are combined **based on their spatial relationship to one another**.

Docs: http://geopandas.org/mergingdata.html

In [6]:
from geopandas import sjoin

def spatial_join_nodes_with_centroids(gdf, centroid_zones, how='left', op='within', type='origin'):
    if type == 'origin':
        gdf_to_join = gdf.set_geometry(start_node_geometry_column)
    elif type == 'destination':
        gdf_to_join = gdf.set_geometry(end_node_geometry_column)
    
    if type not in ['origin', 'destination']:
        raise ValueError('{type} argument is incorrect, use "origin" or "destination"'.format(type=repr(type)))
        

    gdf_to_join = sjoin(gdf_to_join, centroid_zones, how='left', op='within')
    gdf_to_join.rename(
        columns={
            "CentroidID": "CentroidID_O" if type == 'origin' else "CentroidID_D"
        },
        inplace=True
    )

    for column in ['index_left', 'index_right', 'OBJECTID']:
        try:
            gdf_to_join.drop(column, axis=1, inplace=True)
        except KeyError:
            # ignore if there are no index columns
            pass
        
    return gdf_to_join

In [7]:
# Internal to internal OD matrix
int_int_start_nodes = spatial_join_nodes_with_centroids(int_int_legs, int_centroid_zones, type='origin')
int_int_end_nodes = spatial_join_nodes_with_centroids(int_int_legs, int_centroid_zones, type='destination')

int_int_OD = int_int_start_nodes.combine_first(int_int_end_nodes)
int_int_OD['OBJECTID'] = int_int_OD.index + 1

In [8]:
# Internal to external OD matrix
int_ext_start_nodes = spatial_join_nodes_with_centroids(int_ext_legs, int_centroid_zones, type='origin')
int_ext_end_nodes = spatial_join_nodes_with_centroids(int_ext_legs, ext_centroid_zones, type='destination')

int_ext_OD = int_ext_start_nodes.combine_first(int_ext_end_nodes)
int_ext_OD['OBJECTID'] = int_ext_OD.index + 1

### Attention! Index 56707 of int_ext_OD is duplicated. I don't know if it's okay!

In [9]:
# Code to check it out:
# int_ext_end_nodes_OD.loc[[56705, 56706, 56707, 56708, 56709, 56710]]

In [10]:
# External to internal OD matrix
ext_int_start_nodes = spatial_join_nodes_with_centroids(ext_int_legs, ext_centroid_zones, type='origin')
ext_int_end_nodes = spatial_join_nodes_with_centroids(ext_int_legs, int_centroid_zones, type='destination')

ext_int_OD = ext_int_start_nodes.combine_first(ext_int_end_nodes)
ext_int_OD['OBJECTID'] = ext_int_OD.index + 1

---

## Export OD matrices as CSV files 

In [11]:
def export_od_matrix_to_csv(df, path):
    """
    Exports an origin-destination matrix into CSV
    Parameters
    ----------
    df : DataFrame
        DataFrame representing OD matrix
    output_path : string
        Output path
    """
    if path == '' or path == None:
        raise ValueError('"output_path" cannot be empty.')
        
    pd.DataFrame.to_csv(df,
        path,
        encoding='utf8',
        columns=["OBJECTID", "leg_id","start_time","start_node_lat","start_node_lng","end_node_lat","end_node_lng","CentroidID_O","CentroidID_D"]
    )

In [16]:
# Export all OD matrixes
export_od_matrix_to_csv(int_int_OD, output_path+'/Demand/int_int_OD.csv')
export_od_matrix_to_csv(int_ext_OD, output_path+'/Demand/int_ext_OD.csv')
export_od_matrix_to_csv(ext_int_OD, output_path+'/Demand/ext_int_OD.csv')

---