# WIP - Origin-Destination (OD) demand

### Goal of the notebook

This notebook groups together four parts of the demand pipeline, particularly:
1. Doing a spatial join of internal or external centroid zones with origin and destination locations of car trips, which gives us 3 datasets (OD matrices).
2. Clustering the OD demand data that was spatially joined in 1. into 15-minute and 1-hour timeframes, which means getting the sum of the number of cars that traveled between centroid zones within specified timeframe.
3. Generating demand matrices between all of the centroids for specified timeframes (X files for X time steps (1-1:15pm, 1:15-1:30pm, etc.).
4. Estimating the external to external demand from Flow data and comparing it with the Streetlight data.

We'll going to work with demand data provided for various types of traffic in Fremont.
Demand datasets fall into three categories based on **origin** and **destination** of the cars driving through the areas:

1. Cars that **start** their trip within internal centroid zones and **end** within internal centroid zones **(internal-internal demand)** (also called "Internal Fremont leg")
2. Cars that **start** their trip within internal centroid zones and **end** within external centroid zones **(internal-external demand)** (also called "Starting Fremont leg")
3. Cars that **start** their trip within external centroid zones and **end** within internal centroid zones **(external-internal demand)** (also called "Ending Fremont leg")

As you can see, for now we don't have any knowledge about cars that drive through the area - in other words, they **start** their trip within external centroid zones (or are coming from outside) and **end** their trip within external centroid zones (or are on their way somewhere away from the city).

***

**Outputs:**

OD matrices:
- int_int_OD.csv
- int_ext_OD.csv
- ext_int_OD.csv

Clustered demand into 15 minute timeframes:
- int_int_OD_cluster_15min.csv
- int_ext_OD_cluster_15min.csv
- ext_int_OD_cluster_15min.csv

Clustered demand into 1 hour timeframes:
- int_int_OD_cluster_1h.csv
- int_ext_OD_cluster_1h.csv
- ext_int_OD_cluster_1h.csv

**Inputs:**

Fremont traffic legs:
- internal_fremont_legs.csv
- starting_fremont_legs.csv
- ending_fremont_legs.csv

Centroid zones:
- InternalCentroidZones.shp
- ExternalCentroidZones.shp

For the external to external OD demand destination, we're trying to compare the estimations to the Street Light data (PDF available here: [/Data processing/Raw/Demand/Flow_speed/Street Light data/SR 262 link analysis/SR_262_Streetlight.pdf](https://www.dropbox.com/s/zg5mzys0nb2w7k1/SR_262_Streetlight.pdf?dl=0)

***

**Work done by the code:**
0. [Getting the data and pre-processing Fremont legs](#0.-Getting-the-data-and-pre-processing-Fremont-legs)
1. [Spatial joins](#1.-Spatial-joins)
2. [Clustering OD matrices](#2.-Clustering-OD-matrices)
3. [Generating demand matrices between all of the centroids](#3.-Generating-demand-matrices-between-all-of-the-centroids)
4. [Estimating the external to external demand](#4.-Estimating-the-external-to-external-demand)
5. [Final exports](#5.-Final-exports)

---

## 0. Getting the data and pre-processing Fremont legs

In [25]:
# --- Global variables

# Setting up the Coordinate Reference Systems up front in the necessary format.
crs_degree = {'init': 'epsg:4326'} # CGS_WGS_1984 (what the GPS uses)

# --- Paths

# Root path of Fremont Dropbox
import os
import sys
# We let this notebook to know where to look for fremontdropbox module
module_path = os.path.abspath(os.path.join('../..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from fremontdropbox import get_dropbox_location
# Root path of the Dropbox business account
dbx = get_dropbox_location()

# Temporary! Location of the folder where the restructuring is currently happening
data_path = dbx + '/Private Structured data collection'

aux_files = data_path+'/Data processing/Auxiliary files'

# Processing output path
output_path = data_path + '/Data processing/Temporary exports to be copied to processed data'

Internal and external centroid zones are exported from ArcGIS. They contain geometries (shapes created from polygons around certain zone) so they can be loaded with `GeoPandas` as `GeoDataFrame`:

In [2]:
# Read more about GeoPandas data structures here: http://geopandas.org/data_structures.html
from geopandas import GeoDataFrame

# Centroid zones
int_centroid_zones = GeoDataFrame.from_file(data_path + "/Data processing/Raw/Demand/TAZ/InternalCentroidZones.shp")
ext_centroid_zones = GeoDataFrame.from_file(data_path + "/Data processing/Raw/Demand/TAZ/ExternalCentroidZones.shp")

Let's load all the Fremont legs, e.g. all categories of demand data we have:

In [3]:
import pandas as pd

# Cars that stay within internal centroid zones
internal_legs = pd.read_csv(data_path+'/Data processing/Raw/Demand/SFCTA demand data/internal_fremont_legs.csv')

# Cars that start within internal centroid zones and end outside
starting_legs = pd.read_csv(data_path+'/Data processing/Raw/Demand/SFCTA demand data/starting_fremont_legs.csv')

# Cars that start outside and end within internal centroid zones
ending_legs = pd.read_csv(data_path+'/Data processing/Raw/Demand/SFCTA demand data/ending_fremont_legs.csv')

We have to convert lattitude and longitude into **`Point` geometries** and convert Fremont legs `DataFrame`s into `GeoDataFrame`s. For "enhancing" the datasets with geometries, we're defining `add_point_geometry` function:

In [4]:
def add_point_geometry(df, lng_column='start_node_lng', lat_column='start_node_lat', geometry_column='geometry'):
    """
    Add a new Point geometry column
    Parameters
    ----------
    df : DataFrame
        DataFrame representing demand legs (internal|starting|ending)
    lat_column : string
        Name of the column representing lattitude
    lng_column : string
        Name of the column representing longitude
    geometry_column : string
        Name of the column that will represent geometry column

    Returns
    -------
    df_with_geometry : GeoDataFrame
        GeoDataFrame representing demand legs (internal|starting|ending) with added point geometry.
    """
    # Process XY coordinates data as Point geometry
    from shapely.geometry import Point
    points = [Point(xy) for xy in zip(df[lng_column], df[lat_column])]
    
    gdf = GeoDataFrame(df, crs=crs_degree, geometry=points)
    gdf = gdf.rename(columns={'geometry': geometry_column}).set_geometry(geometry_column)
    return gdf

In [5]:
# Geometries column names
start_node_geometry_column = 'start_node_geometry'
end_node_geometry_column = 'end_node_geometry'

# Converting each leg into GeoDataFrame (with two geometries for start end nodes)
int_int_legs = add_point_geometry(internal_legs, lng_column='start_node_lng', lat_column='start_node_lat', geometry_column=start_node_geometry_column)
int_int_legs = add_point_geometry(int_int_legs, lng_column='end_node_lng', lat_column='end_node_lat', geometry_column=end_node_geometry_column)

int_ext_legs = add_point_geometry(starting_legs, lng_column='start_node_lng', lat_column='start_node_lat', geometry_column=start_node_geometry_column)
int_ext_legs = add_point_geometry(int_ext_legs, lng_column='end_node_lng', lat_column='end_node_lat', geometry_column=end_node_geometry_column)

ext_int_legs = add_point_geometry(ending_legs, lng_column='start_node_lng', lat_column='start_node_lat', geometry_column=start_node_geometry_column)
ext_int_legs = add_point_geometry(ext_int_legs, lng_column='end_node_lng', lat_column='end_node_lat', geometry_column=end_node_geometry_column)

---

## 1. Spatial joins

In a Spatial Join, observations from GeoDataFrames are combined **based on their spatial relationship to one another**.

Docs: http://geopandas.org/mergingdata.html

In [6]:
from geopandas import sjoin

def spatial_join_nodes_with_centroids(gdf, centroid_zones, how='left', op='within', type='origin'):
    if type == 'origin':
        gdf_to_join = gdf.set_geometry(start_node_geometry_column)
    elif type == 'destination':
        gdf_to_join = gdf.set_geometry(end_node_geometry_column)
    
    if type not in ['origin', 'destination']:
        raise ValueError('{type} argument is incorrect, use "origin" or "destination"'.format(type=repr(type)))
        
    centroid_id_column = "CentroidID_O" if type == 'origin' else "CentroidID_D"

    gdf_to_join = sjoin(gdf_to_join, centroid_zones, how='left', op='within')
    gdf_to_join.rename(
        columns={
            "CentroidID": centroid_id_column
        },
        inplace=True
    )

    for column in ['index_left', 'index_right', 'OBJECTID']:
        try:
            gdf_to_join.drop(column, axis=1, inplace=True)
        except KeyError:
            # ignore if there are no index columns
            pass
        
    return gdf_to_join

In [7]:
# Internal to internal OD matrix
int_int_start_nodes = spatial_join_nodes_with_centroids(int_int_legs, int_centroid_zones, type='origin')
int_int_end_nodes = spatial_join_nodes_with_centroids(int_int_legs, int_centroid_zones, type='destination')

int_int_OD = int_int_start_nodes.combine_first(int_int_end_nodes)
int_int_OD['OBJECTID'] = int_int_OD.index + 1
# Parse CentroidIDs to be numeric
int_int_OD["CentroidID_O"] = pd.to_numeric(int_int_OD["CentroidID_O"], downcast='signed')
int_int_OD["CentroidID_D"] = pd.to_numeric(int_int_OD["CentroidID_D"], downcast='signed')
int_int_OD.head()

Unnamed: 0,CentroidID_D,CentroidID_O,FID,Shape_Area,Shape_Leng,end_node_geometry,end_node_lat,end_node_lng,leg_id,start_node_geometry,start_node_lat,start_node_lng,start_time,OBJECTID
0,1,2,16,2379123.0,6270.634124,POINT (-121.93948 37.50078),37.50078,-121.93948,87585773,POINT (-121.94962 37.50937),37.50937,-121.94962,2000-01-01 22:39:00 +0800,1
1,30,30,15,1167626.0,5539.626488,POINT (-121.94508 37.51287),37.51287,-121.94508,87663883,POINT (-121.94493 37.51764),37.51764,-121.94493,2000-01-01 22:52:00 +0800,2
2,30,30,15,1167626.0,5539.626488,POINT (-121.94493 37.51764),37.51764,-121.94493,87663885,POINT (-121.94508 37.51287),37.51287,-121.94508,2000-01-02 00:29:00 +0800,3
3,25,1,2,2260875.0,6766.028824,POINT (-121.92959 37.49598),37.49598,-121.92959,87771074,POINT (-121.95432 37.50237),37.50237,-121.95432,2000-01-01 22:35:00 +0800,4
4,1,25,4,1361209.0,6126.178508,POINT (-121.95432 37.50237),37.50237,-121.95432,87771078,POINT (-121.92959 37.49598),37.49598,-121.92959,2000-01-02 02:48:00 +0800,5


In [8]:
# Internal to external OD matrix
int_ext_start_nodes = spatial_join_nodes_with_centroids(int_ext_legs, int_centroid_zones, type='origin')
int_ext_end_nodes = spatial_join_nodes_with_centroids(int_ext_legs, ext_centroid_zones, type='destination')

int_ext_OD = int_ext_start_nodes.combine_first(int_ext_end_nodes)
int_ext_OD['OBJECTID'] = int_ext_OD.index + 1
# Parse CentroidIDs to be numeric
int_ext_OD["CentroidID_O"] = pd.to_numeric(int_ext_OD["CentroidID_O"], downcast='signed')
int_ext_OD["CentroidID_D"] = pd.to_numeric(int_ext_OD["CentroidID_D"], downcast='signed')
int_ext_OD.head()

Unnamed: 0,CentroidID_D,CentroidID_O,FID,Shape_Area,Shape_Leng,end_node_geometry,end_node_lat,end_node_lng,leg_id,start_node_geometry,start_node_lat,start_node_lng,start_time,OBJECTID
0,4,2,16,2379123.0,6270.634124,POINT (-122.38437 37.59346),37.59346,-122.38437,87531622,POINT (-121.95007 37.51941),37.51941,-121.95007,2000-01-02 03:27:00 +0800,1
1,4,2,16,2379123.0,6270.634124,POINT (-121.96212 37.50821),37.50821,-121.96212,87585767,POINT (-121.94962 37.50937),37.50937,-121.94962,2000-01-01 19:38:00 +0800,2
2,4,2,16,2379123.0,6270.634124,POINT (-122.38779 37.60484),37.60484,-122.38779,87585779,POINT (-121.94962 37.50937),37.50937,-121.94962,2000-01-02 03:07:00 +0800,3
3,4,16,14,571366.0,8958.687162,POINT (-122.47037 37.58723),37.58723,-122.47037,87588447,POINT (-121.93459 37.50618),37.50618,-121.93459,2000-01-02 03:01:00 +0800,4
4,4,18,9,1573981.0,5535.771845,POINT (-122.377 37.58809),37.58809,-122.377,87600069,POINT (-121.92912 37.50737),37.50737,-121.92912,2000-01-02 02:11:00 +0800,5


### Attention! Index 56707 of int_ext_OD is duplicated. I don't know if it's okay!

In [9]:
# Code to check it out:
int_ext_end_nodes.loc[[56705, 56706, 56707, 56708, 56709, 56710]]

Unnamed: 0,leg_id,start_time,start_node_lat,start_node_lng,end_node_lat,end_node_lng,start_node_geometry,end_node_geometry,FID,Shape_Leng,Shape_Area,CentroidID_D
56705,95894917,2000-01-02 01:32:00 +0800,37.53554,-121.95262,37.55931,-122.003,POINT (-121.95262 37.53554),POINT (-122.00300 37.55931),3,21534.252771,8302704.0,5
56706,95894921,2000-01-01 23:44:00 +0800,37.53642,-121.93739,37.40996,-121.8885,POINT (-121.93739 37.53642),POINT (-121.88850 37.40996),5,330568.646352,5429565000.0,20
56707,95894923,2000-01-02 03:41:00 +0800,37.53642,-121.93739,37.60851,-122.06748,POINT (-121.93739 37.53642),POINT (-122.06748 37.60851),1,66950.443866,202540100.0,12
56707,95894923,2000-01-02 03:41:00 +0800,37.53642,-121.93739,37.60851,-122.06748,POINT (-121.93739 37.53642),POINT (-122.06748 37.60851),7,628478.423004,19860190000.0,4
56708,95894926,2000-01-01 18:41:00 +0800,37.53642,-121.93739,37.4364,-121.88063,POINT (-121.93739 37.53642),POINT (-121.88063 37.43640),5,330568.646352,5429565000.0,20
56709,95894928,2000-01-01 18:40:00 +0800,37.54135,-121.94815,37.4255,-121.88448,POINT (-121.94815 37.54135),POINT (-121.88448 37.42550),5,330568.646352,5429565000.0,20
56710,95894932,2000-01-02 06:21:00 +0800,37.53466,-121.93943,37.75336,-122.24107,POINT (-121.93943 37.53466),POINT (-122.24107 37.75336),7,628478.423004,19860190000.0,4


In [10]:
# External to internal OD matrix
ext_int_start_nodes = spatial_join_nodes_with_centroids(ext_int_legs, ext_centroid_zones, type='origin')
ext_int_end_nodes = spatial_join_nodes_with_centroids(ext_int_legs, int_centroid_zones, type='destination')

ext_int_OD = ext_int_start_nodes.combine_first(ext_int_end_nodes)
ext_int_OD['OBJECTID'] = ext_int_OD.index + 1
# Parse CentroidIDs to be numeric
ext_int_OD["CentroidID_O"] = pd.to_numeric(ext_int_OD["CentroidID_O"], downcast='signed')
ext_int_OD["CentroidID_D"] = pd.to_numeric(ext_int_OD["CentroidID_D"], downcast='signed')
ext_int_OD.head()

Unnamed: 0,CentroidID_D,CentroidID_O,FID,Shape_Area,Shape_Leng,end_node_geometry,end_node_lat,end_node_lng,leg_id,start_node_geometry,start_node_lat,start_node_lng,start_time,OBJECTID
0,2,4,7,19860190000.0,628478.423004,POINT (-121.95007 37.51941),37.51941,-121.95007,87531621,POINT (-122.38437 37.59346),37.59346,-122.38437,2000-01-02 02:18:00 +0800,1
1,8,21,6,7244543000.0,437445.770541,POINT (-121.92245 37.53183),37.53183,-121.92245,87574224,POINT (-121.97604 37.38812),37.38812,-121.97604,2000-01-01 22:02:00 +0800,2
2,2,4,7,19860190000.0,628478.423004,POINT (-121.94962 37.50937),37.50937,-121.94962,87585765,POINT (-122.38779 37.60484),37.60484,-122.38779,2000-01-01 16:32:00 +0800,3
3,2,4,7,19860190000.0,628478.423004,POINT (-121.94962 37.50937),37.50937,-121.94962,87585770,POINT (-121.96212 37.50821),37.50821,-121.96212,2000-01-01 22:19:00 +0800,4
4,2,20,5,5429565000.0,330568.646352,POINT (-121.94962 37.50937),37.50937,-121.94962,87581078,POINT (-121.88313 37.44256),37.44256,-121.88313,2000-01-02 07:05:00 +0800,5


---

## 2. Clustering OD matrices

### Cluster demand data per 15min, set (origin, dest) as index, time as column

In [11]:
from pytz import timezone, utc

local_tz = timezone('US/Pacific')

def cluster_demand_15min(df):
    """
    Exports an origin-destination matrix into CSV.
    
    -----------------------------------------------
    | CentroidID_O | CentroidID_D | dt_15 | count |
    -----------------------------------------------
    
    Parameters
    ----------
    df : DataFrame
        DataFrame representing OD matrix
    output_path : string
        Output path
    """
    demand_df = df
    demand_df['dt'] = pd.to_datetime(demand_df['start_time'])
    dt_15=[]
    for dt in demand_df['dt']:
        # Replace each dt value (start_time) with the time in current 15 minute chunk of the hour
        # (e.g. 22:39 -> 22:30 as it's past 22:30 but before 22:45)
        dt_15.append(dt.replace(minute=int(dt.minute/15)*15,second = 0).replace(tzinfo=utc))

    demand_df['dt_15'] = dt_15
    grouped_od_demand_15min = demand_df.groupby(['CentroidID_D', 'CentroidID_O', 'dt_15']).size().reset_index(name='count')
    return grouped_od_demand_15min

In [12]:
int_int_OD_demand_cluster_15min = cluster_demand_15min(int_int_OD)
int_int_OD_demand_cluster_15min.head()

Unnamed: 0,CentroidID_D,CentroidID_O,dt_15,count
0,1,1,2000-01-01 14:45:00+00:00,2
1,1,1,2000-01-01 16:45:00+00:00,1
2,1,1,2000-01-01 17:00:00+00:00,1
3,1,1,2000-01-01 18:00:00+00:00,2
4,1,1,2000-01-01 18:15:00+00:00,2


In [13]:
int_ext_OD_demand_cluster_15min = cluster_demand_15min(int_ext_OD)
int_ext_OD_demand_cluster_15min.head()

Unnamed: 0,CentroidID_D,CentroidID_O,dt_15,count
0,4,1,2000-01-01 12:45:00+00:00,1
1,4,1,2000-01-01 13:45:00+00:00,1
2,4,1,2000-01-01 14:30:00+00:00,1
3,4,1,2000-01-01 15:45:00+00:00,1
4,4,1,2000-01-01 16:00:00+00:00,1


In [14]:
ext_int_OD_demand_cluster_15min = cluster_demand_15min(ext_int_OD)
ext_int_OD_demand_cluster_15min.head()

Unnamed: 0,CentroidID_D,CentroidID_O,dt_15,count
0,1,4,2000-01-01 12:30:00+00:00,1
1,1,4,2000-01-01 14:00:00+00:00,4
2,1,4,2000-01-01 14:15:00+00:00,1
3,1,4,2000-01-01 14:30:00+00:00,3
4,1,4,2000-01-01 14:45:00+00:00,3


---

## 3. Generating demand matrices between all of the centroids

In [15]:
def demand_between_all_centroids():
    ext_int_df = ext_int_OD_demand_cluster_15min
    ext_int_df['CentroidID_O_name'] = ['E' + str(i) for i in ext_int_df['CentroidID_O']]
    ext_int_df['CentroidID_D_name'] = ['I' + str(i) for i in ext_int_df['CentroidID_D']]

    int_ext_df = int_ext_OD_demand_cluster_15min
    int_ext_df['CentroidID_O_name'] = ['I' + str(i) for i in int_ext_df['CentroidID_O']]
    int_ext_df['CentroidID_D_name'] = ['E' + str(i) for i in int_ext_df['CentroidID_D']]

    int_int_df = int_int_OD_demand_cluster_15min
    int_int_df['CentroidID_O_name'] = ['I' + str(i) for i in int_int_df['CentroidID_O']]
    int_int_df['CentroidID_D_name'] = ['I' + str(i) for i in int_int_df['CentroidID_D']]

    # Concat matrices
    df = ext_int_df.append(int_ext_df)
    df = df.append(int_int_df)
    df = df.reset_index(drop=True)

    # Normalize count
    df['count'] = df['count']*4

    # Filter time
    df['min'] = [df for df in df['dt_15']]
    df['min'] = pd.to_datetime(df['min'], format="%H:%M")
    df = df[(df['min'] >= pd.Timestamp(2000, 1, 1, 13, 00).replace(tzinfo=utc)) & (df['min'] < pd.Timestamp(2000, 1, 1, 19, 00).replace(tzinfo=utc))]

    return df

In [16]:
concatenated_matrices = demand_between_all_centroids()
concatenated_matrices.head(10)

Unnamed: 0,CentroidID_D,CentroidID_O,dt_15,count,CentroidID_O_name,CentroidID_D_name,min
1,1,4,2000-01-01 14:00:00+00:00,16,E4,I1,2000-01-01 14:00:00+00:00
2,1,4,2000-01-01 14:15:00+00:00,4,E4,I1,2000-01-01 14:15:00+00:00
3,1,4,2000-01-01 14:30:00+00:00,12,E4,I1,2000-01-01 14:30:00+00:00
4,1,4,2000-01-01 14:45:00+00:00,12,E4,I1,2000-01-01 14:45:00+00:00
5,1,4,2000-01-01 15:00:00+00:00,16,E4,I1,2000-01-01 15:00:00+00:00
6,1,4,2000-01-01 15:15:00+00:00,16,E4,I1,2000-01-01 15:15:00+00:00
7,1,4,2000-01-01 15:30:00+00:00,12,E4,I1,2000-01-01 15:30:00+00:00
8,1,4,2000-01-01 15:45:00+00:00,16,E4,I1,2000-01-01 15:45:00+00:00
9,1,4,2000-01-01 16:00:00+00:00,28,E4,I1,2000-01-01 16:00:00+00:00
10,1,4,2000-01-01 16:15:00+00:00,28,E4,I1,2000-01-01 16:15:00+00:00


---

## 4. Estimating the external to external demand

### Going from south to north

PeMS detectors positioned at the entering of the highway on the south side: **403250**.
PeMS detectors positioned at the exiting of the highway on the north side: **402799**.

This will be estimation of external to external demand from PeMS data along the highway (going north). We'll be using origin **PeMS detector with ID 403250** and destination **PeMS detector with ID 402799**.

PeMS vehicle data is sampled at 5 min. intervals. We need to first get them clustered into 15 min. intervals.

In [17]:
def cluster_pems_by_15_min(path):
    pems_dataset = pd.read_excel(path)
    cluster_pems_15_min = pd.DataFrame()
    pems_1_day = pems_dataset[pems_dataset['5 Minutes'].dt.day.eq(5)]
    pems_1_day.index = pems_1_day['5 Minutes']
    pems_1_day.index.name = '15min'
    cluster_pems_15_min['Flow (Veh/15min)'] = pems_1_day['Flow (Veh/5 Minutes)'].resample('15min').sum()
    cluster_pems_15_min.reset_index(inplace=True)
    cluster_pems_15_min.index.name = None
    cluster_pems_15_min['df_15'] = cluster_pems_15_min['15min'].map(
        lambda x: pd.Timestamp(
            year=2000,
            month=1,
            day=1,
            hour=x.hour,
            minute=x.minute,
            second=x.second
        )
    )
    return cluster_pems_15_min

Here we're using PeMS data from 2019:

In [19]:
pems_o_cluster_15_min = cluster_pems_by_15_min(data_path+'/Data processing/Raw/Demand/PeMs/PeMS_2019/403250_2019.xlsx')
pems_d_cluster_15_min = cluster_pems_by_15_min(data_path+'/Data processing/Raw/Demand/PeMs/PeMS_2019/402799_2019.xlsx')

### Demand inference

In [20]:
# estimate ext_ext OD data
# estimate ext20_ext13, Centroid_O = 23 (pems 403250), Centroid_D = 31 (pems 402799), going north along highway
def compute_ext_ext(origin_centroid, dest_centroid, cluster_OD_15min_ext_int, cluster_OD_15min_int_ext):
    """
    Computes the external to external estimaton
    """
    display(pems_o_cluster_15_min)
    display(pems_d_cluster_15_min)
    flow_o = pems_o_cluster_15_min['Flow (Veh/15min)']
    flow_d = pems_d_cluster_15_min['Flow (Veh/15min)']
    display(flow_o)
    
    ext_int = cluster_OD_15min_ext_int
    display(ext_int)
    
    # Cars drive through origin cluster (Centroid_O = 20), they are external to internal...
    # We need have 
    # sum of all ext_int demand depart from an ext node
    #sum_ext_int_o = ext_int[ext_int['CentroidID_O']==origin_centroid].sum()['count']
    #display(sum_ext_int_o)
    
    return
    int_ext = cluster_OD_15min_int_ext.reset_index()
    # sum of all int_ext demand arrive at an ext node
    sum_int_ext_d = int_ext[int_ext['CentroidID_D']==dest_centroid].sum()['count'].reset_index()[0]
    display(sum_int_ext_d)
    result1 = flow_o-sum_ext_int_o
    result2 = flow_d-sum_int_ext_d
    display(result1)
    display(result2)
    final_result = result1-result2
    return final_result

#ext_ext_OD_col_15 = compute_ext_ext(23, 31, ext_int_OD_demand_cluster_15min, int_ext_OD_demand_cluster_15min)

In [21]:
#ext_ext_OD_col_15 = compute_ext_ext(20, 13, ext_int_OD_demand_cluster_15min, int_ext_OD_demand_cluster_15min)
#ext_ext_OD_col_15.head()

In [22]:
def compare_with_demand(origin_centroid, dest_centroid, cluster_OD_15min_ext_int, cluster_OD_15min_int_ext):
    """
    Compares inferred external to external demand with int_ext and ext_int data
    """
    # Cars drive through origin cluster (Centroid_O = 20), they are external to internal...
    # We need have 
    # sum of all ext_int demand depart from an ext node
    #sum_ext_int_o = ext_int[ext_int['CentroidID_O']==origin_centroid].sum()['count']
    #display(sum_ext_int_o)
    
    return
    int_ext = cluster_OD_15min_int_ext.reset_index()
    # sum of all int_ext demand arrive at an ext node
    sum_int_ext_d = int_ext[int_ext['CentroidID_D']==dest_centroid].sum()['count'].reset_index()[0]
    display(sum_int_ext_d)
    result1 = flow_o-sum_ext_int_o
    result2 = flow_d-sum_int_ext_d
    display(result1)
    display(result2)
    final_result = result1-result2
    return final_result

#ext_ext_comparison = compare_with_demand(20, 13, ext_int_OD_demand_cluster_15min, int_ext_OD_demand_cluster_15min)

---

## 5. Final exports

In [27]:
from pathlib import Path

### OD matrices

In [23]:
def export_od_matrix_to_csv(df, path):
    """
    Exports an origin-destination matrix into CSV
    Parameters
    ----------
    df : DataFrame
        DataFrame representing OD matrix
    output_path : string
        Output path
    """
    if path == '' or path == None:
        raise ValueError('"output_path" cannot be empty.')
        
    pd.DataFrame.to_csv(df,
        path,
        encoding='utf8',
        columns=["OBJECTID", "leg_id","start_time","start_node_lat","start_node_lng","end_node_lat","end_node_lng","CentroidID_O","CentroidID_D"]
    )

In [28]:
# Export all OD matrixes
demand_folder_export_path = output_path+'/Demand'
Path(demand_folder_export_path).mkdir(parents=True, exist_ok=True)

export_od_matrix_to_csv(int_int_OD, demand_folder_export_path+'/int_int_OD.csv')
export_od_matrix_to_csv(int_ext_OD, demand_folder_export_path+'/int_ext_OD.csv')
export_od_matrix_to_csv(ext_int_OD, demand_folder_export_path+'/ext_int_OD.csv')

### Clustered matrices

In [29]:
int_int_OD_demand_cluster_15min.to_csv(demand_folder_export_path+'/int_int_OD_demand_cluster_15min.csv')
int_ext_OD_demand_cluster_15min.to_csv(demand_folder_export_path+'/int_ext_OD_demand_cluster_15min.csv')
ext_int_OD_demand_cluster_15min.to_csv(demand_folder_export_path+'/ext_int_OD_demand_cluster_15min.csv')

### Demand matrices between all centroids

In [31]:
# Save by timestamp group
import numpy as np

def export_all_demand_between_centroids(df, output):
    # Create folder if it doesn't exist
    Path(output).mkdir(parents=True, exist_ok=True)

    for timestamp in concatenated_matrices['min'].unique():
        demand_matrices = concatenated_matrices.groupby('min').get_group(timestamp)
        demand_matrices = pd.pivot_table(demand_matrices, values='count', index='CentroidID_O_name', columns='CentroidID_D_name', aggfunc=np.sum)
        demand_matrices = demand_matrices.fillna(0)
        demand_matrices.insert(0, '', value=demand_matrices.index)
        demand_matrices.to_csv(output+'/'+str(timestamp.isoformat()).replace(':', '-')+'.csv', index=False)
    print('Datasets with demand between centroids exported.')
        
export_all_demand_between_centroids(concatenated_matrices, demand_folder_export_path+'/OD grouped by timestamp')

Datasets with demand between centroids exported.
