# Trajectories Clustering and Analysis

### Goal of this Notebook
Parsing and clustering the avaiable trajectory data and analyzing aggregate measurements.
***
**Outputs:**
- TRAJECTORIES.HTML [Manually Made] KeplerGl Interactive Interface File, pre-configured to load with Trajectories layers and spatial clustering.

**Inputs:**
- _trajectories.csv_ from raw trajectories data. Available on Dropbox in `/Private Structured data collection/Data processing/Auxiliary files/Demand/Flow_speed/Trajectories`.
- _trajectories_clusterd.csv_ from raw trajectories data. Available on Dropbox in `/Private Structured data collection/Data processing/Auxiliary files/Demand/Flow_speed/Trajectories`.
- _InternalCentroidZones.shp_ Shapefile available on Dropbox in `/Private Structured data collection/Data processing/Raw/Demand/OD demand/TAZ`
- _ExternalCentroidZones.shp_ Shapefile available on Dropbox in `/Private Structured data collection/Data processing/Raw/Demand/OD demand/TAZ`

**Temporary Files Within the Pipeline:** 
- No temporary files.

**Dependent Scripts:**
- No script dependencies.

**Dependent Libraries:**
- numpy
- pandas
- os
- csv
- json
- matplotlib
- shapely
- keplergl
- geopandas
- rtree
***
**Sections:**
- A. [Parse Raw Trajectory Data](#section_ID_a)
- B. [Plotting, Mapping & Analysis](#section_ID_c)

# To dos
- Make sure everything is working with the dropbox **DONE**
- Put the functions in a util script **DONE**
- Put the old clustering into another notebook **DONE**

0. Discuss with Michal about the way you did the clustering, as it might be helpful for him to reuse some of your code. **DONE**
1. Use the module fremont dropbox to get the folders from the dropbox (see next cell)  **DONE**
2. Create both files trajectories and trajectories condensed in the current iPython notebook.  **DONE**
    - Put them in `/Private Structured data collection/Data processing/Auxiliary files/Demand/Flow_speed/Trajectories`
3. Use the external and internal TAZs instead of the sklearn clustering to cluster the trajectories depending on their origin and destination  **DONE**
    - TAZ are shapefiles in `Private Structured data collection/Data processing/Raw/Demand/OD demand/TAZ`
4. Write a function that takes as input the ids of the origin and destination TAZ and output the corresponding trajectories using Kepler.gl  **DONE**
5. Remove `trajectories.csv` and `trajectories_condensed.csv` from GitHub (they are under NDA) **DONE**
6. Generate all Kepler.gl maps in `/Private Structured data collection/Data processing/Temporary exports to be copied to processed data/Trajectories` **DONE**

### To do later
7. Match paths to road sections (see Jane McFarlan for that)
8. For every O-D pairs (where O and D are TAZ id), and 15 minutes time step output the corresponding paths used by drivers
9. Compare the paths used by drivers using Here data, with the ones used by drivers in Aimsun simulations.
10. For the path going from South of I-680N to North of I-680N, deduces the percentage of drivers using local roads instead of staying on the Highway for different time of the day

In [9]:
import os
import csv
import sys
import json
import rtree
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt

from sklearn import metrics
from keplergl import KeplerGl
from sklearn.cluster import DBSCAN
from shapely.geometry import Point, LineString, MultiPoint
from trajectories_utils import parseTrajectories, clusterByZone, trajectoriesFromZones, showTrajectoriesFromZones


In [10]:
module_path = os.path.abspath(os.path.join('../..'))
if module_path not in sys.path:
    sys.path.append(module_path)

from fremontdropbox import get_dropbox_location

dropbox_dir = get_dropbox_location()

rootdir = dropbox_dir + "/Private Structured data collection/Data processing/Raw/Demand/Flow_speed/Here data"
print(rootdir)

C:\Users\jainc\Fremont Dropbox\Theophile Cabannes/Private Structured data collection/Data processing/Raw/Demand/Flow_speed/Here data


<a id="section_ID_a"></a>
## A. Parse Raw Trajectory Data into Singular CSV

In [11]:
# rootdir = './step_019_organize_by_provider'
parseTrajectories('trajectories.csv', rootdir, False)
parseTrajectories('trajectories_condensed.csv', rootdir, True)

All trajectory data has been parsed to trajectories.csv. 3140 files total.
All trajectory data has been parsed to trajectories_condensed.csv. 3140 files total.


<a id="section_ID_c"></a>
## B. Plotting, Mapping, and Analysis

In [13]:
direct = dropbox_dir + "/Private Structured data collection/Data processing/Raw/Demand/OD demand/"
# direct = "./"

int_shapefile = gpd.read_file(direct + "TAZ/InternalCentroidZones.shp")
ext_shapefile = gpd.read_file(direct + "TAZ/ExternalCentroidZones.shp")

df = pd.read_csv("trajectories_condensed.csv")
gdf_origins = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Origin X'], df['Origin Y']))

df = pd.read_csv("trajectories_condensed.csv")
gdf_dests = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Dest X'], df['Dest Y']))

int_trajectories_origins = clusterByZone(gdf_origins, int_shapefile, merge=True)
int_trajectories_dests = clusterByZone(gdf_dests, int_shapefile, merge=True)

ext_trajectories_origins = clusterByZone(gdf_origins, ext_shapefile, merge=True)
ext_trajectories_dests = clusterByZone(gdf_dests, ext_shapefile, merge=True)

  "(%s != %s)" % (left_df.crs, right_df.crs)
  "(%s != %s)" % (left_df.crs, right_df.crs)
  "(%s != %s)" % (left_df.crs, right_df.crs)
  "(%s != %s)" % (left_df.crs, right_df.crs)


In [14]:
df = pd.read_csv("trajectories_condensed.csv")

origin_zones_map = KeplerGl(height=500) #, config=czm_config)

origin_zones_map.add_data(data=df, name='Trajectories')
origin_zones_map.add_data(data=int_trajectories_origins, name='Origins - Internal Zones')
origin_zones_map.add_data(data=ext_trajectories_origins, name='Origins - External Zones')

origin_zones_map.save_to_html(file_name="origin_zones_map.html")
origin_zones_map

User Guide: https://github.com/keplergl/kepler.gl/blob/master/docs/keplergl-jupyter/user-guide.md
Map saved to origin_zones_map.html!


KeplerGl(data={'Trajectories': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19…

In [15]:
dests_zones_map = KeplerGl(height=500) #, config=czm_config)

dests_zones_map.add_data(data=df, name='Trajectories')
dests_zones_map.add_data(data=int_trajectories_dests, name='Destinations - Internal Zones')
dests_zones_map.add_data(data=ext_trajectories_dests, name='Destinations - External Zones')

dests_zones_map.save_to_html(file_name="dests_zones_map.html")
dests_zones_map

User Guide: https://github.com/keplergl/kepler.gl/blob/master/docs/keplergl-jupyter/user-guide.md
Map saved to dests_zones_map.html!


KeplerGl(data={'Trajectories': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19…

In [16]:
showTrajectoriesFromZones(origin_id=24, dest_id=10, direct=direct)

  "(%s != %s)" % (left_df.crs, right_df.crs)
  "(%s != %s)" % (left_df.crs, right_df.crs)


User Guide: https://github.com/keplergl/kepler.gl/blob/master/docs/keplergl-jupyter/user-guide.md


  "(%s != %s)" % (left_df.crs, right_df.crs)


Map saved to chosen_zones_map.html!


KeplerGl(data={'Trajectories': {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19…