# Estimate car vs bus travel time

* Pull out parallel routes. Run `setup_parallel_trips_with_stops.py`
* Make car travel down same route as the bus.
* `osmx` snaps to nodes, but even for every 5th bus stop, it's snapping to same node.
* `osrm` wasn't able to be installed in Hub
* `valhalla`? Kuan Butt's blog?

#### Quick and Dirty Approach
* Based on distance traveled, estimate car travel time with some assumptions (35, 40 mph?)
* For now, estimate car travel with lower mph assumption, so that some viable routes can be pulled. Don't want bus to look worse than it is (mid-day, free-flowing), and compare it to car travel (which is probably estimated during free-flowing too)

Later, swap out car travel time estimation with other approaches. Maybe use Google API to do requests.

In [1]:
#https://stackoverflow.com/questions/55162077/how-to-get-the-driving-distance-between-two-geographical-coordinates-using-pytho
import geopandas as gpd
import pandas as pd

from siuba import *

from shared_utils import geography_utils

E0321 23:15:01.920512406    1274 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies
E0321 23:15:04.269284691    1274 fork_posix.cc:70]           Fork support is only compatible with the epoll1 and poll polling strategies


In [2]:
df = gpd.read_parquet("./data/parallel_trips_with_stops.parquet")

In [3]:
# https://stackoverflow.com/questions/25055712/pandas-every-nth-row
# Maybe not use every bus stop, since bus stops are spaced fairly closely
# Maybe every other, every 3rd? want to mimic the bus route, do not want
# to stray too far

# df = df.iloc[::3]
group_cols = ["calitp_itp_id", "route_id", "trip_id"]
df["stop_rank"] = df.groupby(group_cols).cumcount() + 1

In [4]:
subset = df[df.stop_rank % 3 == 0]

In [5]:
keep_trips = [-7505741281882708052]
df[df.trip_key.isin(keep_trips)]

Unnamed: 0,calitp_itp_id,date,trip_key,trip_id,is_in_service,day_name,stop_sequence,stop_id,departure_time,shape_id,route_id,service_hours,geometry,stop_rank
0,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,1,4370,06:41:00,shp-10-10,10,0.45,POINT (-122.16020 37.72127),1
1,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,2,4318,06:41:46,shp-10-10,10,0.45,POINT (-122.15635 37.72287),2
2,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,3,4316,06:42:11,shp-10-10,10,0.45,POINT (-122.15755 37.72473),3
3,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,4,4281,06:42:46,shp-10-10,10,0.45,POINT (-122.15637 37.72521),4
4,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,5,4446,06:43:16,shp-10-10,10,0.45,POINT (-122.15411 37.72361),5
5,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,6,4272,06:43:36,shp-10-10,10,0.45,POINT (-122.15255 37.72249),6
6,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,7,4263,06:44:07,shp-10-10,10,0.45,POINT (-122.15017 37.72068),7
7,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,8,5227,06:44:35,shp-10-10,10,0.45,POINT (-122.14787 37.71899),8
8,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,9,4264,06:44:59,shp-10-10,10,0.45,POINT (-122.14596 37.71750),9
9,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,10,4273,06:45:33,shp-10-10,10,0.45,POINT (-122.14326 37.71548),10


In [6]:
subset[subset.trip_key.isin(keep_trips)]

Unnamed: 0,calitp_itp_id,date,trip_key,trip_id,is_in_service,day_name,stop_sequence,stop_id,departure_time,shape_id,route_id,service_hours,geometry,stop_rank
2,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,3,4316,06:42:11,shp-10-10,10,0.45,POINT (-122.15755 37.72473),3
5,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,6,4272,06:43:36,shp-10-10,10,0.45,POINT (-122.15255 37.72249),6
8,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,9,4264,06:44:59,shp-10-10,10,0.45,POINT (-122.14596 37.71750),9
11,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,12,4251,06:46:27,shp-10-10,10,0.45,POINT (-122.13919 37.71253),12
14,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,15,4473,06:48:29,shp-10-10,10,0.45,POINT (-122.12906 37.70545),15
17,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,18,4702,06:50:22,shp-10-10,10,0.45,POINT (-122.12120 37.69934),18
20,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,21,4765,06:53:09,shp-10-10,10,0.45,POINT (-122.12451 37.69827),21
23,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,24,4777,06:56:25,shp-10-10,10,0.45,POINT (-122.11529 37.69579),24
26,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,27,4785,06:58:36,shp-10-10,10,0.45,POINT (-122.10800 37.69066),27
29,4,2022-01-06,-7505741281882708052,13277020,True,Thursday,30,4897,07:02:05,shp-10-10,10,0.45,POINT (-122.09765 37.68303),30


Don't like how `osmx` is returning the same nodes for bus stops, even at every 5th bus stop.

`osrm` doesn't install bc of some `GDAL` dependencies.

Can Google API be used? But need to check terms and conditions if we can make requests to calculate travel time or even grab speed limits through the
[Python package](https://github.com/googlemaps/google-maps-services-python)

At minimum, can calculate distance between stops, sum it up, and for cars, set an assumption of 30 mph or 45 mph. If we can't use Google API to grab speed limit, then we will hard code it.

In [None]:
def calculate_distance_traveled(df):
    group_cols = ["calitp_itp_id", "route_id"]
    sort_cols = group_cols + ["stop_sequence"]
    
    df = df.to_crs(shared_utils.geography_utils.CA_StatePlane)
    
    # Distance traveled
    df = df.assign(
        # Previous geometry
        start = (df.sort_values(sort_cols)
                 .groupby(group_cols)["geometry"]
                 .apply(lambda x: x.shift(1))),
        end = (df.sort_values(sort_cols)
               .groupby(group_cols)["geometry"]
               .apply(lambda x: x.shift(0))
              )
    )
    
    df = df.assign(
        feet_traveled = df.end.distance(df.start) 
    ).drop(columns = ["start", "end"])
        
    return df
            

In [None]:
df = calculate_distance_traveled(parallel)

In [None]:
def calculate_time_traveled(df):
    # Use a set of assumptions
    
    AVG_SPEED = 40
    
    df = df.assign(
        max_stop = (df.groupby(["itp_id", "route_id", "trip_id"])
                    ["stop_sequence"].transform("max"))
    )
    
    df2 = shared_utils.geography_utils.aggregate_by_geography(
        df,
        group_cols = ["itp_id", "route_id", "trip_id", 
                     "trip_first_departure_ts", "trip_last_arrival_ts"],
        sum_cols = ["feet_traveled"], 
        mean_cols = ["service_hours", "max_stop"]
    )
    
    df2 = df2.assign(
        miles_traveled = df2.feet_traveled.divide(
            shared_utils.geography_utils.FEET_PER_MI)
    
    )
    
    # speed = distance / time
    # time = distance / speed
    df2 = df2.assign(
        car_trip_time_hr = df2.miles_traveled.divide(AVG_SPEED),
    ).drop(columns = "feet_traveled")
        
    return df2

In [None]:
df2 = calculate_time_traveled(df)