## RT trip diagnostics: thresholds for usable trips 
### Other Questions
* Should thresholds be on the operator or the operator-route ID level?
* How to figure out whether a segment is acceptable or not?
* Is the `proportion_route_length` tied with usable segments?

In [217]:
import dask.dataframe as dd
import dask_geopandas as dg
import geopandas as gpd
import pandas as pd
from calitp.sql import to_snakecase
from shared_utils import geography_utils, utils, calitp_color_palette as cp, styleguide

# Charts
import altair as alt

In [2]:
# Save files to GCS
from calitp.storage import get_fs
fs = get_fs()

In [3]:
# Record start and end time
import datetime
from loguru import logger

In [4]:
pd.options.display.max_columns = 100
pd.options.display.float_format = "{:.2f}".format
pd.set_option("display.max_rows", None)
pd.set_option("display.max_colwidth", None)

### Load Files

In [5]:
GCS_DASK_PATH = "gs://calitp-analytics-data/data-analyses/dask_test/"

In [6]:
GCS_RT_PATH = "gs://calitp-analytics-data/data-analyses/rt_delay/"

In [7]:
analysis_date = "2022-10-12"

In [8]:
# RT data Read in route lines
# Tells me actual route length for each shape id.
routelines = gpd.read_parquet(
    f"{GCS_RT_PATH}compiled_cached_views/routelines_{analysis_date}.parquet"
)

In [9]:
len(routelines), routelines.shape_id.nunique()

(9430, 6353)

In [10]:
# RT data Read in Trips 
# Gives me trips ran for a particular day across all oeprators. 
trips = pd.read_parquet(
    f"{GCS_RT_PATH}compiled_cached_views/trips_{analysis_date}.parquet"
)

In [11]:
len(trips)

120136

* The `route_dir_identifier` is used for segments to cut segments
for both directions the route runs.

In [12]:
# Read in longest_shape of each route
# Schedule data, source of truth.
longest_shape = gpd.read_parquet(f"{GCS_DASK_PATH}longest_shape_segments.parquet")

In [13]:
# longest_shape.groupby(['calitp_itp_id','route_id','longest_shape_id']).agg({'segment_sequence':'nunique'}).head()

In [14]:
# longest_shape.sort_values(['calitp_itp_id', 'route_id']).head(25).drop(columns=["geometry", "geometry_arrowized"])

In [15]:
crosswalk = pd.read_parquet(
    f"{GCS_DASK_PATH}segments_route_direction_crosswalk.parquet"
)

In [16]:
# Use pandas.read_parquet/read_feather() instead.
operator_4 = pd.read_parquet(
    f"{GCS_DASK_PATH}vp_sjoin/vp_segment_4_{analysis_date}.parquet"
)

### Task 1
* Using GTFS schedule data, by route_id-shape_id, calculate the route_length of each shape_id as a proportion of the longest shape_id. 
* For <b>each route_id</b>, what's the shortest shape_id length, in proportion to the longest shape_id's length. if it's 100%, then all shape_ids are equal length for that route. if it's 50%, there's a short trip that exists that only runs 50% of the length and turns around.

<b>How</b>
* Need table `trips` from compile cached views -> shape ID and route ID and direction ID -> merge in segments crosswalk with route direction identifier 
* Shapes table -> attach route dir identifier
* Merge in longest shape line using  routes and direction take the fraction. 

#### Step 1. Merge `trips` with `crosswalk`
* Why do we take away `trip_id` from `trips`? 

In [17]:
# Subset
trips2 = trips[
    [
        "calitp_itp_id",
        "route_id",
        "direction_id",
        "shape_id",
    ]
]

In [18]:
len(trips2), len(crosswalk)

(120136, 5150)

In [162]:
trips.calitp_itp_id.nunique() == routelines.calitp_itp_id.nunique() == crosswalk.calitp_itp_id.nunique()

True

In [158]:
trips.head(2)

Unnamed: 0,calitp_itp_id,calitp_url_number,service_date,trip_key,trip_id,route_id,direction_id,shape_id,calitp_extracted_at,calitp_deleted_at,route_short_name,route_long_name,route_desc,route_type
0,4,0,2022-10-12,-8276973624288086181,5204040,U,1,shp-U-06,2022-08-07,2099-01-01,U,Stanford - Dumbarton - Fremont,,3
1,4,0,2022-10-12,-8528689834062779863,12137020,U,1,shp-U-06,2022-08-07,2099-01-01,U,Stanford - Dumbarton - Fremont,,3


In [20]:
trips2.head(2)

Unnamed: 0,calitp_itp_id,route_id,direction_id,shape_id
0,4,U,1,shp-U-06
1,4,U,1,shp-U-06


In [21]:
crosswalk.head(2)

Unnamed: 0,calitp_itp_id,route_id,direction_id,route_dir_identifier
0,372,4ba918e5-58c0-4d4a-9f55-5cadb8564bff,0,255544
1,293,7,0,1269889


##### Help. Is it correct to drop the duplicates?
* We dropped the columns that make each row unique (trip_id, calitp_extracted_at, calitp_deleted_at). 

In [22]:
len(trips2.drop_duplicates()), len(trips2)

(8199, 120136)

In [23]:
trips2 = (trips2.drop_duplicates()).reset_index(drop=True)

* 366 more values in `trips` than `crosswalk` even tho `Cal ITP ID.nunique()` yields the same number.

In [164]:
trips2.merge(
    crosswalk,
    how="outer",
    on=["calitp_itp_id", "route_id", "direction_id"],
    indicator=True,
)[["_merge"]].value_counts()

_merge    
both          7833
left_only      366
right_only      74
dtype: int64

In [25]:
trips_m_crosswalk = trips2.merge(
    crosswalk, how="inner", on=["calitp_itp_id", "route_id", "direction_id"]
)

In [26]:
trips_m_crosswalk.head()

Unnamed: 0,calitp_itp_id,route_id,direction_id,shape_id,route_dir_identifier
0,4,U,1,shp-U-06,1244740981
1,4,U,0,shp-U-07,1026952675
2,4,212,1,shp-212-07,1369834141
3,4,212,0,shp-212-57,648098315
4,4,67,0,shp-67-57,3358964048


In [169]:
trips_m_crosswalk.shape_id.value_counts().describe()

count   6002.00
mean       1.31
std        0.94
min        1.00
25%        1.00
50%        1.00
75%        1.00
max        9.00
Name: shape_id, dtype: float64

#### Step 2. Shapes table -> attach route dir identifier 

In [27]:
routelines = routelines.drop(columns=["calitp_url_number"])

In [28]:
# Calculate length of geometry
routelines = routelines.assign(
    actual_route_length=(
        routelines.geometry.to_crs(geography_utils.CA_StatePlane).length
    )
)

##### Help. Why are there duplicates? 
* Each route will have at least 2 shape IDS. 
* How come Cal ITP 4 yields two of the same ID's with the same route lengths?
* Is it right to drop them?

In [29]:
routelines.loc[routelines.shape_id == "shp-U-06"].drop(columns=["geometry"])

Unnamed: 0,calitp_itp_id,shape_id,actual_route_length
111,4,shp-U-06,111927.21
622,4,shp-U-06,111927.21


In [30]:
routelines.shape_id.value_counts().head(15)

2           11
20          10
5           10
16          10
86          10
13          10
p_531955     9
p_531884     9
p_531873     9
p_531875     9
p_531923     9
p_532031     9
p_531918     9
p_531919     9
p_531921     9
Name: shape_id, dtype: int64

In [31]:
# Original routelines with duplicates. 
len(routelines)

9430

In [32]:
# Length if duplicates are dropped
len(routelines.drop_duplicates())

8133

In [33]:
routelines = (routelines.drop_duplicates()).reset_index(drop=True)

In [34]:
routelines.loc[routelines.shape_id == "shp-U-06"].drop(columns=["geometry"])

Unnamed: 0,calitp_itp_id,shape_id,actual_route_length
111,4,shp-U-06,111927.21


In [35]:
routelines.merge(
    trips_m_crosswalk, how="outer", on=["calitp_itp_id", "shape_id"], indicator=True
)[["_merge"]].value_counts()

_merge    
both          8529
left_only      363
right_only       0
dtype: int64

In [36]:
routelines_m_trips = routelines.merge(
    trips_m_crosswalk,
    how="inner",
    on=["calitp_itp_id", "shape_id"],
)

In [38]:
len(routelines_m_trips), len(trips_m_crosswalk), len(routelines)

(8529, 7833, 8133)

In [39]:
routelines_m_trips.loc[routelines_m_trips.route_id == "U"].drop(columns=["geometry"])

Unnamed: 0,calitp_itp_id,shape_id,actual_route_length,route_id,direction_id,route_dir_identifier
1,4,shp-U-07,111849.47,U,0,1026952675
111,4,shp-U-06,111927.21,U,1,1244740981


#### Step 3. Merge in longest shape line on routes and direction.

In [40]:
routelines_m_trips.crs == longest_shape.crs

True

In [None]:
# longest_shape = longest_shape.rename(columns={"route_length": "longest_route_length"})

##### Help.  Do I have to aggregate longest_shape because the `longest shape id` is broken down by segment and doesn't total up to the `not_longest_route_length`? 
* Route 1244740981 is broken out to 35 segments. 
* Above in `routelines_m_trips` filtered for 1244740981 I can see the actual route length is 111,849.47 meters. However, the longest route length per sequence in `longest_shape` for the same route_dir_identifier is only 34,070.90 meters. 
* Do I also sum up total segments in `longest_shape` per `longest_shape_id`? 

In [43]:
longest_shape.loc[longest_shape.route_dir_identifier == 1244740981].drop(
    columns=["geometry", "geometry_arrowized"]
).head()

Unnamed: 0,calitp_itp_id,calitp_url_number,route_id,direction_id,longest_shape_id,route_dir_identifier,route_length,segment_sequence
4012,4,0,U,1,shp-U-06,1244740981,34070.9,0
4013,4,0,U,1,shp-U-06,1244740981,34070.9,1
4014,4,0,U,1,shp-U-06,1244740981,34070.9,2
4015,4,0,U,1,shp-U-06,1244740981,34070.9,3
4016,4,0,U,1,shp-U-06,1244740981,34070.9,4


In [170]:
longest_shape.loc[longest_shape.route_dir_identifier == 1244740981][['route_length']].sum()

route_length   1192481.63
dtype: float64

In [44]:
# Have to aggregate longest_shape, take away calitp url & segments.
longest_shape_agg = (
    longest_shape.groupby(
        [
            "calitp_itp_id",
            "route_id",
            "direction_id",
            "longest_shape_id",
            "route_dir_identifier",
        ]
    ).agg({"route_length": "sum",
          "segment_sequence":"nunique"})
    .rename(columns = {'route_length':'longest_route_length',
                       'segment_sequence':'total_segments'})
).reset_index()

* Now the longest length for 1244740981 is 1,192,481.63. Doesn't match routelines_m_trips filtered for 1244740981, where the actual route length is 111,849.47 meters. 

In [45]:
longest_shape_agg.loc[longest_shape_agg.route_dir_identifier == 1244740981]

Unnamed: 0,calitp_itp_id,route_id,direction_id,longest_shape_id,route_dir_identifier,longest_route_length,total_segments
244,4,U,1,shp-U-06,1244740981,1192481.63,35


In [46]:
len(routelines_m_trips), len(longest_shape), len(longest_shape_agg)

(8529, 126896, 5150)

In [47]:
# Right only has 74 more rows...
routelines_m_trips.merge(
    longest_shape_agg,
    how="outer",
    on=["calitp_itp_id", "direction_id", "route_id", "route_dir_identifier"],
    indicator=True,
)[["_merge"]].value_counts()

_merge    
both          8529
right_only      74
left_only        0
dtype: int64

In [48]:
routelines_final = routelines_m_trips.merge(
    longest_shape_agg,
    how="inner",
    on=["calitp_itp_id", "direction_id", "route_id", "route_dir_identifier"],
)

In [49]:
# Calculate out proportion of route length against longest.
routelines_final["proportion_route_length"] = (
    routelines_final["actual_route_length"] / routelines_final["longest_route_length"]
) * 100

In [50]:
routelines_final.proportion_route_length.describe()

count   8529.00
mean      33.32
std      144.08
min        0.07
25%        9.81
50%       18.23
75%       32.45
max     7015.79
Name: proportion_route_length, dtype: float64

In [51]:
routelines_final.loc[routelines_final.route_dir_identifier == 1244740981].drop(columns=["geometry"]).head()

Unnamed: 0,calitp_itp_id,shape_id,actual_route_length,route_id,direction_id,route_dir_identifier,longest_shape_id,longest_route_length,total_segments,proportion_route_length
137,4,shp-U-06,111927.21,U,1,1244740981,shp-U-06,1192481.63,35,9.39


### Task 2
* Testing with Agency 4. 
* Calculate time of trips?

<b> Answered questions </b>
* How come there are so many different timestamps within a 30 second increments of each either within the same segment? GTFS pings every 30 seconds.

In [52]:
def find_operator_info(df):
    df = df.sort_values(["calitp_itp_id", "trip_id", "segment_sequence"])
    crosswalk = pd.read_parquet(
        f"{GCS_DASK_PATH}segments_route_direction_crosswalk.parquet"
    )

    merge_cols = [
        "calitp_itp_id",
        "trip_id",
        "route_dir_identifier",
    ]

    # Get start time.
    start_time_trip = (
        df.groupby(merge_cols)
        .agg({"vehicle_timestamp": "min"})
        .rename(columns={"vehicle_timestamp": "start"})
        .reset_index()
    )

    # Get end time.
    end_time_trip = (
        df.groupby(merge_cols)
        .agg({"vehicle_timestamp": "max"})
        .rename(columns={"vehicle_timestamp": "end"})
        .reset_index()
    )

    # Count number of segments.
    segment_counts = (
        df.groupby(merge_cols)
        .agg({"segment_sequence": "nunique"})
        .reset_index()
        .rename(columns={"segment_sequence": "number_of_segments"})
    )

    # Merge
    m1 = start_time_trip.merge(end_time_trip, how="inner", on=merge_cols).merge(
        segment_counts, how="left", on=merge_cols
    )
    
    # Calculate time elapsed
    # https://stackoverflow.com/questions/51491724/calculate-difference-of-2-dates-in-minutes-in-pandas
    m1["minutes_elapsed"] = (m1.end - m1.start).dt.total_seconds() / 60

    return m1

In [109]:
operator_4.head(2)

Unnamed: 0,calitp_itp_id,calitp_url_number,vehicle_timestamp,trip_id,route_dir_identifier,segment_sequence,lon,lat
0,4,0,2022-10-12 03:57:57,1002020,2062080730,0,-199410.89,-20669.41
1,4,0,2022-10-12 03:58:12,1002020,2062080730,0,-199423.82,-20695.24


In [53]:
operator_4_metrics = find_operator_info(operator_4)

In [54]:
operator_4_metrics.head(2)

Unnamed: 0,calitp_itp_id,trip_id,route_dir_identifier,start,end,number_of_segments,minutes_elapsed
0,4,10000020,4214183996,2022-10-12 21:55:01,2022-10-12 22:24:54,7,29.88
1,4,1000020,2437991552,2022-10-12 16:21:29,2022-10-12 16:32:23,4,10.9


In [55]:
m2 = operator_4_metrics[['calitp_itp_id', 'trip_id', 'route_dir_identifier','number_of_segments', 'minutes_elapsed']].merge(
    routelines_final,
    how="inner",
    on=["calitp_itp_id", "route_dir_identifier"],
)

In [56]:
len(operator_4_metrics), len(m2)

(5202, 7217)

In [58]:
# Drop some columns for now to check out
m2 = m2.drop(columns = ['geometry', 'actual_route_length','longest_route_length'])

In [183]:
# Find the total number of segments in the specific operator file 
# vs. what was recorded in `longest_shape`
m2['segment_proportion'] =((m2.number_of_segments/m2.total_segments)*100).astype('int64')

In [184]:
m2.sort_values(['route_id','shape_id', 'minutes_elapsed']).tail(10)

Unnamed: 0,calitp_itp_id,trip_id,route_dir_identifier,number_of_segments,minutes_elapsed,shape_id,route_id,direction_id,longest_shape_id,total_segments,proportion_route_length,segment_proportion,trip_duration_categories,shape_id_categories
7150,4,3360020,370892320,27,49.25,shp-V-53,V,0,shp-V-53,27,12.17,100,Medium Trip < 66 min,Low Proportion <14%
7151,4,9325020,370892320,27,51.33,shp-V-53,V,0,shp-V-53,27,12.17,100,Medium Trip < 66 min,Low Proportion <14%
6924,4,12446020,2013749239,27,47.55,shp-W-06,W,1,shp-W-06,28,11.74,96,Medium Trip < 66 min,Low Proportion <14%
6923,4,11838040,2013749239,27,52.78,shp-W-06,W,1,shp-W-06,28,11.74,96,Medium Trip < 66 min,Low Proportion <14%
6927,4,6452040,2013749239,27,54.1,shp-W-06,W,1,shp-W-06,28,11.74,96,Medium Trip < 66 min,Low Proportion <14%
6926,4,485020,2013749239,26,56.07,shp-W-06,W,1,shp-W-06,28,11.74,92,Medium Trip < 66 min,Low Proportion <14%
6928,4,6825020,2013749239,27,56.77,shp-W-06,W,1,shp-W-06,28,11.74,96,Medium Trip < 66 min,Low Proportion <14%
6925,4,2436040,2013749239,27,57.63,shp-W-06,W,1,shp-W-06,28,11.74,96,Medium Trip < 66 min,Low Proportion <14%
6769,4,9140020,251686753,29,47.18,shp-W-07,W,0,shp-W-07,29,11.33,100,Medium Trip < 66 min,Low Proportion <14%
6768,4,11396020,251686753,28,50.98,shp-W-07,W,0,shp-W-07,29,11.33,96,Medium Trip < 66 min,Low Proportion <14%


##### Help.  Why is for 1244740981 not yielding any results, even in the original dataframe?
* 2 more route ids when filtering out the `routelines_final` df for ITP ID 4 compared with the `vp_sjoin/vp_segment_4`
* Wondering why that is.

In [185]:
operator_4.loc[operator_4.route_dir_identifier == 1244740981]

Unnamed: 0,calitp_itp_id,calitp_url_number,vehicle_timestamp,trip_id,route_dir_identifier,segment_sequence,lon,lat


In [186]:
# Can't find 1244740981 in this list. 
# operator_4.route_dir_identifier.unique().tolist()

In [187]:
# Total route ids using longest_shape/trips/routelines. 
routelines_final.loc[routelines_final.calitp_itp_id == 4][['route_id']].nunique()

route_id    129
dtype: int64

In [188]:
m2.route_id.nunique()

127

In [189]:
merged_routeid = set(m2.route_id.unique().tolist())

In [190]:
routelines_routeid = set(routelines_final.route_id.unique().tolist())

In [191]:
# merged_routeid - routelines_final

### Ask
* For each operator, what's the % of RT trip_ids that would remain after those thresholds are used? Make a chart function that takes a single operator. Produce charts for all operators. Is the time or geographic coverage that's driving this excluding of trips? What is a recommended threshold to use?
* For short trips, do they tend to be 50% of the longest route length? 40% 30%? 

In [192]:
m2.proportion_route_length.describe()

count   7217.00
mean      33.25
std       48.82
min        3.88
25%       14.93
50%       20.54
75%       32.87
max      383.27
Name: proportion_route_length, dtype: float64

In [193]:
m2.minutes_elapsed.describe()

count   7217.00
mean      91.35
std      235.41
min        0.17
25%       37.18
50%       52.32
75%       66.35
max     1531.98
Name: minutes_elapsed, dtype: float64

In [194]:
p25_time = m2.minutes_elapsed.quantile(0.25).astype(int)
p50_time = m2.minutes_elapsed.quantile(0.50).astype(int)
p75_time = m2.minutes_elapsed.quantile(0.75).astype(int)

In [195]:
p25_time, p50_time, p75_time

(37, 52, 66)

In [196]:
def trip_duration (row):
    if ((row.minutes_elapsed > 0) and (row.minutes_elapsed <= p25_time)):
         return f"Short Trip < {p25_time} min"
    elif ((row.minutes_elapsed > p25_time) and (row.minutes_elapsed <= p75_time)):
         return f"Medium Trip < {p75_time} min"
    else:
        return "Long Trip"

In [197]:
m2["trip_duration_categories"] = m2.apply(lambda x: trip_duration(x), axis=1)

In [198]:
m2.trip_duration_categories.value_counts()

Medium Trip < 66 min    3595
Long Trip               1833
Short Trip < 37 min     1789
Name: trip_duration_categories, dtype: int64

In [199]:
for i in [p25_time, p50_time,p75_time]: 
    print(len(m2.loc[m2.minutes_elapsed >= i]))

5428
3646
1833


In [200]:
p25_length = m2.proportion_route_length.quantile(0.25).astype(int)
p75_length = m2.proportion_route_length.quantile(0.75).astype(int)

In [201]:
p25_length, p75_length

(14, 32)

* Flag what's usable
* Need two aggregatiosn, one for trips that are usable, one for shape_ids.

In [202]:
def shape_id_comparison(row):
    if ((row.proportion_route_length > 0) and (row.proportion_route_length <= p25_length)):
         return f"Low Proportion <{p25_length}%"
    elif ((row.proportion_route_length > p25_length) and (row.proportion_route_length <= p75_length)):
         return f"Medium Proportion <{p75_length}%"
    else:
        return "Substantial Proportion"

In [203]:
m2["shape_id_categories"] = m2.apply(lambda x: shape_id_comparison(x), axis=1)

In [205]:
agg1 = m2.groupby(['trip_duration_categories']).agg({'total_segments':'sum'}).rename(columns = {'total_segments':'grand_total_segments'}).reset_index()

In [220]:
agg2 = m2.groupby(['trip_duration_categories','shape_id_categories']).agg({'trip_id':'nunique', 'total_segments':'sum'}).rename(columns = {'trip_id':'total_trips'}).reset_index()

In [221]:
m2.groupby(['shape_id_categories', ]).agg({'total_segments':'sum'})

Unnamed: 0_level_0,total_segments
shape_id_categories,Unnamed: 1_level_1
Low Proportion <14%,27049
Medium Proportion <32%,68245
Substantial Proportion,16879


In [208]:
test = agg1.merge(agg2, on = ["trip_duration_categories"])

In [213]:
test = test.assign(segment_portions= ((test.total_segments/test.grand_total_segments)*100).astype('int').astype('str') + '%')

In [214]:
test

Unnamed: 0,trip_duration_categories,grand_total_segments,shape_id_categories,total_trips,total_segments,grand_total_trips,segment_portions
0,Long Trip,39325,Low Proportion <14%,440,14111,1371,35%
1,Long Trip,39325,Medium Proportion <32%,1116,24765,1371,62%
2,Long Trip,39325,Substantial Proportion,34,449,1371,1%
3,Medium Trip < 66 min,56594,Low Proportion <14%,413,11697,2644,20%
4,Medium Trip < 66 min,56594,Medium Proportion <32%,2073,38198,2644,67%
5,Medium Trip < 66 min,56594,Substantial Proportion,507,6699,2644,11%
6,Short Trip < 37 min,16254,Low Proportion <14%,40,1241,1187,7%
7,Short Trip < 37 min,16254,Medium Proportion <32%,344,5282,1187,32%
8,Short Trip < 37 min,16254,Substantial Proportion,919,9731,1187,59%


In [218]:
def chart_with_dropdown(
    df,
    dropdown_list: list,
    dropdown_field: str,
    x_axis_chart1: str,
    y_axis_chart1: str,
    color_col1: str,
    chart1_tooltip_cols: list,
    chart_title: str,
):
    """A bar chart controlled by a dropdown filter.
    Args:
        df: the dataframe
        dropdown_list(list): a list of all the values in the dropdown menu,
        dropdown_field(str): column where the dropdown menu's values are drawn from,
        x_axis_chart1(str): x axis value for chart 1 - encode as Q or N,
        y_axis_chart1(str): y axis value for chart 1 - encode as Q or N,
        color_col1(str): column to color the graphs for chart 1,
        chart1_tooltip_cols(list): list of all the columns to populate the tooltip,
        chart_title(str):chart title,
    """
    # Create drop down menu
    input_dropdown = alt.binding_select(options=dropdown_list, name="Select ")

    # The column tied to the drop down menu
    selection = alt.selection_single(fields=[dropdown_field], bind=input_dropdown)

    chart1 = (
        alt.Chart(df)
        .mark_bar()
        .encode(
            x=x_axis_chart1,
            y=(y_axis_chart1),
            color=alt.Color(
                color_col1, scale=alt.Scale(range=cp.CALITP_CATEGORY_BRIGHT_COLORS)
                , legend = None
            ),
            tooltip=chart1_tooltip_cols,
        )
        .properties(title=chart_title)
        .add_selection(selection)
        .transform_filter(selection)
    )

    chart1 = styleguide.preset_chart_config(chart1)

    return chart1

In [None]:
chart_with_dropdown(

### Already Answered Notes/Questions
* What is the calitp url number? What does 0 or 1 mean? V1, operator has different feeds. 
    * 0 could be primary, 1 is backup. This column will be deleted in V2. 
* Do you think that most shape IDS are going to be less than 100% of the length of the longest shape ID? 
    * Not necessarily, shape ID can be a short version of the trip.
* What’s the difference between direction ID and route dir identifier? What does the 0 and 1 mean in direction ID?
    * We don't know where the bus is going, so just do 0 and 1.
    * Route dir identifier: captures route info and direction it is going to capture all the trips. Helps with groupby. 
    * We don't want to stick with trip id, we need to get to route level. 
    * Don't want to lose info on the direction. 
    * Have to distinguish direction or else it'll look like the bus is going backwards when plotting.
    * RT data comes with direction id and can get which direction it ran in from schedule data. 
    * Attach route, join coordinate data to segments. 
    * Use segments and average out trips that occurred on that segment. 
* Ask about graph on Slack. 
* Should I use this `get_routelines` from `A1_vehicle_positions`. 
    * Just read it directly from GCS, don't need buffer.
* Why would the same route ID for the other direction have more segments? 
   * Can have a layover. 
   * A segment must be 1000 meters or less.