## Goals of this project:
1. How much impact does being late or too spaced out at the first stop have downstream?
2. What is the impact of the layover at the start of the trip (the difference between the first top arrival and departure time)? Does more layover lead to more stable headways (lower values for % headway deviation)?
3. How closely does lateness (ADHERENCE) correlate to headway?
4. What is the relationship between distance or time travelled since the start of a given trip and the headway deviation? Does headway become less stable the further along the route the bus has travelled?
5. How much of a factor does the driver have on headway and on-time performance? The driver is indicated by the OPERATOR variable.
6. How does direction of travel, route, or location affect the headway and on-time performance?
7. How does time of day or day of week affect headway and on-time performance? Can you detect an impact of school schedule on headway deviation (for certain routes and at certain times of day)?
8. Does weather have any effect on headway or on-time performance? To help answer this question, the file bna_2022.csv contains historical weather data recorded at Nashville International Airport.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
headway = pd.read_csv('../data/Headway Data.csv')
weather = pd.read_csv('../data/bna_weather.csv')

In [None]:
headway_df = headway[['ADHERENCE_ID', 'DATE', 'ROUTE_ABBR', 'BLOCK_ABBR', 'OPERATOR', 'TRIP_ID', 'ROUTE_DIRECTION_NAME', 'TIME_POINT_ABBR', 'ROUTE_STOP_SEQUENCE', 'LATITUDE', 'LONGITUDE', 'SCHEDULED_TIME', 'ACTUAL_ARRIVAL_TIME', 'ACTUAL_DEPARTURE_TIME', 'ADHERENCE', 'SCHEDULED_HDWY', 'ACTUAL_HDWY', 'HDWY_DEV']]
weather_df = weather[['Date', 'temp', 'wx_phrase']]

In [None]:
headway_df.columns = ['adh_id', 'date', 'rte_abbr', 'blk_abbr', 'opr', 'trip_id', 'rte_dir_name', 'time_pt_abbr', 'rte_stop_seq', 'lat', 'log', 'schd_time', 'act_arrvl_time', 'act_depart', 'adh', 'schd_hdwy', 'act_hdwy', 'hdwy_dev']
display(headway_df)

In [None]:
weather_df.columns = ['date', 'temp', 'weather']
display(weather_df)

In [None]:
print(weather_df['weather'].unique())

## Q1. How much impact does being late or too spaced out at the first stop have downstream?

In [None]:
q1df = headway_df[['date', 'trip_id', 'schd_time', 'act_arrvl_time', 'act_depart', 'adh']]
display(q1df)

In [None]:
first_trip = (q1df.melt(id_vars=['trip_id', 'date'], value_vars=['schd_time'])
         .groupby(['trip_id', 'date'])['value']
         .agg(['min']))
display(first_trip)

In [None]:
first_trip_adh = pd.merge(first_trip, q1df[['date', 'trip_id', 'schd_time', 'adh']], left_on=['date', 'trip_id', 'min'], right_on=['date', 'trip_id', 'schd_time'], how='inner')
display(first_trip_adh)

In [None]:
def categorise(row):  
    if row['adh'] > 0:
        return 'early'
    elif row['adh'] < 0:
        return 'late'
    return 'on time'

In [None]:
first_trip_adh['on_time'] = first_trip_adh.apply(lambda row: categorise(row), axis=1)
display(first_trip_adh)

In [None]:
first_trip_adh.drop(columns='min')