### Goals of this project:

How much impact does being late or too spaced out at the first stop have downstream?

What is the impact of the layover at the start of the trip (the difference between the first top arrival and departure time)?

Does more layover lead to more stable headways (lower values for % headway deviation)?

How closely does lateness (ADHERENCE) correlate to headway?

What is the relationship between distance or time travelled since the start of a given trip and the headway deviation? Does headway become less stable the further along the route the bus has travelled?

How much of a factor does the driver have on headway and on-time performance? The driver is indicated by the OPERATOR variable.
How does direction of travel, route, or location affect the headway and on-time performance?

How does time of day or day of week affect headway and on-time performance? Can you detect an impact of school schedule on headway deviation (for certain routes and at certain times of day)?


Does weather have any effect on headway or on-time performance? To help answer this question, the file bna_2022.csv contains historical weather data recorded at Nashville International Airport.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [None]:
%matplotlib inline

In [None]:
pd.options.display.max_columns = None

In [None]:
headway = pd.read_csv('../data/Headway Data.csv')

In [None]:
headway.head()        

In [None]:
headway.info()

In [None]:
headway.isnull().sum()

In [None]:
len(headway['ROUTE_ABBR'].unique())

In [None]:
headway['ROUTE_ABBR'].unique()

In [None]:
len(headway['OPERATOR'].unique())

In [None]:
bna_weather = pd.read_csv('../data/bna_weather.csv')

In [None]:
bna_weather.head()

In [None]:
#calculating headway deviation percentage - which is HDWY_DEV/SCHEDULED_HDWY
headway['Deviation_Percentage'] =headway['HDWY_DEV']/headway['SCHEDULED_HDWY']*100
headway

In [None]:
#making a dataframe with only + headway deviations (HDWY_DEV)
headway1 = headway[['DATE', 'ROUTE_ABBR', 'BLOCK_ABBR', 'OPERATOR', 'TRIP_ID', 'ROUTE_DIRECTION_NAME', 'TRIP_EDGE', 'HDWY_DEV']]
headway1 = headway1.loc[(headway1['HDWY_DEV']>=0)]
headway1

In [None]:
#making a dataframe with only - headway deviations (HDWY_DEV)
headway2 = headway[['DATE', 'ROUTE_ABBR', 'BLOCK_ABBR', 'OPERATOR', 'TRIP_ID', 'ROUTE_DIRECTION_NAME', 'TRIP_EDGE', 'HDWY_DEV']]
headway2 = headway2.loc[(headway2['HDWY_DEV']<0)]
headway2

In [None]:

headway3 = headway[['DATE', 'ROUTE_ABBR', 'BLOCK_ABBR', 'OPERATOR', 'TRIP_ID', 'ROUTE_DIRECTION_NAME', 'TRIP_EDGE', 'HDWY_DEV']]
headway3 = headway3.loc[(headway3['TRIP_EDGE']==1)]
headway3

In [None]:
#trip edge 2(the turn around)
headway4 = headway[['DATE', 'ROUTE_ABBR', 'BLOCK_ABBR', 'OPERATOR', 'TRIP_ID', 'ROUTE_DIRECTION_NAME', 'TRIP_EDGE', 'HDWY_DEV']]
headway4 = headway4.loc[(headway4['TRIP_EDGE']==2)]
headway4

In [None]:
#looking at hdwy_dev and adherence to scheduled time side by side 
headway5 = headway[['DATE', 'OPERATOR', 'TRIP_ID', 'TRIP_EDGE', 'ADHERENCE_ID', 'HDWY_DEV', 'ADHERENCE']]
headway5

In [None]:
#time of day 

headway6 = headway[['DATE', 'OPERATOR', 'TRIP_ID', 'TRIP_EDGE', 'HDWY_DEV', 'ADHERENCE','SCHEDULED_TIME', 'ROUTE_ABBR']]
headway6 = headway6.loc[(headway6['TRIP_EDGE']==2) | headway6['TRIP_EDGE']==1] 
headway6 = headway6.loc[(headway6['OPERATOR']==2088)]
headway6

In [None]:
headway6['TIMES'] = headway6['SCHEDULED_TIME'].astype(str).str[:2].astype(int)
headway6

In [None]:
headway6

### Question 1: 
How much impact does being late or too spaced out at the first stop have downstream?