WeGo Public Transit

WeGo Public Transit is a public transit system serving the Greater Nashville and Davidson County area. WeGo provides local and regional bus routes, the WeGo Star train service connecting Lebanon to downtown Nashville, along with several other transit services.

In this project, you'll be analyzing the bus spacing to look for patterns and try to identify correlations to controllable or external factors. Specifically, you'll be using a dataset containing information on the headway, or amount of time between vehicle arrivals at a stop. This dataset contains a column HDWY_DEV, which shows the headway deviation. This variable will be negative when bunching has occurred (shorter headway than scheduled) and will be positive for gapping (longer headway than scheduled). Note that you can calculate headway deviation percentage as HDWY_DEV/SCHEDULED_HDWY.

#### Goals of this project:

How much impact does being late or too spaced out at the first stop have downstream?

What is the impact of the layover at the start of the trip (the difference between the first top arrival and departure time)? Does more layover lead to more stable headways (lower values for % headway deviation)?

How closely does lateness (ADHERENCE) correlate to headway?

What is the relationship between distance or time travelled since the start of a given trip and the headway deviation? Does headway become less stable the further along the route the bus has travelled?

How much of a factor does the driver have on headway and on-time performance? The driver is indicated by the OPERATOR variable.

How does direction of travel, route, or location affect the headway and on-time performance?

#### How does time of day or day of week affect headway and on-time performance? Can you detect an impact of school schedule on headway deviation (for certain routes and at certain times of day)?

#### Does weather have any effect on headway or on-time performance? To help answer this question, the file bna_weather.csv contains historical weather data recorded at Nashville International Airport.

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

from datetime import datetime, timedelta

In [2]:
transit_df = pd.read_csv(r"C:\Users\ndidi\Documents\NSS_Projects\wego-creepin-it-real-wego\Data\headway_data_with_routes.csv", index_col=0)
pd.set_option('display.max_columns', None)

transit_df.head()

Unnamed: 0,SERVICE_ABBR,ADHERENCE_ID,DATE,ROUTE_ABBR,BLOCK_ABBR,OPERATOR,TRIP_ID,OVERLOAD_ID,ROUTE_DIRECTION_NAME,TIME_POINT_ABBR,ROUTE_STOP_SEQUENCE,TRIP_EDGE,LATITUDE,LONGITUDE,SCHEDULED_TIME,ACTUAL_ARRIVAL_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT,STOP_CANCELLED,PREV_SCHED_STOP_CANCELLED,IS_RELIEF,BLOCK_STOP_ORDER,DWELL_IN_MINS,ARRIVAL_STATUS
0,1,99457890,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,MHSP,14.0,1,36.181248,-86.847705,04:42:00,04:37:38,04:44:08,-2.133333,,,,0,0,1,0,0.0,0,2,6.5,ON TIME
1,1,99457891,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,ELIZ,10.0,0,36.193454,-86.839981,04:46:00,04:48:27,04:48:27,-2.45,,,,0,0,1,0,0.0,0,9,0.0,ON TIME
2,1,99457892,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,CV23,5.0,0,36.182177,-86.814445,04:54:00,04:54:56,04:54:56,-0.933333,,,,0,0,1,0,0.0,0,19,0.0,ON TIME
3,1,99457893,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,MCC5_10,1.0,2,36.167091,-86.781923,05:10:00,05:03:43,05:03:43,6.283333,,,,0,0,1,0,,0,35,0.0,ON TIME
4,1,99457894,2023-08-01,BORDEAUX,2200,1040,345105,0,FROM DOWNTOWN,MCC5_10,1.0,1,36.167091,-86.781923,05:15:00,05:03:43,05:16:35,-1.583333,,,,0,0,1,0,0.0,0,36,12.866666,ON TIME


In [3]:
# concatenate date & time variables

transit_df['Scheduled_Time_with_Date'] = transit_df['DATE'] + ' ' + transit_df['SCHEDULED_TIME']
transit_df['Actual_Arrival_with_Date'] = transit_df['DATE'] + ' ' + transit_df['ACTUAL_ARRIVAL_TIME']
transit_df['Actual_Departure_with_Date'] = transit_df['DATE'] + ' ' + transit_df['ACTUAL_DEPARTURE_TIME']

transit_df.head()

Unnamed: 0,SERVICE_ABBR,ADHERENCE_ID,DATE,ROUTE_ABBR,BLOCK_ABBR,OPERATOR,TRIP_ID,OVERLOAD_ID,ROUTE_DIRECTION_NAME,TIME_POINT_ABBR,ROUTE_STOP_SEQUENCE,TRIP_EDGE,LATITUDE,LONGITUDE,SCHEDULED_TIME,ACTUAL_ARRIVAL_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT,STOP_CANCELLED,PREV_SCHED_STOP_CANCELLED,IS_RELIEF,BLOCK_STOP_ORDER,DWELL_IN_MINS,ARRIVAL_STATUS,Scheduled_Time_with_Date,Actual_Arrival_with_Date,Actual_Departure_with_Date
0,1,99457890,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,MHSP,14.0,1,36.181248,-86.847705,04:42:00,04:37:38,04:44:08,-2.133333,,,,0,0,1,0,0.0,0,2,6.5,ON TIME,2023-08-01 04:42:00,2023-08-01 04:37:38,2023-08-01 04:44:08
1,1,99457891,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,ELIZ,10.0,0,36.193454,-86.839981,04:46:00,04:48:27,04:48:27,-2.45,,,,0,0,1,0,0.0,0,9,0.0,ON TIME,2023-08-01 04:46:00,2023-08-01 04:48:27,2023-08-01 04:48:27
2,1,99457892,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,CV23,5.0,0,36.182177,-86.814445,04:54:00,04:54:56,04:54:56,-0.933333,,,,0,0,1,0,0.0,0,19,0.0,ON TIME,2023-08-01 04:54:00,2023-08-01 04:54:56,2023-08-01 04:54:56
3,1,99457893,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,MCC5_10,1.0,2,36.167091,-86.781923,05:10:00,05:03:43,05:03:43,6.283333,,,,0,0,1,0,,0,35,0.0,ON TIME,2023-08-01 05:10:00,2023-08-01 05:03:43,2023-08-01 05:03:43
4,1,99457894,2023-08-01,BORDEAUX,2200,1040,345105,0,FROM DOWNTOWN,MCC5_10,1.0,1,36.167091,-86.781923,05:15:00,05:03:43,05:16:35,-1.583333,,,,0,0,1,0,0.0,0,36,12.866666,ON TIME,2023-08-01 05:15:00,2023-08-01 05:03:43,2023-08-01 05:16:35


In [4]:
# convert date & time variables to datetime
transit_df['Scheduled_Time_with_Date'] = pd.to_datetime(transit_df['Scheduled_Time_with_Date'], utc=True)
transit_df['Actual_Arrival_with_Date'] = pd.to_datetime(transit_df['Actual_Arrival_with_Date'], utc=True)
transit_df['Actual_Departure_with_Date'] = pd.to_datetime(transit_df['Actual_Departure_with_Date'], utc=True)


#transit_df['SCHEDULED_TIME'] = pd.to_datetime(transit_df['SCHEDULED_TIME']).dt.time
#transit_df['ACTUAL_ARRIVAL_TIME'] = pd.to_datetime(transit_df['ACTUAL_ARRIVAL_TIME']).dt.time
#transit_df['ACTUAL_DEPARTURE_TIME'] = pd.to_datetime(transit_df['ACTUAL_DEPARTURE_TIME']).dt.time

In [5]:
transit_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 350328 entries, 0 to 350327
Data columns (total 33 columns):
 #   Column                      Non-Null Count   Dtype              
---  ------                      --------------   -----              
 0   SERVICE_ABBR                350328 non-null  int64              
 1   ADHERENCE_ID                350328 non-null  int64              
 2   DATE                        350328 non-null  object             
 3   ROUTE_ABBR                  350328 non-null  object             
 4   BLOCK_ABBR                  350328 non-null  int64              
 5   OPERATOR                    350328 non-null  int64              
 6   TRIP_ID                     350328 non-null  int64              
 7   OVERLOAD_ID                 350328 non-null  int64              
 8   ROUTE_DIRECTION_NAME        350328 non-null  object             
 9   TIME_POINT_ABBR             350328 non-null  object             
 10  ROUTE_STOP_SEQUENCE         350317 non-null 

In [6]:
transit_df.describe()

Unnamed: 0,SERVICE_ABBR,ADHERENCE_ID,BLOCK_ABBR,OPERATOR,TRIP_ID,OVERLOAD_ID,ROUTE_STOP_SEQUENCE,TRIP_EDGE,LATITUDE,LONGITUDE,ADHERENCE,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT,STOP_CANCELLED,PREV_SCHED_STOP_CANCELLED,IS_RELIEF,BLOCK_STOP_ORDER,DWELL_IN_MINS
count,350328.0,350328.0,350328.0,350328.0,350328.0,350328.0,350317.0,350328.0,350328.0,350328.0,338860.0,274737.0,266061.0,265892.0,350328.0,350328.0,350328.0,350328.0,279999.0,350328.0,350328.0,338857.0
mean,1.298466,100103800.0,3846.953289,1951.131054,351609.79687,0.006631,7.351302,0.579089,36.158433,-86.769952,-3.188795,18.18604,18.648567,0.518068,0.027902,0.168522,0.77084,0.015263,0.015193,0.011955,325.17328,2.963914
std,0.633101,322750.7,2042.238399,769.828707,1490.949373,0.097167,4.032993,0.793978,0.059113,0.065548,6.898852,12.261828,14.330594,7.161809,0.164694,0.37433,0.420293,0.122597,0.12232,0.108681,235.335703,7.402945
min,1.0,99457890.0,300.0,0.0,345104.0,0.0,1.0,0.0,36.048934,-86.955657,-948.533333,0.0,0.0,-64.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,-208.033333
25%,1.0,99891400.0,2211.0,1391.0,350900.0,0.0,4.0,0.0,36.127172,-86.812719,-4.566666,10.0,11.0,-2.316667,0.0,0.0,1.0,0.0,0.0,0.0,134.0,0.0
50%,1.0,100134600.0,5006.0,2012.0,352001.0,0.0,6.0,0.0,36.15387,-86.774535,-2.0,15.0,16.183333,0.033333,0.0,0.0,1.0,0.0,0.0,0.0,292.0,0.0
75%,1.0,100348100.0,5505.0,2585.0,352669.0,0.0,10.0,1.0,36.179753,-86.726914,-0.333333,20.0,23.25,2.483333,0.0,0.0,1.0,0.0,0.0,0.0,465.0,1.95
max,3.0,100702900.0,9975.0,3173.0,354106.0,4.0,17.0,2.0,36.307973,-86.636496,88.383333,503.0,590.433333,565.433333,1.0,1.0,1.0,1.0,1.0,1.0,1309.0,956.5


#### How does time of day or day of week affect headway and on-time performance? Can you detect an impact of school schedule on headway deviation (for certain routes and at certain times of day)?

In [7]:
# subset early arrivals

early_df = transit_df[transit_df.ARRIVAL_STATUS == 'EARLY']
early_df.head()

Unnamed: 0,SERVICE_ABBR,ADHERENCE_ID,DATE,ROUTE_ABBR,BLOCK_ABBR,OPERATOR,TRIP_ID,OVERLOAD_ID,ROUTE_DIRECTION_NAME,TIME_POINT_ABBR,ROUTE_STOP_SEQUENCE,TRIP_EDGE,LATITUDE,LONGITUDE,SCHEDULED_TIME,ACTUAL_ARRIVAL_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT,STOP_CANCELLED,PREV_SCHED_STOP_CANCELLED,IS_RELIEF,BLOCK_STOP_ORDER,DWELL_IN_MINS,ARRIVAL_STATUS,Scheduled_Time_with_Date,Actual_Arrival_with_Date,Actual_Departure_with_Date
12,1,99457902,2023-08-01,BORDEAUX,2200,1040,345107,0,FROM DOWNTOWN,CV23,5.0,0,36.18348,-86.81422,06:25:00,06:23:30,06:23:30,1.5,15.0,11.116666,-3.883334,1,0,0,0,0.0,0,112,0.0,EARLY,2023-08-01 06:25:00+00:00,2023-08-01 06:23:30+00:00,2023-08-01 06:23:30+00:00
32,1,99457922,2023-08-01,BORDEAUX,2200,1040,345113,0,FROM DOWNTOWN,CV23,5.0,0,36.18348,-86.81422,09:26:00,09:24:44,09:24:44,1.266666,16.0,10.483333,-5.516667,1,0,0,0,0.0,0,296,0.0,EARLY,2023-08-01 09:26:00+00:00,2023-08-01 09:24:44+00:00,2023-08-01 09:24:44+00:00
38,1,99457928,2023-08-01,BORDEAUX,2200,1040,345115,0,FROM DOWNTOWN,CV23,5.0,0,36.18348,-86.81422,10:21:00,10:19:07,10:19:07,1.883333,12.0,8.85,-3.15,1,0,0,0,0.0,0,356,0.0,EARLY,2023-08-01 10:21:00+00:00,2023-08-01 10:19:07+00:00,2023-08-01 10:19:07+00:00
113,1,99458003,2023-08-01,BORDEAUX,2200,2689,345138,0,FROM DOWNTOWN,MCC5_10,1.0,1,36.167091,-86.781923,23:15:00,23:06:47,23:06:47,8.216666,30.0,19.683333,-10.316667,1,0,0,0,0.0,0,1069,0.0,EARLY,2023-08-01 23:15:00+00:00,2023-08-01 23:06:47+00:00,2023-08-01 23:06:47+00:00
134,1,99458026,2023-08-01,BORDEAUX,2201,1617,345215,0,TO DOWNTOWN,CV23,5.0,0,36.182177,-86.814445,06:24:00,06:21:57,06:21:57,2.05,15.0,11.933333,-3.066667,1,0,0,1,0.0,0,99,0.0,EARLY,2023-08-01 06:24:00+00:00,2023-08-01 06:21:57+00:00,2023-08-01 06:21:57+00:00


In [9]:
# new df with headway/time/date
headway_early_df = early_df[['Scheduled_Time_with_Date','Actual_Arrival_with_Date', 'Actual_Departure_with_Date', 'SCHEDULED_HDWY', 'ACTUAL_HDWY', 'HDWY_DEV', 'ARRIVAL_STATUS']]
headway_early_df.head()

Unnamed: 0,Scheduled_Time_with_Date,Actual_Arrival_with_Date,Actual_Departure_with_Date,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ARRIVAL_STATUS
12,2023-08-01 06:25:00+00:00,2023-08-01 06:23:30+00:00,2023-08-01 06:23:30+00:00,15.0,11.116666,-3.883334,EARLY
32,2023-08-01 09:26:00+00:00,2023-08-01 09:24:44+00:00,2023-08-01 09:24:44+00:00,16.0,10.483333,-5.516667,EARLY
38,2023-08-01 10:21:00+00:00,2023-08-01 10:19:07+00:00,2023-08-01 10:19:07+00:00,12.0,8.85,-3.15,EARLY
113,2023-08-01 23:15:00+00:00,2023-08-01 23:06:47+00:00,2023-08-01 23:06:47+00:00,30.0,19.683333,-10.316667,EARLY
134,2023-08-01 06:24:00+00:00,2023-08-01 06:21:57+00:00,2023-08-01 06:21:57+00:00,15.0,11.933333,-3.066667,EARLY


In [10]:
# subset on time arrivals
on_time_df = transit_df[transit_df.ARRIVAL_STATUS == 'ON TIME']
on_time_df.head()

Unnamed: 0,SERVICE_ABBR,ADHERENCE_ID,DATE,ROUTE_ABBR,BLOCK_ABBR,OPERATOR,TRIP_ID,OVERLOAD_ID,ROUTE_DIRECTION_NAME,TIME_POINT_ABBR,ROUTE_STOP_SEQUENCE,TRIP_EDGE,LATITUDE,LONGITUDE,SCHEDULED_TIME,ACTUAL_ARRIVAL_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT,STOP_CANCELLED,PREV_SCHED_STOP_CANCELLED,IS_RELIEF,BLOCK_STOP_ORDER,DWELL_IN_MINS,ARRIVAL_STATUS,Scheduled_Time_with_Date,Actual_Arrival_with_Date,Actual_Departure_with_Date
0,1,99457890,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,MHSP,14.0,1,36.181248,-86.847705,04:42:00,04:37:38,04:44:08,-2.133333,,,,0,0,1,0,0.0,0,2,6.5,ON TIME,2023-08-01 04:42:00+00:00,2023-08-01 04:37:38+00:00,2023-08-01 04:44:08+00:00
1,1,99457891,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,ELIZ,10.0,0,36.193454,-86.839981,04:46:00,04:48:27,04:48:27,-2.45,,,,0,0,1,0,0.0,0,9,0.0,ON TIME,2023-08-01 04:46:00+00:00,2023-08-01 04:48:27+00:00,2023-08-01 04:48:27+00:00
2,1,99457892,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,CV23,5.0,0,36.182177,-86.814445,04:54:00,04:54:56,04:54:56,-0.933333,,,,0,0,1,0,0.0,0,19,0.0,ON TIME,2023-08-01 04:54:00+00:00,2023-08-01 04:54:56+00:00,2023-08-01 04:54:56+00:00
3,1,99457893,2023-08-01,BORDEAUX,2200,1040,345104,0,TO DOWNTOWN,MCC5_10,1.0,2,36.167091,-86.781923,05:10:00,05:03:43,05:03:43,6.283333,,,,0,0,1,0,,0,35,0.0,ON TIME,2023-08-01 05:10:00+00:00,2023-08-01 05:03:43+00:00,2023-08-01 05:03:43+00:00
4,1,99457894,2023-08-01,BORDEAUX,2200,1040,345105,0,FROM DOWNTOWN,MCC5_10,1.0,1,36.167091,-86.781923,05:15:00,05:03:43,05:16:35,-1.583333,,,,0,0,1,0,0.0,0,36,12.866666,ON TIME,2023-08-01 05:15:00+00:00,2023-08-01 05:03:43+00:00,2023-08-01 05:16:35+00:00


In [13]:
# new df with headway/time/date
headway_on_time_df = on_time_df[['Scheduled_Time_with_Date','Actual_Arrival_with_Date', 'Actual_Departure_with_Date', 'SCHEDULED_HDWY', 'ACTUAL_HDWY', 'HDWY_DEV', 'ARRIVAL_STATUS']]
headway_on_time_df.head()

Unnamed: 0,Scheduled_Time_with_Date,Actual_Arrival_with_Date,Actual_Departure_with_Date,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ARRIVAL_STATUS
0,2023-08-01 04:42:00+00:00,2023-08-01 04:37:38+00:00,2023-08-01 04:44:08+00:00,,,,ON TIME
1,2023-08-01 04:46:00+00:00,2023-08-01 04:48:27+00:00,2023-08-01 04:48:27+00:00,,,,ON TIME
2,2023-08-01 04:54:00+00:00,2023-08-01 04:54:56+00:00,2023-08-01 04:54:56+00:00,,,,ON TIME
3,2023-08-01 05:10:00+00:00,2023-08-01 05:03:43+00:00,2023-08-01 05:03:43+00:00,,,,ON TIME
4,2023-08-01 05:15:00+00:00,2023-08-01 05:03:43+00:00,2023-08-01 05:16:35+00:00,,,,ON TIME


In [12]:
# subset late arrivals
late_df = transit_df[transit_df.ARRIVAL_STATUS == 'LATE']
late_df.head()

Unnamed: 0,SERVICE_ABBR,ADHERENCE_ID,DATE,ROUTE_ABBR,BLOCK_ABBR,OPERATOR,TRIP_ID,OVERLOAD_ID,ROUTE_DIRECTION_NAME,TIME_POINT_ABBR,ROUTE_STOP_SEQUENCE,TRIP_EDGE,LATITUDE,LONGITUDE,SCHEDULED_TIME,ACTUAL_ARRIVAL_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT,STOP_CANCELLED,PREV_SCHED_STOP_CANCELLED,IS_RELIEF,BLOCK_STOP_ORDER,DWELL_IN_MINS,ARRIVAL_STATUS,Scheduled_Time_with_Date,Actual_Arrival_with_Date,Actual_Departure_with_Date
69,1,99457959,2023-08-01,BORDEAUX,2200,2374,345124,0,FROM DOWNTOWN,YGKG,13.0,2,36.203239,-86.840636,15:44:00,15:56:35,15:56:35,-12.583333,,,,0,1,0,0,,0,656,0.0,LATE,2023-08-01 15:44:00+00:00,2023-08-01 15:56:35+00:00,2023-08-01 15:56:35+00:00
78,1,99457968,2023-08-01,BORDEAUX,2200,2689,345127,0,TO DOWNTOWN,CV23,5.0,0,36.182177,-86.814445,17:14:00,17:20:10,17:20:10,-6.166666,12.0,16.683333,4.683333,0,1,0,0,0.0,0,732,0.0,LATE,2023-08-01 17:14:00+00:00,2023-08-01 17:20:10+00:00,2023-08-01 17:20:10+00:00
83,1,99457973,2023-08-01,BORDEAUX,2200,2689,345128,0,FROM DOWNTOWN,HPKL,12.0,0,36.218706,-86.834137,18:06:00,18:13:16,18:13:16,-7.266666,49.0,45.016666,-3.983334,0,1,0,0,0.0,0,785,0.0,LATE,2023-08-01 18:06:00+00:00,2023-08-01 18:13:16+00:00,2023-08-01 18:13:16+00:00
91,1,99457981,2023-08-01,BORDEAUX,2200,2689,345131,0,TO DOWNTOWN,MHSP,14.0,1,36.181248,-86.847705,19:35:00,19:19:31,19:41:31,-6.516666,50.0,54.166666,4.166666,0,1,0,0,0.0,0,851,22.0,LATE,2023-08-01 19:35:00+00:00,2023-08-01 19:19:31+00:00,2023-08-01 19:41:31+00:00
92,1,99457982,2023-08-01,BORDEAUX,2200,2689,345131,0,TO DOWNTOWN,CV23,5.0,0,36.182177,-86.814445,19:44:00,19:50:44,19:50:44,-6.733333,25.0,30.816666,5.816666,0,1,0,0,0.0,0,867,0.0,LATE,2023-08-01 19:44:00+00:00,2023-08-01 19:50:44+00:00,2023-08-01 19:50:44+00:00


In [14]:
# new df with headway/time/date
headway_late_df = late_df[['Scheduled_Time_with_Date','Actual_Arrival_with_Date', 'Actual_Departure_with_Date', 'SCHEDULED_HDWY', 'ACTUAL_HDWY', 'HDWY_DEV', 'ARRIVAL_STATUS']]
headway_late_df.head()

Unnamed: 0,Scheduled_Time_with_Date,Actual_Arrival_with_Date,Actual_Departure_with_Date,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ARRIVAL_STATUS
69,2023-08-01 15:44:00+00:00,2023-08-01 15:56:35+00:00,2023-08-01 15:56:35+00:00,,,,LATE
78,2023-08-01 17:14:00+00:00,2023-08-01 17:20:10+00:00,2023-08-01 17:20:10+00:00,12.0,16.683333,4.683333,LATE
83,2023-08-01 18:06:00+00:00,2023-08-01 18:13:16+00:00,2023-08-01 18:13:16+00:00,49.0,45.016666,-3.983334,LATE
91,2023-08-01 19:35:00+00:00,2023-08-01 19:19:31+00:00,2023-08-01 19:41:31+00:00,50.0,54.166666,4.166666,LATE
92,2023-08-01 19:44:00+00:00,2023-08-01 19:50:44+00:00,2023-08-01 19:50:44+00:00,25.0,30.816666,5.816666,LATE
