In [1]:
import pandas as pd

In [2]:
wego = pd.read_csv("../data/headway.csv")

wego.head()

Unnamed: 0,CALENDAR_ID,SERVICE_ABBR,ADHERENCE_ID,DATE,ROUTE_ABBR,BLOCK_ABBR,OPERATOR,TRIP_ID,OVERLOAD_ID,ROUTE_DIRECTION_NAME,...,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT,STOP_CANCELLED,PREV_SCHED_STOP_CANCELLED,IS_RELIEF,DWELL_IN_MINS,SCHEDULED_LAYOVER_MINUTES
0,120230801,1,99457890,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,...,,,0,0,1,0,0.0,0,6.5,
1,120230801,1,99457891,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,...,,,0,0,1,0,0.0,0,0.0,
2,120230801,1,99457892,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,...,,,0,0,1,0,0.0,0,0.0,
3,120230801,1,99457893,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,...,,,0,0,1,0,,0,0.0,
4,120230801,1,99457894,2023-08-01,22,2200,1040,345105,0,FROM DOWNTOWN,...,,,0,0,1,0,0.0,0,12.866666,5.0


In [3]:
wego.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 350329 entries, 0 to 350328
Data columns (total 30 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   CALENDAR_ID                350329 non-null  int64  
 1   SERVICE_ABBR               350329 non-null  int64  
 2   ADHERENCE_ID               350329 non-null  int64  
 3   DATE                       350329 non-null  object 
 4   ROUTE_ABBR                 350329 non-null  int64  
 5   BLOCK_ABBR                 350329 non-null  int64  
 6   OPERATOR                   350329 non-null  int64  
 7   TRIP_ID                    350329 non-null  int64  
 8   OVERLOAD_ID                350329 non-null  int64  
 9   ROUTE_DIRECTION_NAME       350329 non-null  object 
 10  TIME_POINT_ABBR            350329 non-null  object 
 11  ROUTE_STOP_SEQUENCE        350318 non-null  float64
 12  TRIP_EDGE                  350329 non-null  int64  
 13  LATITUDE                   35

In [5]:
wego.shape

(350329, 30)

Headway is the amount of time between a bus and the prior bus at the same stop. In the dataset, the amount of headway scheduled is contained in the SCHEDULED_HDWY column and indicates the difference between the scheduled time for a particular stop and the scheduled time for the previous bus on that same stop.

This dataset contains a column HDWY_DEV, which shows the amount of deviation from the scheduled headway. Bunching occurs when there is shorter headway than scheduled, which would appear as a negative HDWY_DEV value. Gapping is when there is more headway than scheduled and appears as a positive value in the HDWY_DEV column. Note that you can calculate headway deviation percentage as HDWY_DEV/SCHEDULED_HDWY. The generally accepted range of headway deviation is 50% to 150% of the scheduled headway, so if scheduled headway is 10 minutes, a headway deviation of up to 5 minutes would be acceptable (but not ideal).

Another important variable is adherence, which compares the actual departure time to the scheduled time and is included in the ADHERENCE column. A negative adherence value means that a bus left a time point late and a positive adherence indicates that the bus left the time point early. Buses with adherence values beyond negative 6 are generally considered late and beyond positive 1 are considered early. However, there is some additional logic where the staff applies waivers to allow early departures, such as an express bus that has already picked up everyone at a park-and-ride lot and is only dropping people off at the remaining stops, and also allows for early timepoint records for all records where TRIP_EDGE = 2 (end of trip), since it is not a problem if a bus ends its trip early as long as it didn't pass other timepoints early along the way. Note: When determining whether a bus is early or late, it is advised that you use the 'ADJUSTED_EARLY_COUNT', 'ADJUSTED_LATE_COUNT', and 'ADJUSTED_ONTIME_COUNT' columns in order to account for the adjustments.

#### Goals of this project:

1.) What is the overall on-time performance, and what do the overall distributions of adherence and headway deviation look like?

    calculate headway deviation percentage as HDWY_DEV/SCHEDULED_HDWY

2.) How does direction of travel, route, or location affect the headway and on-time performance?

3.) How does time of day or day of week affect headway and on-time performance?

4.) How much of a factor does the driver have on headway and on-time performance? The driver is indicated by the OPERATOR variable.

5.) Is there any relationship between lateness (ADHERENCE) and headway deviation?

In [7]:
wego[[
        'DATE', 'CALENDAR_ID', 'TRIP_ID', 'ROUTE_ABBR',
        'TIME_POINT_ABBR', 'TRIP_EDGE',
        'SCHEDULED_TIME', 'ACTUAL_DEPARTURE_TIME', 'ADHERENCE',
        'ADJUSTED_EARLY_COUNT', 'ADJUSTED_LATE_COUNT', 'ADJUSTED_ONTIME_COUNT'
    ]]

Unnamed: 0,DATE,CALENDAR_ID,TRIP_ID,ROUTE_ABBR,TIME_POINT_ABBR,TRIP_EDGE,SCHEDULED_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT
0,2023-08-01,120230801,345104,22,MHSP,1,2023-08-01 04:42:00,2023-08-01 04:44:08,-2.133333,0,0,1
1,2023-08-01,120230801,345104,22,ELIZ,0,2023-08-01 04:46:00,2023-08-01 04:48:27,-2.450000,0,0,1
2,2023-08-01,120230801,345104,22,CV23,0,2023-08-01 04:54:00,2023-08-01 04:54:56,-0.933333,0,0,1
3,2023-08-01,120230801,345104,22,MCC5_10,2,2023-08-01 05:10:00,2023-08-01 05:03:43,6.283333,0,0,1
4,2023-08-01,120230801,345105,22,MCC5_10,1,2023-08-01 05:15:00,2023-08-01 05:16:35,-1.583333,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...
350324,2023-09-30,120230930,353448,7,21BK,0,2023-09-30 22:23:00,2023-09-30 22:31:26,-8.433333,0,1,0
350325,2023-09-30,120230930,353448,7,MCC5_9,2,2023-09-30 22:38:00,2023-09-30 22:49:18,-11.300000,0,1,0
350326,2023-09-30,120230930,353449,7,MCC5_9,1,2023-09-30 22:45:00,2023-09-30 22:49:19,-4.316666,0,0,1
350327,2023-09-30,120230930,353449,7,21BK,0,2023-09-30 22:59:00,2023-09-30 23:21:05,-22.083333,0,1,0


## overall on-time performance

In [12]:
(wego['ADJUSTED_ONTIME_COUNT'].sum()/350329)*100

77.08411236295025

## distribution of adherence

## distribution of headway

In [24]:
wego[['DATE', 'TRIP_ID', 'TIME_POINT_ABBR','ROUTE_DIRECTION_NAME', 'TRIP_EDGE', 
      'SCHEDULED_TIME', 'SCHEDULED_HDWY',
      'ACTUAL_DEPARTURE_TIME', 'ACTUAL_HDWY', 'HDWY_DEV'
     ]].sort_values(['DATE', 'SCHEDULED_TIME'])

Unnamed: 0,DATE,TRIP_ID,TIME_POINT_ABBR,ROUTE_DIRECTION_NAME,TRIP_EDGE,SCHEDULED_TIME,SCHEDULED_HDWY,ACTUAL_DEPARTURE_TIME,ACTUAL_HDWY,HDWY_DEV
742,2023-08-01,345498,DWMRT,FROM DOWNTOWN,1,2023-08-01 04:20:00,,2023-08-01 04:20:43,,
3853,2023-08-01,347480,HHWM,TO DOWNTOWN,1,2023-08-01 04:23:00,,2023-08-01 04:23:54,,
743,2023-08-01,345498,EDBC,FROM DOWNTOWN,0,2023-08-01 04:29:00,,2023-08-01 04:28:24,,
3854,2023-08-01,347480,MXBELL,TO DOWNTOWN,0,2023-08-01 04:30:00,,2023-08-01 04:30:37,,
4999,2023-08-01,347902,GXRVRGAT,TO DOWNTOWN,1,2023-08-01 04:30:00,,2023-08-01 04:31:17,,
...,...,...,...,...,...,...,...,...,...,...
348271,2023-09-30,352036,HCKP,FROM DOWNTOWN,0,2023-10-01 00:57:00,30.0,,,
349317,2023-09-30,352588,HHWM,FROM DOWNTOWN,2,2023-10-01 01:02:00,,2023-10-02 01:18:58,,
347118,2023-09-30,350940,MP&R,FROM DOWNTOWN,2,2023-10-01 01:04:00,,2023-10-02 01:10:03,,
348272,2023-09-30,352036,LINWAL,FROM DOWNTOWN,0,2023-10-01 01:08:00,30.0,,,


In [None]:
#calculate headway deviation percentage as HDWY_DEV/SCHEDULED_HDWY
