## WeGo Data Introduction

In [2]:
import pandas as pd

In [152]:
wego = pd.read_csv("../data/Route 50 Timepoint and Headway Data, 1-1-2023 through 5-12-2025.csv")

All the data that you have been provided is from Route 50, Charlotte Pike.

In [154]:
wego['ROUTE_ABBR'].value_counts().sort_index()

ROUTE_ABBR
50    618998
Name: count, dtype: int64

The trip can be identified by the DATE/CALENDAR_ID plus the TRIP_ID.  
**Warning:** The TRIP_ID refers to the route and time but will be used across multiple days.

The data contains multiple **time points** for each trip. There are more stops along the route than time points, but the time points are the points with specific scheduled times the bus operators must adhere to.

The first stop of a trip has a TRIP_EDGE of 1, the last has a TRIP_EDGE of 2, and the intermediate stops are TRIP_EDGE 0. 

Here is the first trip in the dataset. It occurred on January 1, 2023 and was scheduled to start at 5:34 and end at 6:05.

In [156]:
(
    wego
    .loc[wego['CALENDAR_ID'] == 120230101]
    .loc[wego['TRIP_ID'] == 332422]
    [['DATE', 'CALENDAR_ID', 'TRIP_ID', 'ROUTE_ABBR', 'TIME_POINT_ABBR', 'TRIP_EDGE', 'SCHEDULED_TIME']]
)

Unnamed: 0,DATE,CALENDAR_ID,TRIP_ID,ROUTE_ABBR,TIME_POINT_ABBR,TRIP_EDGE,SCHEDULED_TIME
0,2023-01-01,120230101,332422,50,WALM,1,05:34:00
1,2023-01-01,120230101,332422,50,HLWD,0,05:40:00
2,2023-01-01,120230101,332422,50,WHBG,0,05:47:00
3,2023-01-01,120230101,332422,50,CH46,0,05:50:00
4,2023-01-01,120230101,332422,50,28&CHARL,0,05:54:00
5,2023-01-01,120230101,332422,50,MCC5_1,2,06:05:00


Note that the same TRIP_ID appears on the following day with a different CALENDAR_ID.

In [158]:
(
    wego
    .loc[wego['CALENDAR_ID'] == 120230102]
    .loc[wego['TRIP_ID'] == 332422]
    [['DATE', 'CALENDAR_ID', 'TRIP_ID', 'ROUTE_ABBR', 'TIME_POINT_ABBR', 'TRIP_EDGE', 'SCHEDULED_TIME']]
)

Unnamed: 0,DATE,CALENDAR_ID,TRIP_ID,ROUTE_ABBR,TIME_POINT_ABBR,TRIP_EDGE,SCHEDULED_TIME
516,2023-01-02,120230102,332422,50,WALM,1,05:34:00
517,2023-01-02,120230102,332422,50,HLWD,0,05:40:00
518,2023-01-02,120230102,332422,50,WHBG,0,05:47:00
519,2023-01-02,120230102,332422,50,CH46,0,05:50:00
520,2023-01-02,120230102,332422,50,28&CHARL,0,05:54:00
521,2023-01-02,120230102,332422,50,MCC5_1,2,06:05:00


**Adherence** refers to the difference between scheduled time and the actual time that the bus departs from a stop.

A negative value for ADHERENCE indicates that the bus is late, and a positive indicates that the bus is early.

Generally, an adherence value less than -6 is considered late, and greater than 1 is considered early, but there are some exceptions. For example, a positive adherence for the end of a trip (TRIP_EDGE 2) is not considered early, since it is not a problem if a bus ends its trip early as long as it didn't pass other timepoints early along the way. You can check whether a trip was considered on-time, early, or late using the ADJUSTED_EARLY_COUNT, ADJUSTED_LATE_COUNT, and ADJUSTED_ONTIME_COUNT columns.

Here is an example of a trip where all time points would be considered to be on time. Notice that at the end of the trip, the bus was almost 3 minutes early, but was still counted as on-time since this was a trip edge of 2. Also, the bus departed more than 5 minutes late but was still considered to be on-time.

In [160]:
(
    wego
    .loc[wego['CALENDAR_ID'] == 120230101]
    .loc[wego['TRIP_ID'] == 332423]
    [[
        'DATE', 'CALENDAR_ID', 'TRIP_ID', 'ROUTE_ABBR',
        'TIME_POINT_ABBR', 'TRIP_EDGE',
        'SCHEDULED_TIME', 'ACTUAL_DEPARTURE_TIME', 'ADHERENCE',
        'ADJUSTED_EARLY_COUNT', 'ADJUSTED_LATE_COUNT', 'ADJUSTED_ONTIME_COUNT'
    ]]
)

Unnamed: 0,DATE,CALENDAR_ID,TRIP_ID,ROUTE_ABBR,TIME_POINT_ABBR,TRIP_EDGE,SCHEDULED_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT
6,2023-01-01,120230101,332423,50,MCC5_1,1,06:15:00,06:20:12,-5.2,0,0,1
7,2023-01-01,120230101,332423,50,28&CHARL,0,06:25:00,06:26:55,-1.916666,0,0,1
8,2023-01-01,120230101,332423,50,CH46,0,06:29:00,06:31:29,-2.483333,0,0,1
9,2023-01-01,120230101,332423,50,WHBG,0,06:33:00,06:35:12,-2.2,0,0,1
10,2023-01-01,120230101,332423,50,HLWD,0,06:40:00,06:40:41,-0.683333,0,0,1
11,2023-01-01,120230101,332423,50,WALM,2,06:47:00,06:44:01,2.983333,0,0,1


Here's an example of another trip later that same day that was considered late at the first time point, departing almost 8 minutes behind schedule.

In [162]:
(
    wego
    .loc[wego['CALENDAR_ID'] == 120230101]
    .loc[wego['TRIP_ID'] == 332493]
    [[
        'DATE', 'CALENDAR_ID', 'TRIP_ID', 'ROUTE_ABBR',
        'TIME_POINT_ABBR', 'TRIP_EDGE',
        'SCHEDULED_TIME', 'ACTUAL_DEPARTURE_TIME', 'ADHERENCE',
        'ADJUSTED_EARLY_COUNT', 'ADJUSTED_LATE_COUNT', 'ADJUSTED_ONTIME_COUNT'
    ]]
)

Unnamed: 0,DATE,CALENDAR_ID,TRIP_ID,ROUTE_ABBR,TIME_POINT_ABBR,TRIP_EDGE,SCHEDULED_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT
210,2023-01-01,120230101,332493,50,MCC5_1,1,14:35:00,14:42:44,-7.733333,0,1,0
211,2023-01-01,120230101,332493,50,28&CHARL,0,14:45:00,14:49:29,-4.483333,0,0,1
212,2023-01-01,120230101,332493,50,CH46,0,14:49:00,14:54:25,-5.416666,0,0,1
213,2023-01-01,120230101,332493,50,WHBG,0,14:53:00,14:57:02,-4.033333,0,0,1
214,2023-01-01,120230101,332493,50,HLWD,0,15:01:00,15:03:04,-2.066666,0,0,1
215,2023-01-01,120230101,332493,50,WALM,2,15:10:00,15:06:38,3.366666,0,0,1


This trip departed nearly 5 minutes early from the first time point, so that time point would be considered early.

In [164]:
(
    wego
    .loc[wego['CALENDAR_ID'] == 120230101]
    .loc[wego['TRIP_ID'] == 332431]
    [[
        'DATE', 'CALENDAR_ID', 'TRIP_ID', 'ROUTE_ABBR',
        'TIME_POINT_ABBR', 'TRIP_EDGE',
        'SCHEDULED_TIME', 'ACTUAL_DEPARTURE_TIME', 'ADHERENCE',
        'ADJUSTED_EARLY_COUNT', 'ADJUSTED_LATE_COUNT', 'ADJUSTED_ONTIME_COUNT'
    ]]
)

Unnamed: 0,DATE,CALENDAR_ID,TRIP_ID,ROUTE_ABBR,TIME_POINT_ABBR,TRIP_EDGE,SCHEDULED_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT
54,2023-01-01,120230101,332431,50,MCC5_1,1,12:15:00,12:10:04,4.933333,1,0,0
55,2023-01-01,120230101,332431,50,28&CHARL,0,12:25:00,12:27:55,-2.916666,0,0,1
56,2023-01-01,120230101,332431,50,CH46,0,12:29:00,12:30:48,-1.8,0,0,1
57,2023-01-01,120230101,332431,50,WHBG,0,12:34:00,12:33:37,0.383333,0,0,1
58,2023-01-01,120230101,332431,50,HLWD,0,12:42:00,12:41:42,0.3,0,0,1
59,2023-01-01,120230101,332431,50,WALM,2,12:50:00,12:48:24,1.6,0,0,1


**Headway** is the amount of time between a bus and the prior bus at the same stop. In the dataset, the amount of headway scheduled is contained in the SCHEDULED_HDWY column and indicates the difference between the scheduled time for a particular stop and the scheduled time for the previous bus on that same stop.

This dataset contains a column HDWY_DEV, which shows the amount of deviation from the scheduled headway. **Bunching** occurs when there is shorter headway than scheduled, which would appear as a negative HDWY_DEV value. **Gapping** is when there is more headway than scheduled and appears as a positive value in the HDWY_DEV column. Note that you can calculate headway deviation percentage as HDWY_DEV/SCHEDULED_HDWY. 

The generally accepted range of headway deviation is 50% to 150% of the scheduled headway, so if scheduled headway is 10 minutes, a headway deviation of up to 5 minutes would be acceptable (but not ideal).

Here, you can see consecutive trips at the same time point. Notice that the scheduled headway is based on the scheduled time and that the actual headway is based on the actual departure times.

In [166]:
(
    wego
    .loc[wego['ROUTE_DIRECTION_NAME'] == 'TO DOWNTOWN']
    .loc[wego['TIME_POINT_ABBR'] == 'CH46']
    [['DATE', 'TRIP_ID', 'TIME_POINT_ABBR','ROUTE_DIRECTION_NAME', 'TRIP_EDGE', 
      'SCHEDULED_TIME', 'SCHEDULED_HDWY',
      'ACTUAL_DEPARTURE_TIME', 'ACTUAL_HDWY', 'HDWY_DEV'
     ]]
    .sort_values(['DATE', 'SCHEDULED_TIME'])
    .iloc[:5]
)

Unnamed: 0,DATE,TRIP_ID,TIME_POINT_ABBR,ROUTE_DIRECTION_NAME,TRIP_EDGE,SCHEDULED_TIME,SCHEDULED_HDWY,ACTUAL_DEPARTURE_TIME,ACTUAL_HDWY,HDWY_DEV
3,2023-01-01,332422,CH46,TO DOWNTOWN,0,05:50:00,,05:50:34,,
147,2023-01-01,332482,CH46,TO DOWNTOWN,0,06:20:00,30.0,06:19:11,28.616666,-1.383334
243,2023-01-01,332536,CH46,TO DOWNTOWN,0,06:50:00,30.0,06:50:14,31.05,1.05
15,2023-01-01,332424,CH46,TO DOWNTOWN,0,07:20:00,30.0,07:24:18,34.066666,4.066666
159,2023-01-01,332484,CH46,TO DOWNTOWN,0,07:50:00,30.0,07:51:15,26.95,-3.05


In [168]:
wego.columns.tolist()

['CALENDAR_ID',
 'SERVICE_ABBR',
 'ADHERENCE_ID',
 'DATE',
 'ROUTE_ABBR',
 'BLOCK_ABBR',
 'OPERATOR',
 'TRIP_ID',
 'OVERLOAD_ID',
 'ROUTE_DIRECTION_NAME',
 'TIME_POINT_ABBR',
 'ROUTE_STOP_SEQUENCE',
 'TRIP_EDGE',
 'LATITUDE',
 'LONGITUDE',
 'SCHEDULED_TIME',
 'ACTUAL_ARRIVAL_TIME',
 'ACTUAL_DEPARTURE_TIME',
 'ADHERENCE',
 'SCHEDULED_HDWY',
 'ACTUAL_HDWY',
 'HDWY_DEV',
 'ADJUSTED_EARLY_COUNT',
 'ADJUSTED_LATE_COUNT',
 'ADJUSTED_ONTIME_COUNT',
 'STOP_CANCELLED',
 'PREV_SCHED_STOP_CANCELLED',
 'IS_RELIEF',
 'BLOCK_STOP_ORDER',
 'DWELL_IN_MINS']

In [170]:
wego.head()

Unnamed: 0,CALENDAR_ID,SERVICE_ABBR,ADHERENCE_ID,DATE,ROUTE_ABBR,BLOCK_ABBR,OPERATOR,TRIP_ID,OVERLOAD_ID,ROUTE_DIRECTION_NAME,...,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT,STOP_CANCELLED,PREV_SCHED_STOP_CANCELLED,IS_RELIEF,BLOCK_STOP_ORDER,DWELL_IN_MINS
0,120230101,3,93549161,2023-01-01,50,5000,2355,332422,0,TO DOWNTOWN,...,,,0,0,1,0,0.0,0,2,8.133333
1,120230101,3,93549162,2023-01-01,50,5000,2355,332422,0,TO DOWNTOWN,...,,,0,0,1,0,0.0,0,5,0.0
2,120230101,3,93549163,2023-01-01,50,5000,2355,332422,0,TO DOWNTOWN,...,,,0,0,1,0,0.0,0,11,0.0
3,120230101,3,93549164,2023-01-01,50,5000,2355,332422,0,TO DOWNTOWN,...,,,0,0,1,0,0.0,0,13,0.0
4,120230101,3,93549165,2023-01-01,50,5000,2355,332422,0,TO DOWNTOWN,...,,,0,0,1,0,0.0,0,18,2.15


In [172]:
dates = [120250203, 120250210, 120250428, 120250505, 120250512]

In [214]:
view = (
    wego
    .loc[wego['ROUTE_ABBR'] == 50]
    .loc[wego['CALENDAR_ID'].isin([120250203, 120250210, 120250428, 120250505, 120250512])]
    [['CALENDAR_ID',
 'SERVICE_ABBR',
 'DATE',
 'ROUTE_ABBR',
 'TRIP_ID',
 'OVERLOAD_ID',
 'ROUTE_DIRECTION_NAME',
 'ROUTE_STOP_SEQUENCE',
 'TRIP_EDGE',
 'SCHEDULED_TIME',
 'ACTUAL_ARRIVAL_TIME',
 'ACTUAL_DEPARTURE_TIME',
 'ADHERENCE',
 'SCHEDULED_HDWY',
 'ACTUAL_HDWY',
 'HDWY_DEV',
 'ADJUSTED_EARLY_COUNT',
 'ADJUSTED_LATE_COUNT',
 'ADJUSTED_ONTIME_COUNT'
        ]]
)

In [176]:
view.sort_values('ACTUAL_ARRIVAL_TIME', ascending = False)

Unnamed: 0,CALENDAR_ID,SERVICE_ABBR,DATE,ROUTE_ABBR,TRIP_ID,OVERLOAD_ID,ROUTE_DIRECTION_NAME,ROUTE_STOP_SEQUENCE,TRIP_EDGE,SCHEDULED_TIME,ACTUAL_ARRIVAL_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT
618862,120250512,1,2025-05-12,50,429643,0,FROM DOWNTOWN,13,1,1900-01-01 00:15:00,23:59:22,1900-01-01 00:16:28,-1.466666,30.0,28.200000,-1.800000,0,0,1
618861,120250512,1,2025-05-12,50,429642,0,TO DOWNTOWN,13,2,1900-01-01 00:05:00,23:59:22,23:59:22,5.633333,,,,0,0,1
613594,120250505,1,2025-05-05,50,429575,0,FROM DOWNTOWN,15,0,23:57:00,23:59:14,23:59:14,-2.233333,30.0,30.566666,0.566666,0,0,1
613748,120250505,1,2025-05-05,50,429643,0,FROM DOWNTOWN,13,1,1900-01-01 00:15:00,23:59:06,1900-01-01 00:16:18,-1.300000,30.0,28.666666,-1.333334,0,0,1
613747,120250505,1,2025-05-05,50,429642,0,TO DOWNTOWN,13,2,1900-01-01 00:05:00,23:59:06,23:59:06,5.900000,,,,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
618563,120250512,1,2025-05-12,50,429523,0,FROM DOWNTOWN,14,0,1900-01-01 00:53:00,,,,30.0,,,0,0,0
618564,120250512,1,2025-05-12,50,429523,0,FROM DOWNTOWN,15,0,1900-01-01 00:57:00,,,,30.0,,,0,0,0
618565,120250512,1,2025-05-12,50,429523,0,FROM DOWNTOWN,16,0,1900-01-01 01:00:00,,,,30.0,,,0,0,0
618566,120250512,1,2025-05-12,50,429523,0,FROM DOWNTOWN,17,0,1900-01-01 01:06:00,,,,30.0,,,0,0,0


In [182]:
view['ACTUAL_ARRIVAL_TIME'] = view['ACTUAL_ARRIVAL_TIME'].str[-8:]

In [204]:
view['ACTUAL_ARRIVAL_TIME'] = pd.to_datetime(view['ACTUAL_ARRIVAL_TIME'], format='%X')

In [206]:
view.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4032 entries, 546973 to 618997
Data columns (total 19 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   CALENDAR_ID            4032 non-null   int64         
 1   SERVICE_ABBR           4032 non-null   int64         
 2   DATE                   4032 non-null   object        
 3   ROUTE_ABBR             4032 non-null   int64         
 4   TRIP_ID                4032 non-null   int64         
 5   OVERLOAD_ID            4032 non-null   int64         
 6   ROUTE_DIRECTION_NAME   4032 non-null   object        
 7   ROUTE_STOP_SEQUENCE    4032 non-null   int64         
 8   TRIP_EDGE              4032 non-null   int64         
 9   SCHEDULED_TIME         4032 non-null   object        
 10  ACTUAL_ARRIVAL_TIME    3994 non-null   datetime64[ns]
 11  ACTUAL_DEPARTURE_TIME  3993 non-null   object        
 12  ADHERENCE              3993 non-null   float64       
 13  S

In [222]:
start_time = '12:00:00'
end_time = '23:59:59'

In [224]:
tsp_view = view[(view['ACTUAL_ARRIVAL_TIME'] >= start_time) & (view['ACTUAL_ARRIVAL_TIME'] <= end_time)]

In [226]:
tsp_view

Unnamed: 0,CALENDAR_ID,SERVICE_ABBR,DATE,ROUTE_ABBR,TRIP_ID,OVERLOAD_ID,ROUTE_DIRECTION_NAME,ROUTE_STOP_SEQUENCE,TRIP_EDGE,SCHEDULED_TIME,ACTUAL_ARRIVAL_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT
546973,120250203,1,2025-02-03,50,417223,0,TO DOWNTOWN,11,1,15:10:00,14:33:16,15:09:02,0.966666,0.0,13.333333,13.333333,0,0,1
546974,120250203,1,2025-02-03,50,417223,0,TO DOWNTOWN,4,2,15:25:00,15:12:43,15:12:43,12.283333,,,,1,0,0
547036,120250203,1,2025-02-03,50,417735,0,TO DOWNTOWN,17,0,12:07:00,12:15:01,12:15:01,-8.016666,15.0,19.800000,4.800000,0,1,0
547037,120250203,1,2025-02-03,50,417735,0,TO DOWNTOWN,16,0,12:16:00,12:17:45,12:20:07,-4.116666,15.0,17.500000,2.500000,0,0,1
547038,120250203,1,2025-02-03,50,417735,0,TO DOWNTOWN,15,0,12:20:00,12:25:01,12:25:01,-5.016666,15.0,20.483333,5.483333,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
618989,120250512,1,2025-05-12,50,429701,0,TO DOWNTOWN,3,2,15:25:00,15:11:28,15:13:50,11.166666,,,,1,0,0
618994,120250512,1,2025-05-12,50,432353,0,TO DOWNTOWN,7,1,15:05:00,14:38:33,15:05:50,-0.833333,,,,0,0,1
618995,120250512,1,2025-05-12,50,432353,0,TO DOWNTOWN,4,2,15:18:00,15:18:06,15:18:14,-0.233333,,,,0,0,1
618996,120250512,1,2025-05-12,50,432387,0,TO DOWNTOWN,11,1,15:20:00,15:14:10,15:22:20,-2.333333,10.0,10.883333,0.883333,0,0,1
