# About

While trying to further develop the methodology first mentioned in 04, I ran into a problematic timing issue in `information_time`.

In [1]:
import sys; sys.path.append("../src/")
from processing import fetch_archival_gtfs_realtime_data, parse_gtfs_into_action_log

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
gtfs_r = dict()

for n in range(0, 60, 5):
    print(n + 1)
    gtfs_r[n] = fetch_archival_gtfs_realtime_data(kind='gtfs', timestamp='2014-09-18-09-' + str(1 + n).zfill(2))
    
print("Done!")

1
6
11
16
21
26
31
36
41
46
51
56
Done!


In this example pull, the nearly completed `047600_1..S02R` trip stays live for two ticks.

In [6]:
gtfs_r[0].entity[0].trip_update.trip.trip_id

'047600_1..S02R'

In [7]:
gtfs_r[5].entity[0].trip_update.trip.trip_id

'047600_1..S02R'

We create our two action logs...

In [61]:
from processing import parse_message_into_action_log, mta_archival_time_to_unix_timestamp

In [70]:
S02R_action_logs = []

for n in [0, 5]:
    information_time = mta_archival_time_to_unix_timestamp('2014-09-18-09-' + str(1 + n).zfill(2))
    S02R_action_logs.append(parse_message_into_action_log(gtfs_r[n].entity[0], 
                                                          gtfs_r[n].entity[1],
                                                          information_time))

In [71]:
S02R_action_logs[0]

Unnamed: 0,trip_id,route_id,action,stop_id,time_assigned,information_time
0,047600_1..S02R,1,STOPPED_AT,137S,1411045000.0,1411045000.0
1,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,138S,1411045000.0,1411045000.0
2,047600_1..S02R,1,EXPECTED_TO_DEPART_AT,138S,1411045000.0,1411045000.0
3,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,139S,1411045000.0,1411045000.0
4,047600_1..S02R,1,EXPECTED_TO_DEPART_AT,139S,1411045000.0,1411045000.0
5,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,140S,1411045000.0,1411045000.0


Now here's the problem. `information_time` is *ahead* of `time_assigned` for much of the table.

Here is `information_time`, which is, provisionally, the time at which we learned this information about the state of the system:

In [86]:
datetime.datetime.fromtimestamp(S02R_action_logs[0].information_time[0])

datetime.datetime(2014, 9, 18, 9, 1)

Here is the first timestamp in the table. We can see that this time is ten minutes before our `information_time`.

In [85]:
datetime.datetime.fromtimestamp(S02R_action_logs[0].time_assigned[0])

datetime.datetime(2014, 9, 18, 8, 51, 58)

Here is the third timestamp in the table. This corresponds with a projected arrival time for a station two stops from the one the train is at now. It corresponds with a time three-and-a-half minutes in the past:

In [84]:
datetime.datetime.fromtimestamp(S02R_action_logs[0].time_assigned[2])

datetime.datetime(2014, 9, 18, 8, 55, 28)

Now our `parse_tripwise_action_logs_into_trip_log` handles concatenating this information into a trip log, using our available information as a guide.

This could mean two things. Either:

1. The information being provided in the `GTFS-Realtime` stream is being provided at a several-minute time lag.
2. Projected arrival times for stalled trains are not updated.

Case 1 can't be ruled out without further examination, but since this would defeat the entire purpose of providing a realtime stream at all, it seems unlikely. Case 2 makes operational sense. It also jives with the fact that the train is still stopped at the station in question:

In [90]:
gtfs_r[0].entity[1]

id: "000002"
vehicle {
  trip {
    trip_id: "047600_1..S02R"
    start_date: "20140918"
    route_id: "1"
  }
  current_stop_sequence: 35
  current_status: STOPPED_AT
  timestamp: 1411044718
  stop_id: "137S"
}

We will need to build many example trip logs and examine them in detail to determine what the issue may be.

In [33]:
from processing import parse_tripwise_action_logs_into_trip_log

In [34]:
parse_tripwise_action_logs_into_trip_log(S02R_action_logs)

> /home/alex/Desktop/mta-data-exploration/src/processing.py(287)parse_tripwise_action_logs_into_trip_log()
-> return trip
(Pdb) c


Unnamed: 0,trip_id,route_id,action,stop_id,timestamp
0,047600_1..S02R,1,STOPPED_AT,137S,1411045000.0
1,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,138S,1411045000.0
2,047600_1..S02R,1,EXPECTED_TO_DEPART_AT,138S,1411045000.0
3,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,139S,1411045000.0
4,047600_1..S02R,1,EXPECTED_TO_DEPART_AT,139S,1411045000.0
5,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,140S,1411045000.0
6,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,140S,1411046000.0


In [40]:
S02R_action_logs[0].timestamp.value_counts()

1.411045e+09    2
1.411045e+09    1
1.411045e+09    1
1.411045e+09    1
1.411045e+09    1
Name: timestamp, dtype: int64

In [38]:
S02R_action_logs[1].timestamp[0]

1411045639.0