# About

In order to transform action logs into trip logs, we have to solve an important subproblem. That subproblem is: how do we determine what stations a trip actually involves?

The idea here is that each update includes an estimate of stops and stop times for this trip that is pointed along a certain set of stops. Those stops may change, if the train gets rerouted, or made into a local, or made into an express.

So for example we may look at a trip update that says that we are `A B C` bound, and then the next one says we are `B D E` bound. What happened? We either stopped at or skipped A; we haven't gotten to B yet; C isn't going to happen; and D and E are the hot new thing.

This is a problem comfortably in the synthetic CS space.

There are important practical reprecussions, however. Again, because we can't truly know whether or not a stop was made or not, our theoretical solution will include stops that we know did not happen, either because the time window is too short or because those stops become physically impossible given the layout of the track system. But those considerations are hard practical problems that will need to be solved in a data cleaning step. We will not consider them here.

I am going to call this route the **synthetic route**.

This notebook is the development notebook for this effort.

In [5]:
import sys; sys.path.append("../src/")
from processing import extract_synthetic_route_from_station_lists, synthesize_station_lists

In [2]:
%load_ext autoreload
%autoreload 2

In [9]:
synthesize_station_lists(['A', 'B', 'C'], ['D', 'E', 'F'])

['A', 'B', 'C', 'D', 'E', 'F']

In [10]:
synthesize_station_lists(['A', 'B', 'C'], ['B', 'E', 'F'])

['A', 'B', 'E', 'F']

In [11]:
synthesize_station_lists(['A', 'B', 'E', 'F'], ['B', 'E', 'F'])

['A', 'B', 'E', 'F']

In [12]:
synthesize_station_lists(['A', 'B', 'D'], ['C', 'D', 'E'])

['A', 'B', 'D', 'E']

In [14]:
synthesize_station_lists(['A', 'B', 'C'], ['A', 'B', 'C'])

['A', 'B', 'C']

In [15]:
synthesize_station_lists([], ['A', 'B', 'C'])

['A', 'B', 'C']

In [16]:
synthesize_station_lists([], [])

[]

Below we generate and save a little test case for the associated unit test:

In [19]:
from processing import fetch_archival_gtfs_realtime_data, parse_gtfs_into_action_log

In [21]:
gtfs_r = dict()

for n in range(0, 10, 5):
    print(n + 1)
    gtfs_r[n] = fetch_archival_gtfs_realtime_data(kind='gtfs', timestamp='2014-09-18-09-' + str(1 + n).zfill(2))
    
print("Done!")

1
6
Done!


In [22]:
from processing import parse_message_into_action_log, mta_archival_time_to_unix_timestamp

S02R_action_logs = []

for n in [0, 5]:
    information_time = mta_archival_time_to_unix_timestamp('2014-09-18-09-' + str(1 + n).zfill(2))
    S02R_action_logs.append(parse_message_into_action_log(gtfs_r[n].entity[0], 
                                                          gtfs_r[n].entity[1],
                                                          information_time))

In [24]:
S02R_action_logs[0]

Unnamed: 0,trip_id,route_id,action,stop_id,time_assigned,information_time
0,047600_1..S02R,1,STOPPED_AT,137S,1411045000.0,1411045000.0
1,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,138S,1411045000.0,1411045000.0
2,047600_1..S02R,1,EXPECTED_TO_DEPART_AT,138S,1411045000.0,1411045000.0
3,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,139S,1411045000.0,1411045000.0
4,047600_1..S02R,1,EXPECTED_TO_DEPART_AT,139S,1411045000.0,1411045000.0
5,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,140S,1411045000.0,1411045000.0


In [25]:
S02R_action_logs[0].to_csv("../src/tests/data/S02R_tripwise_action_log_1.csv")

In [26]:
S02R_action_logs[1]

Unnamed: 0,trip_id,route_id,action,stop_id,time_assigned,information_time
0,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,140S,1411046000.0,1411046000.0


In [27]:
S02R_action_logs[1].to_csv("../src/tests/data/S02R_tripwise_action_log_2.csv")

Verifying the last test itself.

In [28]:
from processing import extract_synthetic_route_from_tripwise_action_logs

In [39]:
extract_synthetic_route_from_tripwise_action_logs([S02R_action_logs[0], S02R_action_logs[1]])

['137S', '138S', '139S', '140S']