# About

Here we start working on the tripwise conversion.

In [1]:
import sys; sys.path.append("../src/")
from processing import fetch_archival_gtfs_realtime_data, parse_gtfs_into_action_log

In [2]:
%load_ext autoreload
%autoreload 2

In [3]:
gtfs_r = dict()

for n in range(0, 60, 5):
    print(n + 1)
    gtfs_r[n] = fetch_archival_gtfs_realtime_data(kind='gtfs', timestamp='2014-09-18-09-' + str(1 + n).zfill(2))
    
print("Done!")

1
6
11
16
21
26
31
36
41
46
51
56
Done!


In this example pull, the nearly completed `047600_1..S02R` trip stays live for two ticks.

In [42]:
gtfs_r[0].entity[32]

id: "000033"
trip_update {
  trip {
    trip_id: "051200_1..N08R"
    start_date: "20140918"
    route_id: "1"
  }
  stop_time_update {
    arrival {
      time: 1411045236
    }
    departure {
      time: 1411045236
    }
    stop_id: "121N"
  }
  stop_time_update {
    arrival {
      time: 1411045356
    }
    departure {
      time: 1411045356
    }
    stop_id: "120N"
  }
  stop_time_update {
    arrival {
      time: 1411045446
    }
    departure {
      time: 1411045446
    }
    stop_id: "119N"
  }
  stop_time_update {
    arrival {
      time: 1411045536
    }
    departure {
      time: 1411045536
    }
    stop_id: "118N"
  }
  stop_time_update {
    arrival {
      time: 1411045596
    }
    departure {
      time: 1411045596
    }
    stop_id: "117N"
  }
  stop_time_update {
    arrival {
      time: 1411045716
    }
    departure {
      time: 1411045716
    }
    stop_id: "116N"
  }
  stop_time_update {
    arrival {
      time: 1411045836
    }
    stop_id: "115N"
  }

In [41]:
gtfs_r[0].entity[33]

id: "000034"
vehicle {
  trip {
    trip_id: "051200_1..N08R"
    start_date: "20140918"
    route_id: "1"
  }
  current_stop_sequence: 20
  current_status: INCOMING_AT
  timestamp: 1411045196
  stop_id: "121N"
}

We create our two action logs...

In [6]:
from processing import parse_message_into_action_log, mta_archival_time_to_unix_timestamp

In [7]:
S02R_action_logs = []

for n in [0, 5]:
    information_time = mta_archival_time_to_unix_timestamp('2014-09-18-09-' + str(1 + n).zfill(2))
    S02R_action_logs.append(parse_message_into_action_log(gtfs_r[n].entity[0], 
                                                          gtfs_r[n].entity[1],
                                                          information_time))

Now we are back at where we were earlier.

In [8]:
S02R_action_logs[0]

Unnamed: 0,trip_id,route_id,action,stop_id,time_assigned,information_time
0,047600_1..S02R,1,STOPPED_AT,137S,1411045000.0,1411045000.0
1,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,138S,1411045000.0,1411045000.0
2,047600_1..S02R,1,EXPECTED_TO_DEPART_AT,138S,1411045000.0,1411045000.0
3,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,139S,1411045000.0,1411045000.0
4,047600_1..S02R,1,EXPECTED_TO_DEPART_AT,139S,1411045000.0,1411045000.0
5,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,140S,1411045000.0,1411045000.0


In [9]:
S02R_action_logs[1]

Unnamed: 0,trip_id,route_id,action,stop_id,time_assigned,information_time
0,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,140S,1411046000.0,1411046000.0


The story here is that in five minutes, this train has cleared two stations. The state is also now consistent again (whereas in the first log it is not, see 06).

In [11]:
S02R_action_logs[1].time_assigned[0], S02R_action_logs[1].information_time[0]

(1411045639.0, 1411045560.0)

The information that we can recover from this data is as follows:
* At time 1, the train was stopped at 137S.
* Sometime between times 1 and 2, the train stopped at or passed by stops 138S and 139S.
* At time 2, the train was en-route to stop 140S.

In terms of a trip log, what we see here is that:

* The train stopped at 137S, with a known time.
* The train either stopped at or skipped 138S and 139S, though the exact time at which it did these things is uncertain.
* The train is still planned to stop at time 140S.

We could table this data as:

```csv
trip_id,route_id,action,stop_id,minimum_time_assigned,maximum_time_assigned,latest_information_time
047600_1..S02R,1,STOPPED_AT,137S,1411044658,1411044718,1411045260
047600_1..S02R,1,STOPPED_OR_SKIPPED,138S,1411044718,1411044918,1411045260
047600_1..S02R,1,STOPPED_OR_SKIPPED,139S,1411044718,1411044918,1411045260
047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,140S,1411045228,1411045228,1411045260
```

In [13]:
from processing import parse_tripwise_action_logs_into_trip_log

In [28]:
import pandas as pd; pd.set_option('precision', 10)

In [48]:
S02R_action_logs[0].append(S02R_action_logs[1]).groupby('information_time').first().reset_index()

Unnamed: 0,information_time,trip_id,route_id,action,stop_id,time_assigned
0,1411045260.0,047600_1..S02R,1,STOPPED_AT,137S,1411044718.0
1,1411045560.0,047600_1..S02R,1,EXPECTED_TO_ARRIVE_AT,140S,1411045639.0


`parse_tripwise_action_logs_into_trip_log` does this:

In [57]:
parse_tripwise_action_logs_into_trip_log(S02R_action_logs)

Unnamed: 0,action,latest_information_time,maximum_time,minimum_time,route_id,stop_id,trip_id
0,STOPPED_AT,1411045260.0,1411045260.0,1411045260.0,1,137S,047600_1..S02R
1,STOPPED_OR_SKIPPED,1411045560.0,1411045560.0,1411045260.0,1,138S,047600_1..S02R
2,STOPPED_OR_SKIPPED,1411045560.0,1411045560.0,1411045260.0,1,139S,047600_1..S02R
3,EN_ROUTE_TO,1411045560.0,,1411045560.0,1,140S,047600_1..S02R


To make sure it's working properly, let's look at trip records from a few other trips.

Here's one with three recordsets:

In [61]:
gtfs_r[0].entity[2].trip_update.trip.trip_id

'048150_1..S07R'

In [62]:
gtfs_r[5].entity[2].trip_update.trip.trip_id

'048150_1..S07R'

In [63]:
gtfs_r[10].entity[2].trip_update.trip.trip_id

'048600_1..S02R'

In [66]:
S07R_action_logs = []

for n in [0, 15]:
    information_time = mta_archival_time_to_unix_timestamp('2014-09-18-09-' + str(1 + n).zfill(2))
    S07R_action_logs.append(parse_message_into_action_log(gtfs_r[n].entity[2], 
                                                          gtfs_r[n].entity[3],
                                                          information_time))

In [73]:
parse_tripwise_action_logs_into_trip_log(S07R_action_logs)

Unnamed: 0,action,latest_information_time,maximum_time,minimum_time,route_id,stop_id,trip_id
0,STOPPED_AT,1411045260.0,1411045260.0,1411045260.0,1,136S,048150_1..S07R
1,STOPPED_OR_SKIPPED,1411046160.0,1411046160.0,1411045260.0,1,137S,048150_1..S07R
2,STOPPED_AT,1411046160.0,1411046160.0,1411046160.0,1,138S,048150_1..S07R
3,EN_ROUTE_TO,1411046160.0,,1411046160.0,1,139S,048150_1..S07R
4,EN_ROUTE_TO,1411046160.0,,1411046160.0,1,140S,048150_1..S07R
