# About

This is a development notebook for `sort_feed_messages_by_trip_ids`, a method that separates a list of feeds into a hash table of trips corresponding with a list of messages on the subject thereof found in the feeds.

In [1]:
import sys; sys.path.append("../src/")
from processing import fetch_archival_gtfs_realtime_data, parse_gtfs_into_action_log

In [23]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [7]:
gtfs_r = dict()

for n in range(0, 60, 5):
    print(n + 1)
    gtfs_r[n] = fetch_archival_gtfs_realtime_data(kind='gtfs', timestamp='2014-09-18-09-' + str(1 + n).zfill(2))
    
print("Done!")

1
6
11
16
21
26
31
36
41
46
51
56
Done!


In [9]:
gtfs_r[0].entity[0].trip_update.trip.trip_id

'047600_1..S02R'

In [16]:
gtfs_r[0].entity[1].vehicle.trip.trip_id

'047600_1..S02R'

In [14]:
gtfs_r[0].entity[-1]

id: "000493"
alert {
  informed_entity {
    trip {
      trip_id: "047600_1..S02R"
      route_id: "1"
    }
  }
  header_text {
    translation {
      text: "Train delayed"
    }
  }
}

In [30]:
from processing import sort_feed_messages_by_trip_id

In [29]:
len(gtfs_r[0].entity)

493

In [33]:
sorted_table = sort_feed_messages_by_trip_id([gtfs_r[0]])

Alert!


In [39]:
sorted_table[list(sorted_table.keys())[0]]

[id: "000181"
 trip_update {
   trip {
     trip_id: "049150_3..S01R"
     start_date: "20140918"
     route_id: "3"
   }
   stop_time_update {
     arrival {
       time: 1411045262
     }
     departure {
       time: 1411045262
     }
     stop_id: "234S"
   }
   stop_time_update {
     arrival {
       time: 1411045352
     }
     departure {
       time: 1411045352
     }
     stop_id: "235S"
   }
   stop_time_update {
     arrival {
       time: 1411045412
     }
     departure {
       time: 1411045412
     }
     stop_id: "236S"
   }
   stop_time_update {
     arrival {
       time: 1411045532
     }
     departure {
       time: 1411045532
     }
     stop_id: "237S"
   }
   stop_time_update {
     arrival {
       time: 1411045652
     }
     departure {
       time: 1411045652
     }
     stop_id: "238S"
   }
   stop_time_update {
     arrival {
       time: 1411045742
     }
     departure {
       time: 1411045772
     }
     stop_id: "239S"
   }
   stop_time_update {
    

That was pretty simple. Incidentally, we see that there's no order to feed messages for trips that are and aren't ongoing; the two are intermixed (trips with two messages have a vehicle update, and are thus en route; ones that do not, are not).

In [41]:
for key in sorted_table:
    print(len(sorted_table[key]))

2
2
2
2
1
2
2
1
1
1
1
2
2
2
2
2
1
2
2
2
2
2
2
2
2
2
2
2
1
1
2
2
1
2
2
2
2
2
2
2
1
2
2
2
1
1
2
2
1
1
2
2
2
1
2
2
2
2
2
2
2
2
1
2
1
1
1
1
1
2
1
1
2
1
2
2
2
2
2
2
1
2
1
2
1
1
2
2
1
2
2
1
2
2
2
1
2
1
2
1
1
2
2
1
2
2
2
1
1
1
2
2
1
2
2
1
1
2
2
1
1
1
1
2
2
1
2
2
2
1
2
1
1
2
1
2
2
1
1
2
2
1
2
2
1
1
1
1
2
2
1
2
1
1
2
2
1
2
1
2
2
2
1
1
2
2
1
2
1
1
2
2
2
1
1
2
1
2
1
1
2
2
2
2
2
2
2
2
1
2
1
2
1
1
2
2
1
2
2
2
1
1
2
2
1
2
1
2
2
1
1
1
1
1
2
2
2
1
2
1
2
2
2
2
2
1
1
2
1
2
2
2
1
1
2
2
2
2
2
2
2
2
2
1
1
2
2
2
1
2
2
2
1
1
2
2
1
1
1
1
1
1
1
2
1
2
2
2
1
2
2
2
1
2
1
2
1
2
2
2
1
2
2
2
2
1
2
1
2
2
1
1
2
1
2
2
2
1
1
2
1
2
2
2
1
2


Since `DataFrame` objects don't have a `name` parameter, we will stash the `name` of the trip in question in a trip log in the index `name`.

Now, finally, let's get to parsing feeds into combined trip tables!

In [51]:
from processing import parse_feeds_into_trip_logs, mta_archival_time_to_unix_timestamp

In [70]:
ret = parse_feeds_into_trip_logs([gtfs_r[0]], [mta_archival_time_to_unix_timestamp('2014-09-18-09-01')])

In [73]:
ret[list(ret.keys())[0]]

Unnamed: 0,action,latest_information_time,maximum_time,minimum_time,route_id,stop_id,trip_id
0,EN_ROUTE_TO,1411045000.0,,1411045000.0,3,234S,049150_3..S01R
1,EN_ROUTE_TO,1411045000.0,,1411045000.0,3,235S,049150_3..S01R
2,EN_ROUTE_TO,1411045000.0,,1411045000.0,3,236S,049150_3..S01R
3,EN_ROUTE_TO,1411045000.0,,1411045000.0,3,237S,049150_3..S01R
4,EN_ROUTE_TO,1411045000.0,,1411045000.0,3,238S,049150_3..S01R
5,EN_ROUTE_TO,1411045000.0,,1411045000.0,3,239S,049150_3..S01R
6,EN_ROUTE_TO,1411045000.0,,1411045000.0,3,248S,049150_3..S01R
7,EN_ROUTE_TO,1411045000.0,,1411045000.0,3,249S,049150_3..S01R
8,EN_ROUTE_TO,1411045000.0,,1411045000.0,3,250S,049150_3..S01R
9,EN_ROUTE_TO,1411045000.0,,1411045000.0,3,251S,049150_3..S01R


Now two at a time...

In [92]:
ret2 = parse_feeds_into_trip_logs([gtfs_r[0], gtfs_r[5]], 
                                  [mta_archival_time_to_unix_timestamp('2014-09-18-09-01'),
                                   mta_archival_time_to_unix_timestamp('2014-09-18-09-06')])

In [93]:
ret2[list(ret.keys())[0]]

Unnamed: 0,trip_id,route_id,action,stop_id,minimum_time,maximum_time,latest_information_time
0,049150_3..S01R,3,STOPPED_OR_SKIPPED,234S,1411045000.0,1411046000.0,1411046000.0
1,049150_3..S01R,3,STOPPED_OR_SKIPPED,235S,1411045000.0,1411046000.0,1411046000.0
2,049150_3..S01R,3,STOPPED_OR_SKIPPED,236S,1411045000.0,1411046000.0,1411046000.0
3,049150_3..S01R,3,STOPPED_AT,237S,1411046000.0,1411046000.0,1411046000.0
4,049150_3..S01R,3,EN_ROUTE_TO,238S,1411046000.0,,1411046000.0
5,049150_3..S01R,3,EN_ROUTE_TO,239S,1411046000.0,,1411046000.0
6,049150_3..S01R,3,EN_ROUTE_TO,248S,1411046000.0,,1411046000.0
7,049150_3..S01R,3,EN_ROUTE_TO,249S,1411046000.0,,1411046000.0
8,049150_3..S01R,3,EN_ROUTE_TO,250S,1411046000.0,,1411046000.0
9,049150_3..S01R,3,EN_ROUTE_TO,251S,1411046000.0,,1411046000.0


In [91]:
ret2[list(ret.keys())[0]]['minimum_time'][0], ret2[list(ret.keys())[0]]['maximum_time'][0]

(1411045260.0, 1411045560.0)

Great!

There is one challenge that is not addressed yet, which is how to account for remaining stations that get deleted when the train is taken off the roster ahead of completing its trip.

The facility for detecting such an occurance is when, given a set of updates in sequence, a trip stops appearing in the updates. This could occur when a train is just about to reach the last stop on its trip, and it makes it past there within our time window; or when a train is taken out-of-service mid-track (maybe? not totally sure about its presence in the GTFS-R record after that happens).

Since that train needs to get to its endpoint station anyway, in order to make it to its storage yard, If we find that such a thing occurs, we can cut the stop and skip information off on all stations still `EN_ROUTE_TO`.