# Using CSP to analyze MTA data

The NYC Metropolitan Transportation Authority provides [an API for developers](https://api.mta.info), and we'll explore the MTA's realtime [GTFS-rt](https://developers.google.com/transit/gtfs-realtime) transportation data feed.

In order to deal read and process the GTFS-rt data, we'll use the [nyct-gtfs](https://pypi.org/project/nyct-gtfs/) library, available from PyPI through

```
pip install nyct-gtfs
```

The MTA feed can be inspected as follows:

In [2]:
from nyct_gtfs import NYCTFeed

# Load the realtime feed from the MTA site
feed = NYCTFeed("https://api-endpoint.mta.info/Dataservice/mtagtfsfeeds/nyct%2Fgtfs", api_key="")

# feed_id must be a valid feed URL or one of: 
# '1', '2', '3', '4', '5', '6', '7',
# 'S', 'GS', 'A', 'C', 'E', 'H',
# 'FS', 'SF', 'SR',
# 'B', 'D', 'F', 'M', 'G', 'J', 'Z', 'N', 'Q', 'R', 'W', 'L',
# 'SI', 'SS', 'SIR'

Let's explore the data first. `feed` is a `nyct_gtfs.feed.NYCTFeed` instance, with the most important methods being the following:

In [3]:
feed.refresh?

[0;31mSignature:[0m [0mfeed[0m[0;34m.[0m[0mrefresh[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Reload this object's feed information from the MTA API
[0;31mFile:[0m      ~/micromamba/envs/csp-vanilla/lib/python3.11/site-packages/nyct_gtfs/feed.py
[0;31mType:[0m      method

In [4]:
feed.filter_trips?

[0;31mSignature:[0m
[0mfeed[0m[0;34m.[0m[0mfilter_trips[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mline_id[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtravel_direction[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtrain_assigned[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0munderway[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mshape_id[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mheaded_for_stop_id[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mupdated_after[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mhas_delay_alert[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Get the list of subway trips from the GTFS-realtime feed, optionally filtering based on one or more parameters.

If more than one filter is specifi

In [5]:
feed.trips?

[0;31mType:[0m        property
[0;31mString form:[0m <property object at 0x7f37ac4d25c0>
[0;31mDocstring:[0m   Get the list of subway trips from the GTFS-realtime feed. Returns a list of `Trip` objects

In our case, we will (for simplicity) filter the trips to collect information only about the 1, 2 and 3 trains, and we will start with trains going through 34 St-Penn Station.

In [6]:
# This cell can be run multiple times, and data will be refreshed every 30s
feed.refresh()
trains = feed.filter_trips(underway=True, headed_for_stop_id=['128', '128N'])
trains

[{"069950_1..N03R", STOPPED_AT 133N @12:52:08},
 {"070550_1..N03R", STOPPED_AT 137N @12:51:41},
 {"071150_1..N03R", IN_TRANSIT_TO 139N @12:52:23},
 {"067150_2..N09R", STOPPED_AT 128N @12:51:58},
 {"067950_2..N01R", IN_TRANSIT_TO 132N @12:52:26},
 {"068750_2..N09R", IN_TRANSIT_TO 230N @12:52:27},
 {"069550_2..N01R", STOPPED_AT 235N @12:52:05},
 {"070350_2..N09R", STOPPED_AT 241N @12:51:56},
 {"071150_2..N01R", IN_TRANSIT_TO 246N @12:52:22},
 {"066500_3..N01R", STOPPED_AT 132N @12:51:45},
 {"067100_3..N01R", STOPPED_AT 137N @12:51:58},
 {"067900_3..N01R", INCOMING_AT 230N @12:52:26},
 {"068700_3..N01R", IN_TRANSIT_TO 233N @12:52:22},
 {"069500_3..N01R", INCOMING_AT 239N @12:52:25},
 {"070300_3..N01R", STOPPED_AT 252N @12:51:38},
 {"071150_3..N01R", INCOMING_AT 256N @12:52:06}]

We can also show this data in a human-readable way.

In [7]:
for train in trains:
    print(train)

Northbound 1 to Van Cortlandt Park-242 St, departed origin 11:39:30, Currently STOPPED_AT Christopher St-Sheridan Sq, last update at 12:52:08
Northbound 1 to Van Cortlandt Park-242 St, departed origin 11:45:30, Currently STOPPED_AT Chambers St, last update at 12:51:41
Northbound 1 to Van Cortlandt Park-242 St, departed origin 11:51:30, Currently IN_TRANSIT_TO Rector St, last update at 12:52:23
Northbound 2 to Gun Hill Rd, departed origin 11:11:30, Currently STOPPED_AT 34 St-Penn Station, last update at 12:51:58
Northbound 2 to Wakefield-241 St, departed origin 11:19:30, Currently IN_TRANSIT_TO 14 St, last update at 12:52:26
Northbound 2 to Gun Hill Rd, departed origin 11:27:30, Currently IN_TRANSIT_TO Wall St, last update at 12:52:27
Northbound 2 to Wakefield-241 St, departed origin 11:35:30, Currently STOPPED_AT Atlantic Av-Barclays Ctr, last update at 12:52:05
Northbound 2 to Gun Hill Rd, departed origin 11:43:30, Currently STOPPED_AT President St-Medgar Evers College, last update at

We can now check for the times when trains have passed through 34St-Penn Station.

In [8]:
trains_at_penn = []
print("Station | Line | Direction | Arrival time")
for train in trains:
    for update in train.stop_time_updates:
        if update.stop_id in ['128', '128N']:
            print(f"{update.stop_name} | {train.route_id} | {train.headsign_text} | {update.arrival}")
            trains_at_penn.append((train, update))

Station | Line | Direction | Arrival time
34 St-Penn Station | 1 | Van Cortlandt Park-242 St | 2024-04-17 12:58:08
34 St-Penn Station | 1 | Van Cortlandt Park-242 St | 2024-04-17 13:03:11
34 St-Penn Station | 1 | Van Cortlandt Park-242 St | 2024-04-17 13:07:03
34 St-Penn Station | 2 | Gun Hill Rd | 2024-04-17 12:52:28
34 St-Penn Station | 2 | Wakefield-241 St | 2024-04-17 12:57:40
34 St-Penn Station | 2 | Gun Hill Rd | 2024-04-17 13:07:12
34 St-Penn Station | 2 | Wakefield-241 St | 2024-04-17 13:14:05
34 St-Penn Station | 2 | Gun Hill Rd | 2024-04-17 13:23:56
34 St-Penn Station | 2 | Wakefield-241 St | 2024-04-17 13:32:57
34 St-Penn Station | 3 | Harlem-148 St | 2024-04-17 12:54:45
34 St-Penn Station | 3 | Harlem-148 St | 2024-04-17 12:59:28
34 St-Penn Station | 3 | Harlem-148 St | 2024-04-17 13:04:36
34 St-Penn Station | 3 | Harlem-148 St | 2024-04-17 13:12:22
34 St-Penn Station | 3 | Harlem-148 St | 2024-04-17 13:21:34
34 St-Penn Station | 3 | Harlem-148 St | 2024-04-17 13:30:38
34 S

---

## Using CSP to ingest and analyze the data

When using CSP to ingest and analyze this data, we [start with a graph](https://github.com/Point72/csp/wiki/CSP-Graph) representing the operations we want to perform. Graphs are composed of some number of "input" adapters, a set of connected calculation "nodes" and at the end the data is sent off to "output" adapters. For simplicity, we'll build a very simple graph that will show only trains passing through 34 St-Penn Station.

There are two types of [Input Adapters](https://github.com/Point72/csp/wiki/Adapters): Historical (aka Simulated) adapters and Realtime Adapters. Historical adapters are used to process historical timeseries data into the graph. Realtime Adapters are used to feed in live event data, generally created from external sources on separate threads.

As you may have guessed, in our case we need to use a [Realtime adapter](https://github.com/Point72/csp/wiki/Write-Realtime-Input-Adapters), which will ingest the live data and periodically refresh it.

In CSP terminology, a single adapter corresponds to a single timeseries edge in the graph. When writing realtime adapters, you will need to implement a "push" adapter, which will get data from a separate thread that drives external events and "pushes" them into the engine as they occur.

When [writing input adapters](https://github.com/Point72/csp/wiki/Write-Realtime-Input-Adapters#pushinputadapter---python) it is also very important to denote the difference between "graph building time" and "runtime" versions of your adapter. For example, [`csp.adapters.csv` has a `CSVReader` class](https://github.com/Point72/csp/blob/main/csp/adapters/csv.py) that is used at graph building time. Graph build time components solely describe the adapter. They are meant to do little else than keep track of the type of adapter and its parameters, which will then be used to construct the actual adapter implementation when the engine is constructed from the graph description. It is the runtime implementation that actual runs during the engine execution phase to process data.

> Once the graph is constructed, `csp.graph` code is no longer needed. Once the
> graph is run, only inputs, `csp.nodes` and outputs will be active as data flows
> through the graph, driven by input ticks.

In our case, "ticks" correspond to feed refreshes, and we'll observe this data being updated every 30s. We will read 3 minutes of data for the purposes of this demonstration.

In [8]:
import csp
from csp.impl.pushadapter import PushInputAdapter
from csp.impl.wiring import py_push_adapter_def

import nyct_gtfs

import os
import time
import threading
from datetime import datetime, timedelta


class Event(csp.Struct):
    train: nyct_gtfs.trip.Trip
    update: nyct_gtfs.stop_time_update.StopTimeUpdate


class FetchTrainDataAdapter(PushInputAdapter):
    def __init__(self, interval):
        self._interval = interval
        self._thread = None
        self._running = False

    def start(self, starttime, endtime):
        print("FetchTrainDataAdapter::start")
        self._running = True
        self._thread = threading.Thread(target=self._run)
        self._thread.start()

    def stop(self):
        print("FetchTrainDataAdapter::stop")
        if self._running:
            self._running = False
            self._thread.join()

    def _run(self):
        # This is where we will read and process the real-time data feed
        feed = nyct_gtfs.NYCTFeed("https://api-endpoint.mta.info/Dataservice/mtagtfsfeeds/nyct%2Fgtfs", api_key="")

        while self._running:
            print(f"{datetime.utcnow()}: refreshing MTA feed")
            feed.refresh()
            # trains will contain all trains underway currently headed to 34St-Penn Station.
            trains = feed.filter_trips(underway=True, headed_for_stop_id=['128', '128N'])
            # tick whenever feed is refreshed
            for train in trains:
                for update in train.stop_time_updates:
                    if update.stop_id in ['128', '128N']:
                        self.push_tick(Event(train=train, update=update))
            time.sleep(self._interval.total_seconds())

FetchTrainData = py_push_adapter_def("FetchTrainData", FetchTrainDataAdapter, csp.ts[Event], interval=timedelta)

@csp.graph
def mta_graph():
    print("Start of graph building")
    trains_at_penn = FetchTrainData(timedelta(seconds=30))
    csp.print("MTA data", trains_at_penn)
    print("End of graph building")

start = datetime.utcnow()
end = start + timedelta(minutes=3)
csp.run(mta_graph, starttime=start, realtime=True, endtime=end)
print("Done.")

Start of graph building
End of graph building
FetchTrainDataAdapter::start
2024-04-17 13:06:07.738811: refreshing MTA feed
2024-04-17 13:06:08.869066 MTA data:Event( train={"052850_1..N03R", INCOMING_AT 130N @10:05:55}, update={ID: 128N, Arr: 10:09:20, Dep: 10:09:20, Sched: T4, } )
2024-04-17 13:06:08.872556 MTA data:Event( train={"053450_1..N03R", STOPPED_AT 132N @10:05:30}, update={ID: 128N, Arr: 10:10:30, Dep: 10:10:30, Sched: T4, } )
2024-04-17 13:06:08.872654 MTA data:Event( train={"053850_1..N13R", IN_TRANSIT_TO 135N @10:05:55}, update={ID: 128N, Arr: 10:15:40, Dep: 10:15:40, Sched: T4, } )
2024-04-17 13:06:08.872731 MTA data:Event( train={"054150_1..N03R", STOPPED_AT 137N @10:05:50}, update={ID: 128N, Arr: 10:17:20, Dep: 10:17:20, Sched: T4, } )
2024-04-17 13:06:08.872815 MTA data:Event( train={"054000_1..N03R", STOPPED_AT 134N @10:05:21}, update={ID: 128N, Arr: 10:12:51, Dep: 10:12:51, Sched: T4, } )
2024-04-17 13:06:08.872893 MTA data:Event( train={"051050_2..N09R", STOPPED_AT

---

### References

* https://erikbern.com/2016/04/04/nyc-subway-math
* https://erikbern.com/2016/07/09/waiting-time-math.html
* https://pypi.org/project/nyct-gtfs/
* https://api.mta.info/#/landing
* https://developers.google.com/transit/gtfs-realtime
* https://github.com/Point72/csp/blob/main/examples/4_writing_adapters/e_14_user_adapters_03_pushinput.py
* https://github.com/Point72/csp/wiki/5.-Adapters#realtime-adapters