# About

## mtadeveloperresources

The MTA keeps a Google Groups discussion board for people interested in working with MTA data. The group, `mtadeveloperresources`, is available [here](https://groups.google.com/forum/#!forum/mtadeveloperresources).

We can learn a lot by examining the prior art of people who have worked on MTA data in the past. In no particular order, here are observations in FAQ format.

----

**How often are the [non-GTFS-Realtime] data sets updated?**

MTA schedules are updated approximately four times per year; we’ll notify developers through this list, and will attempt to post the new schedules as early as possible before the changeover. Other data sets, like turnstiles, are updated weekly, while service status information is updated every minute.

----

**What historical data is available?**

[Subway Time historical data](http://web.mta.info/developers/MTA-Subway-Time-historical-data.html) is available for GTFS-Realtime Feed 1 (covering the 1, 2, 3, 4, 5, 6, S), Feed 2 (L), and 11 (SIR). This data is published in five-minute updates, as well as daily aggregations. However, this data is only available in the time period between September 17, 2014 and Octobter 29, 2015 ([source](https://groups.google.com/forum/#!topic/mtadeveloperresources/f8NKzZb5p3Q)).

Nathan Johnson (with some support from BetaNYC) began archiving the data himself. [His archive](http://data.mytransit.nyc/subway_time/) runs from January 31, 2016 to May 31, 2017, at which point it stops. He has not indicated why it stopped, or whether or not he will endevour to continue it at some point or another. Nathan Johnson's data is far superior to the MTA historical data because it is published at 1-minute resolution&mdash;the highest resolution easily achievable with typical repeating-job tools (notably `cron`; a similar limitation exists on AWS Lambda, even though the MTA data is actually republished every 30 seconds). It's also interesting to observe that his archive also has gaps in it every so often, corresponding with likely load failures.

----

**How reliable is data publication?**

Data publication suffers from gaps in availability. For example, at the time of writing GTFS-Realtime Feed 1 appears to be down completely for me&mdash;requests against that URL simply return a Permission Denied. Since Nathan Johnson has already undertaken the work of archiving the data, estimating overall availability ought to be discoverable by pinging his archive by timestamp.

----

**How reliably formatted are the MTA GTFS schedules?**

It seems that one can reasonably expect the MTA GTFS releases (which are done on a seasonly basis) will undergo several revisions on initial release. For example, [this thread](https://groups.google.com/d/msg/mtadeveloperresources/bHm8gJ9-KLM/TLxVJlFpCAAJ) points out that commas were not properly escaped in the most recent release.

----

**What format is the MTA GTFS feed released in?**

CSV for Excel (cf. [here](https://groups.google.com/d/msg/mtadeveloperresources/bHm8gJ9-KLM/TLxVJlFpCAAJ)).

----

**How does one get access to MetroNorth and LIRR GTFS-Realtime feeds?**

By emailing John Larson, a developer at the MTA. See [here](https://groups.google.com/forum/#!msg/mtadeveloperresources/slwHaNqAE18/gE-m6RaRGp8J). However, it appears possible, and even likely, that Larson has since left the MTA, as many people have recently complained that they have made several inquiries but not recieved any response. So it seems that these feeds are simply inaccessible for the moment, for sheer reason of administrivia.

----

**Do stations that re-open after renovations recieve the same GTFS identifier?**

Not necessarily! When South Ferry Terminal reopened, it was given a new identifier, for operational reasons ([source](https://groups.google.com/forum/#!topic/mtadeveloperresources/slzCyf8Qsx8)). So it seems that extremely heavy renovations occassionally result in IDs getting reassigned. However this should be a rare occurance.

----

**Are individual stations always given a singular ID?**

No. Grand Central, for example, is split into three stations by line. It seems that the system that is used is one in which stations serving multiple lines (where a line is a composite of services, e.g. the B/Q in Brooklyn) are given unique identifiers (differenciated by N and S, of course, to indicate directionality).

----

**If a major event occurs, is the GTFS archive updated to account for this?**

Possibly sometimes, but if so, rarely. Buses have recieved [a number of updates](https://groups.google.com/forum/#!topic/mtadeveloperresources/bHm8gJ9-KLM) after initial release for their season, for error correction and for "fixing" the schedule on special days, like Columbus Day. However, it is uncertain to what extent the train GTFS feed gets updated after release. 

So overall, I would not count on it. In general, it is safest to say that the GTFS record represents the best projection that the MTA has at the time of that record's release of what MTA service will look like for the oncoming season.

----

**Am I losing important information by sticking to Google's generic GTFS-Realtime parser, and not baking my own Protobuf parser?**

[Maybe](https://groups.google.com/forum/#!topic/mtadeveloperresources/IvcsiJNSgZY). This is a concern that I believe ought to be addressed nearer to the end of this project, however. I imagine that incorporating it would a day or two's work on a branch-and-merge, if it proves necessary.

----

**Are there tools for transforming GTFS feeds?**

[OneBusAway](http://developer.onebusaway.org/modules/onebusaway-gtfs-modules/current/onebusaway-gtfs-transformer-cli.html) has a Java CLI for this purpose. Otherwise, things look pretty sparse on the ground.

----

**What is the supplemented GTFS feed?**

The MTA recently started publishing ([link](https://groups.google.com/forum/#!topic/mtadeveloperresources/14d8DV4hnj4)) a "supplemented GTFS feed", which provides a seven-day look-ahead schedule that is supposedly more accurate than the seasonal one. These schedules are derived from MTA-internal supplements that are done on a weekly basis, but they also reflect artifacts in the way in which these supplements are planned. This makes them not directly compatible with the seasonal GTFS, and somewhat "quirky" overall. The MTA says they are currently working on conforming that feed more to what developers would expect it to contain, but that it will take some time.

----

**Are the timestamps always correct?**

Not necessarily! [This thread](https://groups.google.com/forum/#!topic/mtadeveloperresources/6RlGODDrxSk) from 2014 points out that the time given in the MTA GTFS-Realtime feeds diversed significantly from the time at which it is actually published. This was corrected at some point in the past in this instance, but still remains an issue, apparently, on the MetroNorth feed ([source](https://groups.google.com/forum/#!topic/mtadeveloperresources/zXm-BnwoRFQ)).

Note that the timestamps are published in UTC, per the GTFS-Realtime spec. Even with the timezone correction to EST, the dates were/are still wrong.

So to be truly robust, it's a good idea to check for errors in the timestamp immediately after the data download step, and do something if it's incorrect (?).

----

**How do I get structured service alerts?**

There is an [XML feed](https://groups.google.com/forum/#!topic/mtadeveloperresources/JKLiP5XKZk0) that you can access to get them. This is relatively recent, dating from May of this year.

----

**Is the GTFS-R feed constantly available?**

No. The server the files are served from is put in a locked state during the write process on their side. Reads that attempt to retrieve information during that time would hit a truncated data error ([source](https://groups.google.com/forum/#!topic/mtadeveloperresources/xwEcwi6RVp4)).

The workaround is to implement a timed retry. Retrying the request, say, 10 seconds after a first attempt seems like a good thing to do.

----

**Is the N/Q/R/W feed currently active?**

Almost certainly, [but I need to make sure](https://groups.google.com/forum/#!topic/mtadeveloperresources/78Gh3w6EfU4), because it was down for a while a few months ago for the purposes of addressing technical issues (?). When the feed was down, it served static GTFS-Realtime information from the previous (most recent) pull. That's something I need to be careful about.