## About

### Downloadables

The MTA's developer portal provides datasets in two streams, both accessible via its data portal, http://datamine.mta.info.

One is its set of "static" assets, which are essentially files you read from. Some of these are CSV files, some of these are GTFS (General Transit Feed Specification, a unified transit information representation from Google that's based around a CSV files packaged into a ZIP).

The other is a set of streams, which are provided in the GTFS-Realtime format.

I'm not familiar with the GTFS format, so my starting point is going to be exploring that export specifically.

## Reading in the data

In [1]:
import requests
import io
import zipfile

subways_zipped = zipfile.ZipFile(io.BytesIO(requests.get("http://web.mta.info/developers/data/nyct/subway/google_transit.zip").content))

In [2]:
subways_zipped.extractall(path="../data/gtfs/")

In [3]:
del subways_zipped

`agency.txt`

In [5]:
%ls "../data/gtfs/"

agency.txt          calendar.txt  shapes.txt  stop_times.txt  trips.txt
calendar_dates.txt  routes.txt    stops.txt   transfers.txt


In [4]:
import pandas as pd
agency = pd.read_csv("../data/gtfs/agency.txt")

In [10]:
agency

Unnamed: 0,agency_id,agency_name,agency_url,agency_timezone,agency_lang,agency_phone
0,MTA NYCT,MTA New York City Transit,http://www.mta.info,America/New_York,en,718-330-1234


Rather simplistic.

`calendar.txt`

In [12]:
calendar = pd.read_csv("../data/gtfs/calendar.txt")

In [13]:
calendar

Unnamed: 0,service_id,monday,tuesday,wednesday,thursday,friday,saturday,sunday,start_date,end_date
0,A20161106WKD,1,1,1,1,1,0,0,20161106,20171231
1,A20161106SAT,0,0,0,0,0,1,0,20161106,20171231
2,A20161106SUN,0,0,0,0,0,0,1,20161106,20171231
3,B20161106WKD,1,1,1,1,1,0,0,20161106,20171231
4,B20161106SAT,0,0,0,0,0,1,0,20161106,20171231
5,B20161106SUN,0,0,0,0,0,0,1,20161106,20171231
6,R20161106WKD,1,1,1,1,1,0,0,20161106,20171231
7,R20161106SAT,0,0,0,0,0,1,0,20161106,20171231
8,R20161106SUN,0,0,0,0,0,0,1,20161106,20171231
9,S20161106MON,1,0,0,0,0,0,0,20161106,20171231


Letters and numbers are for routes (A, B, C, 1, 2, 3, etcetera), which are tabulated in `routes.txt` (`route_id` unique key). Each route is serviced by individual trips, tabulated in `trips.txt` (connected to routes by `route_id`, with a `trip_id` as a unique key).  The `calendar` is the controller for service availability. It contains that information as a series of booleans attached to a specific `service_id`, which is referenced from `trips.txt`.

In other words, every train *trip* (`trip_id`) occurs as a part of a *service* (`service_id`) on a *route* (`route_id`). In our locii of interest, the difference between a service and a route is simply that a service differenciates the weekend versus weekday service (the seemingly curious exception, `S20161106MON`, never actually runs within our current dataset).

`calendar_dates` tabulates the exceptions to the service terms set out in `calendar.txt`. Since services only transition from local to express and back again in the MTA system, these always occur in pairs. One line tells you what servicegot switched off, and another one tells you what service got switched on.

`shapes` is a rather tedious CSV-ification of the route maps.

`stops` contains individual stops.

`stop_times` contains all of the planned stop times, for every trip in the dataset. This is, of course, not accurate with reality.

OK, got it.

In [8]:
routes = pd.read_csv("../data/gtfs/routes.txt")
routes

Unnamed: 0,route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_url,route_color,route_text_color
0,1,MTA NYCT,1,Broadway - 7 Avenue Local,Trains operate between 242 St in the Bronx and...,1,http://web.mta.info/nyct/service/pdf/t1cur.pdf,EE352E,
1,2,MTA NYCT,2,7 Avenue Express,"Trains operate between Wakefield-241 St, Bronx...",1,http://web.mta.info/nyct/service/pdf/t2cur.pdf,EE352E,
2,3,MTA NYCT,3,7 Avenue Express,"Trains operate between 148 St, 7 Av, Manhattan...",1,http://web.mta.info/nyct/service/pdf/t3cur.pdf,EE352E,
3,4,MTA NYCT,4,Lexington Avenue Express,Trains operate daily between Woodlawn/Jerome A...,1,http://web.mta.info/nyct/service/pdf/t4cur.pdf,00933C,
4,5,MTA NYCT,5,Lexington Avenue Express,"Weekdays daytime, most trains operate between ...",1,http://web.mta.info/nyct/service/pdf/t5cur.pdf,00933C,
5,5X,MTA NYCT,5X,Lexington Avenue Express,"Weekdays daytime, most trains operate between ...",1,http://web.mta.info/nyct/service/pdf/t5cur.pdf,00933C,
6,6,MTA NYCT,6,Lexington Avenue Local,Local trains operate between Pelham Bay Park/B...,1,http://web.mta.info/nyct/service/pdf/t6cur.pdf,00933C,
7,6X,MTA NYCT,6X,Lexington Avenue Express,Express trains operate between Pelham Bay Park...,1,http://web.mta.info/nyct/service/pdf/t6cur.pdf,00A65C,
8,7,MTA NYCT,7,Flushing Local,"Trains operate between Main St-Flushing, Queen...",1,http://web.mta.info/nyct/service/pdf/t7cur.pdf,B933AD,
9,7X,MTA NYCT,7X,Flushing Express,"Trains operate between Main St-Flushing, Queen...",1,http://web.mta.info/nyct/service/pdf/t7cur.pdf,B933AD,
