Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

data visualization for MTA turnstile data NYC

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 bedfordPark
Octocat-spinner-32 .gitignore
Octocat-spinner-32 README.md
Octocat-spinner-32 Remote-Booth-Station.csv
Octocat-spinner-32 index.html
Octocat-spinner-32 remote-booth-station.py
Octocat-spinner-32 ts_Field Description.txt
README.md

MTA TURNSTILE DATA

This is an honest-to-goodness attempt to read the MTA turnstile data published weekly.

wget http://www.mta.info/developers/data/nyct/turnstile/turnstile_130803.txt

The goal would be to specialize ajschumacher's "Calendar view" visualization of MTA usage to each train station (e.g. 125st & Lexington) and line (e.g. D ).

The weekly metrocard swipes of the City are also available from 2010 onwards.

Use Cases: Motion & Location

The Heisenberg Uncertainty principle says we can't say too much about where a particle is and where a particle is going at the same time. In the New Yorker, they combined turnstile data with census data to write an viz addressing income disparity.

Both data sets would make great use of mixture models to make guesses:

  • Among the people who enter/exit Times Square, what fraction is taking the S-train?
  • Among the people who enter/exit Grand Central Station who is taking the 6-train?
  • At 125st Lexington Ave, how many people are heading uptown or downtown?

Theoretically the MTA data offers an unprecedented level of granularity:

  • Every 4 hours
  • Every station - or even station entrance - in NYC
  • Weekdays, Saturday, Sunday
  • Every week since June 2010

How to Decipher the Turnstile Data

In order to read MTA's data set you need to understand lines like

R151,G009,STILLWELL AVE,DFNQ,BMT

and

A002,R051,02-00-00,
07-27-13,00:00:00,REGULAR,004209603,001443585,
07-27-13,04:00:00,REGULAR,004209643,001443593,
07-27-13,08:00:00,REGULAR,004209663,001443616,
07-27-13,12:00:00,REGULAR,004209741,001443687,
07-27-13,16:00:00,REGULAR,004210004,001443740,
07-27-13,20:00:00,REGULAR,004210276,001443777,
07-28-13,00:00:00,REGULAR,004210432,001443801,
07-28-13,04:00:00,REGULAR,004210472,001443805 

Example #1 - Remote-Booth-Station

Hopefully we recognize the 3rd & 4th items in R151,G009,STILLWELL AVE,DFNQ,BMT. They say that

  • The station name is Stillwell Ave
  • The D,F,N,Q trains run through this station
  • This is a BMT line

We New Yorkers can point this spot on a map! What about R151,G009? These are the Remote and the Booth (and "Stillwell Ave" was the Station).

It is hard to decipher these two pieces of MTA jargon. Could be they kind of map to station entrances.

Example 2: Times Square

R032,R145,42 ST-TIMES SQ,1237ACENQRS,IRT
R032,A021,42 ST-TIMES SQ,1237ACENQRS,BMT
R032,R143,42 ST-TIMES SQ,ACENQRS1237,IRT
R032,R146,42 ST-TIMES SQ,1237ACENQRS,IRT
R033,R151,42 ST-TIMES SQ,1237ACENQRS,IRT
R033,R148,42 ST-TIMES SQ,1237ACENQRS,IRT
R033,R150,42 ST-TIMES SQ,1237ACENQRS,IRT
R033,R153,42 ST-TIMES SQ,1237ACENQRS,IRT
R033,R147,42 ST-TIMES SQ,1237ACENQRS,IRT

The 11 lines 1-2-3-7-A-C-E-N-Q-R-S pass thru here. There are two "remotes" with 4 and 5 "booths". This makes sense since Times Square is very large

Example 3: 125st + Lexington Ave (4-5-6)

R034,R174,125 ST,1,IRT

125 st - one of the busiest stations in the MTA system gets only 1 remote and 1 booth.

Example 4: Turnstile Data - A002

MTA's turnstile data set is comma-separated, with 3 items related to the station and turnstile, followed by groups of five.

A typical line has 3+5×8=43 items for 59st + Lexington Ave station on July 27, 2013.

A002,R051,02-00-00,
07-27-13,00:00:00,REGULAR,004209603,001443585,
07-27-13,04:00:00,REGULAR,004209643,001443593,
07-27-13,08:00:00,REGULAR,004209663,001443616,
07-27-13,12:00:00,REGULAR,004209741,001443687,
07-27-13,16:00:00,REGULAR,004210004,001443740,
07-27-13,20:00:00,REGULAR,004210276,001443777,
07-28-13,00:00:00,REGULAR,004210432,001443801,
07-28-13,04:00:00,REGULAR,004210472,001443805
  • the first column is the date
  • the second column is the time stamp. it usually checks every 4 hours
  • the status is REGULAR
  • the entries read "004209603" and the exits read "001443585"
  • between 8am and 12am on July 23, 741-663=78 passengers entered the station (and 71 exited ) through this turnstile

The three info A002,R051,02-00-00 specify a single turnstile in the entire MTA system. Looking at remote-booth-station.csv we find it was part of a group of 3 entrances, itself part of two sets of entrances.

R050,R244,59 ST,456NQR,IRT
R050,R244A,59 ST,456NQR,IRT
R050,A004,LEXINGTON AVE,456NQR,BMT
R051,R245,59 ST,456NQR,IRT
R051,R245A,59 ST,456NQR,IRT
R051,A002,LEXINGTON AVE,456NQR,BMT

This station has two names LEXINGTON AVE and 59 ST. This is a very busy station with passengers between Queens, the Upper East Side, Times Square, and Midtown East.

Turnstiles and their Statuses

You are ready to read the MTA turns tile data. I suggest you stop reading and get your hands dirty. In about 30 mins to 1 hour, this section will make a lot more sense.

Turnstile ID's

By now, we sort-of-have rough interpretations for Remote-Booth-Station

  • Remote is a station entrance
  • Booth probably has to do with subway booths
  • Station is well...

Door Statuses

REGULAR is not the only code we can have and turnstiles don't necessarily report every 4 hours. I've seen status DOOR OPEN, RECOVR AUD, LOGON and time stamps of 08:40:29. These irregularities bend in knots your chances at visualizating MTA's turnstile data.

Progress Report

Something went wrong with that request. Please try again.