# Exploring the TTC Subway Real-time API
The API we're pulling data from is what supports the TTC's [Next Train Arrivals](http://www.ttc.ca/Subway/next_train_arrivals.jsp) page. With a bit of exploration through your browser's developer console, you can see that the page gets refreshed with data from a request to http://www.ttc.ca/Subway/loadNtas.action 

In [2]:
import requests #to handle http requests to the API
from psycopg2 import connect

In [4]:
stationid = 3 
#We'll find out the full range of possible stations further down.
lineid = 1 
#[1,2,4]

In [4]:
# The url for the request
base_url = "http://www.ttc.ca/Subway/loadNtas.action"

In [6]:
# Our query parameters for this API request
payload = {#"subwayLine":lineid,
           "stationId":stationid,
           "searchCriteria":''} #The value in the search box
           #it has to be included otherwise the query fails
           #"_":request_epoch} #Great job naming variables...
# subwayLine and _ are redundant variables. 
# We thought we could query historical data using the "_" parameter 
# But it seems no
r = requests.get(base_url, params = payload)

So now we've just received our first request from the API and the response is stored in the `requests` object `r`. From previous examination of the API we know that the response to an API request is in JSON format. So the below code will pretty print out the response so we can have a look at the variables.

In [7]:
r.json()

{'allStations': 'success',
 'data': None,
 'defaultDirection': [['YKD1', 'Southbound<br/> To Union', 'YUS'],
  ['YKD2', 'Northbound<br/> To Downsview', 'YUS']],
 'limit': 3,
 'ntasData': [{'createDate': '2017-01-31T20:28:21',
   'id': 12175559601,
   'stationDirectionText': 'Southbound<br/> To Union',
   'stationId': 'YKD1',
   'subwayLine': 'YUS',
   'systemMessageType': 'Normal',
   'timeInt': 0.29609142857142856,
   'timeString': '00.30',
   'trainDirection': 'North',
   'trainId': 145,
   'trainMessage': 'Arriving'},
  {'createDate': '2017-01-31T20:28:21',
   'id': 12175559602,
   'stationDirectionText': 'Southbound<br/> To Union',
   'stationId': 'YKD1',
   'subwayLine': 'YUS',
   'systemMessageType': 'Normal',
   'timeInt': 4.534429241904761,
   'timeString': '04.53',
   'trainDirection': 'North',
   'trainId': 122,
   'trainMessage': 'Delayed'},
  {'createDate': '2017-01-31T20:28:21',
   'id': 12175559603,
   'stationDirectionText': 'Southbound<br/> To Union',
   'stationId': 'Y

In [11]:
data = r.json()

In [12]:
data['ntasData'][0]['createDate']

'2017-01-31T20:28:21'

In [5]:
#Testing whether have to be explicit about line numbers for stations with multiple lines
payload = {#"subwayLine":lineid,
           "stationId":10, #St. George, Line 1
           "searchCriteria":''} 
r = requests.get(base_url, params = payload)
r.json()

{'allStations': 'success',
 'data': None,
 'defaultDirection': [['SGU1', 'Southbound<br/> To Union', 'YUS'],
  ['SGU2', 'Northbound<br/> To Downsview', 'YUS'],
  ['SGL1', 'Eastbound</br> To Kennedy', 'BD'],
  ['SGL2', 'Westbound<br/> To Kipling', 'BD']],
 'limit': 3,
 'ntasData': [{'createDate': '2017-02-05T16:25:30',
   'id': 12265037214,
   'stationDirectionText': 'Northbound<br/> To Downsview',
   'stationId': 'SGU2',
   'subwayLine': 'YUS',
   'systemMessageType': 'Normal',
   'timeInt': 0.9045222222222222,
   'timeString': '00.90',
   'trainDirection': 'South',
   'trainId': 124,
   'trainMessage': 'Arriving'},
  {'createDate': '2017-02-05T16:25:30',
   'id': 12265037215,
   'stationDirectionText': 'Northbound<br/> To Downsview',
   'stationId': 'SGU2',
   'subwayLine': 'YUS',
   'systemMessageType': 'Normal',
   'timeInt': 4.51758,
   'timeString': '04.52',
   'trainDirection': 'South',
   'trainId': 125,
   'trainMessage': 'Arriving'},
  {'createDate': '2017-02-05T16:25:30',
   

In [8]:
#Testing whether have to be explicit about line numbers for stations with multiple lines
payload = {#"subwayLine":lineid,
           "stationId":48, #St. George, Line 2
           "searchCriteria":''} 
r = requests.get(base_url, params = payload)
r.json()

{'allStations': 'success',
 'data': None,
 'defaultDirection': [['SGL1', 'Eastbound</br> To Kennedy', 'BD'],
  ['SGL2', 'Westbound<br/> To Kipling', 'BD'],
  ['SGU1', 'Southbound<br/> To Union', 'YUS'],
  ['SGU2', 'Northbound<br/> To Downsview', 'YUS']],
 'limit': 3,
 'ntasData': [{'createDate': '2017-02-05T16:30:13',
   'id': 12265113796,
   'stationDirectionText': 'Eastbound</br> To Kennedy',
   'stationId': 'SGL1',
   'subwayLine': 'BD',
   'systemMessageType': 'Normal',
   'timeInt': 1.5897975232198145,
   'timeString': '01.59',
   'trainDirection': 'East',
   'trainId': 203,
   'trainMessage': 'Arriving'},
  {'createDate': '2017-02-05T16:30:13',
   'id': 12265113797,
   'stationDirectionText': 'Eastbound</br> To Kennedy',
   'stationId': 'SGL1',
   'subwayLine': 'BD',
   'systemMessageType': 'Normal',
   'timeInt': 5.93598838885449,
   'timeString': '05.94',
   'trainDirection': 'East',
   'trainId': 204,
   'trainMessage': 'Arriving'},
  {'createDate': '2017-02-05T16:30:13',
   '

In [10]:
data = r.json()
data['ntasData'][0]['createDate'].replace('T',' ')

'2017-02-05 16:30:13'

## Building a scraping script
By opening up the inspector tools in the browser, we can see the full list of station ids by hovering over the `Select a subway station` dropdown list. Stations increase in number from West to East.
![](img/line1_stations.png)  
For Line 1 they are numbered 1-32
For Line 2 they are numbered 33-63
For Line 4 they are numbered 64-68
Thus we can construct a dictionary that will represent every possible API call:

In [24]:
lines = {1: range(1, 33), #max value must be 1 greater
         2: range(33, 64),
         4: range(64, 68)}

In [27]:
def get_API_response(station_id):
    payload = {"stationId":station_id,
               "searchCriteria":''}
    r = requests.get(base_url, params = payload) 
    return r.json()

def insert_request_info(con, data, line_id, station_id):
    request_row = {}
    request_row['data_'] = data['data']
    request_row['stationid'] = station_id
    request_row['lineid'] = line_id
    request_row['all_stations'] = data['allStations']
    request_row['create_date'] = data['ntasData'][0]['createDate'].replace( 'T', ' ')
    cursor = con.cursor()
    cursor.execute("INSERT INTO public.requests(data_, stationid, lineid, all_stations, create_date)"
                   "VALUES(%(data_)s, %(stationid)s, %(lineid)s, %(all_stations)s, %(create_date)s)"
                   "RETURNING requestid", request_row)
    request_id = cursor.fetchone()[0]
    con.commit()
    return request_id

def insert_ntas_data(con, ntas_data, request_id):
    cursor = con.cursor()
    sql = """INSERT INTO public.ntas_data(
            requestid, id, station_char, subwayline, system_message_type, 
            timint, traindirection, trainid, train_message)
            VALUES (%(requestid)s, %(id)s, %(station_char)s, %(subwayline)s, %(system_message_type)s, 
            %(timint)s, %(traindirection)s, %(trainid)s, %(train_message)s);
          """
    for record in ntas_data:
        record_row ={}
        record_row['requestid'] = request_id
        record_row['id'] = record['id']
        record_row['station_char'] = record['stationId']
        record_row['subwayline'] = record['subwayLine']
        record_row['system_message_type'] = record['systemMessageType']
        record_row['timint'] = record['timeInt']
        record_row['traindirection'] = record['trainDirection']
        record_row['trainid'] = record['trainId']
        record_row['train_message'] = record['trainMessage']
        cursor.execute(sql, record_row)
    con.commit()

def query_all_stations(con):
    data = {}
    for line_id, stations in lines.items():
        for station_id in stations:
            data = get_API_response(station_id)
            request_id = insert_request_info(con, data, line_id, station_id)
            insert_ntas_data(con, data['ntasData'], request_id)
    return data

## Database schema 
Looking at the response above. I've written up a basic schema of two tables to store the responses to the API. it's in [`create_tables.sql`](create_tables.sql).

In [10]:
dbsettings = {'database':'ttc',
              'user':'rad'}
#              'host':'localhost'}
con = connect(database = dbsettings['database'],
              user = dbsettings['user'])
              #host = dbsettings['host'])


In [19]:
#data = get_API_response(3)
request_id = insert_request_info(con, data, 1, 3)

In [23]:
insert_ntas_data(con, data['ntasData'], request_id)

In [28]:
query_all_stations(con)

IndexError: list index out of range

In [9]:
last_poll = time.now()
poll_frequency = timedelta(minutes = 1)
while time.now() < time():
    if time.now() - last_poll >= poll_frequency:
        

SyntaxError: invalid syntax (<ipython-input-9-07e38ec6d08d>, line 4)

In [18]:
data

{'allStations': 'success',
 'data': None,
 'defaultDirection': [['YKD1', 'Southbound<br/> To Union', 'YUS'],
  ['YKD2', 'Northbound<br/> To Downsview', 'YUS']],
 'limit': 3,
 'ntasData': [{'createDate': '2017-02-02T23:22:42',
   'id': 12215713332,
   'stationDirectionText': 'Southbound<br/> To Union',
   'stationId': 'YKD1',
   'subwayLine': 'YUS',
   'systemMessageType': 'Normal',
   'timeInt': 1.3716,
   'timeString': '01.37',
   'trainDirection': 'North',
   'trainId': 111,
   'trainMessage': 'Arriving'},
  {'createDate': '2017-02-02T23:22:42',
   'id': 12215713333,
   'stationDirectionText': 'Southbound<br/> To Union',
   'stationId': 'YKD1',
   'subwayLine': 'YUS',
   'systemMessageType': 'Normal',
   'timeInt': 6.069130281481481,
   'timeString': '06.07',
   'trainDirection': 'North',
   'trainId': 113,
   'trainMessage': 'Arriving'},
  {'createDate': '2017-02-02T23:22:42',
   'id': 12215713334,
   'stationDirectionText': 'Southbound<br/> To Union',
   'stationId': 'YKD1',
   'su

In [15]:
test_string.replace('T','F')

'Fest'