# Bus data fetching pipeline

## Open Issues
- proper way of querying SBS routes through API
- change output csv file naming convention: Line-Date-Time-Duration-DayOfWeek
- move this outside the loop in `plot_tsd`:

```[ax.plot([df['RecordedAtTime'].min(), df['RecordedAtTime'].max()], [i, i], color='gray', alpha=0.1) for i in df['CallDistanceAlongRoute'].unique()]```

## Objectives
- create a function for fetching the data for desired period (included in fetchbus.py)

## MTA Bus Data
[Bustime](http://bustime.mta.info/wiki/Developers/Index)

- Version 2 is used for this development

## OneBusAway API
The root of the OBA API in the MTA Bus Time deployment is http://bustime.mta.info/api/where/.

- To get the list of and metadata for the agencies covered by MTA Bus Time, use: http://bustime.mta.info/api/where/agencies-with-coverage.xml?key=YOUR_KEY_HERE
- To get the list of and metadata for the MTA NYCT and MTABC routes covered by MTA Bus Time, use: http://bustime.mta.info/api/where/routes-for-agency/MTA%20NYCT.xml?key=YOUR_KEY_HERE
- For information on one specific stop served by MTA Bus Time, use: http://bustime.mta.info/api/where/stop/MTA_STOP-ID.xml?key=YOUR_KEY_HERE
- For information on the stops that serve a route, use <a href="http://bustime.mta.info/api/where/stops-for-route/MTA%20NYCT_M1.json?key=YOUR_KEY_HERE&includePolylines=false&version=2">http://bustime.mta.info/api/where/stops-for-route/MTA%20NYCT_M1.json?key=YOUR_KEY_HERE&includePolylines=false&version=2</a>{{/html}}
- For information on stops near a location, use http://bustime.mta.info/api/where/stops-for-location.json?lat=40.748433&lon=-73.985656&latSpan=0.005&lonSpan=0.005&key=YOUR_KEY_HERE

## SIRI API
http://bustime.mta.info/api/siri/vehicle-monitoring.json

http://api.prod.obanyc.com/api/siri/vehicle-monitoring.json

http://datamine.mta.info/mta_esi.php

Please note that the calls made without either a VehicleRef or LineRef produces quite a load on the system, so use them sparingly. Any developers found to be making repeated calls (e.g. at less than  30 second intervals) for all vehicles in the system may find their key revoked.

- v **key** - your MTA Bus Time developer API key (required).

- v **version** - which version of the SIRI API to use (1 or 2). Defaults to 1, but 2 is preferrable.

- **OperatorRef** - the GTFS agency ID to be monitored (optional). Currently MTA.

- **VehicleRef** - the ID of the vehicle to be monitored (optional). This is the 4-digit number painted on the side of the bus, for example 7560. Response will include all buses if not included. 

- v **LineRef** - a filter by 'fully qualified' route name, GTFS agency ID + route ID (optional).

- **DirectionRef** - a filter by GTFS direction ID (optional). Either 0 or 1.

- v **VehicleMonitoringDetailLevel** - Level of detail present in response. In order of verbosity:
  - minimum - only available in version 2. Designed for front-end use.
  - basic - only available in version 2. Designed for system-to-system interchange when GTFS is loaded.
  - normal - default.
  - calls - Determines whether or not the response will include the stops ("calls" in SIRI-speak) each vehicle is going to make after it serves the selected stop.

- **MaximumNumberOfCallsOnwards** - Limit on the number of OnwardCall elements for each vehicle when VehicleMonitoringDetailLevel=calls

- **MaximumStopVisits** - an upper bound on the number of buses to return in the results.

- **MinimumStopVisitsPerLine** - a lower bound on the number of buses to return in the results per line/route (assuming that many are available)

## Granularity of data (specified in "VehicleMonitoringDetailLevel")
|Element|Calls|Normal|Basic|Minimum|
|:------|:---:|:----:|:---:|:-----:|
|'LineRef'                |o|o|o| |
|'DirectionRef'           |o|o|o| |
|'FramedVehicleJourneyRef'|o|o|o| |
|'JourneyPatternRef'      |o|o| | |
|'PublishedLineName'      |o|o|o|o|
|'OperatorRef'            |o|o| | |
|'OriginRef'              |o|o| | |
|'DestinationRef'         |o|o|o| |
|'DestinationName'        |o|o|o|o|
|'SituationRef'           |o|o|o|o|
|'Monitored'              |o|o|o|o|
|'VehicleLocation'        |o|o|o|o|
|'Bearing'                |o|o|o|o|
|'ProgressRate'           |o|o|o| |
|'BlockRef'               |o|o| | |
|'VehicleRef'             |o|o|o|o|
|'MonitoredCall'          |o|o|o|o|
|'OnwardCalls'            |o| | | |

In [1]:
# initialize
from __future__ import print_function, division
__author__ = 'Yuwen Chang (ywnch)'

# import packages
import os
import sys
import json

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import time
import calendar
import collections
from datetime import datetime
from collections import defaultdict

try:
    import urllib2 as urllib
except ImportError:
    import urllib.request as urllib

In [81]:
# function for dictionary flattening
# code by Imran adopted and modified from
# https://stackoverflow.com/questions/6027558/flatten-nested-python-dictionaries-compressing-keys
def flatten(d, parent_key='', sep='_'):
    """
    Flatten the data of a nested dictionary.
    
    PARAMETERS
    ----------
    d: dictionary
        a nested dictionary
        
    RETURNS
    -------
    dict(items): dictionary
        a dictionary with all items unpacked from their nests
    """
    items = []
    for k, v in d.items():
        new_key = k
        if isinstance(v, collections.MutableMapping):
            items.extend(flatten(v, new_key, sep=sep).items())
        else:
            items.append((new_key, v))
    return dict(items)

# function for fetching bus data
def bus_data(apikey, route, duration=5):
    """
    Fetch MTA real-time bus location data for specified route and direction
    in a given duration.

    PARAMETERS
    ----------
    apikey: string
        API key for MTA data
    route: string
        route reference (e.g., B54)
    duration: integer
        minutes of data to fetch (by 30-second intervals)
        
    RETURNS
    -------
    df: pd.DataFrame
        a dataframe of all SIRI variables for real-time bus trajectories
    filename.csv: csv
        a csv file containing the same data saved at local folder
    """
    
    # name the output csv file
    ts = datetime.now()
    dow = calendar.day_name[ts.weekday()][:3]
    filename = '%s-%s-%s-%s.csv'%(route, dow, ts.strftime('%y%m%d-%H%M%S'), duration)
    
    # set up parameters
    t_elapsed = 0 # timer
    duration = int(duration) * 60 # minutes to seconds
    url = "http://bustime.mta.info/api/siri/vehicle-monitoring.json?key=%s&VehicleMonitoringDetailLevel=calls&LineRef=%s"%(apikey, route)
    df = pd.DataFrame() # empty dataframe
    
    # main block for fetching data
    while True:
        # fetch data through MTA API
        response = urllib.urlopen(url)
        data = response.read().decode("utf-8")
        data = json.loads(data)

        # check if bus route exists
        try:
            data2 = data['Siri']['ServiceDelivery']['VehicleMonitoringDelivery'][0]['VehicleActivity']
        # print error if bus route not found
        except:
            error = data['Siri']['ServiceDelivery']['VehicleMonitoringDelivery'][0]['ErrorCondition']
            print(error['Description'])
            sys.exit()

        # print info of the current query request
        print("\nTime Elapsed: " + str(t_elapsed/60) + " min(s)")
        print("Bus Line: " + route)
        print("Number of Active Buses: " + str(len(data2)))

        # parse the data of each active vehicle
        for i, v in enumerate(data2):
            dict1 = flatten(v['MonitoredVehicleJourney'])
            dict1['RecordedAtTime'] = v['RecordedAtTime']
            dict1['SituationSimpleRef'] = dict1['SituationRef'][0]['SituationSimpleRef']
            dict1.pop('SituationRef')
            dict1.pop('OnwardCall')
            
            # print info of the vehicle
            print("Bus %s (#%s) is at latitude %s and longitude %s"%(i+1, dict1['VehicleRef'], dict1['Latitude'], dict1['Longitude']))

            # write data to dictionary
            df = pd.concat([df, pd.DataFrame(dict1, index=[i])])
        
        # write/update data to csv
        df.to_csv(filename)

        # check and update timer
        if t_elapsed < duration:
            t_elapsed += 30
            time.sleep(30)
        else:
            return(df)

In [82]:
bus_data(os.getenv("MTAAPIKEY"), "B54")


Time Elapsed: 0.0 min(s)
Bus Line: B54
Number of Active Buses: 15
Bus 1 (#MTA NYCT_3974) is at latitude 40.693631 and longitude -73.964354
Bus 2 (#MTA NYCT_6548) is at latitude 40.695028 and longitude -73.952276
Bus 3 (#MTA NYCT_7261) is at latitude 40.700651 and longitude -73.910008
Bus 4 (#MTA NYCT_6541) is at latitude 40.69801 and longitude -73.925956
Bus 5 (#MTA NYCT_6578) is at latitude 40.693436 and longitude -73.977829
Bus 6 (#MTA NYCT_6503) is at latitude 40.696654 and longitude -73.93808
Bus 7 (#MTA NYCT_7286) is at latitude 40.698724 and longitude -73.919154
Bus 8 (#MTA NYCT_7283) is at latitude 40.693057 and longitude -73.985538
Bus 9 (#MTA NYCT_6516) is at latitude 40.693717 and longitude -73.963629
Bus 10 (#MTA NYCT_6520) is at latitude 40.693811 and longitude -73.987201
Bus 11 (#MTA NYCT_4520) is at latitude 40.69561 and longitude -73.987131
Bus 12 (#MTA NYCT_6579) is at latitude 40.697033 and longitude -73.934781
Bus 13 (#MTA NYCT_5205) is at latitude 40.695114 and long

Unnamed: 0,Bearing,BlockRef,CallDistanceAlongRoute,DataFrameRef,DatedVehicleJourneyRef,DestinationName,DestinationRef,DirectionRef,DistanceFromCall,ExpectedArrivalTime,...,ProgressRate,ProgressStatus,PublishedLineName,RecordedAtTime,SituationSimpleRef,StopPointName,StopPointRef,StopsFromCall,VehicleRef,VisitNumber
0,186.685970,MTA NYCT_FP_B8-Weekday_C_FP_58440_B54-214,4672.19,2018-04-03,MTA NYCT_FP_B8-Weekday-098600_B54_214,DNTWN BKLYN JAY ST via MYRTLE,MTA_801134,1,12.00,2018-04-03T17:04:18.160-04:00,...,normalProgress,,B54,2018-04-03T17:03:54.000-04:00,MTABC_3eb852e7-605a-487e-9c33-5e34b9a09b0a,MYRTLE AV/GRAND AV,MTA_304440,0,MTA NYCT_3974,1
1,186.467670,MTA NYCT_FP_B8-Weekday_C_FP_51900_B54-218,3646.53,2018-04-03,MTA NYCT_FP_B8-Weekday-100200_B54_223,DNTWN BKLYN JAY ST via MYRTLE,MTA_801134,1,16.41,2018-04-03T17:04:18.160-04:00,...,normalProgress,,B54,2018-04-03T17:03:44.000-04:00,MTABC_3eb852e7-605a-487e-9c33-5e34b9a09b0a,MYRTLE AV/NOSTRAND AV,MTA_304434,0,MTA NYCT_6548,1
2,329.523380,MTA NYCT_FP_B8-Weekday_C_FP_26040_B26-16,7025.20,2018-04-03,MTA NYCT_FP_B8-Weekday-096600_B54_213,RIDGEWOOD TERM via MYRTLE,MTA_901258,0,25.41,2018-04-03T17:04:18.160-04:00,...,noProgress,layover,B54,2018-04-03T17:04:05.000-04:00,MTABC_3eb852e7-605a-487e-9c33-5e34b9a09b0a,PALMETTO ST/ST NICHOLAS AV,MTA_901258,0,MTA NYCT_7261,1
3,6.652087,MTA NYCT_FP_B8-Weekday_C_FP_54780_B54-219,5570.10,2018-04-03,MTA NYCT_FP_B8-Weekday-098200_B54_219,RIDGEWOOD TERM via MYRTLE,MTA_901258,0,25.64,2018-04-03T17:04:18.160-04:00,...,normalProgress,,B54,2018-04-03T17:03:54.000-04:00,MTABC_3eb852e7-605a-487e-9c33-5e34b9a09b0a,MYRTLE AV/DE KALB AV,MTA_304409,0,MTA NYCT_6541,1
4,357.805050,MTA NYCT_FP_B8-Weekday_C_FP_17820_B54-205,1168.30,2018-04-03,MTA NYCT_FP_B8-Weekday-101400_B54_222,RIDGEWOOD TERM via MYRTLE,MTA_901258,0,38.94,2018-04-03T17:04:19.247-04:00,...,normalProgress,,B54,2018-04-03T17:04:00.000-04:00,MTABC_3eb852e7-605a-487e-9c33-5e34b9a09b0a,MYRTLE AV/ST EDWARDS ST,MTA_306931,0,MTA NYCT_6578,1
5,6.519247,MTA NYCT_FP_B8-Weekday_C_FP_16680_B54-204,4569.29,2018-04-03,MTA NYCT_FP_B8-Weekday-099000_B54_216,RIDGEWOOD TERM via MYRTLE,MTA_901258,0,58.04,2018-04-03T17:04:28.769-04:00,...,normalProgress,,B54,2018-04-03T17:04:06.000-04:00,MTABC_3eb852e7-605a-487e-9c33-5e34b9a09b0a,MYRTLE AV/LEWIS AV,MTA_304404,0,MTA NYCT_6503,1
6,6.060223,MTA NYCT_FP_B8-Weekday_C_FP_20760_B54-207,6183.74,2018-04-03,MTA NYCT_FP_B8-Weekday-097400_B26_19,RIDGEWOOD TERM via MYRTLE,MTA_901258,0,60.37,2018-04-03T17:04:18.160-04:00,...,normalProgress,,B54,2018-04-03T17:03:46.000-04:00,MTABC_3eb852e7-605a-487e-9c33-5e34b9a09b0a,MYRTLE AV/KNICKERBOCKER AV,MTA_304412,0,MTA NYCT_7286,1
7,177.623700,MTA NYCT_FP_B8-Weekday_C_FP_24780_B54-203,6596.69,2018-04-03,MTA NYCT_FP_B8-Weekday-097800_B54_210,DNTWN BKLYN JAY ST via MYRTLE,MTA_801134,1,72.81,2018-04-03T17:04:30.687-04:00,...,normalProgress,,B54,2018-04-03T17:03:52.000-04:00,MTABC_3eb852e7-605a-487e-9c33-5e34b9a09b0a,METROTECH UNDERPASS/LAWRENCE ST,MTA_307935,0,MTA NYCT_7283,1
8,6.607365,MTA NYCT_FP_B8-Weekday_C_FP_48300_B26-23,2407.03,2018-04-03,MTA NYCT_FP_B8-Weekday-100600_B54_220,RIDGEWOOD TERM via MYRTLE,MTA_901258,0,74.39,2018-04-03T17:04:36.257-04:00,...,normalProgress,,B54,2018-04-03T17:04:13.000-04:00,MTABC_3eb852e7-605a-487e-9c33-5e34b9a09b0a,MYRTLE AV/STEUBEN ST,MTA_304393,0,MTA NYCT_6516,1
9,87.153380,MTA NYCT_FP_B8-Weekday_C_FP_18360_B54-206,6820.99,2018-04-03,MTA NYCT_FP_B8-Weekday-097000_B54_217,DNTWN BKLYN JAY ST via MYRTLE,MTA_801134,1,75.67,2018-04-03T17:04:28.180-04:00,...,noProgress,layover,B54,2018-04-03T17:03:52.000-04:00,MTABC_3eb852e7-605a-487e-9c33-5e34b9a09b0a,JAY ST/MYRTLE PLZ,MTA_801134,0,MTA NYCT_6520,1


### Issue: currently using version 1 since CallDistanceAlongRoute is not contained in v2

In [2]:
apikey = os.getenv("MTAAPIKEY")
route = "B54"
detail = "normal"

url = "http://bustime.mta.info/api/siri/vehicle-monitoring.json?key=%s&version=1&LineRef=%s&VehicleMonitoringDetailLevel=%s"%(apikey, route, detail)

# fetch data
response = urllib.urlopen(url)
data = response.read().decode("utf-8")
data = json.loads(data)

# check if bus route exists
try:
    data4 = data['Siri']['ServiceDelivery']['VehicleMonitoringDelivery'][0]['VehicleActivity']
except:
    error = data['Siri']['ServiceDelivery']['VehicleMonitoringDelivery'][0]['ErrorCondition']
    print(error['Description'])

In [None]:
# nested dictionaries in version 1 output
['FramedVehicleJourneyRef']
['MonitoredCall']
['MonitoredCall']['Extensions']['Distances']
['SituationRef'][0]
['VehicleLocation']

In [27]:
d = defaultdict(list)
iterrows

## Variables for MTA NYCT Bus Data

- Bearing – Vehicle bearing: 0 is East, increments counter-clockwise.
- BlockRef – Depending on the system’s level of confidence, the GTFS block_id the bus is serving. Please see “Transparency of Block vs. Trip-Level Assignment”.
- CallDistanceAlongRoute -The distance of the stop from the beginning of the trip/route in meters.
- DataFrameRef – The GTFS service date for the trip the vehicle is serving.
- DatedVehicleJourneyRef – The GTFS trip ID for trip the vehicle is serving, preceeded by the GTFS agency ID.
- DestinationRef – The GTFS stop ID for the last stop on the trip the vehicle is serving, prefixed by Agency ID.
- DirectionRef – The GTFS direction for the trip the vehicle is serving.
- DistanceFromCall – The distance from the vehicle to the stop along the route, in meters.
- Distances – The MTA Bus Time extensions to show distance of the vehicle from the stop.
- Extensions – SIRI container for extensions to the standard.
- FramedVehicleJourneyRef – A compound element uniquely identifying the trip the vehicle is serving.
- JourneyPatternRef – The GTFS Shape ID, prefixed by GTFS Agency ID.
- LineRef – The ‘fully qualified’ route name (GTFS agency ID + route ID) for the trip the vehicle is serving. Not intended to be customer-facing.
- Monitored – Always true.
- MonitoredCall – Call data about a particular stop. In StopMonitoring, it is the stop of interest, in VehicleMonitoring it is the next stop the bus will make.
- MonitoredStopVisit– SIRI container for data about a particular vehicle servicing the selected stop.
- MonitoredVehicleJourney – A MonitoredVehicleJourney element for a vehicle in revenue service. Please See the MonitoredVehicleJourney page for a thorough discription.
- OnwardCall – A stop that the vehicle is going to make.
- OnwardCalls – The collection of calls that a vehicle is going to make.
- OperatorRef – GTFS Agency_ID.
- OriginAimedDepartureTime – OriginAimedDepartureTime indicates the scheduled departure time of that bus from that terminal in ISO8601 format.
- OriginRef -The GTFS stop ID for the first stop on the trip the vehicle is serving, prefixed by Agency ID
- PresentableDistance -The distance displayed in the UI, see below for an additional info.
- ProgressRate – Indicator of whether the bus is making progress (i.e. moving, generally), not moving (with value noProgress), laying over before beginning a trip (value layover), or serving a trip prior to one which will arrive (prevTrip).
- ProgressStatus – Optional indicator of vehicle progress status. Set to “layover” when the bus is in a layover waiting for its next trip to start at a terminal, and/or “prevTrip” when the bus is currently serving the previous trip and the information presented ‘wraps around’ to the following scheduled trip.
- PublishedLineName – The GTFS route_short_name.
- RecordedAtTime – The timestamp of the last real-time update from the particular vehicle.
- ResponseTimestamp – The timestamp on the MTA Bus Time server at the time the request was fulfilled.
- SituationExchangeDelivery – The SIRI SituationExchangeDelivery element only appears when there is a service alert active for a route or stop being called on. It is used by the responses to both the VehicleMonitoring and StopMonitoring calls.
- SituationRef – SituationRef, present only if there is an active service alert covering this call.
- StopMonitoringDelivery – SIRI container for VehicleMonitoring response data
- StopPointName – The GTFS stop name of the stop.
- StopPointRef – The GTFS stop ID of the stop prefixed by agency_id.
- StopsFromCall – The number of stops on the vehicle’s current trip until the stop in question, starting from 0.
- ValidUntil – The time until which the response data is valid until.
- VehicleActivity – SIRI container for data about a particular vehicle.
- VehicleLocation – The most recently recorded or inferred coordinates of this vehicle.
- VehicleMonitoringDelivery – SIRI container for VehicleMonitoring response data.
- VehicleRef – The vehicle ID, preceded by the GTFS agency ID.
- VisitNumber – The ordinal value of the visit of this vehicle to this stop, always 1 in this implementation.

#### Transparency of Block vs. Trip-Level Assignment:

MTA Bus Time tries to assign buses to blocks- a sequence of trips that start and end at a depot. This allows the system to make a statement about what a bus will do after it reaches the end of its current trip.

However, there is not always enough affirmative and corresponding evidence to make such a strong statement. In this case, MTA Bus Time falls back to a trip-level assignment, where it just picks a trip from the schedule that is representative of the route and stopping pattern that the bus is likely to pursue.

The SIRI API now reflects this distinction as described here and in other items below. If the assignment is block-level, the new BlockRef field of the MonitoredVehicleJourney is present, and populated with the assigned block id.

#### The PresentableDistance field:

the logic that determines whether stops or miles are shown in the PresentableDistance field is below:

show distance in miles if and only if:
(distance in miles to _immediate next stop_ is > D) OR (distance in stops to current stop is > N AND distance in miles to current stop > E)
in other words, show distance in stops if and only if (the below is just the inverse of the above, according to DeMorgan’s law):
(distance in miles to _immediate next stop_ is <= D) AND (distance in stops to current stop <= N OR distance in miles to current stop <= E)
Show “approaching” if and only if:
distance_in_miles to immediate next stop < P
show “at stop” if and only if:
distance_in_miles to immediate next stop < T
Current Parameter Values:

Parameter Value
- D .5 miles
- N 3 stops
- E .5 miles
- P 500 feet
- T 100 feet