# Repeatedly querying Hubway status

In [1]:
from pylab import rcParams
%matplotlib inline
rcParams['figure.figsize'] = (8,6)

import urllib
from lxml import etree
import datetime
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import pandas as pd
import schedule
import time

## Periodically print the most recently updated station

Simplest thing is just to reload and reanalyze the XML file periodically:

In [2]:
def printmostrecent():
    data = etree.parse(urllib.urlopen('http://www.thehubway.com/data/stations/bikeStations.xml'))
    stations = data.findall('station')
    everything = [[elt.text for elt in station.getchildren()] for station in stations]
    df = pd.DataFrame(everything, columns = [elt.tag for elt in data.find('station')]).convert_objects(convert_numeric=True)
    df.set_index('name', inplace=True)
    mostrecent = df.sort('latestUpdateTime', ascending=False).head(1)
    recentname = mostrecent.index.to_native_types()[0]
    recenttime = datetime.datetime.fromtimestamp(mostrecent['latestUpdateTime']/1.e3)
    print "Latest updated station was {} at {}.".format(recentname, recenttime)

Try using python `schedule` module to repeatedly run one of these lookups.

In [17]:
def repeatmostrecent(seconds):
    schedule.clear()
    schedule.every(seconds).seconds.do(printmostrecent)
    while True:
        schedule.run_pending()
        time.sleep(1)

Similarly, can also count how many stations have been updated recently.

In [4]:
def enumerateupdated(minutes):
    data = etree.parse(urllib.urlopen('http://www.thehubway.com/data/stations/bikeStations.xml'))
    stations = data.findall('station')
    everything = [[elt.text for elt in station.getchildren()] for station in stations]
    df = pd.DataFrame(everything, columns = [elt.tag for elt in data.find('station')]).convert_objects(convert_numeric=True)
    df.set_index('name', inplace=True)
    timeago = (time.time() - df['latestUpdateTime']/1e3)
    updated = timeago <= minutes * 60
    numupdated = len(df[updated].index)
    print "In the past {} minutes, {} stations have updated.".format(minutes, numupdated)

In [18]:
repeatmostrecent(15)

Latest updated station was Buswell St. at Park Dr. at 2015-09-30 11:57:54.612000.
Latest updated station was Buswell St. at Park Dr. at 2015-09-30 11:57:54.612000.
Latest updated station was Buswell St. at Park Dr. at 2015-09-30 11:57:54.612000.
Latest updated station was Buswell St. at Park Dr. at 2015-09-30 11:57:54.612000.
Latest updated station was Kenmore Sq / Comm Ave at 2015-09-30 11:59:56.182000.
Latest updated station was Kenmore Sq / Comm Ave at 2015-09-30 11:59:56.182000.
Latest updated station was Kenmore Sq / Comm Ave at 2015-09-30 11:59:56.182000.
Latest updated station was Kenmore Sq / Comm Ave at 2015-09-30 11:59:56.182000.
Latest updated station was Harvard University Gund Hall at Quincy St / Kirkland S at 2015-09-30 12:00:39.561000.
Latest updated station was Harvard University Gund Hall at Quincy St / Kirkland S at 2015-09-30 12:00:39.561000.
Latest updated station was Harvard University Gund Hall at Quincy St / Kirkland S at 2015-09-30 12:00:39.561000.


KeyboardInterrupt: 

In [20]:
def repeatupdated(secondsrefresh, minutesback):
    schedule.clear()
    schedule.every(secondsrefresh).seconds.do(enumerateupdated, minutesback)
    while True:
        schedule.run_pending()
        time.sleep(1)

In [23]:
repeatupdated(15, 5)

In the past 5 minutes, 51 stations have updated.
In the past 5 minutes, 46 stations have updated.
In the past 5 minutes, 45 stations have updated.
In the past 5 minutes, 52 stations have updated.
In the past 5 minutes, 50 stations have updated.
In the past 5 minutes, 49 stations have updated.
In the past 5 minutes, 47 stations have updated.
In the past 5 minutes, 49 stations have updated.


KeyboardInterrupt: 

A little more complicated is to compare the reloaded data to the previous data. This lets you figure out, e.g., if a bike has been checked out or returned.