# Download Airnow Daily AQI

Downloads Airnow's Daily AQI files, saving them in `airnow-data/daily/dat`.

Reports to stat.createlab.org as `Airnow Daily AQI - Download`.

Docs for the daily data files are here: https://docs.airnowapi.org/docs/DailyDataFactSheet.pdf

In [1]:
import json, os, dateutil, re, requests, subprocess, datetime, glob, stat

from dateutil import rrule, tz, parser

In [2]:
# Boilerplate to load utils.ipynb
# See https://github.com/CMU-CREATE-Lab/python-utils/blob/master/utils.ipynb


def exec_ipynb(filename_or_url):
    nb = (requests.get(filename_or_url).json() if re.match(r'https?:', filename_or_url) else json.load(open(filename_or_url)))
    if(nb['nbformat'] >= 4):
        src = [''.join(cell['source']) for cell in nb['cells'] if cell['cell_type'] == 'code']
    else:
        src = [''.join(cell['input']) for cell in nb['worksheets'][0]['cells'] if cell['cell_type'] == 'code']

    tmpname = '/tmp/%s-%s-%d.py' % (os.path.basename(filename_or_url),
                                    datetime.datetime.now().strftime('%Y%m%d%H%M%S%f'),
                                    os.getpid())
    src = '\n\n\n'.join(src)
    open(tmpname, 'w').write(src)
    code = compile(src, tmpname, 'exec')
    exec(code, globals())


exec_ipynb('./python-utils/utils.ipynb')
exec_ipynb('./airnow-common.ipynb')

In [3]:
# 2018-07-18 seems to be the earliest daily data file available as of 2020-02-10
EARLIEST_DATA_FILE_DATETIME = datetime.datetime(2018, 7, 18)

SECONDS_TO_PAUSE_BETWEEN_DOWNLOADS = 0.5

NUM_TRAILING_DAYS = 30

MIRROR_TIME_PERIOD_SECS = 60 * 60

STAT_HOSTNAME = 'airnow'
STAT_SHORTNAME = 'airnow-daily-aqi-download'

In [4]:
Stat.set_service('Airnow Daily AQI - Download (NEW)')

In [5]:
def mirror_timestamp(timestamp):
    filename = timestamp.strftime('%Y%m%d%H.dat')
    src = AirnowCommon.directory_from_date(timestamp) + '/daily_data_v2.dat'
    dest = AirnowCommon.DAILY_AQI_DAT_DIRECTORY + '/' + filename
    (is_new, message, status_code) = AirnowCommon.mirror_airnow_file(src, dest)

    if is_new:
        Stat.info(message, host=STAT_HOSTNAME, shortname=STAT_SHORTNAME)
    else:
        if status_code == 304:
            print(message) # simply do a print here to reduce noise sent to stat.createlab.org
            # Stat.info(message, host=STAT_HOSTNAME, shortname=STAT_SHORTNAME)
        elif status_code < 400:
            Stat.info(message, host=STAT_HOSTNAME, shortname=STAT_SHORTNAME)
        else:
            Stat.warning(message, host=STAT_HOSTNAME, shortname=STAT_SHORTNAME)


#mirror_timestamp(dateutil.parser.parse('2018-07-18 00:00'))

In [6]:
def compute_first_date_to_check():
    files = glob.glob(AirnowCommon.DAILY_AQI_DAT_DIRECTORY + '/[0-9]*.dat')
    if len(files) == 0:
        return EARLIEST_DATA_FILE_DATETIME
    last_file = sorted(files)[-1]
    last_date = datetime.datetime.strptime(last_file, AirnowCommon.DAILY_AQI_DAT_DIRECTORY + "/%Y%m%d%H.dat")
    Stat.debug('Most recently mirrored data file is %s (%s)' % (last_file, last_date), host=STAT_HOSTNAME, shortname=STAT_SHORTNAME)
    first_date_to_check = last_date - datetime.timedelta(days=NUM_TRAILING_DAYS)
    Stat.debug('Checking for updates starting with date %s' % (first_date_to_check), host=STAT_HOSTNAME, shortname=STAT_SHORTNAME)
    sys.stdout.flush()
    return first_date_to_check


#compute_first_date_to_check()

In [7]:
def mirror():
    start = compute_first_date_to_check()
    now = datetime.datetime.utcnow()
    timestamps_to_mirror = list(rrule.rrule(rrule.DAILY, dtstart=start, until=now))

    Stat.info('Mirroring %d data files, starting with %s... (up-to-date files will not be logged here)' % (len(timestamps_to_mirror), start), host=STAT_HOSTNAME, shortname=STAT_SHORTNAME)
    for timestamp in timestamps_to_mirror:
        mirror_timestamp(timestamp)
        time.sleep(SECONDS_TO_PAUSE_BETWEEN_DOWNLOADS)

    Stat.up('Done mirroring %d data files' % (len(timestamps_to_mirror)), host=STAT_HOSTNAME, shortname=STAT_SHORTNAME, valid_for_secs=MIRROR_TIME_PERIOD_SECS*1.5)

def mirror_forever():
    while True:
        mirror()
        sleep_until_next_period(MIRROR_TIME_PERIOD_SECS)

#mirror_forever()
mirror()

Stat.log debug Airnow Daily AQI - Download (NEW) airnow Most recently mirrored data file is ../../airnow-data/daily-aqi/dat/2020030400.dat (2020-03-04 00:00:00) None


Stat.log debug Airnow Daily AQI - Download (NEW) airnow Checking for updates starting with date 2020-02-03 00:00:00 None


Stat.log info Airnow Daily AQI - Download (NEW) airnow Mirroring 31 data files, starting with 2020-02-03 00:00:00... (up-to-date files will not be logged here) None


Local mirror of https://files.airnowtech.org/airnow/2020/20200203/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200204/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200205/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200206/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200207/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200208/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200209/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200210/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200211/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200212/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200213/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200214/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200215/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200216/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200217/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200218/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200219/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200220/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200221/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200222/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200223/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200224/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200225/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200226/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200227/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200228/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200229/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200301/daily_data_v2.dat is up to date.  Skipping.


Local mirror of https://files.airnowtech.org/airnow/2020/20200302/daily_data_v2.dat is up to date.  Skipping.


Wrote 718869 bytes to ../../airnow-data/daily-aqi/dat/2020030300.dat
Stat.log info Airnow Daily AQI - Download (NEW) airnow Successfully mirrored https://files.airnowtech.org/airnow/2020/20200303/daily_data_v2.dat to ../../airnow-data/daily-aqi/dat/2020030300.dat (718869 bytes) None


Wrote 383410 bytes to ../../airnow-data/daily-aqi/dat/2020030400.dat
Stat.log info Airnow Daily AQI - Download (NEW) airnow Successfully mirrored https://files.airnowtech.org/airnow/2020/20200304/daily_data_v2.dat to ../../airnow-data/daily-aqi/dat/2020030400.dat (383410 bytes) None


Stat.log up Airnow Daily AQI - Download (NEW) airnow Done mirroring 31 data files None
