In [1]:
import json
import pandas as pd

%load_ext autoreload
import helper_functions as hlp

Historical weather data is gathered from DarkSky API for the period of 1/1/2015 - 11/20/2019. The service has a cap of 1,000 free calls per day, and an individual call returns a single day of data (i.e 24 separate hours), so the process of gathering the data was broken into 3 parts and executed over 3 days to avoid charges. While it could have been performed in 2 parts, the decision to include 2015 and 2016 data was made after the initial data gathering, and separating by year made for clearer data organization.  
  
In reality you'd have to rely on weather forecasts and not actual values, but the free data services do not retain historical forecasts, so I will be cheating a little bit by using historical actuals. Though the short time horizon of my forecasts (1 and 2 hours) will mitigate the data leakage as the weather forecasts are quite accurate on those time horizons.  
  
To allow for more genuine model testing, I began gathering forecasts from this API every hour starting 11/19/2019. I utilize AWS' Lambda and S3 services to automate this process.   

In [2]:
# generating values needed to create lists of times to make api calls
# the API returns each hours' data based on the date of the unix time that is passed
# the exact time during the day does not matter, so I chose noon arbitrarily
DAY_LENGTH = 86400

START_2015_2016 = 1420113600
START_2017_2018 = 1483272000
START_2019 = 1546344000

DAYS_2015_2016 = 730
DAYS_2017_2018 = 729
DAYS_2019 = 324

In [3]:
# generating the list of times for which api calls will be made
api_times_2015_2016 = hlp.generate_api_call_times(START_2015_2016, DAY_LENGTH, DAYS_2015_2016)
api_times_2017_2018 = hlp.generate_api_call_times(START_2017_2018, DAY_LENGTH, DAYS_2017_2018)
api_times_2019 = hlp.generate_api_call_times(START_2019, DAY_LENGTH, DAYS_2019)

In [4]:
# retrieving API key from hidden location
with open("/Users/natha/.secret/dark_sky_api.json") as api_key_file:
    api_key = str(json.load(api_key_file)['api_key'])

In [5]:
# establishing relevant strings for use in the API call
url_base = 'https://api.darksky.net/forecast/'
location = '38.8483,-77.0342'

The actual API requests and saving of the data below has been commented out to prevent accidental running of the code and using up some of a given day's limited allowable calls.

In [11]:
# df1 = hlp.historical_dataframe_from_api_calls(api_times_2015_2016, url_base, api_key, location)

In [12]:
# df1.to_csv('data/KDCA_weather_data_2015-2016.csv')

In [13]:
# df2 = hlp.historical_dataframe_from_api_calls(api_times_2017_2018, url_base, api_key, location)

In [14]:
# df2.to_csv('data/KDCA_weather_data_2017-2018.csv')

In [15]:
# df3 = hlp.historical_dataframe_from_api_calls(api_times_2019, url_base, api_key, location)

In [16]:
# df3.to_csv('data/KDCA_weather_data_2019-20191121.csv')