# Enron 2.0: Predicting California's Energy Consumption

## Team Members:

**Names:** John W. Muhs, Corbett Carell
<br>**Emails:** <u0761102,u0502104>@utah.edu</br>
<br>**Github Repository:** [JohnWMuhs/2019-datascience-project](https://github.com/JohnWMuhs/2019-datascience-project "2019 Data Science Project Github Repo")</br>

## Project Objectives

In this project, we present a power consumption prediction method for the [California ISO](http://www.caiso.com/Pages/default.aspx) by utilizing hourly CAISO load data from the Energy Information Agency (EIA), weather data from the Darksky API, and potentially some other sources.

<br>**Project Proposal (Submitted Mar. 1st):** The project was introduced via a project proposal available on Google Docs [here](https://docs.google.com/document/d/1i6FB5gmumkx5CnaKLHzKJk8nae0PirpgJDDWm3NkkZ4/edit?usp=sharing "Project Proposal").</br>

## Data Sources
<br>**Energy Data:** [U.S. Electric System Operating Data](https://www.eia.gov/opendata/qb.php?category=2123635 "EIA API: Electric System Operating Data")</br>
<br>**Weather Data:** [Darksky API](https://darksky.net/dev "Darksky API")</br>

## Project Timeline

\<img src="images/ProjectFlowChart_Proposal.png">

# Section 1: Data Import

## Energy Data Import

This section includes the code required to import power consumption from the CAISO Balancing Authori

In [1]:
import pandas as pd
import matplotlib as plt
import json
import requests 

In [2]:
EIA_api_key = '53e6a63887dc05efe150165fa890f8da'

'''
Hourly Electrici
'''

urls = {'CAISO_HourlyLoad':'http://api.eia.gov/series/?api_key='+EIA_api_key+'&series_id=EBA.CISO-ALL.D.H',
        'California_HourlyLoad':'http://api.eia.gov/series/?api_key='+EIA_api_key+'&series_id=EBA.CAL-ALL.D.H'}

newPull = 0

In [3]:
df = pd.DataFrame()
i = 0
for key in urls:
    if newPull == 0:
        EIAData = requests.get(urls[key])
        EIAData = EIAData.content.decode("utf-8")
        EIAdict = json.loads(EIAData)
        
    dfEIA = pd.DataFrame.from_dict(EIAdict['series'])
    dfEIA = dfEIA['data'][0]
        
    dfEIA = pd.DataFrame(dfEIA)
    #print(dfEIA[0])
    
    while i != 1:
        df['DateTime'] = pd.to_datetime(dfEIA[0],format='%Y%m%dT%H', errors='ignore')
        df['DateTime'] = df['DateTime'].dt.tz_convert('America/Los_Angeles')
        i += 1
        
    df[str(key)] = dfEIA[1]

In [4]:
df.index = df['DateTime']

In [5]:
df = df.resample('D').sum()
df.head()

Unnamed: 0_level_0,CAISO_HourlyLoad,California_HourlyLoad
DateTime,Unnamed: 1_level_1,Unnamed: 2_level_1
2015-07-01 00:00:00-07:00,759794.0,940489
2015-07-02 00:00:00-07:00,780410.0,962751
2015-07-03 00:00:00-07:00,719208.0,887079
2015-07-04 00:00:00-07:00,678520.0,830874
2015-07-05 00:00:00-07:00,649346.0,791310


In [8]:
df.to_csv('energydata.csv')

## Weather Data (DarkSky API)

In [7]:
test = pd.read_csv('darksky_data.csv')
test.head()

Unnamed: 0,time,apparentTemperatureHigh,apparentTemperatureLow,apparentTemperatureMax,apparentTemperatureMin,cloudCover,dewPoint,humidity,moonPhase,precipIntensity,...,sunsetTime,temperatureHigh,temperatureLow,temperatureMax,temperatureMin,uvIndex,visibility,windBearing,windGust,windSpeed
0,01-01-2016,53.216,38.102,53.216,34.786,0.176,24.94,0.522,0.74,0.0,...,1451696000.0,53.556,38.97,53.556,35.8,2.4,7.656,281.2,9.988,1.55
1,01-01-2017,52.314,41.958,52.314,40.05,0.708,38.928,0.768,0.11,0.00052,...,1483319000.0,52.434,42.922,52.434,40.768,2.4,8.472,201.2,9.872,2.086
2,01-01-2018,66.468,45.264,66.468,42.55,0.308,40.228,0.646,0.49,0.0,...,1514855000.0,66.468,45.264,66.468,42.55,2.2,5.492,231.0,6.328,0.53
3,01-02-2016,53.704,41.892,53.704,38.374,0.344,30.504,0.566,0.77,0.0,...,1451783000.0,53.704,41.892,53.704,39.242,2.2,7.672,192.0,8.638,1.086
4,01-02-2017,49.548,42.752,49.548,41.964,0.752,39.374,0.77,0.15,0.00128,...,1483405000.0,50.06,43.674,50.06,42.928,2.0,8.842,137.2,8.446,1.894
