# Connection Setup
## Description
Even though we are allowed to use the Pecan Street data for academic research the licensing doesn't allow it to be freely distributed. In order to comply with this requirement I have setup a private local MySQL server as well as documented the steps to reproduce this database from their raw files. This notebook demonstrates the basic requirements for connecting and fetching data from the MySQL server. 

## Imports

In [12]:
import json
import pandas as pd
from sqlalchemy import create_engine

## Load Credentials
This utilizes a credentials file that has been excluded from the repository for security purposes. If you require access please email me, blake.kleinhans@colorado.edu, and I can send you a copy.

In [36]:
# load credentials
credentials_file_path = '../credentials.json'
with open(credentials_file_path) as credentials_file:
    credentials = json.load(credentials_file)
    
# connect to database
engine = create_engine('mysql+mysqldb://{user}@{host}/{db}'.format(
    user = credentials['user'],
    host = credentials['host'],
    db = credentials['db']
))

conn = engine.connect()

## Basic Query Examples

In [34]:
# show tables
tables = pd.read_sql('SHOW TABLES', conn)
tables

Unnamed: 0,Tables_in_pecan_street
0,onemin
1,temps


In [35]:
# describe table
pd.read_sql('DESC onemin', conn)

Unnamed: 0,Field,Type,Null,Key,Default,Extra
0,index,bigint(20),YES,MUL,,
1,dataid,bigint(20),YES,,,
2,localminute,text,YES,,,
3,air1,double,YES,,,
4,air2,double,YES,,,
...,...,...,...,...,...,...
81,year,bigint(20),YES,,,
82,month,bigint(20),YES,,,
83,day,bigint(20),YES,,,
84,hour,bigint(20),YES,,,


In [38]:
# List the building ids
dataids_df = pd.read_sql('SELECT DISTINCT(dataid) FROM onemin', conn)
dataids = dataids_df['dataid'].values
dataids

array([ 661, 1642, 2335, 2818, 3039, 3456, 3538, 4031, 4373, 4767, 5746,
       6139, 7536, 7719, 7800, 7901, 7951, 8565, 9019, 9278, 8156, 8386,
       2361, 9922, 9160], dtype=int64)

In [None]:
# fetch daily data for one residence
dataid = dataids[0]

daily_usage = pd.read_sql('SELECT * FROM usage_daily WHERE dataid == {dataid}'.format(dataid = dataid), conn)
daily_usage

In [None]:
# fetch hourly data for one residence
daily_usage = pd.read_sql('SELECT * FROM usage_hourly WHERE dataid == {dataid}'.format(dataid = dataid), conn)
daily_usage