# Dataport Database Extration for Overall Energy Demand Notebook: Exploring how a homes overall energy demand is attributable to at-home EV charging

## This notebook will connect to the database and extract the data live and put it into compressed zip files in this directory. 

<p>We will be using Pecan Street Inc. data from dataport to calculate how much overall energy demand is used in homes by electric vehicle charging.<br><br>
Pecans Streets data can be obtained by applying for a dataport account at https://www.dataport.pecanstreet.org.</p>

<p>You'll need to modify the read_csv calls in that notebook to point at these instead of the ones we've extracted and prepared for you in the /shared/JupyterHub-Examples-Data/ directory on the JupyterHub server if you would like to use the ones exported by this notebook in the analysis notebook.</p>

In [None]:
# import packages
import pandas as pd
import psycopg2
import sqlalchemy as sqla
import os
import sys
sys.path.insert(0,'..')
from config.read_config import get_database_config
%matplotlib inline
sys.executable  # shows you your path to the python you're using

In [None]:
# read in db credentials from config/config.txt
# * make sure you add those to the config/config.txt file! *

database_config = get_database_config("../config/config.txt")

In [None]:
# get our DB connection
engine = sqla.create_engine('postgresql://{}:{}@{}:{}/{}'.format(database_config['username'],
                                                                     database_config['password'],
                                                                     database_config['hostname'],
                                                                     database_config['port'],
                                                                     database_config['database']
                                                                     ))

In [None]:
# Select a list of Texas homes from dataport metadata having an electrical vehicle (car1) and also has data for year 2018.
query = """select distinct dataid from other_datasets.metadata 
                                          where car1='yes' and grid='yes'
                                          and egauge_1min_min_time < '2018-01-01' 
                                          and egauge_1min_max_time > '2019-01-01'
                                          and state='Texas'
                                          and (egauge_1min_data_availability like '100%' 
                                               or 
                                               egauge_1min_data_availability like '99%')
                                               LIMIT 25;
         """

df = pd.read_sql_query(sqla.text(query), engine)

In [None]:
# grab dataids and convert them to a string to put into the SQL query
dataids_list = df['dataid'].tolist()
print("{} dataids selected listed here:".format(len(dataids_list)))
dataids_str = ','.join(list(map(str, dataids_list)))
dataids_str
dataids_list

In [None]:
# Check data completeness for dataids selected from metadata above.
## Warning: This query takes some time to run.
query2 = """select dataid,count(*) total_rec from electricity.eg_realpower_1min 
            where dataid in ({})""".format(dataids_str)
query2 = query2 + """ and localminute >= '2018-01-01' and localminute < '2019-01-01' group by 1"""

df2 = pd.read_sql_query(sqla.text(query2), engine)

In [None]:
# Select homes with atleast 99% data availability for year 2018.
df2['perc'] = (df2['total_rec']/525600)*100
final_dataids = df2[df2['perc'] >= 99]
final_dataids['dataid'].count()

In [None]:
# assemble list of selected homes
final_dataids_list = final_dataids['dataid'].tolist()
print("{} dataids selected listed here:".format(len(final_dataids_list)))
final_dataids_str = ','.join(list(map(str, final_dataids_list)))
final_dataids_str
final_dataids_list

In [None]:
# now go pull the data for the selected homes
data_pull = """select localminute::timestamp,car1,grid,solar 
               from electricity.eg_realpower_1min 
               where localminute >= '2018-03-01' and localminute <  '2018-06-01' """
data_pull = data_pull + """AND dataid in ({})""".format(final_dataids_str)

data_df = pd.read_sql_query(sqla.text(data_pull), engine)

In [None]:
data_df

In [None]:
# export the data to a csv file
compression_opts = dict(method='zip',
                        archive_name='ev_overall_household_demand.zip')
data_df.to_csv('ev_overall_household_demand.zip', index=False,
          compression=compression_opts)