## Playing with PUDL
This notebook is meant to help get you up and running with the PUDL database, so you can play with it!

### Importing external code.
We need to import a bunch of outside code to do our work here.  Sometimes we import entire packages (like `numpy` and `pandas`) and sometimes we just pull in a couple of pieces we need from a particular part of a large package (like `declarative_base`)

In [1]:
import sys
import os.path
import numpy as np
import pandas as pd
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.engine.url import URL

### Importing our own code
We also need to tell Python where to look to find our own code.  It has a list of directories that it looks in, but our little project isn't in that list, unless we add it -- which is what `sys.path.append()` does.  You'll need to change this path to reflect where on your computer the PUDL project folder (which you pull down with `git`) lives.

Once Python knows to look in the `pudl` project folder, it will let you import `pudl` modules just like any other Python module.  Here we're pulling in the `ferc1` and `pudl` modules from the `pudl` package (which is a directory inside the `pudl` project directory).

In [2]:
sys.path.append('/Users/christinagosnell/code/pudl')
sys.path.append('/Users/zaneselvans/code/catalyst/pudl')
sys.path.append('/Users/Nannerz/Desktop/working/pudl/')
from pudl import ferc1, pudl, models, models_ferc1, settings, constants

### Automatically reloading a work in progress
Because you're probably going to be editing the Python modules related to PUDL while you're working with this notebook, it's useful to have them get automatically reloaded before every cell is executed -- this means you're always using the freshest version of the module, with all your recent edits.

In [3]:
%load_ext autoreload
%autoreload 1
%aimport pudl.pudl
%aimport pudl.ferc1
%aimport pudl.constants
%aimport pudl.settings
%aimport pudl.models

### Connecting to our databases.
We have two different databases that we're working with right now.  The FERC Form 1 (`ferc1`) and our own PUDL database (`pudl`). For this software to work, you'll need to have the Postgresql database server running on your computer, and you'll need to have created empty databases to receive the tables and data we're going to create.  On a mac, the easiest Postgres install to get running is probably Postgress.app.  You'll need to fire it up at the command line at least once to create the databases (one called `ferc1` and another called `pudl_sandbox`) and a user named `catalyst` with no password.  This information is stored in the `settings` module if you need to look it up.

Here are two shortcuts for connecting to the two databases once they're created:

In [4]:
pudl_engine  = pudl.pudl.db_connect_pudl()
ferc1_engine = pudl.ferc1.db_connect_ferc1()

### Initializing the FERC Form 1 database
Now that you've got an empty database, let's put some data in it!  This function initializes the database by reading in the FERC Form 1 database structure from `refyear` and data from `years` (which can eventually be a list of years, but that's not working yet...). In order for this to work, you need to have the FERC Form 1 data downloaded into the data directory. There's a script called `get_ferc1.sh` down in `data/ferc/form1/` that will get it for you if you don't have it.

In [5]:
pudl.ferc1.init_db(refyear=2015, years=[2015,], ferc1_tables=pudl.constants.ferc1_default_tables)

Defining new FERC Form 1 DB based on 2015...
Ingesting FERC Form 1 Data from 2015...


### Pulling data out of the database!
Now we're ready to pull some data out of one of the databases, just to show that it works. `pd.read_sql()` takes an SQL Query and a database connection, and puts the results of the query into a pandas DataFrame you can play with easily.

In [None]:
ferc1_steam = pd.read_sql('''SELECT respondent_id, report_year, plant_name, type_const, plant_kind \
                                FROM f1_steam WHERE plant_name <> '' ''',ferc1_engine)

In [None]:
ferc1_steam

### Examining the data we pulled.
the `sample()` DataFrame method returns a random sample of records from the DataFrame, which is useful for seeing what kinds of things are in there, without always seeing just the first few records.

In [None]:
ferc1_steam['plant_kind'].unique()

In [None]:
ferc1_steam['type_const'].unique()

In [None]:
ferc1_steam.sample(14)

In [None]:
ferc1_plant_kind_coal = ['Coal']

ferc1_plant_kind_combustion_turbine = ['Combustion Turbine','GT','Gas Turbine',\
'Gas Turbine # 1','Gas turbine','Gas Turbine (Note 1)',\
'Gas Turbines','Simple Cycle','COMBUSTION TURBINE','COMB.TURB.PEAK.UNITS',\
 'GAS TURBINE','Combustion turbine','Com Turbine Peaking',\
'Gas Turbine Peaking', 'Comb Turb Peaking', 'COMBUSTINE TURBINE',\
'Comb. Turine','Conbustion Turbine','Combustine Turbine',\
'Gas Turbine (Leased)','Combustion Tubine','Gas Turb','Gas Turbine Peaker',\
'GTG/Gas','Simple Cycle Turbine','GAS-TURBINE','Gas Turbine-Simple',\
'Gas Turbine - Note 1','Gas Turbine #1','SIMPLE CYCLE','GasTurbine',\
'CombustionTurbine','Gas Turbine (2)','Comb Turb Peak Units','JET ENGINE']

ferc1_plant_kind_combined_cycle = ['COMBINED CYCLE','Combined Cycle','Combined',\
'GAS TURB. & HEAT REC','Combined cycle','Com. Cyc','Com. Cycle',\
'GAS TURB-COMBINED CY','Combined Cycle CTG','Combined Cycle - 40%',\
'Com Cycle Gas Turb','Combined Cycle Oper','Gas Turb/Comb. Cyc',\
'Combine Cycle','CC','Comb. Cycle','Gas Turb-Combined Cy']

ferc1_plant_kind_nuke = ['Nuclear','NUCLEAR','Nuclear (3)']

ferc1_plant_kind_geothermal = ['Steam - Geothermal']

ferc_1_plant_kind_internal_combustion = ['IC','Internal Combustion',\
'Diesel Turbine','Int Combust (Note 1)','Int. Combust (Note1)',\
'INT.COMBUSTINE','Comb. Cyc','Internal Comb','DIESEL','Diesel Engine',\
'INTERNAL COMBUSTION','Int Combust - Note 1','Int. Combust - Note1',\
'Internal Comb Recip','Reciprocating Engine','Comb. Turbine']

ferc1_plant_kind_wind = ['Wind','Wind Energy','Wind Turbine','Wind - Turbine']

ferc1_plant_kind_photovoltaic =['Solar Photovoltaic','Photovoltaic']

ferc1_plant_kind_solar_thermal = ['Solar Thermal']

ferc1_plant_kind_strings = {
                'coal': ferc1_plant_kind_coal,
                'combustine turbine': ferc1_plant_kind_combustion_turbine,
                'combined cycle': ferc1_plant_kind_combined_cycle,
                'nuclear': ferc1_plant_kind_nuke,
                'geothermal': ferc1_plant_kind_geothermal,
                'internal combustion': ferc_1_plant_kind_internal_combustion,
                'wind': ferc1_plant_kind_wind,
                'photovoltaic': ferc1_plant_kind_photovoltaic,
                'solar thermal':ferc1_plant_kind_solar_thermal
}

ferc1_plant_kind_solar_thermal[0]
ferc1_plant_kind_photovoltaic[1]
ferc1_plant_kind_wind[3]
ferc_1_plant_kind_internal_combustion[4]
ferc1_plant_kind_geothermal[0]
ferc1_plant_kind_strings['coal']

In [None]:
ferc1_steam['type_const'].unique()

In [None]:
ferc1_construction_type_outdoor = [ 'Outdoor','OUTDOOR BOILER','Full Outdoor',\
'Outdoor Boiler','Outdoor Boilers','Outboilers','Fuel outdoor','FULL OUTDOOR',\
'Outdoors','OUTDOOR','Boiler Outdoor& Full','Boiler Outdoor&Full',\
'Outdoor Boiler& Full','Full -Outdoor','Outdoor Steam','Outdoor boiler',\
'OB','Outdoor Automatic','OUTDOOR REPOWER','FULL OUTDOOR BOILER','FO',\
'Outdoor Boiler & Ful''Full-Outdoor','Fuel Outdoor','Outoor','outdoor',\
'Outdoor  Boiler&Full','Boiler Outdoor &Full','Outdoor Boiler &Full',\
'Boiler Outdoor & Ful','Outdoor-Boiler', 'Outdoor - Boiler','Outdoor Const.',\
'4 Outdoor Boilers','3 Outdoor Boilers','Full outdoor','Full Outdoors',\
'Full Oudoors','Outdoor (Auto Oper)', 'Outside Boiler','Outdoor Boiler&Full',\
'OUTDOOR HRSG','Outdoor HRSG']
ferc1_construction_type_conventional = ['Conventional','CONVENTIONAL',\
'Conventional Boiler','Conv-B','Conventionall','CONVENTION','conventional',\
'Coventional','Conven Full Boiler','C0NVENTIONAL','Conventtional','Convential'
]

ferc1_construction_type_strings = {
                'outdoor': ferc1_construction_type_outdoor,
                'conventional':ferc1_construction_type_conventional
}

ferc1_construction_type_conventional
[5]

In [None]:
pudl.constants.ferc_electric_plant_accounts.to_csv('ferc_accounts')

In [None]:
pudl.pudl.init_db()

In [6]:
pudl.pudl.init_db(ferc1_tables=['f1_fuel', 'f1_steam', 'f1_hydro', 'f1_plant_in_srvce','f1_accumdepr_prvsn'], verbose=True, debug=True) 

Ingesting static PUDL tables...
Sniffing EIA923/FERC1 glue tables...


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  plants_ferc1['plant_name_ferc1'] = plants_ferc1['plant_name_ferc1'].str.strip()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  plants_ferc1['plant_name_ferc1'] = plants_ferc1['plant_name_ferc1'].str.title()


Ingesting f1_fuel from FERC Form 1 into PUDL.
Ingesting f1_steam from FERC Form 1 into PUDL.
Ingesting f1_hydro from FERC Form 1 into PUDL.
Ingesting f1_plant_in_srvce from FERC Form 1 into PUDL.
Ingesting f1_accumdepr_prvsn from FERC Form 1 into PUDL.
