## Playing with PUDL
This notebook is meant to help get you up and running with the PUDL database, so you can play with it!

### Importing external code.
We need to import a bunch of outside code to do our work here.  Sometimes we import entire packages (like `numpy` and `pandas`) and sometimes we just pull in a couple of pieces we need from a particular part of a large package (like `declarative_base`)

In [2]:
import sys
import os.path
import numpy as np
import pandas as pd
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.engine.url import URL

### Importing our own code
We also need to tell Python where to look to find our own code.  It has a list of directories that it looks in, but our little project isn't in that list, unless we add it -- which is what `sys.path.append()` does.  You'll need to change this path to reflect where on your computer the PUDL project folder (which you pull down with `git`) lives.

Once Python knows to look in the `pudl` project folder, it will let you import `pudl` modules just like any other Python module.  Here we're pulling in the `ferc1` and `pudl` modules from the `pudl` package (which is a directory inside the `pudl` project directory).

In [3]:
sys.path.append('/Users/christinagosnell/code/pudl')
sys.path.append('/Users/zaneselvans/code/catalyst/pudl')
sys.path.append('/Users/Nannerz/Desktop/working/pudl/')
from pudl import ferc1, pudl, models, models_ferc1, settings, constants

### Automatically reloading a work in progress
Because you're probably going to be editing the Python modules related to PUDL while you're working with this notebook, it's useful to have them get automatically reloaded before every cell is executed -- this means you're always using the freshest version of the module, with all your recent edits.

In [4]:
%load_ext autoreload
%autoreload 1
%aimport pudl.pudl
%aimport pudl.ferc1
%aimport pudl.constants
%aimport pudl.settings
%aimport pudl.models

### Connecting to our databases.
We have two different databases that we're working with right now.  The FERC Form 1 (`ferc1`) and our own PUDL database (`pudl`). For this software to work, you'll need to have the Postgresql database server running on your computer, and you'll need to have created empty databases to receive the tables and data we're going to create.  On a mac, the easiest Postgres install to get running is probably Postgress.app.  You'll need to fire it up at the command line at least once to create the databases (one called `ferc1` and another called `pudl_sandbox`) and a user named `catalyst` with no password.  This information is stored in the `settings` module if you need to look it up.

Here are two shortcuts for connecting to the two databases once they're created:

In [5]:
pudl_engine  = pudl.pudl.db_connect_pudl()
ferc1_engine = pudl.ferc1.db_connect_ferc1()

### Initializing the FERC Form 1 database
Now that you've got an empty database, let's put some data in it!  This function initializes the database by reading in the FERC Form 1 database structure from `refyear` and data from `years` (which can eventually be a list of years, but that's not working yet...). In order for this to work, you need to have the FERC Form 1 data downloaded into the data directory. There's a script called `get_ferc1.sh` down in `data/ferc/form1/` that will get it for you if you don't have it.

In [6]:
pudl.ferc1.init_db(refyear=2015, years=range(2004,2016), ferc1_tables=pudl.constants.ferc1_default_tables)

Defining new FERC Form 1 DB based on 2015...
Ingesting FERC Form 1 Data from 2004...
Ingesting FERC Form 1 Data from 2005...
Ingesting FERC Form 1 Data from 2006...
Ingesting FERC Form 1 Data from 2007...
Ingesting FERC Form 1 Data from 2008...
Ingesting FERC Form 1 Data from 2009...
Ingesting FERC Form 1 Data from 2010...
Ingesting FERC Form 1 Data from 2011...
Ingesting FERC Form 1 Data from 2012...
Ingesting FERC Form 1 Data from 2013...
Ingesting FERC Form 1 Data from 2014...
Ingesting FERC Form 1 Data from 2015...


### Pulling data out of the database!
Now we're ready to pull some data out of one of the databases, just to show that it works. `pd.read_sql()` takes an SQL Query and a database connection, and puts the results of the query into a pandas DataFrame you can play with easily.

In [7]:
ferc1_steam = pd.read_sql('''SELECT respondent_id, report_year, plant_name, type_const, plant_kind \
                                FROM f1_steam WHERE plant_name <> '' ''',ferc1_engine)

In [8]:
ferc1_steam

Unnamed: 0,respondent_id,report_year,plant_name,type_const,plant_kind
0,1,2004,ROCKPORT UNIT 1 AEG,Conventional,Steam
1,1,2004,ROCKPORT UNIT 2 AEG,Conventional,Steam
2,1,2004,ROCKPORT TOTAL AEG,Conventional,Steam
3,1,2004,ROCKPORT TOTAL PLANT,Conventional,Steam
4,2,2004,Gorgas,Conventional,Steam
5,2,2004,Gadsden,Conventional,Steam
6,2,2004,Barry,Conventional,Steam
7,2,2004,Chickasaw,Conventional,Steam
8,2,2004,E.C. Gaston - Unit 5,Conventional,Steam
9,2,2004,Joseph M. Farley,Conventional,Nuclear


### Examining the data we pulled.
the `sample()` DataFrame method returns a random sample of records from the DataFrame, which is useful for seeing what kinds of things are in there, without always seeing just the first few records.

In [12]:
ferc1_steam['plant_kind'].unique()

array(['Steam', 'Nuclear', 'Combustion Turbine', '', 'Combined Cycle',
       'IC', 'GT', 'STEAM', 'Steam Units 1, 2, 3', 'Steam Units 4, 5',
       'Common', 'Steam Units 4 & 6', 'Comb. Turbine', 'Combined',
       'Gas Turbine', 'Internal Combustion', 'Gas Turbine # 1',
       'Diesel Turbine', 'Gas turbine', 'Int Combust (Note 1)',
       'Gas Turbine (Note 1)', 'Resp. Share (Note 2)',
       'Resp Share St Note 3', 'Resp Share Gas Note3',
       'Int. Combust (Note1)', 'Resp. Share (Note 8)',
       'Resp. Share (Note 9)', 'Resp Share (Note 11)',
       'Resp. Share (Note 4)', 'Resp. Share (Note 6)', 'STeam',
       'Steam/Fossil', 'Gas Turbines', 'Simple Cycle',
       'COMBUSTION TURBINE', 'NUCLEAR', 'Gas / Steam',
       'COMB.TURB.PEAK.UNITS', 'GAS TURBINE', 'Steam and CC', 'Wind',
       'Steam Turbine', 'Steam/Gas Turbine', 'Gas Turbine/Steam',
       'Combustion turbine', 'STEAM & GAS TURBINE', 'GAS TURB. & HEAT REC',
       'Steam - Geothermal', 'Gas & Steam Turbine', 'Com 

In [10]:
ferc1_steam['type_const'].unique()

array(['Conventional', '', 'Outdoor', 'SEMI-OUTDOOR', 'CONVENTIONAL',
       'CONVEN & SEMI-OUTDR', 'OUTDOOR BOILER', 'OUTDR & CONVENTNL',
       'Full Outdoor', 'Automatic Oper.', 'Semi Outdoor', 'Cycle', 'N/A',
       'Outdoor Boiler', '1970', 'Conv. & Full Outdoor',
       '1 Conv/ 2ODboilers', 'Outdoor Boilers', '2 Conv /1 ODboilers',
       'Outboilers', 'Fuel outdoor', 'FULL OUTDOOR', 'Semi - Outdoors',
       'Semi-Outdoor', 'Indoor & Outdoor', 'Outdoors', 'Combined Cycle',
       '(PEAK LOAD) INDOOR', 'OUTDOOR', '0.0000', 'Outdoor Steel Encl.',
       'Comb. Cycle Indoor', 'Boiler Outdoor& Full', 'Boiler Outdoor&Full',
       'Outdoor Boiler& Full', 'CONV. & FULL OUTDOOR', 'Full Indoor',
       'Full -Outdoor', 'Outdoor Steam', 'Portable', 'Wind',
       'Convntl,Outdoor Blr', 'Conventionl, Indoor',
       'Conventional, Indoor', 'Conventional Boiler', 'SEMI - OUTDOOR',
       'U1-CONV./U2-SEMI -OD', 'Conv-OB', 'Conv-B', 'Outdoor boiler',
       'PARTIAL OUTDOOR', 'Pressurized 

array(['Conventional', 'Boiler', 'Wind', 'Wind Turbine', 'Outdoor',
       'Indoor', 'Over 50% Outdoor', 'Under 50% Outdoor', '',
       'Semi-outdoor', '1 indoor Boiler', '2 Oil/ Gas', '2 Indoor Boiler',
       '4 Indr/Outdr Boiler', '2 Indoor Boilers', '4 gas/oil trubines',
       '2 Oil / Gas Turbines', '2 Oil/ 4 Gas/Oil Tur',
       '5 gas/oil turbines', '3 Indoor Boilers', 'Pressurized Water',
       '3 Outdoor Boilers', 'N/A (CT)', '1 Indoor Boiler',
       'Gas / Oil Turbines', '2 Oil / Gas', '2 Oil/Gas turbines',
       '2 on 1 Gas Turbine', '3 on 1 Gas Turbine', 'Fixed Tilt PV',
       'Outdoor Boiler', 'Full Outdoors', 'Full Outdoor',
       'Conv & Outdoor Boilr', 'Conv. Outdoor Boiler',
       'GasTurbine No Boiler', 'No Boiler', 'Heated Individually',
       'Outside Boiler', 'Ind Enclosures', 'Outdoor (Auto Oper)',
       '1 Out Boil 2 Conv', 'Outdoor (Auto oper.)', 'Semi-Outdoor',
       'Full outdoor', 'Indoor and Outdoor', 'Convntl, Outdoor Blr',
       'Conventional, 

In [13]:
ferc1_steam.sample(14)

Unnamed: 0,respondent_id,report_year,plant_name,type_const,plant_kind
3190,134,2007,Camas Co-Gen,Outdoor Boiler,Steam
3955,55,2007,Suwannee,Conventional,Steam
5782,55,2009,Crystal River North,Conventional,Steam
5434,133,2009,Gateway Gen. Stn.,Outdoor,Combined Cycle
5343,74,2009,Eagle Valley,Conventional,Steam
5145,122,2009,Coyote,Conventional,Steam
1190,194,2005,Columbia I (WPL),Conventional,Steam
1923,210,2005,Louisa,Conventional,Steam
6999,134,2010,Gadsby Gas Peakers,Outdoor,Gas Turbine
5455,108,2009,"Navajo 1,2,3",Conv-B,Steam


In [None]:
ferc1_plant_kind_coal = ['Coal']

ferc1_plant_kind_combustion_turbine = ['Combustion Turbine','GT','Gas Turbine',\
'Gas Turbine # 1','Gas turbine','Gas Turbine (Note 1)',\
'Gas Turbines','Simple Cycle','COMBUSTION TURBINE','COMB.TURB.PEAK.UNITS',\
 'GAS TURBINE','Combustion turbine','Com Turbine Peaking',\
'Gas Turbine Peaking', 'Comb Turb Peaking', 'COMBUSTINE TURBINE',\
'Comb. Turine','Conbustion Turbine','Combustine Turbine',\
'Gas Turbine (Leased)','Combustion Tubine','Gas Turb','Gas Turbine Peaker',\
'GTG/Gas','Simple Cycle Turbine','GAS-TURBINE','Gas Turbine-Simple',\
'Gas Turbine - Note 1','Gas Turbine #1','SIMPLE CYCLE','GasTurbine',\
'CombustionTurbine','Gas Turbine (2)','Comb Turb Peak Units','JET ENGINE']

ferc1_plant_kind_combined_cycle = ['COMBINED CYCLE','Combined Cycle','Combined',\
'GAS TURB. & HEAT REC','Combined cycle','Com. Cyc','Com. Cycle',\
'GAS TURB-COMBINED CY','Combined Cycle CTG','Combined Cycle - 40%',\
'Com Cycle Gas Turb','Combined Cycle Oper','Gas Turb/Comb. Cyc',\
'Combine Cycle','CC','Comb. Cycle','Gas Turb-Combined Cy']

ferc1_plant_kind_nuke = ['Nuclear','NUCLEAR','Nuclear (3)']

ferc1_plant_kind_geothermal = ['Steam - Geothermal']

ferc_1_plant_kind_internal_combustion = ['IC','Internal Combustion',\
'Diesel Turbine','Int Combust (Note 1)','Int. Combust (Note1)',\
'INT.COMBUSTINE','Comb. Cyc','Internal Comb','DIESEL','Diesel Engine',\
'INTERNAL COMBUSTION','Int Combust - Note 1','Int. Combust - Note1',\
'Internal Comb Recip','Reciprocating Engine','Comb. Turbine']

ferc1_plant_kind_wind = ['Wind','Wind Energy','Wind Turbine','Wind - Turbine']

ferc1_plant_kind_photovoltaic =['Solar Photovoltaic','Photovoltaic']

ferc1_plant_kind_solar_thermal = ['Solar Thermal']

ferc1_plant_kind_strings = {
                'coal': ferc1_plant_kind_coal,
                'combustine turbine': ferc1_plant_kind_combustion_turbine,
                'combined cycle': ferc1_plant_kind_combined_cycle,
                'nuclear': ferc1_plant_kind_nuke,
                'geothermal': ferc1_plant_kind_geothermal,
                'internal combustion': ferc_1_plant_kind_internal_combustion,
                'wind': ferc1_plant_kind_wind,
                'photovoltaic': ferc1_plant_kind_photovoltaic,
                'solar thermal':ferc1_plant_kind_solar_thermal
}

ferc1_plant_kind_solar_thermal[0]
ferc1_plant_kind_photovoltaic[1]
ferc1_plant_kind_wind[3]
ferc_1_plant_kind_internal_combustion[4]
ferc1_plant_kind_geothermal[0]
ferc1_plant_kind_strings['coal']