## Install PUDL
* Until we get our custom Docker image built, PUDL needs to be installed in your user environment each session.
* If you are using this notebook on the Catalyst JupyterHub, and this is the first notebook you've used this session, then uncomment the commands in the following cell and run it before anything else.

In [1]:
#!conda install --yes --quiet python-snappy
#!pip install --upgrade pip
#!pip install --quiet git+https://github.com/catalyst-cooperative/pudl.git@dev
#!cp ~/shared/shared-pudl.yml ~/.pudl.yml

In [2]:
# import the necessary packages
%load_ext autoreload
%autoreload 2

import logging
import sys

import pandas as pd
import sqlalchemy as sa
import random
import pudl

In [3]:
# setup for python logging
logger=logging.getLogger()
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter('%(message)s')
handler.setFormatter(formatter)
logger.handlers = [handler]

## Set your EIA API Key
Some of the routines in this notebook pull data from the EIA API to fill in missing fuel cost values. For them to work, you'll need to provide an API KEY. You can [obtain one from EIA here](https://www.eia.gov/opendata/register.php), uncomment the code in the next cell, and use it to set the `API_KEY_EIA` environment variable to be your key. (If you're running this notebook on your own computer and want to set the environment variable outside of the notebook [see this blog post](https://www.twilio.com/blog/2017/01/how-to-set-environment-variables.html))

In [4]:
# Set EIA API key. If you want to set the API key in this notebook, add your key below and remove comment (#)
# %env API_KEY_EIA=put_your_key_here

# Using the PUDL output layer
The PUDL database tables are a clean, [normalized](https://en.wikipedia.org/wiki/Database_normalization) version of US electricity data. Normalized tables are great for databases and storage, but for interactive use, we often want a version of the data that includes plant and utility names and other associated info all in a single dataframe. These are "denormalized" tables. In addition to the referenced names and attributes like latitude and longitude or state, the denormalized tables often contain frequently calculated derived values (like calcuating `total_fuel_cost` from `total_heat_content_mmbtu` and `fuel_cost_per_mmbtu`). The Catalyst team developed a useful tool to access denormalized tables that we call the PUDL output object.

## What does the output layer provide?

Right now the output layer provides access to three different kinds of things:
 * denormalized tables
 * analytical outputs
 * partially integrated PUDL datasets that aren't in the database yet

## Why is the output layer useful?
Some benefits of using the output layer:
 * **Standardized denormalization:** You don't have to manually join the same tables together to get access to common attributes.
 * **Table caching:** many analyses rely on using the same table multiple times. The PUDL output object caches the tables in memory as pandas dataframes so you don't have to read tables from the database over and over again.
 * **Time series aggregation:** Some tables are annual, some monthly, some hourly. When you create a PUDL output object you can tell it to aggregate the data to either monthly or annual resolution for analysis.
 * **Standardized the filling-in of missing data:** There's a ton of missing or incomplete data. If requested, the output objects will use rolling averages and  data from the EIA API try to fill some of that missing data in.

# Instantiating Output Objects
* Grab the `pudl_settings`
* Create a connection to the PUDL Database
* Instantiate a `PudlTabl` object with that connection

In [5]:
pudl_settings = pudl.workspace.setup.get_defaults()
pudl_settings

{'pudl_in': '/home/zane/code/catalyst/pudl-work',
 'data_dir': '/home/zane/code/catalyst/pudl-work/data',
 'settings_dir': '/home/zane/code/catalyst/pudl-work/settings',
 'pudl_out': '/home/zane/code/catalyst/pudl-work',
 'sqlite_dir': '/home/zane/code/catalyst/pudl-work/sqlite',
 'parquet_dir': '/home/zane/code/catalyst/pudl-work/parquet',
 'datapkg_dir': '/home/zane/code/catalyst/pudl-work/datapkg',
 'notebook_dir': '/home/zane/code/catalyst/pudl-work/notebook',
 'ferc1_db': 'sqlite:////home/zane/code/catalyst/pudl-work/sqlite/ferc1.sqlite',
 'pudl_db': 'sqlite:////home/zane/code/catalyst/pudl-work/sqlite/pudl.sqlite'}

In [6]:
pudl_engine = sa.create_engine(pudl_settings["pudl_db"])
pudl_engine

Engine(sqlite:////home/zane/code/catalyst/pudl-work/sqlite/pudl.sqlite)

In [7]:
# this configuration will return tables without aggregating by a time frequency... we'll explore that more below.
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine=pudl_engine)

## List the output object methods
* There are dozens of different data access methods within the `PudlTabl` object. If you want to see all of them with their docstrings, you can un-comment and run `help(pudl_out)` in the next cell.
* If you type `pudl_out.` and press `Shift` and `Tab` at the same time, you'll see a list of available methods as well.

In [8]:
#help(pudl_out)

This cell will print out a simple list of all the available public methods inside the `pudl_out` object

In [9]:
# this is the master list of all of the methods in the pudl_out object
# they all return a table cooresponding to their name
methods_pudl_out = [
    method_name for method_name in dir(pudl_out)
    if callable(getattr(pudl_out, method_name))    # if it is a method
    and '__' not in method_name                    # remove the internal methods
]
methods_pudl_out

['adjacency_ba_ferc714',
 'advanced_metering_infrastructure_eia861',
 'balancing_authority_assn_eia861',
 'balancing_authority_eia861',
 'bf_eia923',
 'bga',
 'bga_eia860',
 'capacity_factor',
 'demand_forecast_pa_ferc714',
 'demand_hourly_pa_ferc714',
 'demand_monthly_ba_ferc714',
 'demand_response_eia861',
 'demand_side_management_eia861',
 'description_pa_ferc714',
 'distributed_generation_eia861',
 'distribution_systems_eia861',
 'dynamic_pricing_eia861',
 'energy_efficiency_eia861',
 'etl_eia861',
 'etl_ferc714',
 'fbp_ferc1',
 'frc_eia923',
 'fuel_cost',
 'fuel_ferc1',
 'gen_allocated_eia923',
 'gen_eia923',
 'gen_original_eia923',
 'gen_plants_ba_ferc714',
 'gens_eia860',
 'gf_eia923',
 'green_pricing_eia861',
 'hr_by_gen',
 'hr_by_unit',
 'id_certification_ferc714',
 'interchange_ba_ferc714',
 'lambda_description_ferc714',
 'lambda_hourly_ba_ferc714',
 'mcoe',
 'mergers_eia861',
 'net_energy_load_ba_ferc714',
 'net_metering_eia861',
 'non_net_metering_eia861',
 'operational_dat

## Basic Functionality

### Read a denormalized table
* Each of output object methods will return a Pandas Dataframe.
* Most of them correspond to a single database table, and will select all the data in that table, and automatically join it with some other useful information.
* Many of the access methods use an abbreviated name for the database table. E.g. the following reads all the data out of the `generators_eia860` table.

In [10]:
%%time
gens_eia860 = pudl_out.gens_eia860()
gens_eia860.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 271689 entries, 271688 to 0
Data columns (total 98 columns):
 #   Column                                   Non-Null Count   Dtype         
---  ------                                   --------------   -----         
 0   report_date                              271689 non-null  datetime64[ns]
 1   plant_id_eia                             271689 non-null  Int64         
 2   plant_id_pudl                            270224 non-null  Int64         
 3   plant_name_eia                           271686 non-null  object        
 4   utility_id_eia                           269535 non-null  Int64         
 5   utility_id_pudl                          266895 non-null  Int64         
 6   utility_name_eia                         270224 non-null  object        
 7   generator_id                             271689 non-null  object        
 8   associated_combined_heat_power           269341 non-null  object        
 9   balancing_authority_code_e

### Automatic dataframe caching
The `generators_eia860` table is quite long, and the above cell probably took several seconds to read 270,000 records each with 100 columns, creating an 800MB Dataframe. If you run the same output routine again, it will complete almost instantly because that dataframe is already stored inside `pudl_out`:

In [11]:
%%time
gens_again_eia860 = pudl_out.gens_eia860()

CPU times: user 14 µs, sys: 2 µs, total: 16 µs
Wall time: 19.6 µs


## Exploring pudl_out Arguments
Below, we'll explore the main arguments that are used to customize the PUDL output object. You can mix and match these options.

By default, the output object will read data from all available years, do no time aggregation, and not attempt to fill in missing values.

In [12]:
# here are the default arguments for the pudl_out object
pudl_out = pudl.output.pudltabl.PudlTabl(
    pudl_engine=pudl_engine, # we always need a pudl_engine
    freq=None,               # Desired time grouping to aggregate PUDL tables to.
    start_date=None,         # Beginning date for data to pull from the PUDL DB.
    end_date=None,           # End date for data to pull from the PUDL DB.
    fill_fuel_cost=False,    # Whether to fill in missing fuel costs with EIA monthly state-level averages.
    roll_fuel_cost=False,    # Whether to fill in monthly missing fuel costs with a 12-month rolling average.
    fill_net_gen=False,      # Whether to fill in missing net_generation_mwh by generator based on plant-level generation data.
)

### Time series aggregation
The PUDL output object can aggregate data on a monthly or annual basis, if you set the `freq` argument to `AS` (annual starting at the beginning of the calendar year) or `MS` (monthly starting at the beginning of the month) or [other equivalent frequency abbreviations](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases).

**NOTE:** Not all columns can be aggregated, so you may lose access to some kinds of information in aggregated outputs. If you need to retain information that gets lost in the default aggregation / groupby process, you may need to pull the unaggregated data and do your own aggregation.

In [13]:
pudl_out_as = pudl.output.pudltabl.PudlTabl(
    pudl_engine=pudl_engine, # we always need a pudl_engine
    freq='AS',               # Aggregate tables annually
)

In [14]:
gen_eia923_as = pudl_out_as.gen_eia923()
gen_eia923_as.head()

Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,generator_id,net_generation_mwh
0,2009-01-01,3,32,Barry,195,18,Alabama Power Co,1,221908.0
1,2009-01-01,3,32,Barry,195,18,Alabama Power Co,2,394031.0
2,2009-01-01,3,32,Barry,195,18,Alabama Power Co,3,1286393.0
3,2009-01-01,3,32,Barry,195,18,Alabama Power Co,4,1626547.0
4,2009-01-01,3,32,Barry,195,18,Alabama Power Co,5,4513101.0


In [15]:
pudl_out_ms = pudl.output.pudltabl.PudlTabl(
    pudl_engine=pudl_engine, # we always need a pudl_engine
    freq='MS',               # Aggregate tables monthly
)

In [16]:
gen_eia923_ms = pudl_out_ms.gen_eia923()
gen_eia923_ms.head()

Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,generator_id,net_generation_mwh
0,2009-01-01,3,32,Barry,195,18,Alabama Power Co,1,39699.0
1,2009-02-01,3,32,Barry,195,18,Alabama Power Co,1,5594.0
2,2009-03-01,3,32,Barry,195,18,Alabama Power Co,1,13015.0
3,2009-04-01,3,32,Barry,195,18,Alabama Power Co,1,15858.0
4,2009-05-01,3,32,Barry,195,18,Alabama Power Co,1,68232.0


### Filling in Missing Fuel Costs
 * The original EIA data is often incomplete.
 * Many utilities withold information about their fuel costs.
 * We have a couple of ways of estimating missing values, if you need complete data.

The ouput object created in the next cell will attempt to use all of these methods to fill in missing data.
To fill in missing fuel costs, we can pull monthly state-level average fuel costs from EIA, and we can use rolling averages to fill in short gaps in the data.
* Set `fill_fuel_cost=True` when creating an output object to pull average monthly fuel costs from the EIA API.
* Set `roll_fuel_cost=True` when creating an output object to use a 12-month rolling average based on available data to fill in gaps.
* These options can be used together to fill in as many gaps as possible.
* **NOTE:** You need to have set the `API_KEY_EIA` environment variable to a valid EIA API key for this to work. See instructions at the top of this notebook.

In [17]:
pudl_out_fill = pudl.output.pudltabl.PudlTabl(
    pudl_engine=pudl_engine, # we always need a pudl_engine
    freq='MS',               # Aggregate tables monthly
    fill_fuel_cost=True,     # Fill in missing fuel cost records with state-level averages from EIA's API
    roll_fuel_cost=True,     # Fill in missing fuel cost records with a 12-month rolling average.
)

In [18]:
%%time
frc_eia923_filled = pudl_out_fill.frc_eia923()
frc_eia923_filled.head()

filling in fuel cost NaNs EIA APIs monthly state averages
filling in fuel cost NaNs with rolling averages
CPU times: user 3min 4s, sys: 1.62 s, total: 3min 6s
Wall time: 3min 30s


Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,ash_content_pct,chlorine_content_ppm,fuel_cost_from_eiaapi,fuel_cost_per_mmbtu,fuel_qty_units,fuel_type_code_pudl,heat_content_mmbtu_per_unit,mercury_content_ppm,moisture_content_pct,sulfur_content_pct,total_fuel_cost,total_heat_content_mmbtu
0,2009-01-01,3,32,Barry,195,18,Alabama Power Co,10.013475,,False,4.52619,381438.0,coal,23.334763,,,0.938976,40286550.0,8900765.245
1,2009-02-01,3,32,Barry,195,18,Alabama Power Co,9.026785,,False,4.096987,410147.0,coal,23.056621,,,0.822421,38743590.0,9456604.085
2,2009-03-01,3,32,Barry,195,18,Alabama Power Co,6.449671,,False,3.709062,376787.0,coal,22.87676,,,0.487466,31970870.0,8619665.59
3,2009-04-01,3,32,Barry,195,18,Alabama Power Co,7.520152,,False,3.897879,105322.0,coal,23.160458,,,0.523833,9508120.0,2439305.763
4,2009-05-01,3,32,Barry,195,18,Alabama Power Co,6.669016,,False,3.67229,367333.0,coal,22.899997,,,0.621304,30891030.0,8411924.736


Looking at the filled vs. unfilled monthly data in the Fuel Receipts and Costs data from EIA 923, we can see that there are about 190k possible monthly records. Unfilled, we have fuel costs for about 107k of them. With the state level monthly fuel costs and rolling averages, we can get that up to about 116k records. An improvement, but it's not great. Unfortunately this data simply isn't reported publicly.

In [19]:
frc_eia923_ms = pudl_out_ms.frc_eia923()
frc_eia923_ms[["plant_id_eia", "report_date", "fuel_cost_per_mmbtu"]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 190115 entries, 0 to 190114
Data columns (total 3 columns):
 #   Column               Non-Null Count   Dtype         
---  ------               --------------   -----         
 0   plant_id_eia         190115 non-null  Int64         
 1   report_date          190115 non-null  datetime64[ns]
 2   fuel_cost_per_mmbtu  106695 non-null  float64       
dtypes: Int64(1), datetime64[ns](1), float64(1)
memory usage: 6.0 MB


In [20]:
frc_eia923_filled[["plant_id_eia", "report_date", "fuel_cost_per_mmbtu"]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 190115 entries, 0 to 190114
Data columns (total 3 columns):
 #   Column               Non-Null Count   Dtype         
---  ------               --------------   -----         
 0   plant_id_eia         190115 non-null  Int64         
 1   report_date          190115 non-null  datetime64[ns]
 2   fuel_cost_per_mmbtu  115786 non-null  float64       
dtypes: Int64(1), datetime64[ns](1), float64(1)
memory usage: 6.0 MB


# Denormalized Output Tables
Below, we'll extract and show a sample of each of the denormalized PUDL output tables.

In [21]:
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine=pudl_engine)

## EIA Forms 860 & 923

In [22]:
# here are all of the EIA tables
tables_eia = [
    t for t in methods_pudl_out 
    if '_eia' in t 
    and '_eia861' not in t       # avoid the EIA 861 tables for now bc it is preliminary
]
tables_eia

['bf_eia923',
 'bga_eia860',
 'frc_eia923',
 'gen_allocated_eia923',
 'gen_eia923',
 'gen_original_eia923',
 'gens_eia860',
 'gf_eia923',
 'own_eia860',
 'plants_eia860',
 'pu_eia860',
 'utils_eia860']

### EIA Plant Utility Associations

In [23]:
pu_assn_eia = pudl_out.pu_eia860()
pu_assn_eia.sample(4)

Unnamed: 0,report_date,plant_id_eia,plant_name_eia,plant_id_pudl,utility_id_eia,utility_name_eia,utility_id_pudl
71281,2018-01-01,57837,Alta Wind IX,6043,59869,"Pinyon Pines Wind II, LLC",2856
69806,2012-01-01,57665,SAF Hydroelectric LLC,5895,56989,SAF Hydroelectric LLC,3072
5296,2019-01-01,724,Terrora,917,7140,Georgia Power Co,123
79354,2013-01-01,58941,Icebreaker Offshore Wind Farm,10533,58804,Lake Erie Energy Development Corp,5085


### EIA 860 Boiler Generator Associations
* **NOTE:** We have filled in many more boiler-generator associations based on additional information. The `bga_source` column indicates where the association came from.

In [24]:
bga_eia860 = pudl_out.bga_eia860()
bga_eia860.sample(4)

Unnamed: 0,plant_id_eia,report_date,generator_id,boiler_id,unit_id_eia,unit_id_pudl,bga_source
82457,55221,2017-01-01,G8,G1,OSW1,1,eia860_org
94441,55841,2014-01-01,ST1,A01,PB1,1,eia860_org
19603,3456,2016-01-01,5CA1,5CT2,5CC,5,eia860_org
33013,10213,2017-01-01,GEN2,GEN1,CC1,1,string_assn


### EIA 860 Plants

In [25]:
plants_eia860 = pudl_out.plants_eia860()
plants_eia860.sample(4)

Unnamed: 0,plant_id_eia,plant_name_eia,balancing_authority_code_eia,balancing_authority_name_eia,city,county,ferc_cogen_status,ferc_exempt_wholesale_generator,ferc_small_power_producer,grid_voltage_kv,...,pipeline_notes,regulatory_status_code,transmission_distribution_owner_id,transmission_distribution_owner_name,transmission_distribution_owner_state,utility_id_eia,water_source,plant_id_pudl,utility_name_eia,utility_id_pudl
42888,50987,Rock Creek I,IPCO,Idaho Power Company,Twin Falls,Twin Falls,False,False,False,14.4,...,,NR,9191,Idaho Power Co,ID,17150,Rock Creek,3885,Shorock Hydro Inc,3158
56814,56188,Pinelawn Power LLC,NYIS,New York Independent System Operator,West Babylon,,False,True,False,69.0,...,,NR,11171,Long Island Power Authority,NY,49837,,4845,Pinelawn Power LLC,2850
27205,7158,Woodsdale,PJM,"PJM Interconnection, LLC",Trenton,Butler,False,False,False,345.0,...,,RE,3542,Duke Energy Ohio Inc,OH,55729,,651,Duke Energy Kentucky Inc,93
78112,58613,ABBK Biomass Plant,SWPP,Southwest Power Pool,Hugoton,Stevens,False,False,True,69.0,...,,NR,15073,"Pioneer Electric Coop, Inc - (KS)",KS,58566,Ogallala Aquifer,6696,Abengoa Bioenergy Biomass of Kansas,381


### EIA 860 Generators

In [26]:
gens_eia860 = pudl_out.gens_eia860()
gens_eia860.sample(4)

Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,generator_id,associated_combined_heat_power,balancing_authority_code_eia,...,timezone,topping_bottoming_code,turbines_inverters_hydrokinetics,turbines_num,ultrasupercritical_tech,uprate_derate_completed_date,uprate_derate_during_year,winter_capacity_mw,winter_estimated_capability_mw,zip_code
113957,2015-01-01,57783,5997,RE Bruceville 1 LLC,57093,2953,RE Bruceville LLC,BRU1,False,BANC,...,America/Los_Angeles,X,10.0,,,,False,5.0,,95757.0
148187,2014-01-01,8106,3023,Tipton,18947,1224,City of Tipton,3,False,MISO,...,America/Chicago,X,,,,,False,1.2,,52772.0
144094,2014-01-01,54834,4268,Fort Greely Power Plant,19272,1517,U S Army-Fort Greely,EN-2,False,,...,US/Alaska,X,,,,,,1.0,,99731.0
212294,2011-01-01,54856,7756,Riverside Manufacturing,22117,4116,Riverside Manufacturing Co,1753,False,SOCO,...,America/New_York,,,,,,,1.1,,31776.0


### EIA 860 Generator-level Ownership

In [27]:
own_eia860 = pudl_out.own_eia860()
own_eia860.sample(4)

Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,generator_id,owner_utility_id_eia,owner_name,fraction_owned,owner_city,owner_state,owner_street_address,owner_zip_code
20621,2014-01-01,6522,935,Yards Creek,9726,156,Jersey Central Power & Lt Co,3,15477,Public Service Elec & Gas Co,0.5,Newark,NJ,"P O Box 57080 Park Plaza, T9B",7102
33859,2017-01-01,2480,2083,Danskammer Generating Station,58971,1437,Danskammer Energy,3,60115,Mercuria Energy America Inc,1.0,Houston,TX,20E Green Way Plaza,77046
22106,2014-01-01,50788,7990,Lafayette Energy Partners LP,21148,3833,Zapco Energy Tactics Corp,HA2,10519,Milton Hydro,1.0,Milton,NH,Hydro Plant Road,3851
24801,2015-01-01,2167,2001,New Madrid,924,539,"Associated Electric Coop, Inc",1,13470,City of New Madrid - (MO),1.0,New Madrid,MO,560 Mott Street,63869


### EIA 923 Generation and Fuel Consumption

In [28]:
gf_eia923 = pudl_out.gf_eia923()
gf_eia923.sample(4)

Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,fuel_consumed_for_electricity_mmbtu,fuel_consumed_for_electricity_units,fuel_consumed_mmbtu,fuel_consumed_units,fuel_mmbtu_per_unit,fuel_type,fuel_type_code_aer,fuel_type_code_pudl,net_generation_mwh,nuclear_unit_id,prime_mover_code
510933,2013-10-01,2574,2120,High Dam,14240,2764,Oswego City of,36368.0,0.0,36368.0,0.0,0.0,WAT,HYC,hydro,3811.742,,HY
130836,2010-01-01,1004,173,Edwardsport,15470,92,"Duke Energy Indiana, LLC",2490.0,433.0,2490.0,433.0,5.75,DFO,DFO,oil,183.003,,ST
1270649,2018-06-01,10017,3072,WestRock-West Point Mill,17465,3753,"Smurfit-Stone Container Enterprises, Inc",0.0,0.0,0.0,0.0,0.0,RFO,RFO,oil,0.0,,ST
1466768,2019-09-01,54956,4319,Deercroft Gas Recovery,40211,352,"Wabash Valley Power Assn, Inc",20670.0,37582.0,20670.0,37582.0,0.55,LFG,MLG,gas,1670.341,,IC


### EIA 923 Fuel Receipts and Costs

In [29]:
frc_eia923 = pudl_out.frc_eia923()
frc_eia923.sample(4)

Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,ash_content_pct,chlorine_content_ppm,contract_expiration_date,...,moisture_content_pct,natural_gas_delivery_contract_type_code,natural_gas_transport_code,primary_transportation_mode_code,secondary_transportation_mode_code,state,sulfur_content_pct,supplier_name,total_fuel_cost,total_heat_content_mmbtu
249210,2014-04-01,2098,314,Lake Road,56211,161,Evergy Missouri West,0.0,,2015-03-01,...,,,firm,PL,,,0.0,southern star central pipeline,766986.6,161914.0
347976,2017-04-01,1893,69,Clay Boswell,12647,23,Minnesota Power Inc,4.8,0.0,2018-12-01,...,26.2,,,RR,,MT,0.38,spring creek,1621956.0,815873.24
367150,2017-01-01,7237,18,Angus Anson,13781,224,Northern States Power Co - Minnesota,0.0,,NaT,...,,firm,firm,PL,,,0.0,various (natural gas spot purchases only),35197.97,7402.308
290738,2015-10-01,3948,380,Mitchell,22053,162,Kentucky Power Co,10.88,,2015-12-01,...,,,,RV,,WV,0.91,koch,975153.7,371204.316


### EIA 923 Boiler Fuel Consumption

In [30]:
bf_eia923 = pudl_out.bf_eia923()
bf_eia923.sample(4)

Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,boiler_id,ash_content_pct,fuel_consumed_units,fuel_mmbtu_per_unit,fuel_type_code,fuel_type_code_pudl,sulfur_content_pct,total_heat_content_mmbtu
353656,2013-05-01,2720,77,Buck,5416,90,"Duke Energy Carolinas, LLC",8,0.0,0.0,0.0,BIT,coal,0.0,0.0
703517,2016-06-01,2527,2090,GMMGreenidge LLC,25,1934,GMM Holdings 1 LLC,6,0.0,0.0,0.0,BIT,coal,0.0,0.0
535932,2014-01-01,54562,4142,Longview Fibre,11169,2295,Longview Fibre Co,PB13,,,,OBL,waste,,
819026,2017-03-01,2876,305,Kyger Creek,14015,236,Ohio Valley Electric Corp,5,8.9,57256.0,25.114,BIT,coal,4.26,1437927.184


### EIA 923 Net Generation by Generator

In [31]:
gen_eia923 = pudl_out.gen_eia923()
gen_eia923.sample(4)

Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,generator_id,net_generation_mwh
55622,2010-03-01,10328,3204,T B Simon Power Plant,12436,2433,Michigan State University,GEN3,0.0
408868,2018-05-01,50629,3734,Covanta Lake County Energy,4482,1377,Covanta Lake Inc,GEN1,7568.0
30520,2009-05-01,54104,3999,Ashdown,5262,1500,Domtar Industries Inc,GEN2,25057.0
409452,2018-01-01,50772,3794,Viking Energy of Lincoln,19781,3667,Viking Energy Corp,GEN1,12884.0


## FERC Form 1
* Only a small subset of the 100+ tables that exist in the original FERC Form 1 have been cleaned and included in the PUDL DB.
* For tables not included here, you'll need to access the cloned multi-year FERC 1 DB that we produce. See the first tutorial notebook for more information.

In [32]:
# All of the FERC Form 1 tables end with _ferc1
tables_ferc1 = [
    t for t in methods_pudl_out 
    if '_ferc1' in t 
]
tables_ferc1

['fbp_ferc1',
 'fuel_ferc1',
 'plant_in_service_ferc1',
 'plants_hydro_ferc1',
 'plants_pumped_storage_ferc1',
 'plants_small_ferc1',
 'plants_steam_ferc1',
 'pu_ferc1',
 'purchased_power_ferc1']

### FERC 1 Large Steam Plants
The large steam plants report detailed operating expenses in this table, as well as operational characteristics.

In [33]:
plants_steam_ferc1 = pudl_out.plants_steam_ferc1()
plants_steam_ferc1.sample(4)

Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,plant_id_pudl,plant_id_ferc1,plant_name_ferc1,asset_retirement_cost,avg_num_employees,capacity_factor,...,opex_steam,opex_steam_other,opex_structures,opex_transfer,peak_demand_mw,plant_capability_mw,plant_hours_connected_while_generating,plant_type,record_id,water_limited_capacity_mw
3625,2004,85,203,National Grid Generation LLC,428,1103,northport,,136.0,0.517327,...,2052055.0,,2016863.0,,1418.0,,27484.0,steam,f1_steam_2004_12_85_1_5,1432.0
5196,2011,126,234,Ohio Edison Company,1209,4828,perry,-2824827.0,,,...,,,,,,158.0,,nuclear,f1_steam_2011_12_126_0_2,155.0
19246,2015,42,317,The Dayton Power and Light Company,184,1201,f. m. tait,,,1.8e-05,...,,,,,290.0,,628.0,combustion_turbine,f1_steam_2015_12_42_0_2,256.0
10121,1999,56,121,Florida Power & Light Company,204,1147,fort myers,,,0.025339,...,,,95359.0,,636.0,,4708.0,combustion_turbine,f1_steam_1999_12_56_0_4,552.0


### FERC 1 Fuel
Fuel consumption by the large steam plants, broken down by plant and fuel type.

In [34]:
fuel_ferc1 = pudl_out.fuel_ferc1()
fuel_ferc1.sample(5)

Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,plant_id_pudl,plant_name_ferc1,fuel_consumed_mmbtu,fuel_consumed_total_cost,fuel_cost_per_mmbtu,fuel_cost_per_unit_burned,fuel_cost_per_unit_delivered,fuel_mmbtu_per_unit,fuel_qty_burned,fuel_type_code_pudl,fuel_unit,record_id
14807,1998,108,204,"Nevada Power Company, d/b/a NV Energy",383,mohave 1 & 2,13917720.0,18489260.0,1.33,28.95,0.0,21.792,638662.0,coal,ton,f1_fuel_1998_12_108_0_10
13695,1996,193,363,Wisconsin Electric Power Company,601,valley-total,60041.47,201584.8,3.377,3.391,3.391,1.01,59447.0,gas,mcf,f1_fuel_1996_12_193_0_3
25715,2007,186,349,VIRGINIA ELECTRIC AND POWER COMPANY,228,gordonsville,3520490.0,28086140.0,12.42,8.305,8.69,1.041,3381835.0,gas,mcf,f1_fuel_2007_12_186_1_7
15972,1995,164,301,Southwestern Electric Power Company,3,pirkey 0008,42704.1,108926.4,2.55,2.64,0.0,1.035,41260.0,gas,mcf,f1_fuel_1995_12_164_1_9
30329,2019,55,91,"Duke Energy Florida, Inc.",595,univ of florida,4620.0,97901.89,21.191,122.838,135.589,5.796738,797.0,oil,bbl,f1_fuel_2019_12_55_2_8


### FERC 1 Fuel by Plant
Wide-form aggregated fuel totals by plant and year, identifying the relative cost and heat content proportions of different fuels, as well as the primary fuel for the plant.

In [35]:
fbp_ferc1 = pudl_out.fbp_ferc1()
fbp_ferc1.sample(4)

Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,plant_id_pudl,plant_name_ferc1,coal_fraction_cost,coal_fraction_mmbtu,fuel_cost,fuel_mmbtu,...,nuclear_fraction_cost,nuclear_fraction_mmbtu,oil_fraction_cost,oil_fraction_mmbtu,primary_fuel_by_cost,primary_fuel_by_mmbtu,unknown_fraction_cost,unknown_fraction_mmbtu,waste_fraction_cost,waste_fraction_mmbtu
17812,2007,194,364,Wisconsin Power and Light Company,554,s fond du lac unit 2,0.0,0.0,3986261.0,293696.3,...,0.0,0.0,0.002548,0.006008,gas,gas,0.0,0.0,0.0,0.0
16907,2012,193,363,Wisconsin Electric Power Company,127,concord-total,0.0,0.0,4137240.0,1105453.0,...,0.0,0.0,0.0,0.0,gas,gas,0.0,0.0,0.0,0.0
19427,2016,281,150,Interstate Power and Light Company,179,emery,0.0,0.0,34621760.0,12635310.0,...,0.0,0.0,0.000739,0.000139,gas,gas,0.0,0.0,0.0,0.0
8783,2009,101,191,MONONGAHELA POWER COMPANY,1213,pleasants,1.0,1.0,11995680.0,4513956.0,...,0.0,0.0,0.0,0.0,coal,coal,0.0,0.0,0.0,0.0


### FERC 1 Plant in Service
An accounting of how much electric plant infrastructure exists in each of the many FERC accounts. This is a very wide form table.

In [36]:
pis_ferc1 = pudl_out.plant_in_service_ferc1()
pis_ferc1.sample(5)

Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,record_id,amount_type,distribution_acct360_land,distribution_acct361_structures,distribution_acct362_station_equip,distribution_acct363_storage_battery_equip,...,transmission_acct352_structures,transmission_acct353_station_equip,transmission_acct354_towers,transmission_acct355_poles,transmission_acct356_overhead_conductors,transmission_acct357_underground_conduit,transmission_acct358_underground_conductors,transmission_acct359_1_asset_retirement,transmission_acct359_roads_trails,transmission_total
15120,2003,145,272,Public Service Company of Colorado,f1_plant_in_srvce_2003_12_145_0,transfers,,43806.0,255345.0,,...,,,,,,,,,,
19676,2019,187,35,Avista Corporation,f1_plant_in_srvce_2019_12_187_0,retirements,291.0,112985.0,690631.0,,...,17218.0,638457.0,,887580.0,315401.0,,,,,1928528.0
12327,1996,123,229,Northwestern Wisconsin Electric Company,f1_plant_in_srvce_1996_12_123_0,starting_balance,72595.0,311849.0,620106.0,,...,,1312623.0,,1222874.0,1289952.0,693.0,26392.0,,,3891224.0
24476,2013,309,230,NSTAR Electric Company,f1_plant_in_srvce_2013_12_309_0,transfers,,,549088.0,,...,,-1893542.0,,,-1016687.0,,2312038.0,372699.0,,-225492.0
21862,2000,224,348,Village of Morrisville Water and Light Department,f1_plant_in_srvce_2000_12_224_0,ending_balance,18810.0,4335.0,441524.0,,...,7449.0,598694.0,,804535.0,746792.0,1326.0,,,36908.0,2311463.0


### FERC 1 Purchased Power
A summary of electricity market transactions between utilities. Sadly the sellers are identified only by their names, and not their FERC Utility (Respondent) ID.

In [37]:
purchased_power_ferc1 = pudl_out.purchased_power_ferc1()
purchased_power_ferc1.sample(5)

Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,seller_name,record_id,billing_demand_mw,coincident_peak_demand_mw,delivered_mwh,demand_charges,energy_charges,non_coincident_peak_demand_mw,other_charges,purchase_type,purchased_mwh,received_mwh,tariff,total_settlement
36415,2017,70,140,Idaho Power Company,Rock Creek #1 Joint Venture,f1_purchased_pwr_2017_12_70_7_11,,,0.0,552508.0,517268.0,,0.0,long_unit,12516.0,0.0,-,1069776.0
159891,2017,57,123,Georgia Power Company,"Brookfield Energy Marketing, L.P.",f1_purchased_pwr_2017_12_57_0_10,,,0.0,0.0,92157.0,,0.0,other_service,3521.0,0.0,,92157.0
158879,2007,57,123,Georgia Power Company,CARROLL EMC,f1_purchased_pwr_2007_12_57_1_5,,,0.0,0.0,27415.0,,0.0,long_firm,0.0,0.0,V4 529,27415.0
47250,1998,27,96,"Duke Energy Ohio, Inc.","Vitol Gas & Electric, LLC",f1_purchased_pwr_1998_12_27_12_6,,,0.0,1456080.0,19652675.0,,0.0,other_service,621116.0,0.0,(1),21108755.0
93890,2014,164,301,Southwestern Electric Power Company,"Empire District Power Marketing (3,8)",f1_purchased_pwr_2014_12_164_1_6,,,0.0,0.0,218962.0,,9766.0,other_service,6278.0,0.0,,228728.0


# Analysis Outputs
* The PUDL Database is mainly meant to standardize the structure of data that's been reported in different ways over different years, so that it can all be used together.
* We typically don't include calculated values or big modifications to the original data.
* We're compiling a growing library of stock analyses in the `pudl.analysis` subpackage, which operate on data stored in the database.
* Some of these analytical outputs are build into the output object so that they can take advantage of the dataframe caching, and for convenient access.

## The Marginal Cost of Electricity (MCOE)
* One of our first analysis modules calculates fuel costs, heat rates, and capacity factors on a generator by generator basis.
* The long term goal is for it to provide a comprehensive marginal cost of electricity production (MCOE).
* The integration of operating costs from FERC Form 1 is still a work in progress, and hasn't been added in here yet.

### MCOE Requires Aggregation
* Fuel costs and other data need to be aggregated by month or year to calculate MCOE.
* This means we need an output object that aggregates by month or year.
* Because a single `NA` value can wipe out a whole aggregated category, you'll get more information with a Monthly aggregation.

In [38]:
pudl_out_monthly = pudl.output.pudltabl.PudlTabl(
    pudl_engine=pudl_engine,
    freq="MS",
    fill_fuel_cost=True,
    roll_fuel_cost=True,
)

### Heat Rate by Generation Unit (MMBTU/MWh)
* A "Generation Unit" (identifyed by `unit_id_pudl` here) is a group of "boilers" (where fuel is consumed) and "generators" (where electricity is made) which are connected to each other.
* Because the fuel inputs and electricity outputs are comingled, this is the most granular level at which a direct heat rate calculation can be done.

In [39]:
hr_by_unit = pudl_out_monthly.hr_by_unit()
hr_by_unit.sample(4)

Unnamed: 0,report_date,plant_id_eia,unit_id_pudl,net_generation_mwh,total_heat_content_mmbtu,heat_rate_mmbtu_mwh
188030,2015-07-01,55404,1,271919.0,2393111.846,8.800826
78354,2011-10-01,52151,2,3066.0,867123.7,282.819211
261925,2018-05-01,8102,2,820420.0,8377405.485,10.211118
213844,2016-07-01,2104,4,162445.0,1675322.862,10.31317


### Heat Rate by Generator (mmBTU/MWh)
* However, we do need per-generator heat rates to estimate per-generator fuel costs.

In [40]:
hr_by_gen = pudl_out_monthly.hr_by_gen()
hr_by_gen.sample(4)

Unnamed: 0,report_date,plant_id_eia,heat_rate_mmbtu_mwh,generator_id,fuel_type_code_pudl,fuel_type_count
84251,2011-12-01,6095,10.513318,1,coal,1
330554,2017-03-01,3797,inf,CW8,gas,2
243533,2015-06-01,7314,8.068276,NA1,gas,1
98892,2011-01-01,55320,0.008325,ST1,gas,1


### Per-generator Fuel Costs
* Calculate per-generator fuel costs based on heat rates and fuel deliveries

In [41]:
fuel_cost = pudl_out_monthly.fuel_cost()
fuel_cost.sample(4)

filling in fuel cost NaNs EIA APIs monthly state averages
filling in fuel cost NaNs with rolling averages


Unnamed: 0,plant_id_eia,report_date,generator_id,plant_name_eia,plant_id_pudl,utility_id_eia,utility_name_eia,utility_id_pudl,fuel_type_count,fuel_type_code_pudl,fuel_cost_from_eiaapi,fuel_cost_per_mmbtu,heat_rate_mmbtu_mwh,fuel_cost_per_mwh
87337,10148,2011-02-01,GEN2,White Pine Electric Power,3126,1951,White Pine Electric Power LLC,3769,1,coal,,,16.62722,
267610,56163,2015-11-01,3,KUCC,4833,49805,Kennecott Utah Copper Corporation,2185,3,coal,,,,
315198,57073,2016-07-01,ST1,Ivanpah 2,5465,57499,NRG Energy Services,2681,1,solar,,,33.173646,
99224,55382,2011-09-01,2STG,KGen Murray I and II LLC,4534,55756,OPC Murray,4460,1,gas,False,,2.4546,


### Per-generator Capacity Factor

In [42]:
capacity_factor = pudl_out_monthly.capacity_factor()
capacity_factor.sample(4)

Unnamed: 0,plant_id_eia,report_date,generator_id,net_generation_mwh,capacity_mw,capacity_factor
155022,2876,2013-05-01,5,116419.0,217.3,0.720097
49480,2067,2010-06-01,3,40.0,12.6,0.004409
347310,10822,2017-04-01,GEN1,12044.0,38.0,0.440205
63515,3630,2010-11-01,2,0.0,22.0,0.0


### Per-generator MCOE
* This function uses the cached dataframes that were generated above to produce a huge table of per-generator statistics.
* If you just called this function alone, all of those other dataframes would be automatically generated, and available within the output object.

In [43]:
mcoe = pudl_out_monthly.mcoe()
mcoe.sample(4)

Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,unit_id_pudl,generator_id,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,associated_combined_heat_power,...,total_fuel_cost,total_mmbtu,turbines_inverters_hydrokinetics,turbines_num,ultrasupercritical_tech,uprate_derate_completed_date,uprate_derate_during_year,winter_capacity_mw,winter_estimated_capability_mw,zip_code
114772,2012-05-01,3161,2287.0,2.0,2,Eddystone Generating Station,6035.0,1691.0,Exelon Power,False,...,,,,,,,,311.0,,19022.0
447662,2019-03-01,56309,4929.0,1.0,CT-2,Trigen St.Louis,50130.0,3516.0,Ashley Energy LLC,True,...,,0.0,,,,,False,7.8,,63102.0
469050,2017-06-01,8068,,,ST6A,,,,,,...,,,,,,,,,,
267357,2015-10-01,56079,4786.0,1.0,STEC,STEC-S LLC,15597.0,2996.0,Riceland Foods Inc.,True,...,,,,,,,False,18.0,,72160.0


# Preliminary Output Tables 
* Integrating a new dataset into the PUDL database requires many steps (datastore, extract, transform, load, outputs).
* Sometimes we need to use tables from new datasets as soon as possible for analysis.
* The interim extract and transform steps can be hacked into the output object to run on the fly, prior to DB integration.
* The data extraction and transformation can take a while though -- and it will need to be re-run from scratch every time you create a new output object.
* **WARNING:** None of this data has been fully validated, and the structure is likely to change. Some of it (especially the FERC 714) is still in a pretty raw state.

As of December 2020, we have preliminarily integrated EIA 861 and FERC 714 in this format.

## EIA Form 861
* The interim EIA 861 ETL is set up to automatically run in its entirety as soon as you request any EIA 861 table.
* This should take 2-5 minutes if you already have the raw input data avaialble.
* If raw input data needs to be downloaded [from our Zenodo archives](https://zenodo.org/record/4127029) first (which should happen automatically), it will take longer.

In [44]:
# here are all of the EIA 861 tables
methods_eia861 = [t for t in methods_pudl_out if '_eia861' in t and "etl" not in t]
methods_eia861

['advanced_metering_infrastructure_eia861',
 'balancing_authority_assn_eia861',
 'balancing_authority_eia861',
 'demand_response_eia861',
 'demand_side_management_eia861',
 'distributed_generation_eia861',
 'distribution_systems_eia861',
 'dynamic_pricing_eia861',
 'energy_efficiency_eia861',
 'green_pricing_eia861',
 'mergers_eia861',
 'net_metering_eia861',
 'non_net_metering_eia861',
 'operational_data_eia861',
 'reliability_eia861',
 'sales_eia861',
 'service_territory_eia861',
 'utility_assn_eia861',
 'utility_data_eia861']

### EIA 861 Balancing Authorities

In [45]:
ba_eia861 = pudl_out.balancing_authority_eia861()
ba_eia861.sample(4)

Running the interim EIA 861 ETL process! (~2 minutes)
Extracting eia861 spreadsheet data.


The data has not yet been validated, and the structure may change.


Transforming raw EIA 861 DataFrames for service_territory_eia861 concatenated across all years.
Assigned state FIPS codes for 100.00% of records.
Assigned county FIPS codes for 99.65% of records.
Transforming raw EIA 861 DataFrames for balancing_authority_eia861 concatenated across all years.
Started with 37622 missing BA Codes out of 39086 records (96.25%)
Ended with 12674 missing BA Codes out of 39086 records (32.43%)
Transforming raw EIA 861 DataFrames for sales_eia861 concatenated across all years.
Tidying the EIA 861 Sales table.
Dropped 0 duplicate records from EIA 861 Sales table, out of a total of 336550 records (0.0000% of all records). 
Performing value transformations on EIA 861 Sales table.
Transforming raw EIA 861 DataFrames for advanced_metering_infrastructure_eia861 concatenated across all years.
Tidying the EIA 861 Advanced Metering Infrastructure table.
Transforming raw EIA 861 DataFrames for demand_response_eia861 concatenated across all years.
Dropped 0 duplicate rec

  mask = arr == x


Unnamed: 0,report_date,balancing_authority_id_eia,balancing_authority_code_eia,balancing_authority_name_eia
10285,2004-01-01,8901,,Reliant Energy HL&P
18696,2007-01-01,13337,NPPD,Nebraska Public Power District
15525,2006-01-01,17881,,St Joseph Light & Power Co
10292,2004-01-01,3258,,Central Iowa Power Cooperative


### EIA 861 Advanced Metering Infrastructure

In [46]:
ami_eia861 = pudl_out.advanced_metering_infrastructure_eia861()
ami_eia861.sample(5)

Unnamed: 0,utility_id_eia,state,balancing_authority_code_eia,report_date,entity_type,short_form,utility_name_eia,customer_class,advanced_metering_infrastructure,automated_meter_reading,daily_digital_access_customers,direct_load_control_customers,energy_served_ami_mwh,home_area_network,non_amr_ami
50616,12839,IA,UNK,2014-01-01,,,City of Montezuma - (IA),industrial,,2.0,,,,,0.0
22004,3764,MN,UNK,2011-01-01,,,Clearwater-Polk Elec Coop Inc,transportation,,,,,,,
109671,12087,MT,SWPP,2019-01-01,Cooperative,,McKenzie Electric Coop Inc,industrial,,80.0,,,,,
89391,21101,OH,PJM,2017-01-01,,,Village of Yellow Springs - (OH),industrial,,16.0,,,,,
18210,14006,OH,UNK,2010-01-01,,,Ohio Power Co,commercial,,8024.0,,,,,


### EIA 861 Sales
How much electricity did utilities report selling to different types of customers in each year by state?

In [47]:
sales_eia861 = pudl_out.sales_eia861()
sales_eia861.sample(5)

Unnamed: 0,utility_id_eia,state,report_date,balancing_authority_code_eia,business_model,data_observed,entity_type,service_type,short_form,utility_name_eia,customer_class,customers,sales_mwh,sales_revenue
99998,13839,MA,2005-01-01,UNK,retail,True,Municipal,bundled,,City of Norwood,other,,,
358550,4041,WA,2018-01-01,BPAT,retail,True,Cooperative,bundled,,"Columbia Rural Elec Assn, Inc",other,,,
143067,13037,UT,2007-01-01,UNK,retail,True,Municipal,,,City of Mt Pleasant,residential,1744.0,10211.0,1016000.0
380835,4437,TN,2019-01-01,TVA,retail,True,Municipal,bundled,,City of Covington - (TN),residential,3700.0,48534.0,4891000.0
90602,3844,SC,2005-01-01,UNK,retail,True,Cooperative,bundled,,"Coastal Electric Coop, Inc",other,,,


### EIA 861 Service Territories
Which counties (with FIPS codes) each utility reported serving in each year.

In [48]:
st_eia861 = pudl_out.service_territory_eia861()
st_eia861.sample(5)

Unnamed: 0,county,short_form,state,utility_id_eia,utility_name_eia,report_date,state_id_fips,county_id_fips
183639,Roger Mills,,OK,14063,Oklahoma Gas & Electric Co,2016-01-01,40,40129
86363,Horry,,SC,8786,Horry Electric Coop Inc,2008-01-01,45,45051
205281,Chambers,,AL,10570,City of Lafayette - (AL),2018-01-01,1,1017
24452,Catron,,NM,23326,"Sierra Electric Coop, Inc",2003-01-01,35,35003
11253,Warren,,GA,7140,Georgia Power Co,2001-01-01,13,13301


## FERC Form 714
* **NOTE:** Most of the FERC Form 714 tables have not yet been fully processed.
* We have primarily been focused on the historical hourly demand reported by planning areas.
* The hourly demand by planning area contains ~10 million rows and so it takes a while to process.
* As with the EIA 861, the full interim ETL will be run as soon as you ask for any FERC 714 table.
* Also as with the EIA 861, if you don't have the [raw FERC 714 input files](https://zenodo.org/record/4127101) cached locally already, they might take a minute to download.
* Currently this whole process takes 10-15 minutes so... you might want to go get a snack.

In [49]:
# here are all of the FERC 714 tables
methods_ferc714 = [t for t in methods_pudl_out if '_ferc714' in t and "etl" not in t]
methods_ferc714

['adjacency_ba_ferc714',
 'demand_forecast_pa_ferc714',
 'demand_hourly_pa_ferc714',
 'demand_monthly_ba_ferc714',
 'description_pa_ferc714',
 'gen_plants_ba_ferc714',
 'id_certification_ferc714',
 'interchange_ba_ferc714',
 'lambda_description_ferc714',
 'lambda_hourly_ba_ferc714',
 'net_energy_load_ba_ferc714',
 'respondent_id_ferc714']

### FERC 714 Respondents

In [50]:
%%time
respondent_id_ferc714 = pudl_out.respondent_id_ferc714()
respondent_id_ferc714.sample(5)

Running the interim FERC 714 ETL process! (~11 minutes)
Extracting respondent_id_ferc714 from CSV into pandas DataFrame.


The data has not yet been validated, and the structure may change.


Extracting id_certification_ferc714 from CSV into pandas DataFrame.
Extracting gen_plants_ba_ferc714 from CSV into pandas DataFrame.
Extracting demand_monthly_ba_ferc714 from CSV into pandas DataFrame.
Extracting net_energy_load_ba_ferc714 from CSV into pandas DataFrame.
Extracting adjacency_ba_ferc714 from CSV into pandas DataFrame.
Extracting interchange_ba_ferc714 from CSV into pandas DataFrame.
Extracting lambda_hourly_ba_ferc714 from CSV into pandas DataFrame.
Extracting lambda_description_ferc714 from CSV into pandas DataFrame.
Extracting description_pa_ferc714 from CSV into pandas DataFrame.
Extracting demand_forecast_pa_ferc714 from CSV into pandas DataFrame.
Extracting demand_hourly_pa_ferc714 from CSV into pandas DataFrame.
Transforming respondent_id_ferc714.
Transforming id_certification_ferc714.
Transforming gen_plants_ba_ferc714.
Transforming demand_monthly_ba_ferc714.
Transforming net_energy_load_ba_ferc714.
Transforming adjacency_ba_ferc714.
Transforming interchange_ba_f

Unnamed: 0,respondent_id_ferc714,respondent_name_ferc714,eia_code
4,102,Alabama Power Company,195
11,109,Ameren CILCO,3252
34,138,City of Lafayette Utilities System,9096
133,242,Reedy Creek Improvement District,54849
82,189,Kansas Gas & Electric (KG&E) a Westar Energy c...,10005


### FERC 714 Hourly Demand by Planning Area

In [51]:
demand_hourly_pa_ferc714 = pudl_out.demand_hourly_pa_ferc714()
demand_hourly_pa_ferc714.sample(20)

Unnamed: 0,respondent_id_ferc714,utc_datetime,timezone,demand_mwh,report_date
7172112,148,2006-03-29 07:00:00,America/Chicago,0.0,2006-01-01
10880334,167,2006-10-21 11:00:00,America/New_York,6995.0,2006-01-01
3706588,271,2019-11-14 09:00:00,America/New_York,960.0,2019-01-01
10098268,173,2010-08-01 11:00:00,America/Chicago,1033.0,2010-01-01
7911753,209,2011-03-18 16:00:00,America/Chicago,1409.0,2011-01-01
7471580,250,2011-05-29 02:00:00,America/New_York,3000.0,2011-01-01
13111664,227,2007-05-14 16:00:00,America/Los_Angeles,11445.0,2007-01-01
9477843,141,2012-06-19 10:00:00,America/Chicago,355.0,2012-01-01
4445235,102,2010-02-21 09:00:00,America/Chicago,6365.0,2010-01-01
9244623,230,2016-11-09 21:00:00,America/New_York,86941.01,2016-01-01


# Future Analyses
The output object contains a lot of different kinds of things, and as we accumulate more and different kinds of analyses in our library, we're looking to break them out into their own reusable classes that access the database directly. Some work in progress here is related to constructing historical service territory geometries for both utilities and balancing authorities, and associating that data usefully with the FERC 714 respondents. Exploring that in detail is beyond the scope of this notebook, but check out the `pudl.output.ferc714` and `pudl.analysis.service_territory` modules for examples.

In [52]:
%%time
ferc714_out = pudl.output.ferc714.Respondents(pudl_out)
annualized = ferc714_out.annualize()
categorized = ferc714_out.categorize()
summarized = ferc714_out.summarize_demand()
fipsified = ferc714_out.fipsify()
counties_gdf = ferc714_out.georef_counties()

  mask = arr == x


We've already got the 2010 Census GeoDB.
Extracting the GeoDB into a GeoDataFrame


  mask = arr == x


CPU times: user 18.5 s, sys: 806 ms, total: 19.3 s
Wall time: 48.2 s


  mask = arr == x
