## Configure PUDL
The `.pudl.yml` configuration file tells PUDL where to look for data. Uncomment the next cell and run it if you're on our 2i2c JupyterHub.

In [1]:
#!cp ~/shared/shared-pudl.yml ~/.pudl.yml

In [2]:
# import the necessary packages
%load_ext autoreload
%autoreload 2

import logging
import os
import sys

import pandas as pd
import sqlalchemy as sa
import pudl

In [3]:
# setup for python logging
logger=logging.getLogger()
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter('%(message)s')
handler.setFormatter(formatter)
logger.handlers = [handler]

## Set your EIA API Key
Some of the routines in this notebook pull data from the EIA API to fill in missing fuel cost values. For them to work, you'll need to provide an API KEY.
* [Obtain an EIA API key here](https://www.eia.gov/opendata/register.php)
* If you put your API key in a shell environment variable named `API_KEY_EIA`, the next cell should work as it is (if you're using the `docker-compose.yml` in the `pudl-tutorial` repository to run our container yourself, it `API_KEY_EIA` will be passed in from your shell to the container automatically).
* You can also uncomment the first line of the next cell, and use it to set the `API_KEY_EIA` environment variable directly.
* If you're running this notebook on your own computer and want to learn more about setting environment variables outside of the notebook [see this blog post](https://www.twilio.com/blog/2017/01/how-to-set-environment-variables.html))

In [4]:
# os.environ["API_KEY_EIA"] = "put your API key here"
assert os.environ.get("API_KEY_EIA") is not None

# Using the PUDL output layer
The PUDL database tables are a clean, [normalized](https://en.wikipedia.org/wiki/Database_normalization) version of US electricity data. Normalized tables are great for databases and storage, but for interactive use, we often want a version of the data that includes plant and utility names and other associated info all in a single dataframe. These are "denormalized" tables. In addition to the referenced names and attributes like latitude and longitude or state, the denormalized tables often contain frequently calculated derived values (like calcuating `total_fuel_cost` from `total_heat_content_mmbtu` and `fuel_cost_per_mmbtu`). The Catalyst team developed a useful tool to access denormalized tables that we call the PUDL output object.

## What does the output layer provide?

Right now the output layer provides access to three different kinds of things:
 * denormalized tables
 * analytical outputs
 * partially integrated PUDL datasets that aren't in the database yet

## Why is the output layer useful?
Some benefits of using the output layer:
 * **Standardized denormalization:** You don't have to manually join the same tables together to get access to common attributes.
 * **Table caching:** many analyses rely on using the same table multiple times. The PUDL output object caches the tables in memory as pandas dataframes so you don't have to read tables from the database over and over again.
 * **Time series aggregation:** Some tables are annual, some monthly, some hourly. When you create a PUDL output object you can tell it to aggregate the data to either monthly or annual resolution for analysis.
 * **Standardized the filling-in of missing data:** There's a ton of missing or incomplete data. If requested, the output objects will use rolling averages and  data from the EIA API try to fill some of that missing data in.

# Instantiating Output Objects
* Grab the `pudl_settings`
* Create a connection to the PUDL Database
* Instantiate a `PudlTabl` object with that connection

In [5]:
pudl_settings = pudl.workspace.setup.get_defaults()
pudl_settings

{'pudl_in': '/home/zane/code/catalyst/pudl-work',
 'data_dir': '/home/zane/code/catalyst/pudl-work/data',
 'settings_dir': '/home/zane/code/catalyst/pudl-work/settings',
 'pudl_out': '/home/zane/code/catalyst/pudl-work',
 'sqlite_dir': '/home/zane/code/catalyst/pudl-work/sqlite',
 'parquet_dir': '/home/zane/code/catalyst/pudl-work/parquet',
 'ferc1_db': 'sqlite:////home/zane/code/catalyst/pudl-work/sqlite/ferc1.sqlite',
 'pudl_db': 'sqlite:////home/zane/code/catalyst/pudl-work/sqlite/pudl.sqlite',
 'censusdp1tract_db': 'sqlite:////home/zane/code/catalyst/pudl-work/sqlite/censusdp1tract.sqlite'}

In [6]:
pudl_engine = sa.create_engine(pudl_settings["pudl_db"])
pudl_engine

Engine(sqlite:////home/zane/code/catalyst/pudl-work/sqlite/pudl.sqlite)

In [7]:
# this configuration will return tables without aggregating by a time frequency... we'll explore that more below.
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine=pudl_engine)

## List the output object methods
* There are dozens of different data access methods within the `PudlTabl` object. If you want to see all of them with their docstrings, you can un-comment and run `help(pudl_out)` in the next cell.
* If you type `pudl_out.` and press `Shift` and `Tab` at the same time, you'll see a list of available methods as well.

In [8]:
#help(pudl_out)

This cell will print out a simple list of all the available public methods inside the `pudl_out` object

In [9]:
# this is the master list of all of the methods in the pudl_out object
# they all return a table cooresponding to their name
methods_pudl_out = [
    method_name for method_name in dir(pudl_out)
    if callable(getattr(pudl_out, method_name))    # if it is a method
    and '__' not in method_name                    # remove the internal methods
]
methods_pudl_out

['adjacency_ba_ferc714',
 'advanced_metering_infrastructure_eia861',
 'all_plants_ferc1',
 'balancing_authority_assn_eia861',
 'balancing_authority_eia861',
 'bf_eia923',
 'bga_eia860',
 'capacity_factor',
 'demand_forecast_pa_ferc714',
 'demand_hourly_pa_ferc714',
 'demand_monthly_ba_ferc714',
 'demand_response_eia861',
 'demand_response_water_heater_eia861',
 'demand_side_management_ee_dr_eia861',
 'demand_side_management_misc_eia861',
 'demand_side_management_sales_eia861',
 'description_pa_ferc714',
 'distributed_generation_fuel_eia861',
 'distributed_generation_misc_eia861',
 'distributed_generation_tech_eia861',
 'distribution_systems_eia861',
 'dynamic_pricing_eia861',
 'energy_efficiency_eia861',
 'etl_eia861',
 'etl_ferc714',
 'fbp_ferc1',
 'frc_eia923',
 'fuel_cost',
 'fuel_ferc1',
 'gen_eia923',
 'gen_fuel_by_generator_eia923',
 'gen_fuel_by_generator_energy_source_eia923',
 'gen_fuel_by_generator_energy_source_owner_eia923',
 'gen_original_eia923',
 'gen_plants_ba_ferc714',

## Basic Functionality

### Read a denormalized table
* Each of output object methods will return a Pandas Dataframe.
* Most of them correspond to a single database table, and will select all the data in that table, and automatically join it with some other useful information.
* Many of the access methods use an abbreviated name for the database table. E.g. the following reads all the data out of the `generators_eia860` table.

In [10]:
%%time
gens_eia860 = pudl_out.gens_eia860().info()

Filling technology type
Filled technology_type coverage now at 98.1%
<class 'pandas.core.frame.DataFrame'>
Int64Index: 491469 entries, 491468 to 0
Columns: 111 entries, report_date to zip_code
dtypes: Int64(9), boolean(28), datetime64[ns](11), float64(18), int64(1), string(44)
memory usage: 345.4 MB
CPU times: user 38.6 s, sys: 4.95 s, total: 43.6 s
Wall time: 44.2 s


### Automatic dataframe caching
The `generators_eia860` table is quite long, and the above cell probably took several seconds to read 270,000 records each with 100 columns, creating an 800MB Dataframe. If you run the same output routine again, it will complete almost instantly because that dataframe is already stored inside `pudl_out`. This is memory intensive, but can save time in calculations that need to use the same tables several times.

In [11]:
%%time
gens_again_eia860 = pudl_out.gens_eia860()

CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 6.91 µs


## Exploring `pudl_out` Arguments
Below, we'll explore the main arguments that are used to customize the PUDL output object. You can mix and match these options.

By default, the output object will read data from all available years, do no time aggregation, and not attempt to fill in missing values.

In [12]:
# here are the default arguments for the pudl_out object
pudl_out = pudl.output.pudltabl.PudlTabl(
    pudl_engine=pudl_engine, # we always need a pudl_engine
    freq=None,               # Desired time grouping to aggregate PUDL tables to.
    start_date=None,         # Beginning date for data to pull from the PUDL DB.
    end_date=None,           # End date for data to pull from the PUDL DB.
    fill_fuel_cost=False,    # Whether to fill in missing fuel costs with EIA monthly state-level averages.
    roll_fuel_cost=False,    # Whether to fill in monthly missing fuel costs with a 12-month rolling average.
)

### Time series aggregation
The PUDL output object can aggregate data on a monthly or annual basis, if you set the `freq` argument to `AS` (annual starting at the beginning of the calendar year) or `MS` (monthly starting at the beginning of the month) or [other equivalent frequency abbreviations](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timeseries-offset-aliases).

**NOTE:** Not all columns can be aggregated, so you may lose access to some kinds of information in aggregated outputs. If you need to retain information that gets lost in the default aggregation / groupby process, you may need to pull the unaggregated data and do your own aggregation.

In [13]:
pudl_out_as = pudl.output.pudltabl.PudlTabl(
    pudl_engine=pudl_engine, # we always need a pudl_engine
    freq='AS',               # Aggregate tables annually
)

In [14]:
pudl_out_as.gen_eia923().head()

Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,generator_id,net_generation_mwh,unit_id_pudl
0,2008-01-01,3,32,Barry,195,18,Alabama Power Co,1,873997.0,
1,2009-01-01,3,32,Barry,195,18,Alabama Power Co,1,221908.0,1.0
2,2010-01-01,3,32,Barry,195,18,Alabama Power Co,1,435334.0,1.0
3,2011-01-01,3,32,Barry,195,18,Alabama Power Co,1,312130.0,1.0
4,2012-01-01,3,32,Barry,195,18,Alabama Power Co,1,152102.0,1.0


In [15]:
pudl_out_ms = pudl.output.pudltabl.PudlTabl(
    pudl_engine=pudl_engine, # we always need a pudl_engine
    freq='MS',               # Aggregate tables monthly
)

In [16]:
pudl_out_ms.gen_eia923().head()

Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,generator_id,net_generation_mwh,unit_id_pudl
0,2008-01-01,3,32,Barry,195,18,Alabama Power Co,1,96021.0,
1,2008-02-01,3,32,Barry,195,18,Alabama Power Co,1,79256.0,
2,2008-03-01,3,32,Barry,195,18,Alabama Power Co,1,91687.0,
3,2008-04-01,3,32,Barry,195,18,Alabama Power Co,1,73693.0,
4,2008-05-01,3,32,Barry,195,18,Alabama Power Co,1,68161.0,


### Filling in Missing Fuel Costs
 * The original EIA data is often incomplete.
 * Many utilities withold information about their fuel costs.
 * We have a couple of ways of estimating missing values, if you need complete data.

The ouput object created in the next cell will attempt to use all of these methods to fill in missing data.
To fill in missing fuel costs, we can pull monthly state-level average fuel costs from EIA, and we can use rolling averages to fill in short gaps in the data.
* Set `fill_fuel_cost=True` when creating an output object to pull average monthly fuel costs from the EIA API.
* Set `roll_fuel_cost=True` when creating an output object to use a 12-month rolling average based on available data to fill in gaps.
* These options can be used together to fill in as many gaps as possible.
* **NOTE:** You need to have set the `API_KEY_EIA` environment variable to a valid EIA API key for this to work. See instructions at the top of this notebook.

In [17]:
pudl_out_fill = pudl.output.pudltabl.PudlTabl(
    pudl_engine=pudl_engine, # we always need a pudl_engine
    freq='MS',               # Aggregate tables monthly
    fill_fuel_cost=True,     # Fill in missing fuel cost records with state-level averages from EIA's API
    roll_fuel_cost=True,     # Fill in missing fuel cost records with a 12-month rolling average.
)

In [18]:
%%time
pudl_out_fill.frc_eia923().head()

filling in fuel cost NaNs EIA APIs monthly state averages
filling in fuel cost NaNs with rolling averages
CPU times: user 3min 49s, sys: 1.23 s, total: 3min 50s
Wall time: 4min 15s


Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,ash_content_pct,chlorine_content_ppm,fuel_consumed_mmbtu,fuel_cost_from_eiaapi,fuel_cost_per_mmbtu,fuel_mmbtu_per_unit,fuel_received_units,fuel_type_code_pudl,mercury_content_ppm,moisture_content_pct,sulfur_content_pct,total_fuel_cost
0,2008-01-01,3,32,Barry,195,18,Alabama Power Co,5.450288,,7183512.0,False,2.131684,23.049712,311653.0,coal,,,0.488324,15312980.0
1,2008-02-01,3,32,Barry,195,18,Alabama Power Co,5.5939,,5679395.265,False,2.143524,22.995086,246983.0,coal,,,0.502347,12173920.0
2,2008-03-01,3,32,Barry,195,18,Alabama Power Co,5.51,,6720962.13,False,2.574383,22.987393,292376.0,coal,,,0.506358,17302330.0
3,2008-04-01,3,32,Barry,195,18,Alabama Power Co,5.586936,,8092480.028,False,2.787388,22.919484,353083.0,coal,,,0.500435,22556880.0
4,2008-05-01,3,32,Barry,195,18,Alabama Power Co,5.309342,,7715891.226,False,2.788092,22.886312,337140.0,coal,,,0.528132,21512610.0


Looking at the filled vs. unfilled monthly data in the Fuel Receipts and Costs data from EIA 923, we can see that there are about 190k possible monthly records. Unfilled, we have fuel costs for about 107k of them. With the state level monthly fuel costs and rolling averages, we can get that up to about 116k records. An improvement, but it's not great. Unfortunately this data simply isn't reported publicly.

In [19]:
pudl_out_ms.frc_eia923()[["plant_id_eia", "report_date", "fuel_cost_per_mmbtu"]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 230057 entries, 0 to 231966
Data columns (total 3 columns):
 #   Column               Non-Null Count   Dtype         
---  ------               --------------   -----         
 0   plant_id_eia         230057 non-null  Int64         
 1   report_date          230057 non-null  datetime64[ns]
 2   fuel_cost_per_mmbtu  127783 non-null  float64       
dtypes: Int64(1), datetime64[ns](1), float64(1)
memory usage: 7.2 MB


In [20]:
pudl_out_fill.frc_eia923()[["plant_id_eia", "report_date", "fuel_cost_per_mmbtu"]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 230057 entries, 0 to 231966
Data columns (total 3 columns):
 #   Column               Non-Null Count   Dtype         
---  ------               --------------   -----         
 0   plant_id_eia         230057 non-null  Int64         
 1   report_date          230057 non-null  datetime64[ns]
 2   fuel_cost_per_mmbtu  208402 non-null  float64       
dtypes: Int64(1), datetime64[ns](1), float64(1)
memory usage: 7.2 MB


## Free Memory
Because this JupyterHub has limited memory, we need to delete the cached dataframes when we're done with them.

In [21]:
del pudl_out
del pudl_out_ms
del pudl_out_as
del pudl_out_fill

# Denormalized Output Tables
* Below, we'll extract and show a sample of many of the available denormalized PUDL output tables.
* If you'd like to see more than 5 sample rows, feel free to change `n_samples` below.
* Rather than assigning the results of these functions to a local variable in the notebook, we're showing samples from the cached dataframes to conserve memory, as this JupyterHub has limited RAM available at the moment.

In [22]:
n_samples = 5
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine=pudl_engine)

## EIA Forms 860 & 923

In [23]:
# here are all of the EIA tables
tables_eia = [
    t for t in methods_pudl_out 
    if '_eia' in t 
    and '_eia861' not in t       # avoid the EIA 861 tables for now bc it is preliminary
]
tables_eia

['bf_eia923',
 'bga_eia860',
 'frc_eia923',
 'gen_eia923',
 'gen_fuel_by_generator_eia923',
 'gen_fuel_by_generator_energy_source_eia923',
 'gen_fuel_by_generator_energy_source_owner_eia923',
 'gen_original_eia923',
 'gens_eia860',
 'gens_mega_eia',
 'gf_eia923',
 'gf_nonuclear_eia923',
 'gf_nuclear_eia923',
 'own_eia860',
 'plant_parts_eia',
 'plants_eia860',
 'pu_eia860',
 'utils_eia860']

### EIA Plant Utility Associations
This is mostly a helper function, used for adding plant and utility names and IDs into other output tables.

In [24]:
%%time
pudl_out.pu_eia860().sample(n_samples)

CPU times: user 9.67 s, sys: 152 ms, total: 9.83 s
Wall time: 10.4 s


Unnamed: 0,report_date,plant_id_eia,plant_name_eia,plant_id_pudl,utility_id_eia,utility_name_eia,utility_id_pudl
145551,2014-01-01,59246,Goodnight,10063,59056,"Tri Global Energy, LLC",5146
23895,2005-01-01,2039,Elk River,1955,7570,Great River Energy,1926
11113,2006-01-01,796,Ray Roberts,8141,5063,Denton City of,3906
62167,2016-01-01,9842,Newhalem,3049,16868,Seattle City of,1183
93279,2021-01-01,54813,Shepherd Center,4243,17069,Shepherd Center,3150


### EIA 860 Boiler Generator Associations
* **NOTE:** We have filled in many more boiler-generator associations based on additional information. The `bga_source` column indicates where the association came from.

In [25]:
%%time
pudl_out.bga_eia860().sample(n_samples)

CPU times: user 1.2 s, sys: 4.13 ms, total: 1.2 s
Wall time: 1.27 s


Unnamed: 0,plant_id_eia,report_date,generator_id,boiler_id,unit_id_eia,unit_id_pudl,bga_source
45117,50392,2017-01-01,TG4,5A,,1,eia860_org
76323,4040,2013-01-01,ST2,BO12,PWG1,1,eia860_org
39583,10216,2016-01-01,GEN1,PB1,,1,eia860_org
35771,50244,2015-01-01,GEN9,RCB,,1,eia860_org
109070,55221,2019-01-01,G2,G1,OSW1,1,string_assn


### EIA 860 Plants

In [26]:
%%time
pudl_out.plants_eia860().sample(n_samples)

CPU times: user 5.71 s, sys: 96.2 ms, total: 5.81 s
Wall time: 5.93 s


Unnamed: 0,plant_id_eia,plant_name_eia,balancing_authority_code_eia,balancing_authority_name_eia,city,county,ferc_cogen_status,ferc_exempt_wholesale_generator,ferc_small_power_producer,grid_voltage_kv,grid_voltage_2_kv,grid_voltage_3_kv,iso_rto_code,latitude,longitude,primary_purpose_id_naics,sector_name_eia,sector_id_eia,state,street_address,zip_code,timezone,report_date,ash_impoundment,ash_impoundment_lined,ash_impoundment_status,datum,energy_storage,ferc_cogen_docket_no,ferc_exempt_wholesale_generator_docket_no,ferc_small_power_producer_docket_no,liquefied_natural_gas_storage,natural_gas_local_distribution_company,natural_gas_storage,natural_gas_pipeline_name_1,natural_gas_pipeline_name_2,natural_gas_pipeline_name_3,nerc_region,net_metering,pipeline_notes,regulatory_status_code,respondent_frequency,service_area,transmission_distribution_owner_id,transmission_distribution_owner_name,transmission_distribution_owner_state,utility_id_eia,water_source,plant_id_pudl,utility_name_eia,utility_id_pudl
89273,54372,University of Colorado,PSCO,Public Service Company of Colorado,Boulder,Boulder,True,False,False,13.8,,,,40.00759,-105.2692,611,Commercial CHP,5,CO,18th St and Colorado,80309,America/Denver,2020-01-01,,False,,,False,90-160-000,,,,Other - See pipeline notes.,False,,,,WECC,,Xcel Energy,NR,M,,15466,Public Service Co of Colorado,CO,22208,Air Cooled Condensor,4060,University of Colorado,3583
147983,59475,Palo Duro Wind,SWPP,Southwest Power Pool,Perryton,Ochiltree,False,False,False,345.0,,,,36.243889,-101.0014,22,IPP Non-CHP,2,TX,14535 FM 1267,79070,America/Chicago,2020-01-01,,False,,,False,,,,,,,,,,MRO,,,NR,M,,14063,Oklahoma Gas & Electric Co,OK,59238,,7235,Palo Duro Wind,2785
152287,60055,Innovative Solar 16,DUK,Duke Energy Carolinas,Hendersonville,Henderson,False,True,True,22.86,,,,35.363863,-82.35,22,IPP Non-CHP,2,NC,3364 Ridge Road,28792,America/New_York,2016-01-01,,,,,False,,,13-407-001,,,,,,,SERC,,,NR,,,5416,"Duke Energy Carolinas, LLC",NC,59787,,7512,Innovative Solar 16,2102
120108,56636,Mountain Home,IPCO,Idaho Power Company,,Elmore,False,False,True,,,,,43.0272,-115.4656,22,IPP Non-CHP,2,ID,18645 Old Oregon Trail Road,83623,America/Boise,2007-01-01,,,,NAD83,,,,07-37-000,,,,,,,WECC,,,NR,,,9191,,ID,55891,,5155,Hot Springs,2033
5962,418,Kelly Ridge,CISO,California Independent System Operator,Oroville,Butte,False,False,False,60.0,,,CAISO,39.531784,-121.4912,22,Electric Utility,1,CA,Oroville Dam Blvd,95965,America/Los_Angeles,2014-01-01,False,False,,,,,,,,,,,,,WECC,,,RE,,,14328,Pacific Gas & Electric Co,CA,14191,South Fork Feather River,1436,South Feather Water and Power Agency,3237


### EIA 860 Generators

In [27]:
%%time
pudl_out.gens_eia860().sample(n_samples)

Filling technology type
Filled technology_type coverage now at 98.1%
CPU times: user 52.6 s, sys: 3.19 s, total: 55.8 s
Wall time: 59.5 s


Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,generator_id,associated_combined_heat_power,balancing_authority_code_eia,balancing_authority_name_eia,bga_source,bypass_heat_recovery,capacity_mw,carbon_capture,city,cofire_fuels,county,current_planned_operating_date,data_source,deliver_power_transgrid,distributed_generation,duct_burners,energy_source_1_transport_1,energy_source_1_transport_2,energy_source_1_transport_3,energy_source_2_transport_1,energy_source_2_transport_2,energy_source_2_transport_3,energy_source_code_1,energy_source_code_2,energy_source_code_3,energy_source_code_4,energy_source_code_5,energy_source_code_6,ferc_cogen_status,ferc_exempt_wholesale_generator,ferc_small_power_producer,fluidized_bed_tech,fuel_type_code_pudl,fuel_type_count,grid_voltage_2_kv,grid_voltage_3_kv,grid_voltage_kv,iso_rto_code,latitude,longitude,minimum_load_mw,multiple_fuels,nameplate_power_factor,operating_date,operating_switch,operational_status,operational_status_code,original_planned_operating_date,other_combustion_tech,other_modifications_date,other_planned_modifications,owned_by_non_utility,ownership_code,planned_derate_date,planned_energy_source_code_1,planned_modifications,planned_net_summer_capacity_derate_mw,planned_net_summer_capacity_uprate_mw,planned_net_winter_capacity_derate_mw,planned_net_winter_capacity_uprate_mw,planned_new_capacity_mw,planned_new_prime_mover_code,planned_repower_date,planned_retirement_date,planned_uprate_date,previously_canceled,primary_purpose_id_naics,prime_mover_code,pulverized_coal_tech,reactive_power_output_mvar,retirement_date,rto_iso_lmp_node_id,rto_iso_location_wholesale_reporting_id,sector_id_eia,sector_name_eia,solid_fuel_gasification,startup_source_code_1,startup_source_code_2,startup_source_code_3,startup_source_code_4,state,stoker_tech,street_address,subcritical_tech,summer_capacity_estimate,summer_capacity_mw,summer_estimated_capability_mw,supercritical_tech,switch_oil_gas,syncronized_transmission_grid,technology_description,time_cold_shutdown_full_load_code,timezone,topping_bottoming_code,turbines_inverters_hydrokinetics,turbines_num,ultrasupercritical_tech,unit_id_pudl,uprate_derate_completed_date,uprate_derate_during_year,winter_capacity_estimate,winter_capacity_mw,winter_estimated_capability_mw,zip_code
181211,2015-01-01,54662,4166,Woodland Landfill Gas Recovery,54843,3798,WM Illinois Renewable Energy LLC,GEN4,False,PJM,"PJM Interconnection, LLC",,False,1.6,,South Elgin,False,Kane,NaT,eia860,,,False,,,,,,,LFG,,,,,,False,False,True,,waste,1,,,13.0,PJM,41.981004,-88.2975,0.8,,0.98,2010-05-01,,existing,OP,2010-06-01,,NaT,,,S,NaT,,,,,,,,,NaT,NaT,NaT,,22,IC,,,NaT,,,2,IPP Non-CHP,False,,,,,IL,,7 N. 500 Route 25,,,1.6,,,False,False,Landfill Gas,1H,America/Chicago,X,,,,,NaT,False,,1.6,,60177
423166,2004-01-01,10586,3297,Cameron Ridge,2818,7814,Cameron Ridge LLC,EXIS,False,CISO,California Independent System Operator,,False,59.6,,Mojave,,Kern,NaT,,True,False,False,,,,,,,WND,,,,,,False,False,True,,wind,1,,,66.6,CAISO,35.075,-118.3158,,,,1984-12-01,,existing,OP,NaT,,NaT,,True,S,NaT,,,,,,,,,NaT,NaT,NaT,,22,WT,,0.0,NaT,,,2,IPP Non-CHP,False,,,,,CA,,10315 Oak Creek Road,,False,59.6,,,,,Onshore Wind Turbine,,America/Los_Angeles,X,,114.0,,,NaT,,False,59.6,,93501
229451,2013-01-01,54661,4165,Pheasant Run Landfill Gas Rec,54842,3799,WM Renewable Energy LLC,GEN9,False,MISO,Midcontinent Independent Transmission System O...,,False,0.8,,Bristol,False,Kenosha,NaT,eia860,,,False,,,,,,,LFG,,,,,,False,False,True,,waste,1,,,28.0,MISO,42.5825,-88.0436,0.5,,0.98,2002-07-01,,existing,OP,2002-07-01,,NaT,,,S,NaT,,,,,,,,,NaT,NaT,NaT,,22,IC,,,NaT,we,we,2,IPP Non-CHP,False,,,,,WI,,19414 60th Street,,,0.8,,,False,False,Landfill Gas,1H,America/Chicago,X,,,,,NaT,False,,0.8,,53104
188024,2015-01-01,4100,2456,Arcadia Electric,765,909,Arcadia City of,1,False,MISO,Midcontinent Independent Transmission System O...,,False,1.3,,Arcadia,False,Trempealeau,NaT,eia860,,,False,,,,,,,DFO,,,,,,False,False,False,,oil,2,,,69.0,MISO,44.2524,-91.5034,1.0,,0.8,1956-01-01,,existing,SB,NaT,,NaT,,,S,NaT,,,,,,,,,NaT,NaT,NaT,,22,IC,,,NaT,,,1,Electric Utility,False,,,,,WI,,115 South Jackson,,,1.3,,,False,False,Petroleum Liquids,1H,America/Chicago,X,,,,,NaT,False,,1.3,,54612
11259,2021-01-01,56754,5215,Goat Wind LP,59883,2679,NRG Energy Gas & Wind Holdings Inc,2,False,ERCO,"Electric Reliability Council of Texas, Inc.",,False,69.6,,Sterling City,,Sterling,NaT,eia860m,,,False,,,,,,,WND,,,,,,False,True,False,,wind,1,,,34.5,ERCOT,31.951944,-95.665,,,,2009-06-01,,existing,(OP) Operating,NaT,,NaT,,,,NaT,,,,,,,,,NaT,NaT,NaT,,22,WT,,,NaT,ercot,ercot,2,IPP Non-CHP,,,,,,TX,,,,,69.6,,,,,Onshore Wind Turbine,,America/Chicago,X,,,,,NaT,,,69.6,,76951


### EIA 860 Generator-level Ownership

In [28]:
%%time
pudl_out.own_eia860().sample(n_samples)

CPU times: user 10.5 s, sys: 128 ms, total: 10.7 s
Wall time: 11.1 s


Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,generator_id,owner_utility_id_eia,owner_name,fraction_owned,owner_city,owner_state,owner_street_address,owner_zip_code
47010,2014-01-01,77,879,Eklutna,599,498,Municipality of Anchorage,1,599,Anchorage Municipal Light and Power,0.533,Anchorage,AK,1200 East First Avenue,99501.0
28566,2009-01-01,7858,2920,MEPI GT Facility,12159,187,Midwest Electric Power Inc,2,520,Ameren Energy Generating Co,1.0,,MO,,
3543,2002-01-01,6076,123,Colstrip,15298,3397,PP&L Montana LLC,GEN3,15500,Puget Sound Energy Inc,0.25,,,,
72792,2019-01-01,56987,5380,Hyland LFGTE Facility,50158,2100,Innovative Energy Systems Inc,GEN3,34466,Casella Waste Systems,1.0,Rutland,VT,25 Greens Hill Ln,5701.0
28979,2009-01-01,10751,3366,Camden Plant Holdings LLC,2904,4339,Camden Cogen LP,GEN2,50159,"Morris Energy Group, LLC",1.0,,NJ,,


### EIA 923 Generation and Fuel Consumption

In [29]:
%%time
pudl_out.gf_eia923().sample(n_samples)

report_date is object column. Converting to datetime.
report_date is object column. Converting to datetime.
CPU times: user 57.1 s, sys: 2.18 s, total: 59.3 s
Wall time: 1min 1s


Unnamed: 0,report_date,plant_id_eia,prime_mover_code,energy_source_code,utility_id_eia,plant_name_eia,plant_id_pudl,utility_id_pudl,fuel_type_code_pudl,utility_name_eia,fuel_type_code_aer,fuel_consumed_for_electricity_mmbtu,fuel_consumed_for_electricity_units,fuel_consumed_mmbtu,fuel_consumed_units,net_generation_mwh,fuel_mmbtu_per_unit
737242,2008-03-01,50006,CT,WO,3890,Linden Cogen,3437,1597,oil,Cogen Technologies Linden Vent,WOO,161819.0,37114.0,244400.0,56055.0,21445.942,4.36
1205913,2012-03-01,3775,ST,BIT,733,Clinch River,115,29,coal,Appalachian Power Co,COL,625822.0,24959.0,625822.0,24959.0,51420.433,25.074
1811090,2016-09-01,1188,IC,DFO,18177,Story City,1703,1212,oil,Story City City of,DFO,116.0,20.0,116.0,20.0,10.258,5.8
486587,2005-12-01,54096,ST,WDS,9393,Riverdale Mill,3976,2123,waste,International Paper Co-Riverdl,WWW,38976.98,4283.18,379206.1,41671.0,7916.74812,9.1
203514,2003-04-01,55091,CS,NG,739,Midlothian Energy Facility,4365,6822,gas,IPA Operations Inc,NG,2240156.0,2213593.0,2240156.0,2213593.0,301905.0,1.01


### EIA 923 Fuel Receipts and Costs

In [30]:
%%time
pudl_out.frc_eia923().sample(n_samples)

CPU times: user 26.9 s, sys: 158 ms, total: 27.1 s
Wall time: 28.4 s


Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,ash_content_pct,chlorine_content_ppm,coalmine_county_id_fips,contract_expiration_date,contract_type_code,energy_source_code,fuel_consumed_mmbtu,fuel_cost_from_eiaapi,fuel_cost_per_mmbtu,fuel_group_code,fuel_mmbtu_per_unit,fuel_received_units,fuel_type_code_pudl,mercury_content_ppm,mine_id_msha,mine_name,mine_state,mine_type_code,moisture_content_pct,natural_gas_delivery_contract_type_code,natural_gas_transport_code,primary_transportation_mode_code,secondary_transportation_mode_code,sulfur_content_pct,supplier_name,total_fuel_cost
563265,2020-10-01,2721,113,James E. Rogers Energy Complex,5416,90,Duke Energy Corp,0.0,,,NaT,S,NG,294428.59,False,2.92,natural_gas,1.03,285853.0,gas,0.0,,,,,,firm,firm,PL,,0.0,mercuria,859731.5
537851,2020-01-01,7972,565,Sumpter,20910,368,Wolverine Pwr Supply Coop Inc,0.0,,,NaT,S,NG,20843.968,False,1.796,natural_gas,1.072,19444.0,gas,0.0,,,,,,interruptible,interruptible,PL,,0.0,sequent,37435.77
510789,2019-05-01,889,1608,Baldwin,5517,1554,Dynegy Midwest Generation Inc,0.0,,,NaT,S,DFO,6130.6,False,,petroleum,5.8,1057.0,oil,0.0,,,,,,,,TR,,0.0,goldstar,
432900,2017-05-01,1082,631,Council Bluffs,12341,185,MidAmerican Energy Co,4.1,0.0,56005.0,2018-12-01,C,SUB,552496.347,False,1.392,coal,16.991,32517.0,coal,0.0,4800732.0,belle ayr mine,WY,S,30.48,,,RR,,0.22,contura energy inc.,769074.9
522083,2019-08-01,7846,2909,Brandy Branch,9617,2158,JEA,0.0,,,NaT,S,NG,2537633.065,False,2.742,natural_gas,1.045,2428357.0,gas,0.0,,,,,,interruptible,interruptible,PL,,0.0,southern natural gas,6958190.0


### EIA 923 Boiler Fuel Consumption

In [31]:
%%time
pudl_out.bf_eia923().sample(n_samples)

report_date is object column. Converting to datetime.
CPU times: user 31.9 s, sys: 1.41 s, total: 33.3 s
Wall time: 36 s


Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,boiler_id,unit_id_pudl,ash_content_pct,energy_source_code,fuel_consumed_mmbtu,fuel_consumed_units,fuel_mmbtu_per_unit,fuel_type_code_pudl,sulfur_content_pct
586553,2014-10-01,50385,3607,Newark Bay Cogen,55846,2594,Newark Bay Cogeneration Partnership,GEN2,1.0,0.0,KER,0.0,0.0,0.0,oil,0.0
341158,2012-03-01,2062,1958,Henderson,7651,1938,Greenwood Utilities Comm,H1,1.0,0.0,NG,0.0,0.0,0.0,gas,0.0
961557,2017-02-01,55470,4544,Green Power 2,17566,3239,South Houston Green Power LLC,TR1,1.0,0.0,NG,1167180.678,1137603.0,1.026,gas,0.0
41321,2008-07-01,10043,3073,Logan,14932,3613,PG&E Operating Service Co,B01,,0.0,DFO,1060.2,186.0,5.7,oil,0.11
430471,2013-12-01,2408,1174,Mercer,15147,268,PSEG Fossil LLC,2,2.0,0.0,NG,6070.224,5882.0,1.032,gas,0.0


### EIA 923 Net Generation by Generator

In [32]:
%%time
pudl_out.gen_eia923().sample(n_samples)

report_date is object column. Converting to datetime.
CPU times: user 19.9 s, sys: 431 ms, total: 20.3 s
Wall time: 21.6 s


Unnamed: 0,report_date,plant_id_eia,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,generator_id,net_generation_mwh,unit_id_pudl
377821,2017-04-01,1710,91,J H Campbell,4254,81,Consumers Energy Company,3,166745.0,3
439937,2018-08-01,10397,3212,Indiana Harbor,56165,519,Cleveland Cliffs,GEN9,17309.65,1
58682,2009-09-01,10670,8126,Deepwater,156,3844,AES Deepwater Inc,GEN1,85473.0,1
423545,2018-08-01,1016,1645,Butler Warner Gen,6235,2920,Fayetteville Public Works Comm,9,-125.0,1
195327,2013-06-01,6073,146,Victor J Daniel Jr,12686,190,Mississippi Power Co,4ST,122065.0,4


## FERC Form 1
* Only a small subset of the 100+ tables that exist in the original FERC Form 1 have been cleaned and included in the PUDL DB.
* For tables not included here, you'll need to access the cloned multi-year FERC 1 DB that we produce. See the first tutorial notebook for more information.

In [33]:
# All of the FERC Form 1 tables end with _ferc1
tables_ferc1 = [
    t for t in methods_pudl_out 
    if '_ferc1' in t 
]
tables_ferc1

['all_plants_ferc1',
 'fbp_ferc1',
 'fuel_ferc1',
 'plant_in_service_ferc1',
 'plants_hydro_ferc1',
 'plants_pumped_storage_ferc1',
 'plants_small_ferc1',
 'plants_steam_ferc1',
 'pu_ferc1',
 'purchased_power_ferc1']

### FERC 1 Large Steam Plants
The large steam plants report detailed operating expenses in this table, as well as operational characteristics.

In [34]:
%%time
pudl_out.plants_steam_ferc1().sample(n_samples)

CPU times: user 1.36 s, sys: 23.5 ms, total: 1.38 s
Wall time: 1.54 s


Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,plant_id_pudl,plant_id_ferc1,plant_name_ferc1,asset_retirement_cost,avg_num_employees,capacity_factor,capacity_mw,capex_equipment,capex_land,capex_per_mw,capex_structures,capex_total,construction_type,construction_year,installation_year,net_generation_mwh,not_water_limited_capacity_mw,opex_allowances,opex_boiler,opex_coolants,opex_electric,opex_engineering,opex_fuel,opex_fuel_per_mwh,opex_misc_power,opex_misc_steam,opex_nonfuel,opex_nonfuel_per_mwh,opex_operations,opex_per_mwh,opex_plants,opex_production_total,opex_rents,opex_steam,opex_steam_other,opex_structures,opex_transfer,peak_demand_mw,plant_capability_mw,plant_hours_connected_while_generating,plant_type,record_id,water_limited_capacity_mw
23127,2015,132,243,Otter Tail Power Company,314,474,lake preston,,,0.000686,24.1,3891658.0,12339.0,171528.3,229834.0,4133831.0,conventional,1978.0,1978.0,144.917,20.0,,,,22716.0,6041.0,19949.0,137.658108,2005.0,,90150.0,622.080225,8991.0,759.7,39685.0,110099.0,,,,10712.0,,21.0,20.0,19.0,combustion_turbine,f1_steam_2015_12_132_0_5,20.0
23891,2015,17,97,"Duke Energy Progress, Inc.",515,82,roxboro,375452272.0,285.0,0.401913,2558.2,1810907000.0,8105075.0,943760.3,219863582.0,2414328000.0,outdoor,1966.0,1980.0,9006794.985,2462.0,422797.0,32399683.0,,16782.0,9354713.0,320964615.0,35.63583,13440304.0,7872980.0,100422299.0,11.149615,7422447.0,46.8,14687567.0,421386914.0,,15488996.0,,-683970.0,,2495.0,,8217.0,steam,f1_steam_2015_12_17_1_1,2439.0
579,1994,193,363,Wisconsin Electric Power Company,128,2029,concord-unit 1,,,1.9e-05,95.4,23638600.0,355666.0,264547.1,1243531.0,25237800.0,conventional,1993.0,,15.612,95.0,,,,165121.0,17524.0,591785.0,37905.777607,17583.0,822.0,252201.0,16154.304381,14404.0,54060.1,36541.0,843986.0,,,,206.0,,80.0,,2189.0,combustion_turbine,f1_steam_1994_12_193_7_5,83.0
18197,2010,89,171,Madison Gas and Electric Company,184,1673,elm road,95960.0,146.0,0.269529,51.23,101435400.0,,2393177.4,21071121.0,122602500.0,conventional,2010.0,2010.0,120957.8,51.0,,412805.0,,,367283.0,3718422.0,30.741482,1036105.0,97118.0,2917785.0,24.122339,121721.0,54.9,150552.0,6636207.0,398340.0,95200.0,,238661.0,,,,3750.0,steam,f1_steam_2010_12_89_2_3,51.0
1424,1995,195,365,Wisconsin Public Service Corporation,500,964,pulliam 5,,,0.312457,50.0,18914430.0,55050.0,433376.3,2699335.0,21668820.0,conventional,1949.0,1949.0,136856.0,50.0,,840563.0,,60059.0,,2468127.0,18.034481,14995.0,13679.0,,,,0.0,63155.0,,,149077.0,,5137.0,,,,5476.0,steam,f1_steam_1995_12_195_0_3,50.0


### FERC 1 Fuel
Fuel consumption by the large steam plants, broken down by plant and fuel type.

In [35]:
%%time
pudl_out.fuel_ferc1().sample(n_samples)

CPU times: user 608 ms, sys: 8.09 ms, total: 616 ms
Wall time: 721 ms


Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,plant_id_pudl,plant_name_ferc1,fuel_consumed_mmbtu,fuel_consumed_total_cost,fuel_consumed_units,fuel_cost_per_mmbtu,fuel_cost_per_unit_burned,fuel_cost_per_unit_delivered,fuel_mmbtu_per_unit,fuel_type_code_pudl,fuel_units,record_id
24389,2019,281,150,Interstate Power and Light Company,352,marshalltown,260340.6,614527.6,238189.0,2.36,2.58,2.58,1.093,gas,mcf,f1_fuel_2019_12_281_3_3
17943,2005,182,161,KCP&L Greater Missouri Operations Company,1155,kci,1106.42,17014.03,1129.0,15.387,15.07,15.07,0.98,gas,mcf,f1_fuel_2005_12_182_0_11
1797,2017,44,89,DTE Electric Company,419,monroe,158297900.0,336267400.0,8258445.0,2.121,40.718,38.673,19.168,coal,ton,f1_fuel_2017_12_44_0_10
7324,2014,177,334,UNION ELECTRIC COMPANY,518,rush island,79768950.0,172853200.0,4716707.0,2.167,36.647,37.439,16.912,coal,ton,f1_fuel_2014_12_177_0_7
12496,1994,57,123,Georgia Power Company,250,harllee branch,68581630.0,119588100.0,2781765.0,1.74,42.99,42.99,24.654,coal,ton,f1_fuel_1994_12_57_1_10


### FERC 1 Fuel by Plant
Wide-form aggregated fuel totals by plant and year, identifying the relative cost and heat content proportions of different fuels, as well as the primary fuel for the plant.

In [36]:
%%time
pudl_out.fbp_ferc1().sample(n_samples)

CPU times: user 1.38 s, sys: 16.2 ms, total: 1.4 s
Wall time: 1.63 s


Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,plant_id_pudl,plant_name_ferc1,coal_fraction_cost,coal_fraction_mmbtu,fuel_cost,fuel_mmbtu,gas_fraction_cost,gas_fraction_mmbtu,nuclear_fraction_cost,nuclear_fraction_mmbtu,oil_fraction_cost,oil_fraction_mmbtu,other_fraction_cost,other_fraction_mmbtu,primary_fuel_by_cost,primary_fuel_by_mmbtu,waste_fraction_cost,waste_fraction_mmbtu
18274,2006,194,364,Wisconsin Power and Light Company,553,s fond du lac u2&3,0.0,0.0,5598238.0,391906.1,0.999018,0.997569,0.0,0.0,0.000982,0.002431,0.0,0.0,gas,gas,0.0,0.0
12514,2002,146,273,Public Service Company of New Hampshire,415,newington,0.0,0.0,916308.5,164472.4,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,oil,oil,0.0,0.0
7815,2003,88,169,Louisville Gas and Electric Company,98,cane run,1.0,1.0,41109260.0,37282370.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,coal,coal,0.0,0.0
18818,2012,195,365,Wisconsin Public Service Corporation,470,weston 1,0.924999,0.94109,2759268.0,891413.5,0.075001,0.05891,0.0,0.0,0.0,0.0,0.0,0.0,coal,coal,0.0,0.0
11136,2006,134,246,PacifiCorp,277,hayden,0.996939,0.998697,9609353.0,6594147.0,0.003061,0.001075,0.0,0.0,0.0,0.000228,0.0,0.0,coal,coal,0.0,0.0


### FERC 1 Plant in Service
An accounting of how much electric plant infrastructure exists in each of the many FERC accounts. This is a very wide form table.

In [37]:
%%time
pudl_out.plant_in_service_ferc1().sample(n_samples)

CPU times: user 1.85 s, sys: 23.7 ms, total: 1.87 s
Wall time: 1.93 s


Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,record_id,amount_type,distribution_acct360_land,distribution_acct361_structures,distribution_acct362_station_equip,distribution_acct363_storage_battery_equip,distribution_acct364_poles_towers,distribution_acct365_overhead_conductors,distribution_acct366_underground_conduit,distribution_acct367_underground_conductors,distribution_acct368_line_transformers,distribution_acct369_services,distribution_acct370_meters,distribution_acct371_customer_installations,distribution_acct372_leased_property,distribution_acct373_street_lighting,distribution_acct374_asset_retirement,distribution_total,electric_plant_in_service_total,electric_plant_purchased_acct102,electric_plant_sold_acct102,experimental_plant_acct103,general_acct389_land,general_acct390_structures,general_acct391_office_equip,general_acct392_transportation_equip,general_acct393_stores_equip,general_acct394_shop_equip,general_acct395_lab_equip,general_acct396_power_operated_equip,general_acct397_communication_equip,general_acct398_misc_equip,general_acct399_1_asset_retirement,general_acct399_other_property,general_subtotal,general_total,hydro_acct330_land,hydro_acct331_structures,hydro_acct332_reservoirs_dams_waterways,hydro_acct333_wheels_turbines_generators,hydro_acct334_accessory_equip,hydro_acct335_misc_equip,hydro_acct336_roads_railroads_bridges,hydro_acct337_asset_retirement,hydro_total,intangible_acct301_organization,intangible_acct302_franchises_consents,intangible_acct303_misc,intangible_total,major_electric_plant_acct101_acct106_total,nuclear_acct320_land,nuclear_acct321_structures,nuclear_acct322_reactor_equip,nuclear_acct323_turbogenerators,nuclear_acct324_accessory_equip,nuclear_acct325_misc_equip,nuclear_acct326_asset_retirement,nuclear_total,other_acct340_land,other_acct341_structures,other_acct342_fuel_accessories,other_acct343_prime_movers,other_acct344_generators,other_acct345_accessory_equip,other_acct346_misc_equip,other_acct347_asset_retirement,other_total,production_total,rtmo_acct380_land,rtmo_acct381_structures,rtmo_acct382_computer_hardware,rtmo_acct383_computer_software,rtmo_acct384_communication_equip,rtmo_acct385_misc_equip,rtmo_total,steam_acct310_land,steam_acct311_structures,steam_acct312_boiler_equip,steam_acct313_engines,steam_acct314_turbogenerators,steam_acct315_accessory_equip,steam_acct316_misc_equip,steam_acct317_asset_retirement,steam_total,transmission_acct350_land,transmission_acct352_structures,transmission_acct353_station_equip,transmission_acct354_towers,transmission_acct355_poles,transmission_acct356_overhead_conductors,transmission_acct357_underground_conduit,transmission_acct358_underground_conductors,transmission_acct359_1_asset_retirement,transmission_acct359_roads_trails,transmission_total
20522,2003,189,15,AEP Texas North Company,f1_plant_in_srvce_2003_12_189_0,starting_balance,879989.0,70725.0,58159752.0,,115548222.0,66062397.0,6841240.0,19603101.0,87692307.0,31675545.0,30320063.0,14861188.0,86896.0,13684879.0,,445486304.0,1162890000.0,,,,2042723.0,27488359.0,9892817.0,7935796.0,393167.0,3889948.0,2402448.0,261688.0,31447516.0,780969.0,,,86535431.0,86535431.0,,,,,,,,,,21968.0,,23275664.0,23297632.0,1162890000.0,,,,,,,,,,52333.0,2542488.0,509877.0,,155264.0,1147.0,,3261109.0,353086940.0,,,,,,,,6497076.0,43383817.0,176362971.0,,88037254.0,25866999.0,9677714.0,,349825831.0,8613562.0,803281.0,94313391.0,527726.0,92572613.0,57649554.0,3319.0,,,,254483446.0
4175,2015,40,80,Consolidated Water Power Company,f1_plant_in_srvce_2015_12_40_0,ending_balance,4772.0,,294637.0,,54490.0,18232.0,,86760.0,34599.0,230.0,76132.0,,,,,569852.0,60916230.0,,,,33889.0,1304107.0,112757.0,,,1029288.0,,802571.0,33373.0,,,,3315985.0,3315985.0,2766451.0,1788655.0,12314326.0,8847725.0,4643311.0,452036.0,,,30812504.0,,1137568.0,,1137568.0,60916230.0,,,,,,,,,,,,,,,,,,30812504.0,,,,,,,,,,,,,,,,,45282.0,909019.0,19115972.0,175595.0,3134207.0,1674742.0,,,21469.0,4039.0,25080325.0
18732,1994,171,21,Alcoa Power Generating Inc.,f1_plant_in_srvce_1994_12_171_0,ending_balance,,,,,,,,,,,,,,,,,71361550.0,,,,,135258.0,491736.0,611513.0,6988.0,264166.0,258822.0,297642.0,3611638.0,97889.0,,,5775652.0,5775652.0,2780403.0,3750786.0,24602139.0,8301787.0,2354760.0,2652406.0,201761.0,,44644042.0,,,1715766.0,1715766.0,71361550.0,,,,,,,,,,,,,,,,,,44644042.0,,,,,,,,,,,,,,,,,103121.0,346510.0,16951832.0,1087230.0,,691797.0,,,,45602.0,19226092.0
24061,2006,274,186,"Midcontinent Independent System Operator, Inc",f1_plant_in_srvce_2006_12_274_0,retirements,,,,,,,,,,,,,,,,,3301227.0,,,,,,,,,,,,,68836.0,,,68836.0,68836.0,,,,,,,,,,,,,,3301227.0,,,,,,,,,,,,,,,,,,,,,2040113.0,1192278.0,,,3232391.0,,,,,,,,,,,,,,,,,,,,
19886,2017,183,344,"Vermont Electric Power Company, Inc.",f1_plant_in_srvce_2017_12_183_0,starting_balance,,,,,,,,,,,,,,,,,569526.0,,,,,,,569526.0,,,,,,,,,569526.0,569526.0,,,,,,,,,,,,,,569526.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


### FERC 1 Purchased Power
A summary of electricity market transactions between utilities. Sadly the sellers are identified only by their names, and not their FERC Utility (Respondent) ID.

In [38]:
%%time
pudl_out.purchased_power_ferc1().sample(n_samples)

CPU times: user 2.44 s, sys: 32.5 ms, total: 2.47 s
Wall time: 2.57 s


Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,seller_name,record_id,billing_demand_mw,coincident_peak_demand_mw,delivered_mwh,demand_charges,energy_charges,non_coincident_peak_demand_mw,other_charges,purchase_type_code,purchased_mwh,received_mwh,tariff,total_settlement
181108,2000,266,143,Illinois Power Generating Company,PP&L Energy Plus LLC,f1_purchased_pwr_2000_12_266_5_9,,,0.0,0.0,78775.0,,0.0,OS,1450.0,0.0,,78775.0
43047,2013,79,159,Kansas City Power & Light Company,Louisiana Energy and Power Authority,f1_purchased_pwr_2013_12_79_2_1,,,0.0,0.0,1629.0,,0.0,OS,31.0,0.0,"WSPP, Sch A",1629.0
50729,2018,30,70,"Cleveland Electric Illuminating Company, The",FirstEnergy Solutions Corp.,f1_purchased_pwr_2018_12_30_0_1,,,0.0,0.0,24591847.0,,0.0,RQ,494339.0,0.0,,24591847.0
127989,2007,107,320,The Narragansett Electric Company,HESS,f1_purchased_pwr_2007_12_107_1_3,,,0.0,0.0,12517973.0,,0.0,RQ,147267.0,0.0,1,12517973.0
92904,2016,161,295,Southern California Edison Company,"GFP ETHANOL, LLC DBA CALGREN RENEW FU",f1_purchased_pwr_2016_12_161_18_11,,,0.0,0.0,1059404.0,,-9000.0,OS,18603.0,0.0,,1050404.0


## Free Memory
Again, because we're on a JupyterHub with limited RAM per user, we need to delete the cached dataframes we've just created.

In [39]:
del pudl_out

# Analysis Outputs
* The PUDL Database is mainly meant to standardize the structure of data that's been reported in different ways over different years, so that it can all be used together.
* We typically don't include calculated values or big modifications to the original data.
* We're compiling a growing library of stock analyses in the `pudl.analysis` subpackage, which operate on data stored in the database.
* Some of these analytical outputs are build into the output object so that they can take advantage of the dataframe caching, and for convenient access.

## The Marginal Cost of Electricity (MCOE)
* One of our first analysis modules calculates fuel costs, heat rates, and capacity factors on a generator by generator basis.
* The long term goal is for it to provide a comprehensive marginal cost of electricity production (MCOE).
* The integration of operating costs from FERC Form 1 is still a work in progress, and hasn't been added in here yet.

### MCOE Requires Aggregation
* Fuel costs and other data need to be aggregated by month or year to calculate MCOE.
* This means we need an output object that aggregates by month or year.
* Because a single `NA` value can wipe out a whole aggregated category, you'll get more information with a monthly aggregation, but it currently takes more memory than the JupyterHub has access to.

In [40]:
pudl_out_annual = pudl.output.pudltabl.PudlTabl(
    pudl_engine=pudl_engine,
    freq="AS",
    fill_fuel_cost=True,
    roll_fuel_cost=True,
)

### Heat Rate by Generation Unit (MMBTU/MWh)
* A "Generation Unit" (identifyed by `unit_id_pudl` here) is a group of "boilers" (where fuel is consumed) and "generators" (where electricity is made) which are connected to each other.
* Because the fuel inputs and electricity outputs are comingled, this is the most granular level at which a direct heat rate calculation can be done.

In [41]:
%%time
pudl_out_annual.hr_by_unit().sample(n_samples)

CPU times: user 1min 45s, sys: 695 ms, total: 1min 45s
Wall time: 1min 56s


Unnamed: 0,report_date,plant_id_eia,unit_id_pudl,net_generation_mwh,fuel_consumed_mmbtu,heat_rate_mmbtu_mwh
5704,2010-01-01,55451,1,239136.0,488810.19,2.044068
25032,2019-01-01,2682,1,32895.0,192408.313,5.849166
418,2008-01-01,1831,2,213456.0,2997209.7,14.041347
26406,2019-01-01,56786,1,181335.0,4522852.7,24.941973
10916,2013-01-01,642,2,1421.0,44931.563,31.619678


### Heat Rate by Generator (mmBTU/MWh)
* However, we do need per-generator heat rates to estimate per-generator fuel costs.

In [42]:
%%time
pudl_out_annual.hr_by_gen().sample(n_samples)

Filling technology type
Filled technology_type coverage now at 98.1%


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[date_col_name] = pd.to_datetime(df[date_col_name])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.loc[:, "year_temp_for_merge"] = df[date_col_name].dt.year


CPU times: user 54 s, sys: 2.69 s, total: 56.7 s
Wall time: 58.4 s


Unnamed: 0,report_date,plant_id_eia,unit_id_pudl,generator_id,heat_rate_mmbtu_mwh,fuel_type_code_pudl,fuel_type_count
41543,2020-01-01,55086,1,STG,8.662418,gas,1
23818,2018-01-01,10426,1,GEN3,41.902305,waste,1
7639,2011-01-01,50051,1,GEN1,18.805634,waste,1
26423,2020-01-01,56,3,3,,coal,2
18118,2015-01-01,52072,1,GEN3,97.965154,coal,1


### Per-generator Fuel Costs
* Calculate per-generator fuel costs based on heat rates and fuel deliveries
* Because we told the `pudl_out` object to try and fill in missing values, this will request monthly average fuel cost data by date from the EIA API. It might take a minute.
* This also means you'll need to have set your EIA API Key at the top of the notebook.

In [43]:
%%time
pudl_out_annual.fuel_cost().sample(n_samples)

filling in fuel cost NaNs EIA APIs monthly state averages
filling in fuel cost NaNs with rolling averages
CPU times: user 55.2 s, sys: 456 ms, total: 55.6 s
Wall time: 1min 7s


Unnamed: 0,report_date,plant_id_eia,generator_id,unit_id_pudl,plant_name_eia,plant_id_pudl,utility_id_eia,utility_name_eia,utility_id_pudl,fuel_type_count,fuel_type_code_pudl,fuel_cost_from_eiaapi,fuel_cost_per_mmbtu,heat_rate_mmbtu_mwh,fuel_cost_per_mwh
31624,2018-01-01,613,ST4,1,Lauderdale,321,6452,Florida Power & Light Co,121,1,gas,False,4.517667,8.346676,37.707501
33278,2020-01-01,7242,1CA,1,Polk,471,18454,Tampa Electric Co,313,1,gas,False,2.698441,7.186658,19.392777
28629,2016-01-01,1404,7C,2,Sterlington,560,11241,Entergy Louisiana Inc,107,1,gas,False,3.184761,-64.928654,-206.782234
1329,2009-01-01,3630,2,2,Pearsall,2375,17583,South Texas Electric Coop Inc,3246,1,gas,False,4.259993,20.056803,85.44183
19067,2016-01-01,2291,4,4,North Omaha,2041,14127,Omaha Public Power District,2740,2,coal,False,1.281037,10.959724,14.039811


### Per-generator Capacity Factor

In [44]:
%%time
pudl_out_annual.capacity_factor().sample(n_samples)

CPU times: user 505 ms, sys: 65 µs, total: 505 ms
Wall time: 545 ms


Unnamed: 0,report_date,plant_id_eia,generator_id,net_generation_mwh,capacity_mw,capacity_factor
6913,2010-01-01,2840,6,1851251.0,443.9,0.476076
41719,2019-01-01,55047,CTG3,968043.0,185.0,0.597336
36410,2018-01-01,7343,4,3090593.0,695.9,0.50698
42803,2019-01-01,60100,G-1,0.0,11.5,0.0
22179,2014-01-01,55480,U2,,169.8,


### Per-generator MCOE
* This function uses the cached dataframes that were generated above to produce a huge table of per-generator statistics.
* If you just called this function alone, all of those other dataframes would be automatically generated, and available within the output object.

In [45]:
%%time
pudl_out_annual.mcoe().sample(n_samples)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[date_col_name] = pd.to_datetime(df[date_col_name])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.loc[:, "year_temp_for_merge"] = df[date_col_name].dt.year


CPU times: user 1.78 s, sys: 95 µs, total: 1.78 s
Wall time: 1.85 s


Unnamed: 0,plant_id_eia,generator_id,report_date,unit_id_pudl,plant_id_pudl,plant_name_eia,utility_id_eia,utility_id_pudl,utility_name_eia,fuel_type_code_pudl,capacity_factor,fuel_cost_from_eiaapi,fuel_cost_per_mmbtu,fuel_cost_per_mwh,heat_rate_mmbtu_mwh,net_generation_mwh,total_fuel_cost,total_mmbtu
8268,6011,2,2001-01-01,,2495,Calvert Cliffs,2876,4338,Calvert Cliffs Nuclear PP LLC,nuclear,,,,,,,,
441172,7970,9,2020-01-01,,7894,State St Generating,12807,2432,Michigan South Central Power Agency,oil,,,,,,,,
457681,62908,BA,2020-01-01,,13111,Anoka BESS,62731,6641,"Gopher Energy Storage, LLC",other,,,,,,,,
3659,1716,3,2001-01-01,,799,Rogers,4254,81,Consumers Energy Company,hydro,,,,,,,,
125219,6440,16,2007-01-01,,2629,Wilson Dam,18642,3430,Tennessee Valley Authority,hydro,,,,,,,,


## Free Memory

In [46]:
del pudl_out_annual

# Preliminary Output Tables 
* Integrating a new dataset into the PUDL database requires many steps (datastore, extract, transform, load, outputs).
* Sometimes we need to use tables from new datasets as soon as possible for analysis.
* The interim extract and transform steps can be hacked into the output object to run on the fly, prior to DB integration.
* The data extraction and transformation can take a while though -- and it will need to be re-run from scratch every time you create a new output object.
* **WARNING:** None of this data has been fully validated, and the structure is likely to change. Some of it (especially the FERC 714) is still in a pretty raw state.

As of December 2020, we have preliminarily integrated EIA 861 and FERC 714 in this format.

## EIA Form 861
* The interim EIA 861 ETL is set up to automatically run in its entirety as soon as you request any EIA 861 table.
* This should take 2-5 minutes if you already have the raw input data avaialble.
* If raw input data needs to be downloaded [from our Zenodo archives](https://zenodo.org/record/4127029) first (which should happen automatically), it will take longer.

In [47]:
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine=pudl_engine)

In [48]:
# here are all of the EIA 861 tables
methods_eia861 = [t for t in methods_pudl_out if '_eia861' in t and "etl" not in t]
methods_eia861

['advanced_metering_infrastructure_eia861',
 'balancing_authority_assn_eia861',
 'balancing_authority_eia861',
 'demand_response_eia861',
 'demand_response_water_heater_eia861',
 'demand_side_management_ee_dr_eia861',
 'demand_side_management_misc_eia861',
 'demand_side_management_sales_eia861',
 'distributed_generation_fuel_eia861',
 'distributed_generation_misc_eia861',
 'distributed_generation_tech_eia861',
 'distribution_systems_eia861',
 'dynamic_pricing_eia861',
 'energy_efficiency_eia861',
 'green_pricing_eia861',
 'mergers_eia861',
 'net_metering_customer_fuel_class_eia861',
 'net_metering_misc_eia861',
 'non_net_metering_customer_fuel_class_eia861',
 'non_net_metering_misc_eia861',
 'operational_data_misc_eia861',
 'operational_data_revenue_eia861',
 'reliability_eia861',
 'sales_eia861',
 'service_territory_eia861',
 'utility_assn_eia861',
 'utility_data_misc_eia861',
 'utility_data_nerc_eia861',
 'utility_data_rto_eia861']

### EIA 861 Balancing Authorities

In [49]:
%%time
pudl_out.balancing_authority_eia861().sample(n_samples)

Running the interim EIA 861 ETL process!
Extracting eia861 spreadsheet data.


The data has not yet been validated, and the structure may change.


Transforming raw EIA 861 DataFrames for advanced_metering_infrastructure_eia861 concatenated across all years.
Tidying the EIA 861 Advanced Metering Infrastructure table.
Transforming raw EIA 861 DataFrames for balancing_authority_eia861 concatenated across all years.
Started with 37622 missing BA Codes out of 39290 records (95.75%)
Ended with 12674 missing BA Codes out of 39290 records (32.26%)
Transforming raw EIA 861 DataFrames for demand_response_eia861 concatenated across all years.
Dropped 0 duplicate records from EIA 861 Demand Response Water Heater table, out of a total of 3497 records (0.0000% of all records). 
Tidying the EIA 861 Demand Response table.
Dropped 0 duplicate records from EIA 861 Demand Response table, out of a total of 14020 records (0.0000% of all records). 
Performing value transformations on EIA 861 Demand Response table.
Transforming raw EIA 861 DataFrames for demand_side_management_eia861 concatenated across all years.
The following reported NERC regions ar

  key_col = Index(lvals).where(~mask_left, rvals)


Building an EIA 861 Util-State-Date association table.
Completing normalization of balancing_authority_eia861.
CPU times: user 3min 8s, sys: 928 ms, total: 3min 9s
Wall time: 3min 22s


Unnamed: 0,report_date,balancing_authority_id_eia,balancing_authority_code_eia,balancing_authority_name_eia
11374,2004-01-01,10620,,"Lake Worth, City of"
20,2001-01-01,13501,NYIS,ISO New York
39111,2020-01-01,3522,CEA,Chugach Electric Assn Inc
38,2001-01-01,13407,NEVP,Nevada Power Co
3031,2002-01-01,15248,PGE,Portland General Electric Co


### EIA 861 Advanced Metering Infrastructure

In [50]:
%%time
pudl_out.advanced_metering_infrastructure_eia861().sample(n_samples)

CPU times: user 2.91 ms, sys: 0 ns, total: 2.91 ms
Wall time: 2.75 ms


Unnamed: 0,utility_id_eia,state,balancing_authority_code_eia,report_date,entity_type,short_form,utility_name_eia,customer_class,advanced_metering_infrastructure,automated_meter_reading,daily_digital_access_customers,direct_load_control_customers,energy_served_ami_mwh,home_area_network,non_amr_ami
43919,18546,MA,UNK,2013-01-01,,True,Town of Templeton - (MA),transportation,,,,,,,
42724,15982,VA,UNK,2013-01-01,,True,Town of Richlands - (VA),transportation,0.0,0.0,,,,,
72277,11458,MS,TVA,2016-01-01,,,City of Macon - (MS),residential,0.0,0.0,0.0,0.0,0.0,0.0,930.0
48590,7715,MA,UNK,2014-01-01,,True,Town of Groton - (MA),commercial,432.0,,,,27312.0,,
55692,1172,IA,MISO,2015-01-01,,True,Bancroft Municipal Utilities,residential,,398.0,,,,,


### EIA 861 Sales
How much electricity did utilities report selling to different types of customers in each year by state?

In [51]:
%%time
pudl_out.sales_eia861().sample(n_samples)

CPU times: user 9.71 ms, sys: 0 ns, total: 9.71 ms
Wall time: 9.39 ms


Unnamed: 0,utility_id_eia,state,report_date,balancing_authority_code_eia,business_model,data_observed,entity_type,service_type,short_form,utility_name_eia,customer_class,customers,sales_mwh,sales_revenue
82999,18454,FL,2004-01-01,UNK,retail,True,Investor Owned,bundled,,Tampa Electric Co,industrial,1299.0,2555667.0,165978000.0
193686,18941,OH,2009-01-01,UNK,retail,True,Municipal,bundled,,City of Tipp City,commercial,505.0,22865.0,2162000.0
24859,3611,HI,2002-01-01,UNK,retail,True,Investor Owned,bundled,,Citizens Communications Co,industrial,93.0,131244.0,26118000.0
162767,10620,FL,2008-01-01,UNK,retail,True,Municipal,bundled,,City of Lake Worth,transportation,0.0,0.0,0.0
201993,4100,CA,2010-01-01,UNK,retail,True,Retail Power Marketer,energy,,"Commerce Energy, Inc.",residential,13815.0,119825.0,10065700.0


### EIA 861 Service Territories
Which counties (with FIPS codes) each utility reported serving in each year.

In [52]:
%%time
pudl_out.service_territory_eia861().sample(n_samples)

CPU times: user 5.36 ms, sys: 24 µs, total: 5.39 ms
Wall time: 5.03 ms


Unnamed: 0,county,short_form,state,utility_id_eia,utility_name_eia,report_date,state_id_fips,county_id_fips
147741,Knox,,OH,14006,Ohio Power Co,2013-01-01,39,39083.0
72010,Douglas,,WA,3413,PUD No 1 of Chelan County,2007-01-01,53,53017.0
188902,Sumter,,AL,195,Alabama Power Co,2017-01-01,1,1119.0
41803,Renville,,ND,13694,"North Central Elec Coop, Inc",2004-01-01,38,38075.0
16933,Marshall,,ID,244,Albion City of,2002-01-01,16,


### Free Memory

In [53]:
del pudl_out

## FERC Form 714
* **NOTE:** Most of the FERC Form 714 tables have not yet been fully processed.
* We have primarily been focused on the historical hourly demand reported by planning areas.
* As with the EIA 861, the full interim ETL will be run as soon as you ask for any FERC 714 table.
* Also as with the EIA 861, if you don't have the [raw FERC 714 input files](https://zenodo.org/record/4127101) cached locally already, they might take a minute to download.

In [54]:
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine=pudl_engine)

In [55]:
# here are all of the FERC 714 tables
methods_ferc714 = [t for t in methods_pudl_out if '_ferc714' in t and "etl" not in t]
methods_ferc714

['adjacency_ba_ferc714',
 'demand_forecast_pa_ferc714',
 'demand_hourly_pa_ferc714',
 'demand_monthly_ba_ferc714',
 'description_pa_ferc714',
 'gen_plants_ba_ferc714',
 'id_certification_ferc714',
 'interchange_ba_ferc714',
 'lambda_description_ferc714',
 'lambda_hourly_ba_ferc714',
 'net_energy_load_ba_ferc714',
 'respondent_id_ferc714']

### FERC 714 Respondents
Currently the processing of the hourly planning area demand table exceeds the available memory on this JupyterHub, so the following cells are commented out.

In [56]:
%%time
respondent_id_ferc714 = pudl_out.respondent_id_ferc714()
respondent_id_ferc714.sample(5)

Running the interim FERC 714 ETL process!
Extracting demand_hourly_pa_ferc714 from CSV into pandas DataFrame.


The data has not yet been validated, and the structure may change.


Extracting respondent_id_ferc714 from CSV into pandas DataFrame.
Transforming demand_hourly_pa_ferc714.
Transforming respondent_id_ferc714.
CPU times: user 28.3 s, sys: 23.3 s, total: 51.6 s
Wall time: 1min


Unnamed: 0,respondent_id_ferc714,respondent_name_ferc714,eia_code
206,321,MISO,56669
171,282,Wisconsin Public Power Inc.,20858
84,191,Lakeland Electric,10623
117,226,Otter Tail Power Company,14232
15,113,KCP&L Greater Missouri Operations Company (For...,12698


### FERC 714 Hourly Demand by Planning Area

In [57]:
demand_hourly_pa_ferc714 = pudl_out.demand_hourly_pa_ferc714()
demand_hourly_pa_ferc714.sample(20)

Unnamed: 0,respondent_id_ferc714,report_date,utc_datetime,timezone,demand_mwh
7437157,124,2010-01-01,2010-04-03 16:00:00,America/New_York,763.0
4363398,183,2014-01-01,2014-01-28 11:00:00,America/New_York,976.0
14854680,231,2015-01-01,2015-10-19 05:00:00,America/Denver,316.0
22209,121,2006-01-01,2006-11-06 07:00:00,America/Denver,193.0
6361170,102,2015-01-01,2015-07-29 15:00:00,America/Chicago,9859.0
9869944,182,2006-01-01,2006-09-17 23:00:00,America/Los_Angeles,546.0
2415510,101,2015-01-01,2015-03-29 09:00:00,America/Chicago,810.8
965193,206,2011-01-01,2011-08-31 09:00:00,America/Los_Angeles,272.0
6223788,232,2012-01-01,2012-05-31 17:00:00,America/Los_Angeles,2428.0
10047073,119,2009-01-01,2009-08-01 23:00:00,America/Los_Angeles,1799.0


# Future Analyses
The output object contains a lot of different kinds of things, and as we accumulate more and different kinds of analyses in our library, we're looking to break them out into their own reusable classes that access the database directly. Some work in progress here is related to constructing historical service territory geometries for both utilities and balancing authorities, and associating that data usefully with the FERC 714 respondents. Exploring that in detail is beyond the scope of this notebook, but check out the `pudl.output.ferc714` and `pudl.analysis.service_territory` modules for examples. Unfortunately as this analysis currently depends on both the interim EIA 861 and the interim FERC 714 datasets, it uses too much memory to be run on the JupyterHub right now.

In [58]:
%%time
ferc714_out = pudl.output.ferc714.Respondents(pudl_out)
annualized = ferc714_out.annualize()
categorized = ferc714_out.categorize()
summarized = ferc714_out.summarize_demand()
fipsified = ferc714_out.fipsify()
counties_gdf = ferc714_out.georef_counties()

Running the interim EIA 861 ETL process!
Extracting eia861 spreadsheet data.


The data has not yet been validated, and the structure may change.


Transforming raw EIA 861 DataFrames for advanced_metering_infrastructure_eia861 concatenated across all years.
Tidying the EIA 861 Advanced Metering Infrastructure table.
Transforming raw EIA 861 DataFrames for balancing_authority_eia861 concatenated across all years.
Started with 37622 missing BA Codes out of 39290 records (95.75%)
Ended with 12674 missing BA Codes out of 39290 records (32.26%)
Transforming raw EIA 861 DataFrames for demand_response_eia861 concatenated across all years.
Dropped 0 duplicate records from EIA 861 Demand Response Water Heater table, out of a total of 3497 records (0.0000% of all records). 
Tidying the EIA 861 Demand Response table.
Dropped 0 duplicate records from EIA 861 Demand Response table, out of a total of 14020 records (0.0000% of all records). 
Performing value transformations on EIA 861 Demand Response table.
Transforming raw EIA 861 DataFrames for demand_side_management_eia861 concatenated across all years.
The following reported NERC regions ar

  key_col = Index(lvals).where(~mask_left, rvals)


Building an EIA 861 Util-State-Date association table.
Completing normalization of balancing_authority_eia861.
CPU times: user 3min 7s, sys: 6.36 s, total: 3min 14s
Wall time: 4min 18s
