# Working with PUDL Output Tables

There are many ways to access the data extracted and manipulated by pudl. You can query the SQL database where the **XX** data live **(link to that notebook)**, use Dask to access the CEMS parquet files, work with cleaned, csv versions of the raw files, or interact with pudl's various output table objects. This notebook provides a walk-through of the output tables, their contents, and their capabilities. 

See **HERE** for a broader walk-through of the different ways to access pudl data.


#### What are the output tables?

The database houses the bulk of the data, but it may not be immediately clear how it can or ought to be assembled for analysis. The output tables are select replications and manipulations of the raw data that use the cleaned and processed data from in the database. The database stores tidy data **(link to tidy data)**, and the output tables pipe that data into a more user-friendly, format. 

The output tables are not just “clean” versions of their raw inputs. Many of the dataframes closely mimic their raw counterparts, but most have undergone data transformations such as unit and aggregation alteration, column condensation, and more. Some of the tables present unique data manipulations, calculations, or inter-source conglomerations that cannot be "found" in a raw format. The intent of the output tables is to provide quick and intuitive access to relevant groups of data.

***NOTE:** This file presuposes access to ....

#### Notebook Contents:
* **<a href='#setup'>Setup</a>**
* **<a href='#access'>Accessing the Output Tables</a>**
* **<a href='#descip'>Table Descriptions</a>**
* **<a href='#ferc'>FERC Output Tables</a>**
    - table 1
    - table 2
* **<a href='#eia'>EIA Output Tables</a>**
    - table 1
    - table 2
* **<a href='#epa'>EPA Output Tables</a>**
    - table 1
    - table 2
* **<a href='#misc'>MISC. Output Tables</a>**
    - table 1
    - table 2

Missing stuff about functions

<a id='setup'></a>
## Setup

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
# Standard libraries
import logging
import sys
import os
import pathlib

# 3rd party libraries
import geopandas as gpd
import dask.dataframe as dd
from dask.distributed import Client
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd
import seaborn as sns
import sqlalchemy as sa

# Local libraries
import pudl

In [None]:
# Enable viewing of logging outputs
logger=logging.getLogger()
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter('%(message)s')
handler.setFormatter(formatter)
logger.handlers = [handler]

In [None]:
# Display settings
sns.set()
%matplotlib inline
mpl.rcParams['figure.dpi'] = 150
pd.options.display.max_columns = 100
pd.options.display.max_rows = 5

<a id='access'></a>
## Accessing the Output Tables

The output table objects are dataframes created by the pudl_engine at a given frequency. Whew. What that means is.....

Before accessing the output tables programatically, you need to specify at what frequency you'd like to **aggregate?** the data. **Certain tables only function with a given frequency?**. 

In [None]:
frequency = 'AS' # annual

In [None]:
# Establish connection to pudl database
from pudl.workspace.setup import PudlPaths

pudl_engine = sa.create_engine(PudlPaths().pudl_db)
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine, freq=frequency) #annual frequency

In [None]:
# Access a table by name as an object
pudl_out.plants_eia860()

<a id='descrip'></a>
## Table Descriptions

| Output Table Name | Description |
| :- | :- |
|<font color='red'>**FERC Output Tables**</font>|
| **plants_steam_ferc1** |Large thermal generating plants, as reported on page 402 of FERC Form 1. |
|**plants_hydro_ferc1**|Hydroelectric generating plant statistics for large plants. Large plants have an installed nameplate capacity of more than 10 MW. As reported on FERC Form 1, pages 406-407, and extracted from the f1_hydro table in FERC's FoxPro database.|
|**plants_small_ferc1**|Generating plant statistics for small plants, as reported on FERC Form 1 pages 410-411, and extracted from the FERC FoxPro database table f1_gnrt_plant. Small generating plants are defined by having nameplate capacity of less than 25MW for steam plants, and less than 10MW for internal combustion, conventional hydro, and pumped storage plants.|
|**plants_pumped_storage_ferc1**|FERC Form 1 Pumped Storage Table.|
|**plant_in_service_ferc1**|FERC Form 1 Plant in Service Table.|
|**fuel_ferc1**|Annual fuel consumed by large thermal generating plants. As reported on page 402 of FERC Form 1.|
|**purchased_power_ferc1**|Purchased Power (Account 555) including power exchanges (i.e. transactions involving a balancing of debits and credits for energy, capacity, etc.) and any settlements for imbalanced exchanges. Reported on pages 326-327 of FERC Form 1. Extracted from the f1_purchased_pwr table in FERC's FoxPro database.|
|**fbp_ferc1**|Calculates useful FERC Form 1 fuel metrics on a per plant-year basis. Each record in the FERC Form 1 corresponds to a particular type of fuel. Many plants -- especially coal plants -- use more than one fuel, with gas and/or diesel serving as startup fuels. In order to be able to classify the type of plant based on relative proportions of fuel consumed or fuel costs it is useful to aggregate these per-fuel records into a single record for each plant. Fuel cost (in nominal dollars) and fuel heat content (in mmBTU) are calculated for each fuel based on the cost and heat content per unit, and the number of units consumed, and then summed by fuel type (there can be more than one record for a given type of fuel in each plant because we are simplifying the fuel categories). The per-fuel records are then pivoted to create one column per fuel type. The total is summed and stored separately, and the individual fuel costs & heat contents are divided by that total, to yield fuel proportions.  Based on those proportions and a minimum threshold that's passed in, a "primary" fuel type is then assigned to the plant-year record and given a string label.|
|**pu_ferc1**|A dataframe of FERC plant-utility associations.|
|<font color='red'>**EIA Output Tables**</font>|
|**gens_eia860**|A dataframe describing generators, as reported in EIA 860.|
|**plants_eia860**|A dataframe of plant level info reported in EIA 860.|
|**utils_eia860**|A dataframe describing utilities reported in EIA 860.|
|**own_eia860**|A dataframe of generator level ownership data from EIA 860.|
|**bga_eia860**|The more complete EIA/PUDL boiler-generator associations.|
|**gen_eia923**|EIA 923 net generation data by generator.|
|**frc_eia923**|EIA 923 fuel receipts and costs data.|
|**gf_eia923**|EIA 923 generation and fuel consumption data.|
|**bf_eia923**|EIA 923 boiler fuel consumption data.|
|**pu_eia**|A dataframe of EIA plant-utility associations.|
|<font color='red'>**EPA Output Tables**</font>|
|**hourly_emissions_epacems**||
|**transmission_single_epaipm**||
|**transmission_joint_epaipm** ||
|**load_curves_epaipm**||
|**plant_region_map_epaipm** ||
|<font color='red'>**Conglomerate Output Tables**</font>|
|**bga**|More complete EIA/PUDL boiler-generator associations.|
|**hr_by_unit**|Calculate and return generation unit level heat rates.|
|**hr_by_gen**|Calculate and return generator level heat rates (mmBTU/MWh).|
|**fuel_cost**|Calculate and return generator level fuel costs per MWh.|
|**capacity_factor**|Calculate and return generator level capacity factors.|
|**mcoe**|Calculate and return generator level MCOE based on EIA data. Eventually this calculation will include non-fuel operating expenses as reported in FERC Form 1, but for now only the fuel costs reported to EIA are included. They are attibuted based on the unit-level heat rates and fuel costs.

<a id='ferc'></a>
## FERC Output Tables