# Working with the EIA Extract / Transform
This notebook steps through PUDL's extract and transform steps for the EIA 860 and 923 datasets, to make it easier to test and add new years of data, or new tables from the various spreadsheets that haven't been integrated yet.

## Dagster
The extract and transform steps for the EIA 860 and 923 datasets are asset groups where each asset is a dataframe produced by the ETL step. A dagster job is created and executed for each asset group. **Jobs will fail if they are not run in the correct order, i.e. the extract asset group jobs should run before the transformation jobs.** Dagster stores the asset outputs in temporary directory that is available for the duration of the jupyter kernel. If you restart the jupyter kernel you will need to rerun all of the extract and transform steps.

In [1]:
%load_ext autoreload
%autoreload 3
import pudl
import logging
import sys
from pathlib import Path
import pandas as pd
pd.options.display.max_columns = None

pudl_settings is being depcrated in favor of environment variables PUDL_OUTPUT and PUDL_CACHE
pudl_settings is being depcrated in favor of environment variables PUDL_OUTPUT and PUDL_CACHE. For more infosee: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html
sqlite and parquet directories are no longer being used. Make sure there is a single directory named 'output' at the root of your workspace. For more info see: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html
pudl_settings is being depcrated in favor of environment variables PUDL_OUTPUT and PUDL_CACHE
pudl_settings is being depcrated in favor of environment variables PUDL_OUTPUT and PUDL_CACHE. For more infosee: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html
sqlite and parquet directories are no longer being used. Make sure there is a single directory named 'output' at the root of your workspace. For more info see: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.htm

In [2]:
logger = logging.getLogger()
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter('%(message)s')
handler.setFormatter(formatter)
logger.handlers = [handler]

## Set the scope for the Extract-Transform:

In [3]:
from pudl.settings import Eia860Settings, Eia923Settings, EiaSettings, DatasetsSettings
from pudl.metadata.classes import DataSource

eia860_data_source = DataSource.from_id("eia860")
eia860_settings = Eia860Settings(
# Limit the years as needed if you're testing only a few of them. E.g.:
    years=[2021],
#   years=eia860_data_source.working_partitions["years"]
)

eia923_data_source = DataSource.from_id("eia923")
eia923_settings = Eia923Settings(
# Limit the years as needed if you're testing only a few of them. E.g.:
    years = [2021]
    # years = eia923_data_source.working_partitions["years"]
)

eia_settings = EiaSettings(eia860=eia860_settings, eia923=eia923_settings)
eia_settings

EiaSettings(eia860=Eia860Settings(years=[2021], eia860m=True), eia923=Eia923Settings(years=[2021]))

In [4]:
from dagster import AssetSelection, Definitions, define_asset_job, fs_io_manager, JobDefinition, AssetKey

import pudl
from pudl.etl import default_resources, load_dataset_settings_from_file, default_assets
from pudl.resources import dataset_settings, datastore

def get_job_from_asset_group(asset_group: str, eia_settings: EiaSettings) -> JobDefinition:
    """Get a job that executes a group of assets.
    
    Args:
        asset_group: the asset group to create a job for.
        dataset_settings: the dataset settings configuration.
        
    Returns:
        A job definition for the subset of assets
    """
    # Select the assets in the asset_group
    selection = AssetSelection.groups(asset_group)
    
    # Create dataset configuration
    dataset_settings = DatasetsSettings().dict()
    dataset_settings["eia"] = eia_settings.dict()
    config = {
        "resources": {
            "dataset_settings": {"config": dataset_settings}
        }
    }
    
    jobs = [define_asset_job(f"{asset_group}_job", config=config, selection=selection)]
    
    resources = pudl.etl.default_resources
    
    # Replace the pudl_sqlite_io_manager with the default IO Manager
    # so the final dataframes are loaded to a temporary directory
    # instead of the actual database.
    resources["pudl_sqlite_io_manager"] = fs_io_manager

    return Definitions(
        assets=pudl.etl.default_assets,
        resources=resources,
        jobs=jobs,
    ).get_job_def(f"{asset_group}_job")

def get_asset_group_keys(asset_group: str) -> list[str]:
    """Get a list of asset names in a given asset group.
    
    Args:
        asset_group: the name of the asset group.
    
    Return:
        A list of asset names in the asset_group.
    """
    asset_keys = AssetSelection.groups(asset_group).resolve(default_assets)
    return [asset.to_python_identifier() for asset in list(asset_keys)]

# EIA-860

## Extract just the EIA-860 / EIA-860m

In [5]:
%%time
eia860_raw_assets_job = get_job_from_asset_group("eia860_raw_assets", eia_settings=eia_settings)
eia860_raw_assets_job_result = eia860_raw_assets_job.execute_in_process()
assert eia860_raw_assets_job_result.success

2023-03-03 16:04:56 -0900 - dagster - DEBUG - eia860_raw_assets_job - 44d5d180-f907-469d-8d6d-e0eac2290116 - 72134 - RUN_START - Started execution of run for "eia860_raw_assets_job".
2023-03-03 16:04:56 -0900 - dagster - DEBUG - eia860_raw_assets_job - 44d5d180-f907-469d-8d6d-e0eac2290116 - 72134 - ENGINE_EVENT - Executing steps in process (pid: 72134)
2023-03-03 16:04:56 -0900 - dagster - DEBUG - eia860_raw_assets_job - 44d5d180-f907-469d-8d6d-e0eac2290116 - 72134 - extract_eia860 - RESOURCE_INIT_STARTED - Starting initialization of resources [dataset_settings, datastore, io_manager].
2023-03-03 16:04:56 -0900 - dagster - DEBUG - eia860_raw_assets_job - 44d5d180-f907-469d-8d6d-e0eac2290116 - 72134 - extract_eia860 - RESOURCE_INIT_SUCCESS - Finished initialization of resources [dataset_settings, datastore, io_manager].
2023-03-03 16:04:56 -0900 - dagster - DEBUG - eia860_raw_assets_job - 44d5d180-f907-469d-8d6d-e0eac2290116 - 72134 - LOGS_CAPTURED - Started capturing logs in process (p

Extracting eia860 spreadsheet data.
Boiler Cooling
Boiler Generator
Boiler Mercury
Boiler NOx
Boiler Particulate Matter
Boiler SO2
Boiler Stack Flue
Emissions Control Equipment
Boiler Info & Design Parameters
Cooling
Emission Standards & Strategies
FGD
FGP
Stack Flue
Operable
Proposed
Retired and Canceled
Operable
Proposed
Retired and Canceled
Ownership
Plant
Utility


2023-03-03 16:05:27 [    INFO] catalystcoop.pudl.extract.excel:237 Extracting eia860m spreadsheet data.


Extracting eia860m spreadsheet data.
Operating
Canceled or Postponed
Operating_PR
Planned
Planned_PR
Retired
Retired_PR


2023-03-03 16:05:38 -0900 - dagster - DEBUG - eia860_raw_assets_job - 44d5d180-f907-469d-8d6d-e0eac2290116 - 72134 - extract_eia860 - STEP_OUTPUT - Yielded output "raw_boiler_cooling_eia860" of type "Any". (Type check passed).
2023-03-03 16:05:38 -0900 - dagster - DEBUG - eia860_raw_assets_job - 44d5d180-f907-469d-8d6d-e0eac2290116 - extract_eia860 - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmp37fkh5hd/storage/raw_boiler_cooling_eia860
2023-03-03 16:05:38 -0900 - dagster - DEBUG - eia860_raw_assets_job - 44d5d180-f907-469d-8d6d-e0eac2290116 - 72134 - extract_eia860 - ASSET_MATERIALIZATION - Materialized value raw_boiler_cooling_eia860.
2023-03-03 16:05:38 -0900 - dagster - DEBUG - eia860_raw_assets_job - 44d5d180-f907-469d-8d6d-e0eac2290116 - 72134 - extract_eia860 - HANDLED_OUTPUT - Handled output "raw_boiler_cooling_eia860" using IO manager "io_manager"
2023-03-03 16:05:38 -0900 - dagster - DEBUG - eia860_raw_assets_job - 44d5d180-f907-469d-8d6d-e0eac2290116 

CPU times: user 41 s, sys: 526 ms, total: 41.6 s
Wall time: 43 s


In [6]:
get_asset_group_keys("eia860_raw_assets")

['raw_ownership_eia860',
 'raw_cooling_equipment_eia860',
 'raw_boiler_stack_flue_eia860',
 'raw_utility_eia860',
 'raw_generator_existing_eia860',
 'raw_boiler_nox_eia860',
 'raw_boiler_pm_eia860',
 'raw_boiler_mercury_eia860',
 'raw_plant_eia860',
 'raw_generator_proposed_eia860',
 'raw_multifuel_existing_eia860',
 'raw_boiler_generator_assn_eia860',
 'raw_fgd_equipment_eia860',
 'raw_multifuel_retired_eia860',
 'raw_boiler_cooling_eia860',
 'raw_emissions_control_equipment_eia860',
 'raw_generator_eia860',
 'raw_generator_retired_eia860',
 'raw_boiler_info_eia860',
 'raw_boiler_so2_eia860',
 'raw_emission_control_strategies_eia860',
 'raw_stack_flue_equipment_eia860',
 'raw_fgp_equipment_eia860']

In [7]:
# Grab the dataframe for a given asset key.
raw_utility_eia860 = eia860_raw_assets_job_result.asset_value("raw_utility_eia860")
raw_utility_eia860.head()

Unnamed: 0,address_2,address_3,attention_line,city,contact_firstname,contact_firstname_2,contact_lastname,contact_lastname_2,contact_title,contact_title_2,data_maturity,entity_type,phone_extension,phone_extension_2,phone_number_first,phone_number_first_2,phone_number_last,phone_number_last_2,phone_number_mid,phone_number_mid_2,plants_reported_asset_manager,plants_reported_operator,plants_reported_other_relationship,plants_reported_owner,regulated,report_year,state,street_address,utility_id_eia,utility_name_eia,zip_code,zip_code_4
0,,,,Decatur,,,,,,,final,IND,,,,,,,,,,,,Y,,2021.0,IL,2200 East Eldorado Street,7.0,"Primary Products Ingredients Americas, LLC",62525,
1,,,,Lafayette,,,,,,,final,IND,,,,,,,,,,,,Y,,2021.0,IN,2245 Sagamore Parkway North,8.0,Tate & Lyle Ingredients Americas Inc,47904,
2,,,,Dresden,,,,,,,final,Q,,,,,,,,,,,,Y,,2021.0,NY,PO Box 187590 Plant Rd,25.0,Greenidge Generation Holdings LLC,14441,
3,,,,Abbeville,,,,,,,final,M,,,,,,,,,,,,Y,,2021.0,SC,P O Box 639,34.0,City of Abbeville - (SC),29620,
4,,,,Cumberland,,,,,,,final,Q,,,,,,,,,,,,Y,,2021.0,MD,11600 Mexico Farms Road,35.0,AES WR Ltd Partnership,21502,


## Transform EIA-860 / EIA-860m

In [8]:
%%time
pre_harvested_eia860_assets_job = get_job_from_asset_group("pre_harvested_eia860_assets", eia_settings=eia_settings)
pre_harvested_eia860_assets_job_result = pre_harvested_eia860_assets_job.execute_in_process()
assert pre_harvested_eia860_assets_job_result.success

2023-03-03 15:46:47 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - RUN_START - Started execution of run for "pre_harvested_eia860_assets_job".
2023-03-03 15:46:47 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - ENGINE_EVENT - Executing steps in process (pid: 70373)
2023-03-03 15:46:47 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - RESOURCE_INIT_STARTED - Starting initialization of resources [io_manager].
2023-03-03 15:46:47 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - RESOURCE_INIT_SUCCESS - Finished initialization of resources [io_manager].
2023-03-03 15:46:47 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - LOGS_CAPTURED - Started capturing logs in process (pid: 70373).
2023-03-03 15:46:47 

Recoding boiler_generator_assn_eia860.boiler_generator_assn_type_code


2023-03-03 15:46:47 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding boiler_generator_assn_eia860.steam_plant_type_code


Recoding boiler_generator_assn_eia860.steam_plant_type_code


2023-03-03 15:46:47 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding boiler_generator_assn_eia860.data_maturity


Recoding boiler_generator_assn_eia860.data_maturity


2023-03-03 15:46:47 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_boiler_generator_assn_eia860 - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2023-03-03 15:46:47 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - clean_boiler_generator_assn_eia860 - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmptiaapj13/storage/clean_boiler_generator_assn_eia860
2023-03-03 15:46:47 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_boiler_generator_assn_eia860 - ASSET_MATERIALIZATION - Materialized value clean_boiler_generator_assn_eia860.
2023-03-03 15:46:47 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_boiler_generator_assn_eia860 - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2023-03

Recoding generators_eia860.operational_status_code


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.prime_mover_code


Recoding generators_eia860.prime_mover_code


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_1


Recoding generators_eia860.energy_source_code_1


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_2


Recoding generators_eia860.energy_source_code_2


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_3


Recoding generators_eia860.energy_source_code_3


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_4


Recoding generators_eia860.energy_source_code_4


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_5


Recoding generators_eia860.energy_source_code_5


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_6


Recoding generators_eia860.energy_source_code_6


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_1_transport_1


Recoding generators_eia860.energy_source_1_transport_1


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_1_transport_2


Recoding generators_eia860.energy_source_1_transport_2


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_1_transport_3


Recoding generators_eia860.energy_source_1_transport_3


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_2_transport_1


Recoding generators_eia860.energy_source_2_transport_1


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_2_transport_2


Recoding generators_eia860.energy_source_2_transport_2


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_2_transport_3


Recoding generators_eia860.energy_source_2_transport_3


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.planned_new_prime_mover_code


Recoding generators_eia860.planned_new_prime_mover_code


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.planned_energy_source_code_1


Recoding generators_eia860.planned_energy_source_code_1


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.startup_source_code_1


Recoding generators_eia860.startup_source_code_1


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.startup_source_code_2


Recoding generators_eia860.startup_source_code_2


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.startup_source_code_3


Recoding generators_eia860.startup_source_code_3


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.startup_source_code_4


Recoding generators_eia860.startup_source_code_4


2023-03-03 15:46:55 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.data_maturity


Recoding generators_eia860.data_maturity


2023-03-03 15:46:56 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_generators_eia860 - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2023-03-03 15:46:56 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - clean_generators_eia860 - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmptiaapj13/storage/clean_generators_eia860
2023-03-03 15:46:56 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_generators_eia860 - ASSET_MATERIALIZATION - Materialized value clean_generators_eia860.
2023-03-03 15:46:56 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_generators_eia860 - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2023-03-03 15:46:56 -0900 - dagster - DEBUG - pre_harvested_eia860_assets

Recoding ownership_eia860.data_maturity


2023-03-03 15:46:56 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_ownership_eia860 - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2023-03-03 15:46:56 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - clean_ownership_eia860 - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmptiaapj13/storage/clean_ownership_eia860
2023-03-03 15:46:56 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_ownership_eia860 - ASSET_MATERIALIZATION - Materialized value clean_ownership_eia860.
2023-03-03 15:46:56 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_ownership_eia860 - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2023-03-03 15:46:56 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job -

Recoding plants_eia860.balancing_authority_code_eia


2023-03-03 15:46:57 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding plants_eia860.sector_id_eia


Recoding plants_eia860.sector_id_eia


2023-03-03 15:46:57 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding plants_eia860.data_maturity


Recoding plants_eia860.data_maturity


2023-03-03 15:46:57 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_plants_eia860 - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2023-03-03 15:46:57 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - clean_plants_eia860 - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmptiaapj13/storage/clean_plants_eia860
2023-03-03 15:46:57 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_plants_eia860 - ASSET_MATERIALIZATION - Materialized value clean_plants_eia860.
2023-03-03 15:46:57 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba0-a4d4-9217d183e00d - 70373 - clean_plants_eia860 - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2023-03-03 15:46:57 -0900 - dagster - DEBUG - pre_harvested_eia860_assets_job - dbf85926-b58c-4ba

CPU times: user 9.71 s, sys: 355 ms, total: 10.1 s
Wall time: 10.3 s


In [9]:
get_asset_group_keys("pre_harvested_eia860_assets")

['clean_ownership_eia860',
 'clean_boiler_generator_assn_eia860',
 'clean_utilities_eia860',
 'clean_plants_eia860',
 'clean_generators_eia860']

In [10]:
asset_key = "clean_ownership_eia860"
pre_harvested_eia860_assets_job_result.asset_value(asset_key).head()

Unnamed: 0,plant_id_eia,generator_id,data_maturity,fraction_owned,operational_status_code,owner_city,owner_name,owner_state,owner_street_address,owner_utility_id_eia,owner_zip_code,plant_name_eia,state,utility_id_eia,utility_name_eia,report_date,owner_country
0,10,1,final,0.6,OP,Birmingham,Alabama Power Co,AL,600 North 18th Street,195,35291,Greene County,AL,195,Alabama Power Co,2021-01-01,USA
1,10,1,final,0.4,OP,Gulfport,Mississippi Power Co,MS,2992 West Beach Boulevard,12686,39501,Greene County,AL,195,Alabama Power Co,2021-01-01,USA
2,10,2,final,0.6,OP,Birmingham,Alabama Power Co,AL,600 North 18th Street,195,35291,Greene County,AL,195,Alabama Power Co,2021-01-01,USA
3,10,2,final,0.4,OP,Gulfport,Mississippi Power Co,MS,2992 West Beach Boulevard,12686,39501,Greene County,AL,195,Alabama Power Co,2021-01-01,USA
4,26,1,final,0.5,OP,Birmingham,Alabama Power Co,AL,600 North 18th Street,195,35291,E C Gaston,AL,195,Alabama Power Co,2021-01-01,USA


# EIA-923

## Extract just the EIA-923

In [11]:
%%time
eia923_raw_assets_job = get_job_from_asset_group("eia923_raw_assets", eia_settings=eia_settings)
eia923_raw_assets_job_result = eia923_raw_assets_job.execute_in_process()
assert eia923_raw_assets_job_result.success



pudl_settings is being depcrated in favor of environment variables PUDL_OUTPUT and PUDL_CACHE




pudl_settings is being depcrated in favor of environment variables PUDL_OUTPUT and PUDL_CACHE. For more infosee: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html




sqlite and parquet directories are no longer being used. Make sure there is a single directory named 'output' at the root of your workspace. For more info see: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html




pudl_settings is being depcrated in favor of environment variables PUDL_OUTPUT and PUDL_CACHE




pudl_settings is being depcrated in favor of environment variables PUDL_OUTPUT and PUDL_CACHE. For more infosee: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html




sqlite and parquet directories are no longer being used. Make sure there is a single directory named 'output' at the root of your workspace. For more info see: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html


2023-03-03 15:46:58 -0900 - dagster - DEBUG - eia923_raw_assets_job - 7b9cbfcc-852a-4bea-b63b-c884c3efce6e - 70373 - RUN_START - Started execution of run for "eia923_raw_assets_job".
2023-03-03 15:46:58 -0900 - dagster - DEBUG - eia923_raw_assets_job - 7b9cbfcc-852a-4bea-b63b-c884c3efce6e - 70373 - ENGINE_EVENT - Executing steps in process (pid: 70373)


pudl_settings is being depcrated in favor of environment variables PUDL_OUTPUT and PUDL_CACHE




pudl_settings is being depcrated in favor of environment variables PUDL_OUTPUT and PUDL_CACHE. For more infosee: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html




sqlite and parquet directories are no longer being used. Make sure there is a single directory named 'output' at the root of your workspace. For more info see: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html


2023-03-03 15:46:58 -0900 - dagster - DEBUG - eia923_raw_assets_job - 7b9cbfcc-852a-4bea-b63b-c884c3efce6e - 70373 - extract_eia923 - RESOURCE_INIT_STARTED - Starting initialization of resources [dataset_settings, datastore, io_manager].
2023-03-03 15:46:58 -0900 - dagster - DEBUG - eia923_raw_assets_job - 7b9cbfcc-852a-4bea-b63b-c884c3efce6e - 70373 - extract_eia923 - RESOURCE_INIT_SUCCESS - Finished initialization of resources [dataset_settings, datastore, io_manager].
2023-03-03 15:46:58 -0900 - dagster - DEBUG - eia923_raw_assets_job - 7b9cbfcc-852a-4bea-b63b-c884c3efce6e - 70373 - LOGS_CAPTURED - Started capturing logs in process (pid: 70373).
2023-03-03 15:46:58 -0900 - dagster - DEBUG - eia923_raw_assets_job - 7b9cbfcc-852a-4bea-b63b-c884c3efce6e - 70373 - extract_eia923 - STEP_START - Started execution of step "extract_eia923".
2023-03-03 15:46:58 [    INFO] catalystcoop.pudl.extract.excel:237 Extracting eia923 spreadsheet data.


Extracting eia923 spreadsheet data.
Page 1 Energy Storage
Page 1 Generation and Fuel Data
Page 1 Puerto Rico
Page 2 Oil Stocks Data
Page 2 Stocks Data
Page 3 Boiler Fuel Data
Page 4 Generator Data
Page 5 Fuel Receipts and Costs
Page 6 Plant Frame
Page 6 Plant Frame Puerto Rico


2023-03-03 15:47:25 -0900 - dagster - DEBUG - eia923_raw_assets_job - 7b9cbfcc-852a-4bea-b63b-c884c3efce6e - 70373 - extract_eia923 - STEP_OUTPUT - Yielded output "raw_boiler_fuel_eia923" of type "Any". (Type check passed).
2023-03-03 15:47:25 -0900 - dagster - DEBUG - eia923_raw_assets_job - 7b9cbfcc-852a-4bea-b63b-c884c3efce6e - extract_eia923 - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmptiaapj13/storage/raw_boiler_fuel_eia923
2023-03-03 15:47:25 -0900 - dagster - DEBUG - eia923_raw_assets_job - 7b9cbfcc-852a-4bea-b63b-c884c3efce6e - 70373 - extract_eia923 - ASSET_MATERIALIZATION - Materialized value raw_boiler_fuel_eia923.
2023-03-03 15:47:25 -0900 - dagster - DEBUG - eia923_raw_assets_job - 7b9cbfcc-852a-4bea-b63b-c884c3efce6e - 70373 - extract_eia923 - HANDLED_OUTPUT - Handled output "raw_boiler_fuel_eia923" using IO manager "io_manager"
2023-03-03 15:47:25 -0900 - dagster - DEBUG - eia923_raw_assets_job - 7b9cbfcc-852a-4bea-b63b-c884c3efce6e - 70373 - ex

CPU times: user 27.4 s, sys: 145 ms, total: 27.5 s
Wall time: 27.7 s


In [12]:
get_asset_group_keys("eia923_raw_assets")

['raw_generator_eia923',
 'raw_generation_fuel_eia923',
 'raw_boiler_fuel_eia923',
 'raw_stocks_eia923',
 'raw_fuel_receipts_costs_eia923']

In [13]:
asset_key = "raw_generator_eia923"
raw_utility_eia923 = eia923_raw_assets_job_result.asset_value(asset_key)
raw_utility_eia923.head()

Unnamed: 0,balancing_authority_code_eia,census_region,combined_heat_power,data_maturity,early_release,generator_id,naics_code,nerc_region,net_generation_mwh_april,net_generation_mwh_august,net_generation_mwh_december,net_generation_mwh_february,net_generation_mwh_january,net_generation_mwh_july,net_generation_mwh_june,net_generation_mwh_march,net_generation_mwh_may,net_generation_mwh_november,net_generation_mwh_october,net_generation_mwh_september,net_generation_mwh_year_to_date,operator_id,operator_name,plant_id_eia,plant_name_eia,plant_state,prime_mover_code,report_year,reporting_frequency_code,sector_id_eia,sector_name_eia
0,SOCO,ESC,N,final,,A2C2,22.0,SERC,117440,125106,77049,104273,135871,122586,121103,133825,126394,110123,125453,106784,1406007.0,195.0,Alabama Power Co,3,Barry,AL,CT,2021.0,M,1.0,Electric Utility
1,SOCO,ESC,N,final,,A1CT2,22.0,SERC,65466,130369,128130,110624,125896,128816,121644,10190,112340,112650,84469,112429,1243023.0,195.0,Alabama Power Co,3,Barry,AL,CT,2021.0,M,1.0,Electric Utility
2,SOCO,ESC,N,final,,A1ST,22.0,SERC,95954,131299,134911,106021,133937,130611,126197,18635,112335,121708,88350,55482,1255440.0,195.0,Alabama Power Co,3,Barry,AL,CA,2021.0,M,1.0,Electric Utility
3,SOCO,ESC,N,final,,2,22.0,SERC,0,-179,-502,310,-437,-175,-179,-46,-116,-386,-178,-127,-2015.0,195.0,Alabama Power Co,3,Barry,AL,ST,2021.0,M,1.0,Electric Utility
4,SOCO,ESC,N,final,,A2C1,22.0,SERC,102454,125593,76781,99333,134091,122663,119548,130806,125651,109456,123974,107128,1377478.0,195.0,Alabama Power Co,3,Barry,AL,CT,2021.0,M,1.0,Electric Utility


## Transform just the EIA-923

In [14]:
%%time
pre_harvested_eia923_assets_job = get_job_from_asset_group("pre_harvested_eia923_assets", eia_settings=eia_settings)
pre_harvested_eia923_assets_job_result = pre_harvested_eia923_assets_job.execute_in_process()
assert pre_harvested_eia923_assets_job_result.success

2023-03-03 15:47:25 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - RUN_START - Started execution of run for "pre_harvested_eia923_assets_job".
2023-03-03 15:47:25 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - ENGINE_EVENT - Executing steps in process (pid: 70373)
2023-03-03 15:47:25 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - RESOURCE_INIT_STARTED - Starting initialization of resources [io_manager].
2023-03-03 15:47:25 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - RESOURCE_INIT_SUCCESS - Finished initialization of resources [io_manager].
2023-03-03 15:47:25 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - LOGS_CAPTURED - Started capturing logs in process (pid: 70373).
2023-03-03 15:47:25 

Recoding boiler_fuel_eia923.energy_source_code


2023-03-03 15:47:26 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_boiler_fuel_eia923 - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2023-03-03 15:47:26 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - clean_boiler_fuel_eia923 - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmptiaapj13/storage/clean_boiler_fuel_eia923
2023-03-03 15:47:27 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_boiler_fuel_eia923 - ASSET_MATERIALIZATION - Materialized value clean_boiler_fuel_eia923.
2023-03-03 15:47:27 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_boiler_fuel_eia923 - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2023-03-03 15:47:27 -0900 - dagster - DEBUG - pre_harvested_eia923_

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3492 entries, 121 to 8574
Data columns (total 11 columns):
 #   Column               Non-Null Count  Dtype         
---  ------               --------------  -----         
 0   boiler_id            3492 non-null   object        
 1   energy_source_code   3492 non-null   string        
 2   plant_id_eia         3492 non-null   Int64         
 3   prime_mover_code     3492 non-null   object        
 4   sector_id_eia        3492 non-null   float64       
 5   sector_name_eia      3492 non-null   object        
 6   ash_content_pct      3352 non-null   float64       
 7   fuel_consumed_units  3352 non-null   float64       
 8   fuel_mmbtu_per_unit  3352 non-null   float64       
 9   sulfur_content_pct   3352 non-null   float64       
 10  report_date          3492 non-null   datetime64[ns]
dtypes: Int64(1), datetime64[ns](1), float64(5), object(3), string(1)
memory usage: 330.8+ KB
Aggregate boilers: None


2023-03-03 15:47:27 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_coalmine_eia923 - ASSET_OBSERVATION - ASSET_OBSERVATION for step clean_coalmine_eia923
2023-03-03 15:47:27 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_coalmine_eia923 - LOADED_INPUT - Loaded input "raw_fuel_receipts_costs_eia923" using input manager "io_manager"
2023-03-03 15:47:27 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_coalmine_eia923 - STEP_INPUT - Got input "raw_fuel_receipts_costs_eia923" of type "Any". (Type check passed).
2023-03-03 15:47:27 [    INFO] catalystcoop.pudl.helpers:207 Assigned state FIPS codes for 21.35% of records.


Assigned state FIPS codes for 21.35% of records.


2023-03-03 15:47:27 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding coalmine_eia923.mine_type_code


Recoding coalmine_eia923.mine_type_code


2023-03-03 15:47:27 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding coalmine_eia923.data_maturity


Recoding coalmine_eia923.data_maturity


2023-03-03 15:47:27 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding coalmine_eia923.mine_type_code


Recoding coalmine_eia923.mine_type_code


2023-03-03 15:47:27 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding coalmine_eia923.data_maturity


Recoding coalmine_eia923.data_maturity


2023-03-03 15:47:27 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_coalmine_eia923 - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2023-03-03 15:47:27 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - clean_coalmine_eia923 - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmptiaapj13/storage/clean_coalmine_eia923
2023-03-03 15:47:27 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_coalmine_eia923 - ASSET_MATERIALIZATION - Materialized value clean_coalmine_eia923.
2023-03-03 15:47:27 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_coalmine_eia923 - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2023-03-03 15:47:27 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76

Recoding generation_eia923.data_maturity


2023-03-03 15:47:28 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_generation_eia923 - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2023-03-03 15:47:28 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - clean_generation_eia923 - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmptiaapj13/storage/clean_generation_eia923
2023-03-03 15:47:28 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_generation_eia923 - ASSET_MATERIALIZATION - Materialized value clean_generation_eia923.
2023-03-03 15:47:28 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_generation_eia923 - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2023-03-03 15:47:28 -0900 - dagster - DEBUG - pre_harvested_eia923_assets

Recoding generation_fuel_eia923.energy_source_code


2023-03-03 15:47:31 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generation_fuel_eia923.fuel_type_code_aer


Recoding generation_fuel_eia923.fuel_type_code_aer


2023-03-03 15:47:31 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generation_fuel_eia923.prime_mover_code


Recoding generation_fuel_eia923.prime_mover_code


2023-03-03 15:47:31 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generation_fuel_eia923.data_maturity


Recoding generation_fuel_eia923.data_maturity


2023-03-03 15:47:32 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - generation_fuel_eia923 - STEP_OUTPUT - Yielded output "clean_generation_fuel_eia923" of type "Any". (Type check passed).
2023-03-03 15:47:32 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - generation_fuel_eia923 - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmptiaapj13/storage/clean_generation_fuel_eia923
2023-03-03 15:47:32 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - generation_fuel_eia923 - ASSET_MATERIALIZATION - Materialized value clean_generation_fuel_eia923.
2023-03-03 15:47:32 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - generation_fuel_eia923 - HANDLED_OUTPUT - Handled output "clean_generation_fuel_eia923" using IO manager "io_manager"
2023-03-03 15:47:32 -09

Assigned state FIPS codes for 21.35% of records.


2023-03-03 15:47:33 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding coalmine_eia923.mine_type_code


Recoding coalmine_eia923.mine_type_code


2023-03-03 15:47:33 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding coalmine_eia923.data_maturity


Recoding coalmine_eia923.data_maturity


2023-03-03 15:47:34 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding fuel_receipts_costs_eia923.contract_type_code


Recoding fuel_receipts_costs_eia923.contract_type_code


2023-03-03 15:47:34 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding fuel_receipts_costs_eia923.energy_source_code


Recoding fuel_receipts_costs_eia923.energy_source_code


2023-03-03 15:47:34 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding fuel_receipts_costs_eia923.primary_transportation_mode_code


Recoding fuel_receipts_costs_eia923.primary_transportation_mode_code


2023-03-03 15:47:34 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding fuel_receipts_costs_eia923.secondary_transportation_mode_code


Recoding fuel_receipts_costs_eia923.secondary_transportation_mode_code


2023-03-03 15:47:34 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding fuel_receipts_costs_eia923.data_maturity


Recoding fuel_receipts_costs_eia923.data_maturity


2023-03-03 15:47:34 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_fuel_receipts_costs_eia923 - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2023-03-03 15:47:34 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - clean_fuel_receipts_costs_eia923 - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmptiaapj13/storage/clean_fuel_receipts_costs_eia923
2023-03-03 15:47:34 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_fuel_receipts_costs_eia923 - ASSET_MATERIALIZATION - Materialized value clean_fuel_receipts_costs_eia923.
2023-03-03 15:47:34 -0900 - dagster - DEBUG - pre_harvested_eia923_assets_job - 56e76cb2-aed3-4143-8378-4c475bbb6c51 - 70373 - clean_fuel_receipts_costs_eia923 - HANDLED_OUTPUT - Handled output "result" using IO manager "io_manager"
2023-03-03 15:47:34

CPU times: user 8.7 s, sys: 100 ms, total: 8.8 s
Wall time: 8.88 s


In [15]:
get_asset_group_keys("pre_harvested_eia923_assets")

['clean_generation_eia923',
 'clean_generation_fuel_nuclear_eia923',
 'clean_coalmine_eia923',
 'clean_boiler_fuel_eia923',
 'clean_fuel_receipts_costs_eia923',
 'clean_generation_fuel_eia923']

In [16]:
asset_key = "clean_generation_fuel_eia923"
pre_harvested_eia923_assets_job_result.asset_value(asset_key).head()

Unnamed: 0,balancing_authority_code_eia,data_maturity,energy_source_code,fuel_type_code_aer,plant_id_eia,prime_mover_code,reporting_frequency_code,sector_id_eia,sector_name_eia,fuel_consumed_for_electricity_mmbtu,fuel_consumed_for_electricity_units,fuel_consumed_mmbtu,fuel_consumed_units,fuel_mmbtu_per_unit,net_generation_mwh,fuel_type_code_pudl,report_date
0,,final,DFO,DFO,1,IC,A,1.0,Electric Utility,2724.0,466.0,2724.0,466.0,5.846,208.841,oil,2021-01-01
0,,final,DFO,DFO,1,IC,A,1.0,Electric Utility,3005.0,514.0,3005.0,514.0,5.846,231.25,oil,2021-02-01
0,,final,DFO,DFO,1,IC,A,1.0,Electric Utility,3250.0,556.0,3250.0,556.0,5.846,250.099,oil,2021-03-01
0,,final,DFO,DFO,1,IC,A,1.0,Electric Utility,3128.0,535.0,3128.0,535.0,5.846,240.409,oil,2021-04-01
0,,final,DFO,DFO,1,IC,A,1.0,Electric Utility,1619.0,277.0,1619.0,277.0,5.846,124.504,oil,2021-05-01


# Combined EIA Data

## Merge EIA-923/860, set dtypes, harvest entities

In [17]:
%%time
eia_harvested_assets_job = get_job_from_asset_group("eia_harvested_assets", eia_settings=eia_settings)
eia_harvested_assets_job_result = eia_harvested_assets_job.execute_in_process()

2023-03-03 15:47:34 -0900 - dagster - DEBUG - eia_harvested_assets_job - 70597524-4e93-42e1-ab40-95e562399209 - 70373 - RUN_START - Started execution of run for "eia_harvested_assets_job".
2023-03-03 15:47:34 -0900 - dagster - DEBUG - eia_harvested_assets_job - 70597524-4e93-42e1-ab40-95e562399209 - 70373 - ENGINE_EVENT - Executing steps in process (pid: 70373)
2023-03-03 15:47:34 -0900 - dagster - DEBUG - eia_harvested_assets_job - 70597524-4e93-42e1-ab40-95e562399209 - 70373 - eia_transform - RESOURCE_INIT_STARTED - Starting initialization of resources [dataset_settings, io_manager, pudl_sqlite_io_manager].
2023-03-03 15:47:34 -0900 - dagster - DEBUG - eia_harvested_assets_job - 70597524-4e93-42e1-ab40-95e562399209 - 70373 - eia_transform - RESOURCE_INIT_SUCCESS - Finished initialization of resources [dataset_settings, io_manager, pudl_sqlite_io_manager].
2023-03-03 15:47:34 -0900 - dagster - DEBUG - eia_harvested_assets_job - 70597524-4e93-42e1-ab40-95e562399209 - 70373 - LOGS_CAPTU

Harvesting IDs & consistently static attributes for EIA plants


2023-03-03 15:47:40 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding plants_eia860.balancing_authority_code_eia


Recoding plants_eia860.balancing_authority_code_eia


2023-03-03 15:47:40 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding plants_eia860.reporting_frequency_code


Recoding plants_eia860.reporting_frequency_code


2023-03-03 15:47:40 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding plants_eia860.sector_id_eia


Recoding plants_eia860.sector_id_eia


2023-03-03 15:47:40 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding plants_eia860.data_maturity


Recoding plants_eia860.data_maturity


2023-03-03 15:47:46 [    INFO] catalystcoop.pudl.transform.eia:553 Average consistency of static plants values is 99.85%


Average consistency of static plants values is 99.85%


2023-03-03 15:47:47 [    INFO] catalystcoop.pudl.transform.eia:1192 Harvesting IDs & consistently static attributes for EIA generators


Harvesting IDs & consistently static attributes for EIA generators


2023-03-03 15:47:47 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.operational_status_code


Recoding generators_eia860.operational_status_code


2023-03-03 15:47:47 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.prime_mover_code


Recoding generators_eia860.prime_mover_code


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_1


Recoding generators_eia860.energy_source_code_1


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_2


Recoding generators_eia860.energy_source_code_2


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_3


Recoding generators_eia860.energy_source_code_3


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_4


Recoding generators_eia860.energy_source_code_4


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_5


Recoding generators_eia860.energy_source_code_5


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_code_6


Recoding generators_eia860.energy_source_code_6


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_1_transport_1


Recoding generators_eia860.energy_source_1_transport_1


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_1_transport_2


Recoding generators_eia860.energy_source_1_transport_2


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_1_transport_3


Recoding generators_eia860.energy_source_1_transport_3


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_2_transport_1


Recoding generators_eia860.energy_source_2_transport_1


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_2_transport_2


Recoding generators_eia860.energy_source_2_transport_2


2023-03-03 15:47:48 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.energy_source_2_transport_3


Recoding generators_eia860.energy_source_2_transport_3


2023-03-03 15:47:49 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.planned_new_prime_mover_code


Recoding generators_eia860.planned_new_prime_mover_code


2023-03-03 15:47:49 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.planned_energy_source_code_1


Recoding generators_eia860.planned_energy_source_code_1


2023-03-03 15:47:49 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.startup_source_code_1


Recoding generators_eia860.startup_source_code_1


2023-03-03 15:47:49 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.startup_source_code_2


Recoding generators_eia860.startup_source_code_2


2023-03-03 15:47:49 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.startup_source_code_3


Recoding generators_eia860.startup_source_code_3


2023-03-03 15:47:49 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.startup_source_code_4


Recoding generators_eia860.startup_source_code_4


2023-03-03 15:47:49 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding generators_eia860.data_maturity


Recoding generators_eia860.data_maturity


2023-03-03 15:48:04 [    INFO] catalystcoop.pudl.transform.eia:553 Average consistency of static generators values is 100.00%


Average consistency of static generators values is 100.00%


2023-03-03 15:48:04 [    INFO] catalystcoop.pudl.transform.eia:1192 Harvesting IDs & consistently static attributes for EIA utilities


Harvesting IDs & consistently static attributes for EIA utilities


2023-03-03 15:48:04 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding utilities_eia860.data_maturity


Recoding utilities_eia860.data_maturity


2023-03-03 15:48:05 [    INFO] catalystcoop.pudl.transform.eia:553 Average consistency of static utilities values is 100.00%


Average consistency of static utilities values is 100.00%


2023-03-03 15:48:05 [    INFO] catalystcoop.pudl.transform.eia:1192 Harvesting IDs & consistently static attributes for EIA boilers


Harvesting IDs & consistently static attributes for EIA boilers


2023-03-03 15:48:05 [    INFO] catalystcoop.pudl.transform.eia:553 Average consistency of static boilers values is 99.18%


Average consistency of static boilers values is 99.18%


2023-03-03 15:48:05 [    INFO] catalystcoop.pudl.transform.eia:630 Inferring complete EIA boiler-generator associations.


Inferring complete EIA boiler-generator associations.




Multiple EIA unit codes:plant_id_eia=10725, unit_id_pudl=1, unit_id_eia=['F801' 'F802']




Multiple EIA unit codes:plant_id_eia=55309, unit_id_pudl=1, unit_id_eia=['SMR2' 'SMR1']




Multiple EIA unit codes:plant_id_eia=57794, unit_id_pudl=1, unit_id_eia=['CC01' 'CC02']




Multiple EIA unit codes:plant_id_eia=60786, unit_id_pudl=1, unit_id_eia=['4343' '4141']


2023-03-03 15:48:10 [    INFO] catalystcoop.pudl.transform.eia:1076 filled 2 balancing authority codes using names.


filled 2 balancing authority codes using names.


2023-03-03 15:48:11 [    INFO] catalystcoop.pudl.metadata.classes:1673 Recoding boilers_entity_eia.prime_mover_code


Recoding boilers_entity_eia.prime_mover_code


2023-03-03 15:48:11 -0900 - dagster - DEBUG - eia_harvested_assets_job - 70597524-4e93-42e1-ab40-95e562399209 - 70373 - eia_transform - STEP_OUTPUT - Yielded output "boiler_fuel_eia923" of type "Any". (Type check passed).
2023-03-03 15:48:11 -0900 - dagster - DEBUG - eia_harvested_assets_job - 70597524-4e93-42e1-ab40-95e562399209 - eia_transform - Writing file at: /var/folders/ts/zf71sqq50nx4d41fy5xtsxhw0000gn/T/tmptiaapj13/storage/boiler_fuel_eia923
2023-03-03 15:48:11 -0900 - dagster - DEBUG - eia_harvested_assets_job - 70597524-4e93-42e1-ab40-95e562399209 - 70373 - eia_transform - ASSET_MATERIALIZATION - Materialized value boiler_fuel_eia923.
2023-03-03 15:48:11 -0900 - dagster - DEBUG - eia_harvested_assets_job - 70597524-4e93-42e1-ab40-95e562399209 - 70373 - eia_transform - HANDLED_OUTPUT - Handled output "boiler_fuel_eia923" using IO manager "pudl_sqlite_io_manager"
2023-03-03 15:48:11 -0900 - dagster - DEBUG - eia_harvested_assets_job - 70597524-4e93-42e1-ab40-95e562399209 - 703

CPU times: user 35.9 s, sys: 556 ms, total: 36.5 s
Wall time: 36.8 s


In [18]:
get_asset_group_keys("eia_harvested_assets")

['utilities_eia860',
 'fuel_receipts_costs_eia923',
 'generation_fuel_nuclear_eia923',
 'boiler_generator_assn_eia860',
 'ownership_eia860',
 'generation_fuel_eia923',
 'generators_eia860',
 'plants_eia860',
 'generation_eia923',
 'boiler_fuel_eia923',
 'coalmine_eia923',
 'generators_entity_eia',
 'boilers_entity_eia',
 'plants_entity_eia',
 'utilities_entity_eia']

In [19]:
asset_key = "boiler_generator_assn_eia860"
eia_harvested_assets_job_result.asset_value(asset_key).head()

Unnamed: 0,plant_id_eia,report_date,generator_id,boiler_id,unit_id_eia,bga_source,boiler_generator_assn_type_code,steam_plant_type_code,data_maturity,unit_id_pudl
0,3,2021-01-01,1,1,,eia860_org,,1,final,1
1,3,2021-01-01,2,2,,eia860_org,,1,final,2
2,3,2021-01-01,4,4,,eia860_org,,1,final,3
3,3,2021-01-01,5,5,,eia860_org,,1,final,4
5,3,2021-01-01,A1ST,6B,G521,eia860_org,,1,final,6
