# Working with the EIA Extract / Transform
This notebooks allows you to inspect the extract and transform dagster asset dataframes for the EIA 860 and 923 datasets, to make it easier to test and add new years of data, or new tables from the various spreadsheets that haven't been integrated yet.

**Note: This notebook does not rerun the ETL steps. It just loads the dataframes returned by an asset of the most recent dagster run.** To debug the EIA ETL:

    1. Materialize all EIA assets in dagit.
    2. Load and inspect the dataframe for an asset of interest in this notebook.
    3. Make some code changes to that asset.
    4. Rematerialize the asset in dagit. No need to rematerialize assets that you didn't update.
    5. Load and inspect the dataframe for the the asset of interest.
    6. Repeat steps 3 - 5 until the ETL works!

Some assets are written to the database in which case you can just pull the tables into pandas or explore them in the database. However, many assets use the default IO Manager which writes asset values to the `$DAGSTER_HOME/storage/` directory as pickle files. Dagster provides a method for inspecting asset values no matter what IO Manager the asset uses.

In [None]:
import os

assert os.environ.get("DAGSTER_HOME"), (
    "The DAGSTER_HOME env var is not set so dagster won't be able to find the assets."
    "Set the DAGSTER_HOME env var in this notebook or kill the jupyter server and set"
    " the DAGSTER_HOME env var in your shell and relaunch jupyter."
)

In [None]:
%load_ext autoreload
%autoreload 3
import logging
import sys
from pathlib import Path

import pandas as pd

import pudl

pd.options.display.max_columns = None

In [None]:
logger = logging.getLogger()
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter("%(message)s")
handler.setFormatter(formatter)
logger.handlers = [handler]

In [None]:
from dagster import AssetKey, AssetSelection

import pudl
from pudl.etl import default_assets, defs
from pudl.helpers import get_asset_group_keys
from pudl.resources import dataset_settings

# EIA-860

## Inspect the raw EIA-860 / EIA-860m tables

In [None]:
get_asset_group_keys("raw_eia860", default_assets)

In [None]:
%%time
asset_key = "raw_generator_retired_eia860"
df = defs.load_asset_value(AssetKey(asset_key))

df.head()

## Inspect the clean pre-harvested EIA-860 / EIA-860m tables

In [None]:
%%time
get_asset_group_keys("clean_eia860", default_assets)

In [None]:
%%time
asset_key = "clean_generators_eia860"
df = defs.load_asset_value(AssetKey(asset_key))

df.head()

# EIA-923

## Inspect the raw EIA-923 tables

In [None]:
get_asset_group_keys("raw_eia923", default_assets)

In [None]:
%%time
asset_key = "raw_generator_eia923"
df = defs.load_asset_value(AssetKey(asset_key))

df.head()

## Inspect the clean pre-harvested EIA-923 tables

In [None]:
get_asset_group_keys("clean_eia923", default_assets)

In [None]:
%%time
asset_key = "clean_generation_eia923"
df = defs.load_asset_value(AssetKey(asset_key))

df.head()

## Inspect the final harvested EIA tables

In [None]:
get_asset_group_keys("norm_eia", default_assets)

In [None]:
%%time
asset_key = "fuel_receipts_costs_eia923"
df = defs.load_asset_value(AssetKey(asset_key))

df.head()