In [None]:
%load_ext autoreload
%autoreload 3

# Purpose
We are in the process of converting some functions in `pudl.output` to be SQL views. This notebook allows us to compare the outputs of the old python functions with the SQL view.

In [None]:
import os

assert os.environ.get("DAGSTER_HOME"), (
    "The DAGSTER_HOME env var is not set so dagster won't be able to find the assets."
    "Set the DAGSTER_HOME env var in this notebook or kill the jupyter server and set"
    " the DAGSTER_HOME env var in your terminal and relaunch jupyter."
)

## Step 1: Create a new asset
Create a new asset in the same module of the existing output table function. Most output tables are just denormalied versions of the normalized tables so to differentiate them, name the asset `"denorm_{output_table_name}"`. For example, if you are converting the `pudl.output.eia860.utilities_eia860()` function, name the asset `denorm_utilities_eia860`. **Don't delete the old oldput table function! We need it later on to test to new asset.**

You can create an asset by creating a new function and adding the `@asset` decorator. For now, the only attribute you should add to the decorator is the `compute_type = "Python"`. All this does is add a cute tag to the asset in the dag to let people know how the asset is being processed.

Next you'll want to figure out what tables the output table depends on. Read through the old output function to see which normalized tables or output functions are being used as inputs to the joins and imputations. Once you have the input table names, add them to the asset function parameters. For example, the `utilities_eia860()` function merges `core_eia__entity_utilities`, `core_eia860__scd_utilities`, and `core_pudl__assn_eia_pudl_utilities` tables together so the asset would look like this:

```python
@asset(compute_kind="Python")
def denorm_utilities_eia860(
    core_eia__entity_utilities: pd.DataFrame,
    core_eia860__scd_utilities: pd.DataFrame,
    core_pudl__assn_eia_pudl_utilities: pd.DataFrame,
):
    ... # joining logic
    return joined_df
```

Dagster will automatically place the `denorm_utilities_eia860` asset downstream of its input assets. **If the old output function depends on an output table function that hasn't been converted to an asset, you'll need to convert that function to an asset first**.

Once the asset has been created and the joining logic is copied over, reload the asset definitions in dagit and materialize the new output table asset. If the asset is succesfully materialized, it won't be present in the database yet. If you don't specify an `io_manager_key` in the asset decorator, the default io manager is used which writes the dataframe to a pickle file in your `DAGSTER_HOME` directory.

## Step 2: Create the metadata
Like the normalized tables, we need to keep track of output table's metadata so the dtypes can be preserved as the table moves between pandas and storage, in this case SQLite. To get a list of field names, load the value of the asset you just created:

In [None]:
from dagster import AssetKey

from pudl.etl import defs

asset_name = "denorm_generators_eia"
df = defs.load_asset_value(AssetKey(asset_name))
df.columns.to_list()

Once you have the field names, find the appropriate module in `pudl.metadata.resources` to add the metadata too. The metadata of an output table should live in the module of the data source. For example, the `denorm_utilities_eia860` merges eia860 data together so the metadata should live in `pudl.metadata.resources.eia860`. Set `"etl_group"` of the resource to `"outputs"`.

Most output tables just join existing fields together, but some add new fields. If the output table create a new field, you'll need to it to the `pudl.metadata.fields` module.

Once the metadata is created, add `io_manager_key="pudl_sqlite_io_manager"` to the asset decorator. This tells the asset to load the returned dataframe to the database instead of a pickle file. **Don't forget this step! If you don't change the `io_manager_key` the table will not be loaded to the database!** Example:

```python
@asset(io_manager_key="pudl_sqlite_io_manager", compute_kind="Python")
def denorm_utilities_eia860(
    core_eia__entity_utilities: pd.DataFrame,
    core_eia860__scd_utilities: pd.DataFrame,
    core_pudl__assn_eia_pudl_utilities: pd.DataFrame,
):
    ... # joining logic
    return joined_df
```
To quickly check for any issues with the new metadata, go to the Deployments tab in Dagster and click "Reload" on the `pudl.etl` module to reload the updated code.

Once the metadata is created, you'll need to delete your `pudl.sqlite` file so the next ETL run can create the new database schema. Then rematerialize all of the assets. If the database flags any data integrity errors in the output table, you can adjust the code in the output asset and just rematerialize the asset to test it out. If you need to update the table metadata, you'll need to delete the `pudl.sqlite` database and rematerialize all of the assets.

## Step 3: Test the output table
Once the output table is comfortably loaded into the database it is time to compare it to the old output function to make sure the data hasn't changed.

Load the asset value from the database:

In [None]:
from pudl.etl import defs
asset_name = "denorm_generators_eia"
new_df = defs.load_asset_value(AssetKey(asset_name))

Create the old output table by calling the old output function:

In [None]:
# Import the old python functions
import pudl
from pudl.io_managers import pudl_sqlite_io_manager

engine = pudl_sqlite_io_manager(None).engine
pudl_out = pudl.output.pudltabl.PudlTabl(engine)
old_df = pudl_out.gens_eia860()

Align the dataframe columns and index then compare the dataframes:

In [None]:
import pandas as pd

key = list(old_df.columns)
old_df = old_df.sort_values(by=key).reset_index(drop=True)
new_df = new_df.sort_values(by=key).reset_index(drop=True)

# Convert to use same schema
plant_schema = pudl.metadata.classes.Package.from_resource_ids().get_resource("denorm_generators_eia")
old_df_schema = plant_schema.enforce_schema(old_df)

pd.testing.assert_frame_equal(old_df_schema, new_df)

# Step 4: Update the `PudlTabl` class
Wahoo! The output table asset has been created, added to the database and tested against the old function. Now you should:
1. Add a deprecation warning to the old output table function. We will remove these functions once all of the output tables have been converted to assets.
2. Add the table to the `table_method_map` in the `PudlTabl._register_output_methods`. Generally, this will just look like:

```python
table_method_map = {
    "table_name": "table_name",
    ...
}
```

In some cases there might be a legacy method for getting the table that uses an abbreviation of the table name in the method name. To preserve the
existing API, you should instead map the table name to the legacy method name:

```python
table_method_map = {
    "table_name": "legacy_method_name",
    ...
}
```

3. Delete old `PudlTabl` method.

All done!