# EIA 860M Update Inspection

To run this notebook, you need to refresh the changelog data first, by updating parameters:
- In `.env`, change `PUDL_VERSION` to the latest found [here](https://github.com/catalyst-cooperative/pudl/releases)
- In `src/dbcp/constants.py`, set `PUDL_LATEST_YEAR` to the latest year for which PUDL has complete data.

and running `make all`.

In [4]:
import pandas as pd

In [5]:
LATEST_MONTH = "2025-04-01"
PREVIOUS_MONTH = "2025-03-01"
LATEST_QUARTER = "2025-04-01"
PREVIOUS_QUARTER = "2025-01-01"

PATH = "../../../data/output"

In [6]:
changelog = pd.read_parquet(f"{PATH}/data_warehouse/pudl_eia860m_changelog.parquet")
changelog.report_date.max()

Timestamp('2025-04-01 00:00:00')

In [8]:
status_codes = pd.read_parquet(f"{PATH}/data_mart/projects_status_codes_eia860m.parquet")

In [58]:
def get_status_at_cutoff_date(df, cutoff_date=None):
    """Return data current to a given cutoff date, or return the most recent state if none given."""
    filtered = df[df.report_date < cutoff_date] if cutoff_date else df
    filtered = filtered.loc[filtered.groupby(["generator_id", "plant_id_eia"])['report_date'].idxmax()]
    assert ~filtered.duplicated(subset=["generator_id", "plant_id_eia"]).any()
    return filtered


def capacity_mw_by_status(df):
    return df.groupby("operational_status_code").capacity_mw.sum()


def pct_change_mw_by_status(df, start_date, end_date):
    recent_quarter = get_status_at_cutoff_date(df, end_date)
    recent_quarter_mw_by_status = capacity_mw_by_status(recent_quarter)

    previous_quarter = get_status_at_cutoff_date(df, start_date)
    
    previous_quarter_mw_by_status = capacity_mw_by_status(previous_quarter)

    return ((recent_quarter_mw_by_status - previous_quarter_mw_by_status) / previous_quarter_mw_by_status) * 100
    

How many generators had a status change in this quarter update? We shouldn't expect that many generators to have status changes.

Merge the quarters together using the generator ID. Each quarter should only have one record for each generator so the merge should be one to one.


In [None]:
previous_quarter = get_status_at_cutoff_date(changelog, PREVIOUS_QUARTER)
current_quarter = get_status_at_cutoff_date(changelog, LATEST_QUARTER)

In [11]:
previous_quarter.report_date.max(), current_quarter.report_date.max()

(Timestamp('2024-12-01 00:00:00'), Timestamp('2025-03-01 00:00:00'))

In [12]:
merged_quarters = previous_quarter.merge(current_quarter, on=["generator_id", "plant_id_eia"], validate="1:1", suffixes=("_previous", "_current"))

different_status_codes = merged_quarters["operational_status_code_previous"].ne(merged_quarters["operational_status_code_current"])
different_status_codes.value_counts()


False    36360
True       629
dtype: Int64

For the generators have have a different status in the new update, check to see if the status change makes sense: ("Operational to Retired", "Under Construction to Operational", etc). A highlevel check to make sure the status changes make sense is to see if the status code numbers stay the same or increase. Higher number operational codes represent more advanced stages in a generator's life cycle.

In [13]:
new_status_code_is_greater = merged_quarters["operational_status_code_previous"].le(merged_quarters["operational_status_code_current"])

new_status_code_is_greater.value_counts()

True     36974
False       15
dtype: Int64

In [14]:
merged_quarters[~new_status_code_is_greater][["raw_operational_status_code_previous", "raw_operational_status_code_current"]]

Unnamed: 0,raw_operational_status_code_previous,raw_operational_status_code_current
703,RE,OA
2085,RE,OS
2207,RE,OS
7483,RE,OS
16781,L,P
16782,L,P
16826,T,L
20619,RE,OA
21578,L,P
21711,L,P


Looks like there are a handful of generators that came out of retirement. Let dig into the status codes of the generators that have a new status in the udpated data.

In [15]:
pd.set_option('display.max_colwidth', None)

status_codes

Unnamed: 0,operational_status_code,raw_operational_status_code,description
0,1,P,Planned for installation but regulatory approvals not initiated; Not under construction
1,2,L,Regulatory approvals pending. Not under construction but site preparation could be underway
2,3,T,Regulatory approvals received. Not under construction but site preparation could be underway
3,4,U,"Under construction, less than or equal to 50 percent complete (based on construction time to date of operation)"
4,5,V,"Under construction, more than 50 percent complete (based on construction time to date of operation)"
5,6,TS,"Construction complete, but not yet in commercial operation"
6,7,"OA, OP, OS, SB",Various operational categories
7,8,RE,Retired
8,98,IP,"Planned new generator canceled, indefinitely postponed, or no longer in resource plan"
9,99,OT,Other


In [16]:
merged_quarters[different_status_codes][
    ["raw_operational_status_code_previous", 
     "raw_operational_status_code_current"]
     ].value_counts()

raw_operational_status_code_previous  raw_operational_status_code_current
OP                                    RE                                     173
V                                     OP                                      82
U                                     V                                       82
TS                                    OP                                      43
SB                                    RE                                      39
OS                                    RE                                      31
V                                     TS                                      26
U                                     OP                                      24
P                                     U                                       17
                                      L                                       16
T                                     U                                       14
L                                  

Look at capacity change for each status code.

In [None]:
pct_change_mw_by_status(changelog, PREVIOUS_QUARTER, LATEST_QUARTER)

operational_status_code
1     10.151463
2      4.898029
3      2.522753
4     18.900128
5     12.782260
6     14.288642
7      0.831132
8      0.517567
99    36.065070
Name: capacity_mw, dtype: float64

## Capacity by status by ISO

In [18]:
ISO_REGIONS = ("MISO", "PJM", "CISO", "ERCO", "ISNE", "NYIS", "SWPP") 

In [None]:
merged_quarters.query(
    "balancing_authority_code_eia_previous != balancing_authority_code_eia_current")[
        ["balancing_authority_code_eia_previous","balancing_authority_code_eia_current"]]

Unnamed: 0,balancing_authority_code_eia_previous,balancing_authority_code_eia_current
1954,FPL,SOCO
7308,FPL,SOCO
9826,FPL,SOCO
16822,WACM,PSCO
18322,WALC,BANC
18347,WALC,BANC
18438,LGEE,PJM
18565,LGEE,PJM
18645,LGEE,PJM
20491,FPL,FMPP


In [None]:
eia860_isos = changelog[changelog.balancing_authority_code_eia.isin(ISO_REGIONS)]


for region in ISO_REGIONS:
    print(region)
    capacity_by_status_prev = pd.DataFrame(
        previous_quarter[previous_quarter["balancing_authority_code_eia"] == region]
        .groupby("operational_status_code")
        .capacity_mw.sum().rename("capacity_mw_previous")
        )
    capacity_by_status_current = pd.DataFrame(
        current_quarter[current_quarter["balancing_authority_code_eia"] == region]
        .groupby("operational_status_code")
        .capacity_mw.sum().rename("capacity_mw_current")
    )
    
    pct_change = pd.DataFrame(pct_change_mw_by_status(
        eia860_isos[eia860_isos["balancing_authority_code_eia"] == region],
        PREVIOUS_QUARTER, 
        LATEST_QUARTER
    ).rename("pct_change")
    )
    capacity_by_status = capacity_by_status_prev.join(
        capacity_by_status_current).join(pct_change)
    print(capacity_by_status)
    print()

MISO
                         capacity_mw_previous  capacity_mw_current  pct_change
operational_status_code                                                       
1                                10911.800041         10823.500042   -0.809216
2                                 8168.399993          7956.399993   -2.594637
3                                 5156.299995          4746.999996   -7.934785
4                                 4090.599997          5010.499998   22.470016
5                                 4040.100031          4523.800036   11.946750
6                                  959.900000           583.000000  -39.186941
7                               204069.100216        206688.000199    1.272182
8                                49789.699946         49990.999951    0.377111

PJM
                         capacity_mw_previous  capacity_mw_current  pct_change
operational_status_code                                                       
1                                15981.499

The larger percentage increases in capacity are places where the capacity was small to begin with, so the abolute changes there are not significant.

### Dig into significant changes
#### MISO decrease in capacity under construction (status code 6) of ~40%

In [37]:
miso_prev = previous_quarter.query("balancing_authority_code_eia == 'MISO'")
miso_curr = current_quarter.query("balancing_authority_code_eia == 'MISO'")

merged = pd.merge(
    miso_prev, 
    miso_curr, 
    on=["generator_id", "plant_id_eia", "plant_name_eia", "iso_region"],
    how="outer",
    suffixes=("_previous", "_current"))

Plants previously under construction were all made operational in the new data

In [51]:
merged.query("operational_status_code_previous == 6 and operational_status_code_current != 6")[
    ["plant_name_eia", "generator_id", 
     "utility_name_eia_previous", "operational_status_code_previous", "operational_status_code_current",
      "capacity_mw_previous"]].sort_values(
        by="capacity_mw_previous", ascending=False)

Unnamed: 0,plant_name_eia,generator_id,utility_name_eia_previous,operational_status_code_previous,operational_status_code_current,capacity_mw_previous
3317,Pike County Energy Storage,BAT2,AES Indiana,6,7,200.0
3781,Eldorado Solar I,ELD15,Sol Systems,6,7,150.0
5567,Sauk Solar Park,SAUSP,DTE Electric Company,6,7,150.0
5322,"Salt Creek Township Solar, LLC",PV,"Birch Creek Power, LLC",6,7,50.0
5652,Slocum Energy Storage,SLOES,DTE Electric Company,6,7,14.0
752,Strix Solar,1,Madison Gas & Electric Co,6,7,6.0
5248,Oster Sun CSG,OSTRS,SunShare Management,6,7,1.0
5479,Quarry Sun CSG,QURYS,SunShare Management,6,7,1.0
3397,Buffalo Sun CSG,BUFFS,SunShare Management,6,7,0.9


Some plants that were removed from the MISO data were previously operational

In [53]:
miso_removed_plants = merged.query("report_date_current.isnull()")[
    ["plant_name_eia", "generator_id", "utility_name_eia_previous", "operational_status_code_previous",
      "capacity_mw_previous"]].sort_values(
        by="capacity_mw_previous", ascending=False)

miso_removed_plants.head(10)

Unnamed: 0,plant_name_eia,generator_id,utility_name_eia_previous,operational_status_code_previous,capacity_mw_previous
5423,Duane Arnold Solar II (150MW),PV1,Interstate Power and Light Co,7,150.0
5242,OE_MS4,OEMS4,OE_MS4,3,96.0
5421,Duane Arnold Solar I (50 MW),PV1,Interstate Power and Light Co,7,50.0
5876,International Paper - Orange,TG,International Paper - Orange,7,48.0
4617,Gramercy Holdings LLC,GT4,Gramercy Holdings I LLC,7,24.700001
5750,Gramercy Holdings LLC,ST2,Gramercy Holdings I LLC,7,18.700001
5728,Gramercy Holdings LLC,ST1,Gramercy Holdings I LLC,7,18.700001
4558,Gramercy Holdings LLC,GT1,Gramercy Holdings I LLC,8,16.0
4590,Gramercy Holdings LLC,GT2,Gramercy Holdings I LLC,8,16.0
4604,Gramercy Holdings LLC,GT3,Gramercy Holdings I LLC,7,16.0


New plants were added in MISO data, mostly under construction

In [54]:
miso_added_plants = merged.query("report_date_previous.isnull()")[
    ["plant_name_eia", "generator_id", "utility_name_eia_current",
      "operational_status_code_current", 
     "capacity_mw_current"]
    ].sort_values(
        by="capacity_mw_current", ascending=False)

miso_added_plants.head(15)

Unnamed: 0,plant_name_eia,generator_id,utility_name_eia_current,operational_status_code_current,capacity_mw_current
6251,Chalk Bluff Solar,CHALK,Chalk Bluff Solar Energy LLC,3,450.0
6293,Sunfish Solar 2,SS2,Consumers Energy Co - (MI),4,360.0
6289,Sherco Solar III,SHS03,Northern States Power Co - Minnesota,4,250.0
6253,Dolet Hills Solar,DHS1,Cleco Power LLC,3,240.0
6288,Sherco Solar II,SHS02,Northern States Power Co - Minnesota,5,237.100006
6273,Liberty Solar Energy Center,LSEC,Consumers Energy Co - (MI),4,220.0
6272,Lotus Wind,LOTUS,"Lotus Wind, LLC",1,200.0
6268,Huck Finn Renewable Energy Center,HFREC,Union Electric Co - (MO),7,200.0
6258,Greer Solar Plant,GREER,"Greer Solar, LLC",1,200.0
6276,Wheatland,OEIN1,OE_IN1,4,150.0


# Month-over-month changes

Since we are now making monthly updates, we may also want to compare the last few months.

In [63]:
february = get_status_at_cutoff_date(changelog, "2025-03-01")
march = get_status_at_cutoff_date(changelog, "2025-04-01")
april = get_status_at_cutoff_date(changelog, "2025-05-01")

february.report_date.max(), march.report_date.max(), april.report_date.max()

(Timestamp('2025-02-01 00:00:00'),
 Timestamp('2025-03-01 00:00:00'),
 Timestamp('2025-04-01 00:00:00'))

# Downstream Tables

We have separate tables tracking monthly / quarterly / yearly project status specifically.

In [55]:
project_status_monthly = pd.read_parquet(
    f"{PATH}/data_mart/projects_status_monthly_eia860m.parquet")
project_status_quarterly = pd.read_parquet(
    f"{PATH}/data_mart/projects_status_quarterly_eia860m.parquet")
project_status_yearly = pd.read_parquet(
    f"{PATH}/data_mart/projects_status_yearly_eia860m.parquet")