# Introducing: A first step toward SEC 10-K integration

## TODO:

* summarize the implications of the below.
* pull in additional narratives from sec10k-data-review.ipynb

# Subsidiaries case study: Xcel Energy

We can better understand the strengths and weaknesses of the subsidiary links in this dataset if we look at an example.

Xcel Energy is a multi-state utility holding company with several major subsidiary utilities operating in in Colorado, Minnesota, New Mexico, and Wisconsin. We can select some of its parent-subsidiary relationships from 2001, 2012 and 2023 from the PUDL outputs using the following query:

In [None]:
import pandas as pd
xcel_subs = (
    pd.read_parquet(
        "s3://pudl.catalyst.coop/v2025.9.1/out_sec10k__parents_and_subsidiaries.parquet",
        columns=[
            "report_date",
            "parent_company_central_index_key",
            "parent_company_utility_id_eia",
            "parent_company_name",
            "subsidiary_company_id_sec10k",
            "subsidiary_company_central_index_key",
            "subsidiary_company_utility_id_eia",
            "subsidiary_company_name",
            "fraction_owned",
            "filename_sec10k",
        ],
        dtype_backend="pyarrow"
    ).query("parent_company_central_index_key == '0000072903'")
    .query("report_date.dt.year.isin([2001, 2012, 2023])")
)

Looking at this sample of the data:

* We can see that Xcel has successfully been linked to a corresponding EIA Utility ID.

In [None]:
xcel_subs[["parent_company_name","parent_company_central_index_key","parent_company_utility_id_eia"]].drop_duplicates()

* In 2023, 4 of 18 listed Xcel subsidiaries have been associated with their own SEC 10-K filer CIK. In 2001, it was 3 out of 23.


In [None]:
(
    xcel_subs.assign(year=xcel_subs.report_date.dt.year)
    .groupby("year")
    .subsidiary_company_central_index_key
    .agg([
        ("cik_fraction", lambda x: x.notna().mean()),
        ("cik_count", lambda x: x.notna().sum()),
        ("total", lambda x: len(x)),
    ])
    .sort_index(ascending=False)
)

* Across both years, only 1 subsidiary company was linked to its own EIA Utility ID.

In [None]:
(
    xcel_subs.assign(year=xcel_subs.report_date.dt.year)
    .groupby("year")
    .subsidiary_company_utility_id_eia
    .agg([
        ("eia_count", lambda x: x.notna().sum()),
        ("total", lambda x: len(x))
    ])
    .sort_index(ascending=False)
)

* A subsidiary called "Northern States Power Company" was linked to CIKs in Wisconsin and Minnesota in 2023 and 2012, but not in 2001, likely due to differences in how the name was spelled. The 2001 instance in Wisconsin is the only one linked to an EIA utility.

In [None]:
(
    xcel_subs.assign(year=xcel_subs.report_date.dt.year)
    .loc[xcel_subs.subsidiary_company_name.str.startswith("northern states"),
        ["subsidiary_company_central_index_key","subsidiary_company_utility_id_eia","year","subsidiary_company_name","filename_sec10k","subsidiary_company_id_sec10k"]
    ]
    .sort_values(["subsidiary_company_name","year", "subsidiary_company_central_index_key"])
)

* No ownership fraction information was captured for any of the subsidiaries.

In [None]:
(
    xcel_subs.assign(year=xcel_subs.report_date.dt.year)
    .groupby("year")
    .fraction_owned
    .agg([
        ("with_ownership", lambda x: x.notna().sum()),
        ("total", lambda x: len(x))
    ])
    .sort_index(ascending=False)
)