Validate and save csv of all 1:m FERC-EIA matches #2516

e-belfer · 2023-04-10T18:27:48Z

Background
The Plant Part List (PPL) shows EIA generators at a variety of useful aggregations.

Part of the role of the PPL is to help connect EIA records with FERC records that might be reported at different aggregations. Most of the time, pre-defined aggregations such as plant, unit, primemover, etc. are good enough, but occasionally there are instances where FERC records are comprised of a unique aggregation of EIA records not currently represented in the PPL.

This becomes apparent when you're going through the programmatically generated overrides sheet for a given utility subset and you notice that a FERC does not match any one of the existing PPL records, but rather a combination of two or more. When this is the case, we need a way to fabricate a new PPL record that corresponds to that FERC record so we can more accurately map the two datasets.

The code reuses the existing validation infrastructure with different inputs, so additional new tests are not needed.

Scope of PR

This PR addresses the first part of issue #1555.

Add the functionality of generating and validating a dataframe containing all 1:m FERC-EIA in ferc_eia_train.py
Update the existing validation notebook to add a step to generate/validate 1:m matches
Add 1:m matches for all already-integrated csvs
Once all data is updated, re-run with updated data
Once Add 1:m matches into plant_parts_eia #2429 integrated, null any problematic 1:m matches

PR Checklist

Merge the most recent version of the branch you are merging into (probably dev).
All CI checks are passing. Run tests locally to debug failures
Make sure you've included good docstrings.
For major data coverage & analysis changes, run data validation tests
Include unit tests for new functions and classes.
Defensive data quality/sanity checks in analyses & data processing functions.
Update the release notes and reference reference the PR and related issues.
Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

review-notebook-app · 2023-04-10T18:27:53Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

e-belfer · 2023-04-10T18:42:43Z

While validating, made one minor change to Xcel data.
XCEL:
FERC ID f1_steam_2011_12_166_2_5 was matched to 2469_2010_plant_total_17718 which is duplicated and the wrong year. 2469_2011_plant_total_17718 does not exist, so this secondary match was removed.

…nges in process

codecov · 2023-04-17T16:15:10Z

Codecov Report

Patch coverage: 7.1% and project coverage change: -0.2 ⚠️

Comparison is base (81d815d) 86.9% compared to head (915ddf7) 86.7%.

Additional details and impacted files

@@           Coverage Diff           @@
##             dev   #2516     +/-   ##
=======================================
- Coverage   86.9%   86.7%   -0.2%     
=======================================
  Files         86      86             
  Lines       9661    9688     +27     
=======================================
+ Hits        8400    8406      +6     
- Misses      1261    1282     +21

Impacted Files	Coverage Δ
src/pudl/analysis/ferc1_eia_train.py	`53.2% <7.1%> (-8.9%)`	⬇️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

aesharpe · 2023-04-17T22:26:55Z

While validating, made one minor change to Xcel data.
XCEL:
FERC ID f1_steam_2011_12_166_2_5 was matched to 2469_2010_plant_total_17718 which is duplicated and the wrong year. 2469_2011_plant_total_17718 does not exist, so this secondary match was removed.

This makes me feel like we should have a test for the training data. We test the validation functions that input new training data, but we don't check that the training data itself hasn't been accidentally altered. But this is outside the scope of this issue.

aesharpe

Looks good! My one main comment is about the one_to_many=False param and whether or not it should be a param.

aesharpe · 2023-04-17T22:34:01Z

src/pudl/analysis/ferc1_eia_train.py

+        pd.DataFrame: A dataframe of 1:m matches formatted to fit into the existing
+            validation framework.
+    """
+    multi_match_cols = [f"record_id_eia_override_{i}" for i in range(2, 4)]


There's a small but non-zero possibility that this could be more than 4.

Updated to handle all possible #s

src/pudl/analysis/ferc1_eia_train.py

aesharpe · 2023-04-17T22:53:56Z

src/pudl/analysis/ferc1_eia_train.py

 def validate_and_add_to_training(
    utils_eia860,
    ppe,
    ferc1_eia,
    input_dir_path,
    expect_override_overrides=False,
    allow_mismatched_utilities=True,
+    one_to_many=False,


I almost feel like this should not be a param. Either that, or it should be set to True by default. Otherwise if you forget to switch it to True it will map all one-to-many values to whatever is in record_id_eia_override_1 which we don't want. I can't think of an occasion where we wouldn't want to do this if there were in fact one-to-many matches in a batch. Maybe we could add a conditional to check the data for one_to_many values with something like if record_id_eia_override_2.notnull().any() and then execute the one_to_many validation code instead of having it as a parameter?

I'm going to leave this as a parameter with the default set to True. I agree that it should be the default behavior, but I'd like to leave this functionality available to toggle given that it's an additional layer of abstraction added onto the existing plant part infrastructure.

Ok that makes sense. I wonder if we should null out the EIA matches for FERC records if one_to_many = False so that they aren't mapped rather than being mapped to just one of their EIA subcomponents.

aesharpe · 2023-04-17T22:58:17Z

src/pudl/package_data/glue/ferc1_eia_train.csv

@@ -1,213 +1,6 @@
 record_id_eia,record_id_ferc1,signature_1,signature_2,notes


Just checking, but all the deleted records here are one-to-many records that were matched to whatever was in record_id_eia_override_1 and all the added records here are the same record_id_ferc1 but mapped to the new PPE one_to_many record you created?

Hm, no. The 1:m process does not overwrite this CSV, but merely updates the records internally. Any changes here are a result of running the validation process with the complete set of newly integrated data. If this shouldn't be included in this PR, I can remove it (though I'm not sure why the result of this would be different than what's currently in dev).

That's what I'm curious about. It might be a good idea to compare this file to the one in dev to see what's going on. I don't see why there would be any changes to this file if you're not updating it with 1:m. One thought is that the new training data (created when you run the validate and add to training function) might count as overrides even if it's already in the old training data. i.e., it replaces current data with the same values instead of only replacing values that are different.

Some look exactly the same, some have different signatures, and some only appear in the new data. Which suggests to me that the source data from your PR and this one aren't actually the same.

e-belfer · 2023-04-18T13:39:38Z

While validating, made one minor change to Xcel data.
XCEL:
FERC ID f1_steam_2011_12_166_2_5 was matched to 2469_2010_plant_total_17718 which is duplicated and the wrong year. 2469_2011_plant_total_17718 does not exist, so this secondary match was removed.

This makes me feel like we should have a test for the training data. We test the validation functions that input new training data, but we don't check that the training data itself hasn't been accidentally altered. But this is outside the scope of this issue.

I'm not sure I'm fully understanding this comment. So far we've only validated the first column of matches. This PR just extends the same checks to the other columns, which is how this issue initially was flagged and I knew to address it. What is the additional check you are proposing?

…e_to_many to True by default

aesharpe · 2023-04-18T16:48:49Z

While validating, made one minor change to Xcel data.
XCEL:
FERC ID f1_steam_2011_12_166_2_5 was matched to 2469_2010_plant_total_17718 which is duplicated and the wrong year. 2469_2011_plant_total_17718 does not exist, so this secondary match was removed.

This makes me feel like we should have a test for the training data. We test the validation functions that input new training data, but we don't check that the training data itself hasn't been accidentally altered. But this is outside the scope of this issue.

I'm not sure I'm fully understanding this comment. So far we've only validated the first column of matches. This PR just extends the same checks to the other columns, which is how this issue initially was flagged and I knew to address it. What is the additional check you are proposing?

I wouldn't worry about it because it's definitely outside the scope of this issue. I was just thinking that if you found bad matches already in the training data that's because we don't run any of our validation functions on the training data itself, just the proposed additions to the training data. So if something were to accidentally slip through unvalidated or get altered in the training data csv it would go undetected.

I may have read this wrong too - idk if those matches you found were already in the training data or they were from record_id_eia_override_2, but either way it made me think that we aren't testing the existing training data.

aesharpe · 2023-04-19T18:02:45Z

Ok I went through the values in train_ferc1_eia.csv that are different from those in the training data (below) and it looks like most of them are conversions from plant_owned to plant_total. I think there was something wrong with the override functionality when we were adding in each of these spreadsheets that got fixed late in the game. Essentially, the records that look like they are getting replaced now are records that were already in the training data that never actually got overridden.

In the "Files Changed" diff it says there are thousands of changes -- this is because the override capability replaces matches even if they are the same as what is already in the training data. All of these spreadsheets have already been validated and run through the training data, so it makes sense that there are tons of overrides. Some of them (below) are actual overrides with different values, but most of them are the same thing as what's already in the training. We should probably create a flag in the code that only "overrides" values for which the record_id_eia and record_id_ferc1 pair is different than what's in the training data, but I'm not sure if we want to spend time on that right now. @cmgosnell? Not a huge fix, but also not essential.

Otherwise, I think we can just keep these changes as part of this PR.

record_id_eia_y comes from the current training data and record_id_eia_override_1 comes from the override spreadsheets.

VALIDATING Evergy_FERC1-EIA_overrides.xlsx ************** 
***
FERC records already in training data: 121
FERC-EIA override dupes: 84
FERC-EIA overrides diffs: 37
                                record_id_eia_y                    record_id_eia_override_1
4                   2090_2018_plant_owned_56211                 2090_2018_plant_total_56211
6                  56819_2018_plant_owned_22500                56819_2018_plant_total_22500
8     1252_1_2018_plant_gen_owned_22500_retired                 1252_2018_plant_total_22500
10                  2092_2018_plant_owned_56211                 2092_2018_plant_total_56211
12                 56818_2018_plant_owned_22500                56818_2018_plant_total_22500
14                  7928_2018_plant_owned_10000                 7928_2018_plant_total_10000
15            7928_1_2019_plant_gen_total_10000                 7928_2019_plant_total_10000
16   2098_GT_2018_plant_prime_mover_owned_56211  2098_gt_2018_plant_prime_mover_total_56211
18            2098_1_2018_plant_gen_owned_56211          2098_1_2018_plant_unit_total_56211
19   2098_st_2019_plant_prime_mover_total_56211          2098_1_2019_plant_unit_total_56211
24   2079_gt_2010_plant_prime_mover_owned_10000  2079_gt_2010_plant_prime_mover_total_10000
25   2079_GT_2018_plant_prime_mover_owned_10000  2079_gt_2018_plant_prime_mover_total_10000
27           7296_1_2018_plant_unit_owned_22500                 7296_2018_plant_owned_22500
29                  1242_2018_plant_owned_10005                 1242_2018_plant_total_10005
30                 60689_2018_plant_owned_22500                60689_2018_plant_total_22500
32            6074_1_2018_plant_gen_owned_56211  6074_gt_2018_plant_prime_mover_total_56211
34   1248_IC_2018_plant_prime_mover_owned_22500  1248_ic_2018_plant_prime_mover_total_22500
37           2079_2_2018_plant_unit_owned_10000          2079_2_2018_plant_unit_total_10000
42            1241_2_2008_plant_gen_owned_10005          1241_2_2008_plant_unit_owned_10005
43            1241_1_2008_plant_gen_total_10005          1241_1_2008_plant_unit_owned_10005
45           1241_1_2009_plant_unit_owned_10000          1241_1_2009_plant_unit_owned_10005
46    2080_1_2018_plant_gen_owned_10000_retired                 2080_2018_plant_total_10000
48                   56151_2006_plant_owned_770                  56151_2006_plant_total_770
49                   56151_2007_plant_owned_770                  56151_2007_plant_total_770
52                 56151_2013_plant_owned_56211                56151_2013_plant_total_56211
54                 56151_2015_plant_owned_56211                56151_2015_plant_total_56211
55                 56151_2017_plant_owned_56211                56151_2017_plant_total_56211
57                  55395_2018_plant_owned_3702                 55395_2018_plant_total_3702
59                  7929_2018_plant_owned_10000                 7929_2018_plant_total_10000
77                  1250_2018_plant_owned_22500                 1250_2018_plant_total_22500
78    2094_1_2018_plant_gen_owned_56211_retired                 2094_2018_plant_total_56211
80   1240_GT_2018_plant_prime_mover_owned_10005  1240_gt_2018_plant_prime_mover_total_10005
81           1240_1_2018_plant_unit_owned_10005  1240_st_2018_plant_prime_mover_total_10005
87            2079_5_2018_plant_gen_total_10000          2079_1_2018_plant_unit_total_10000
97           56502_1_2018_plant_gen_owned_22500                56502_2018_plant_total_22500
114                 1241_2019_plant_owned_10000                 1241_2019_plant_total_10000
115          6068_1_2018_plant_unit_owned_56211                 6068_2018_plant_owned_56211
***
 
VALIDATING BHE_FERC1-EIA_overrides.xlsx ************** 
2023-04-18 17:20:59 [    INFO] catalystcoop.pudl.analysis.ferc1_eia_train:468 Validating overrides
***
FERC records already in training data: 1840
FERC-EIA override dupes: 1840
FERC-EIA overrides diffs: 0
Empty DataFrame
Columns: [record_id_eia_y, record_id_eia_override_1]
Index: []
***
 
VALIDATING Duke_fix_FERC1-EIA_overrides.xlsx ************** 
2023-04-18 17:21:06 [    INFO] catalystcoop.pudl.analysis.ferc1_eia_train:468 Validating overrides
***
FERC records already in training data: 221
FERC-EIA override dupes: 161
FERC-EIA overrides diffs: 60
                                       record_id_eia_y                           record_id_eia_override_1
4            2706_st_2016_plant_prime_mover_owned_3046          2706_st_2016_plant_prime_mover_total_3046
5            2706_gt_2016_plant_prime_mover_owned_3046          2706_gt_2016_plant_prime_mover_total_3046
8                           2706_2019_plant_total_3046  2706_ct_2019_plant_prime_mover_total_3046_prop...
15                           634_2018_plant_total_6455                   634_4_2018_plant_unit_total_6455
16                           634_2019_plant_total_6455                   634_4_2019_plant_unit_total_6455
28           2720_ca_2018_plant_prime_mover_total_5416                         2720_2018_plant_total_5416
31           2720_ca_2019_plant_prime_mover_total_5416                         2720_2019_plant_total_5416
50   2723_gt_2018_plant_prime_mover_owned_5416_retired  2723_gt_2018_plant_prime_mover_total_5416_retired
51   2723_st_2018_plant_prime_mover_owned_5416_retired  2723_st_2018_plant_prime_mover_total_5416_retired
52                  2723_ct8_2018_plant_gen_total_5416                         2723_2018_plant_total_5416
53            2723_6_2019_plant_gen_total_5416_retired  2723_st_2019_plant_prime_mover_total_5416_retired
54                  2723_ct8_2019_plant_gen_total_5416                         2723_2019_plant_total_5416
55                    628_1_2018_plant_unit_total_6455                   628_2_2018_plant_unit_total_6455
56                    628_3_2018_plant_unit_total_6455                   628_4_2018_plant_unit_total_6455
57                    628_3_2019_plant_unit_total_6455           628_st_2019_plant_prime_mover_total_6455
58                    3250_1_2019_plant_gen_total_3046                         3250_2019_plant_total_3046
59                   6046_10_2018_plant_gen_total_6455                         6046_2018_plant_total_6455
60                   6046_10_2019_plant_gen_total_6455                         6046_2019_plant_total_6455
63   1004_st_2018_plant_prime_mover_total_15470_ret...  1004_petroleum_liquids_2018_plant_technology_t...
64   1004_st_2019_plant_prime_mover_total_15470_ret...  1004_petroleum_liquids_2019_plant_technology_t...
65          1004_ca_2019_plant_prime_mover_total_15470                        1004_2019_plant_total_15470
66                  1008_2_2018_plant_unit_total_15470                        1008_2018_plant_total_15470
67                  1008_2_2019_plant_unit_total_15470                        1008_2019_plant_total_15470
70                    3251_2_2016_plant_gen_total_3046                         3251_2016_plant_total_3046
71   3251_conventional_steam_coal_2018_plant_techno...  3251_st_2018_plant_prime_mover_total_3046_retired
73       3251_nuclear_2018_plant_technology_total_3046                         3251_2018_plant_total_3046
74   3251_conventional_steam_coal_2019_plant_techno...  3251_st_2019_plant_prime_mover_total_3046_retired
76       3251_nuclear_2019_plant_technology_total_3046                         3251_2019_plant_total_3046
77                         58215_2016_plant_owned_3046                        58215_2016_plant_total_3046
82           7302_ca_2018_plant_prime_mover_total_6455                         7302_2018_plant_total_6455
83           7302_ca_2019_plant_prime_mover_total_6455                         7302_2019_plant_total_6455
86                          8049_2018_plant_owned_6455                         8049_2018_plant_total_6455
88                   3264_3_2018_plant_unit_total_5416          3264_st_2018_plant_prime_mover_total_5416
89           3264_GT_2018_plant_prime_mover_total_5416          3264_gt_2018_plant_prime_mover_total_5416
92                    3264_7_2019_plant_gen_total_5416          3264_gt_2019_plant_prime_mover_total_5416
96           2713_gt_2012_plant_prime_mover_owned_3046          2713_gt_2012_plant_prime_mover_total_3046
97           2713_gt_2013_plant_prime_mover_owned_3046                        58697_2013_plant_total_3046
98                          2713_2014_plant_owned_3046                        58697_2014_plant_total_3046
99                         58697_2016_plant_owned_3046                        58697_2016_plant_total_3046
100                        58697_2017_plant_owned_3046                        58697_2017_plant_total_3046
109          2713_gt_2009_plant_prime_mover_owned_3046                         6250_2009_plant_owned_3046
110          2713_gt_2010_plant_prime_mover_owned_3046                         6250_2010_plant_owned_3046
111          2713_gt_2011_plant_prime_mover_owned_3046                         6250_2011_plant_owned_3046
128  2732_st_2019_plant_prime_mover_total_5416_retired  2732_gt_2019_plant_prime_mover_total_5416_retired
129  2732_gt_2019_plant_prime_mover_total_5416_retired  2732_st_2019_plant_prime_mover_total_5416_retired
132          2712_st_2018_plant_prime_mover_total_3046                         2712_2018_plant_total_3046
133          2712_st_2019_plant_prime_mover_total_3046                         2712_2019_plant_total_3046
140           638_GT_2018_plant_prime_mover_total_6455                          638_2018_plant_total_6455
141           638_gt_2019_plant_prime_mover_total_6455                          638_2019_plant_total_6455
144          2716_gt_2016_plant_prime_mover_total_3046                         2716_2016_plant_total_3046
145  2716_st_2018_plant_prime_mover_owned_3046_retired                 2716_2018_plant_total_3046_retired
146          2716_gt_2018_plant_prime_mover_total_3046                         2716_2018_plant_total_3046
147  2716_st_2019_plant_prime_mover_owned_3046_retired                 2716_2019_plant_total_3046_retired
148          2716_gt_2019_plant_prime_mover_total_3046                         2716_2019_plant_total_3046
149                1010_2018_plant_total_15470_retired                                                NaN
150          1010_2_2018_plant_gen_total_15470_retired  1010_st_2018_plant_prime_mover_total_15470_ret...
151                1010_2019_plant_total_15470_retired                                                NaN
152          1010_2_2019_plant_gen_total_15470_retired  1010_st_2019_plant_prime_mover_total_15470_ret...
163                   3258_4_2018_plant_gen_total_5416                         3258_2018_plant_total_5416
219                   628_6_2019_plant_unit_owned_6455  628_natural_gas_fired_combined_cycle_2019_plan...
***
 
VALIDATING Southern_FERC1-EIA_overrides.xlsx ************** 
2023-04-18 17:21:13 [    INFO] catalystcoop.pudl.analysis.ferc1_eia_train:468 Validating overrides
***
2023-04-18 17:21:20 [    INFO] catalystcoop.pudl.analysis.ferc1_eia_train:468 Validating overrides
FERC records already in training data: 43
FERC-EIA override dupes: 18
FERC-EIA overrides diffs: 25
                              record_id_eia_y                           record_id_eia_override_1
0               3_1_2018_plant_unit_owned_195              3_st_2018_plant_prime_mover_total_195
1               3_6_2018_plant_unit_owned_195               3_ng_2018_plant_prime_fuel_owned_195
2                   732_2018_plant_owned_7140                          732_2018_plant_total_7140
3            703_1_2018_plant_unit_owned_7140                          703_2018_plant_total_7140
12             26_4_2014_plant_unit_owned_195                     26_4_2014_plant_unit_total_195
17                59863_2018_plant_owned_7140                        59863_2018_plant_total_7140
18                59865_2018_plant_owned_7140                        59865_2018_plant_total_7140
19                     7_2018_plant_owned_195                             7_2018_plant_total_195
20                     8_2018_plant_owned_195                             8_2018_plant_total_195
21     10_gt_2018_plant_prime_mover_owned_195             10_gt_2018_plant_prime_mover_total_195
22                  708_2018_plant_owned_7140                          708_2018_plant_total_7140
26                  6001_2018_plant_owned_195                          6001_2018_plant_total_195
27                59864_2018_plant_owned_7140                        59864_2018_plant_total_7140
28                  7698_2018_plant_owned_195                          7698_2018_plant_total_195
29   710_GT_2018_plant_prime_mover_owned_7140           710_gt_2018_plant_prime_mover_total_7140
30   710_CA_2018_plant_prime_mover_owned_7140  710_natural_gas_fired_combined_cycle_2018_plan...
31                56150_2018_plant_owned_7140                        56150_2018_plant_total_7140
32  6124_GT_2018_plant_prime_mover_owned_7140          6124_gt_2018_plant_prime_mover_total_7140
33   715_GT_2018_plant_prime_mover_owned_7140           715_gt_2018_plant_prime_mover_total_7140
35                  7721_2018_plant_owned_195                          7721_2018_plant_total_195
37  6052_bit_2018_plant_prime_fuel_owned_7140          6052_st_2018_plant_prime_mover_owned_7140
39                 7348_2018_plant_owned_7140                         7348_2018_plant_total_7140
40                  7697_2018_plant_owned_195                          7697_2018_plant_total_195
41  6258_GT_2018_plant_prime_mover_owned_7140                         6258_2018_plant_total_7140
42                  728_2018_plant_owned_7140                          728_2018_plant_total_7140
***
 
VALIDATING Entergy_FERC1-EIA_overrides.xlsx ************** 
***
FERC records already in training data: 404
FERC-EIA override dupes: 402
FERC-EIA overrides diffs: 2
                       record_id_eia_y            record_id_eia_override_1
266  1393_6_2020_plant_gen_total_11241  1393_5_2020_plant_unit_total_11241
399         55380_2018_plant_owned_814                                 NaN
***
 
VALIDATING IDACORP_FERC1-EIA_overrides.xlsx ************** 
2023-04-18 17:21:26 [    INFO] catalystcoop.pudl.analysis.ferc1_eia_train:468 Validating overrides
***
FERC records already in training data: 0
FERC-EIA override dupes: 0
FERC-EIA overrides diffs: 0
Empty DataFrame
Columns: [record_id_eia_y, record_id_eia_override_1]
Index: []
***
 
VALIDATING Dominion_FERC1-EIA_overrides.xlsx ************** 
2023-04-18 17:21:32 [    INFO] catalystcoop.pudl.analysis.ferc1_eia_train:468 Validating overrides
***
FERC records already in training data: 141
FERC-EIA override dupes: 100
FERC-EIA overrides diffs: 41
                                record_id_eia_y                           record_id_eia_override_1
0                  10773_2018_plant_owned_19876                       10773_2018_plant_total_19876
4                  56807_2018_plant_owned_19876                       56807_2018_plant_total_19876
6                  50966_2018_plant_owned_19876                       50966_2018_plant_total_19876
7                   3796_2018_plant_owned_19876                        3796_2018_plant_total_19876
8                  58260_2018_plant_owned_19876                       58260_2018_plant_total_19876
10                  3803_2016_plant_owned_19876         3803_gt_2016_plant_prime_mover_total_19876
11   3803_GT_2018_plant_prime_mover_owned_19876         3803_gt_2018_plant_prime_mover_total_19876
12           3797_1_2018_plant_unit_owned_19876         3797_st_2018_plant_prime_mover_total_19876
14           3797_5_2018_plant_unit_owned_19876  3797_natural_gas_fired_combined_cycle_2018_pla...
28                  7212_2018_plant_owned_19876                        7212_2018_plant_total_19876
30                 52087_2018_plant_owned_19876                       52087_2018_plant_total_19876
36                 54844_2018_plant_owned_19876                       54844_2018_plant_total_19876
38                  7032_2018_plant_owned_19876                        7032_2018_plant_total_19876
40                 59913_2018_plant_owned_19876                       59913_2018_plant_total_19876
41                 61023_2019_plant_total_19876                       59913_2019_plant_total_19876
52                 55927_2005_plant_owned_17539                       55927_2005_plant_total_17539
53                 55927_2006_plant_owned_17539                       55927_2006_plant_total_17539
54                 55927_2007_plant_owned_17539                       55927_2007_plant_total_17539
57                 55927_2010_plant_owned_17539                       55927_2010_plant_total_17539
61                  7839_2018_plant_owned_19876                        7839_2018_plant_total_19876
63                  3799_2018_plant_owned_19876                        3799_2018_plant_total_19876
67                 52007_2018_plant_owned_19876                       52007_2018_plant_total_19876
68                 60422_2018_plant_owned_19876                       60422_2018_plant_total_19876
70           3954_1_2018_plant_unit_owned_19876         3954_st_2018_plant_prime_mover_total_19876
71                  3954_2019_plant_total_19876         3954_st_2019_plant_prime_mover_total_19876
72   3954_GT_2018_plant_prime_mover_owned_19876         3954_gt_2018_plant_prime_mover_total_19876
75   6168_ST_2018_plant_prime_mover_owned_19876         6168_st_2018_plant_prime_mover_owned_19876
77                  6168_2019_plant_owned_19876         6168_st_2019_plant_prime_mover_owned_19876
78                  3800_2018_plant_owned_19876                        3800_2018_plant_total_19876
86                 52118_2018_plant_owned_19876                       52118_2018_plant_total_19876
87           3804_1_2018_plant_unit_owned_19876         3804_st_2018_plant_prime_mover_total_19876
89           3804_4_2018_plant_unit_owned_19876                 3804_4_2018_plant_unit_total_19876
91   3804_GT_2018_plant_prime_mover_owned_19876         3804_gt_2018_plant_prime_mover_total_19876
95                  7838_2018_plant_owned_19876                        7838_2018_plant_total_19876
107                10774_2018_plant_owned_19876                       10774_2018_plant_total_19876
111                 3806_2018_plant_owned_19876                        3806_2018_plant_total_19876
127           6127_1_2018_plant_gen_owned_17539                        6127_2018_plant_owned_17539
128           6127_1_2019_plant_gen_owned_17539                        6127_2019_plant_owned_17539
129                56808_2018_plant_owned_19876                       56808_2018_plant_total_19876
131                55939_2018_plant_owned_19876                       55939_2018_plant_total_19876
139                 3809_2018_plant_owned_19876                        3809_2018_plant_total_19876
***
 
VALIDATING Xcel_FERC1-EIA_overrides.xlsx ************** 
2023-04-18 17:21:38 [    INFO] catalystcoop.pudl.analysis.ferc1_eia_train:468 Validating overrides
***
FERC records already in training data: 3
FERC-EIA override dupes: 2
FERC-EIA overrides diffs: 1
               record_id_eia_y                           record_id_eia_override_1
1  6112_2018_plant_total_15466  6112_natural_gas_fired_combined_cycle_2018_pla...
***
 
VALIDATING NextEra_FERC1-EIA_overrides.xlsx ************** 
2023-04-18 17:21:45 [    INFO] catalystcoop.pudl.analysis.ferc1_eia_train:468 Validating overrides
***
FERC records already in training data: 29
FERC-EIA override dupes: 3
FERC-EIA overrides diffs: 26
                                     record_id_eia_y                           record_id_eia_override_1
0                         6073_2016_plant_owned_7801          6073_st_2016_plant_prime_mover_owned_7801
1                          609_2018_plant_owned_6452                          609_2018_plant_total_6452
2                        56929_2018_plant_owned_6452                        56929_2018_plant_total_6452
4                   613_1_2018_plant_unit_owned_6452  613_natural_gas_fired_combined_cycle_2018_plan...
5                  6042_3_2018_plant_unit_owned_6452                  6042_3_2018_plant_unit_total_6452
6                  6042_1_2018_plant_unit_owned_6452          6042_st_2018_plant_prime_mover_total_6452
7                  6043_3_2018_plant_unit_owned_6452     6043_1994_2018_plant_operating_year_total_6452
8                  6043_5_2018_plant_unit_owned_6452                  6043_3_2018_plant_unit_total_6452
9                  6043_1_2018_plant_unit_owned_6452          6043_st_2018_plant_prime_mover_total_6452
10                         617_2018_plant_owned_6452                          617_2018_plant_total_6452
11  617_GT_2018_plant_prime_mover_owned_6452_retired   617_gt_2018_plant_prime_mover_total_6452_retired
12                         619_2018_plant_owned_6452                          619_2018_plant_total_6452
13                  620_1_2018_plant_unit_owned_6452                          620_2018_plant_total_6452
14                       56930_2018_plant_owned_6452                        56930_2018_plant_total_6452
17      621_Nuclear_2018_plant_technology_owned_6452           621_st_2018_plant_prime_mover_total_6452
18                56407_1_2018_plant_unit_owned_6452                        56407_2018_plant_total_6452
19        60061_PV_2018_plant_prime_mover_owned_6452         60061_pv_2018_plant_prime_mover_total_6452
20        60014_PV_2018_plant_prime_mover_owned_6452                        60014_2018_plant_total_6452
21                       61020_2018_plant_owned_6452                        61020_2018_plant_total_6452
22                       61021_2018_plant_owned_6452                        61021_2018_plant_total_6452
23                       61022_2018_plant_owned_6452                        61022_2018_plant_total_6452
24                       61024_2018_plant_owned_6452                        61024_2018_plant_total_6452
25                       61029_2018_plant_owned_6452                        61029_2018_plant_total_6452
26                       61050_2018_plant_owned_6452                        61050_2018_plant_total_6452
27                       61051_2018_plant_owned_6452                        61051_2018_plant_total_6452
28                       61052_2018_plant_owned_6452                        61052_2018_plant_total_6452
***

e-belfer added 2 commits April 10, 2023 14:24

Update csvs to include all integrated data, one-to-many matches

b132a4f

Fix paths to PUDL

4fb2645

e-belfer requested a review from aesharpe April 10, 2023 18:29

e-belfer changed the title ~~Update csvs to include all integrated data, one-to-many matches~~ Validate and save csv of all 1:m FERC-EIA matches Apr 10, 2023

e-belfer mentioned this pull request Apr 11, 2023

Update plant part list aggregation to accommodate 1:m FERC-EIA matches #1555

Closed

6 tasks

Base automatically changed from move-ferc-eia-manual-mapping-notebook-2 to dev April 13, 2023 20:39

e-belfer added 3 commits April 13, 2023 17:10

Merge branch 'dev' into one_to_many_csv_2

42bdb9c

Add newly integrated data into 1:m and update notebook to reflect cha…

72d5a0c

…nges in process

Fixed known plant unit total issues

51fab3b

aesharpe reviewed Apr 17, 2023

View reviewed changes

Remove extra index columns from 1:m csv, update column filter, set on…

915ddf7

…e_to_many to True by default

e-belfer requested a review from aesharpe April 18, 2023 16:13

aesharpe approved these changes Apr 19, 2023

View reviewed changes

e-belfer merged commit 42ce6ae into dev Apr 19, 2023
6 of 8 checks passed

e-belfer deleted the one_to_many_csv_2 branch April 19, 2023 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate and save csv of all 1:m FERC-EIA matches #2516

Validate and save csv of all 1:m FERC-EIA matches #2516

e-belfer commented Apr 10, 2023 •

edited

review-notebook-app bot commented Apr 10, 2023

e-belfer commented Apr 10, 2023

codecov bot commented Apr 17, 2023 •

edited

aesharpe commented Apr 17, 2023 •

edited

aesharpe left a comment

aesharpe Apr 17, 2023

e-belfer Apr 18, 2023

aesharpe Apr 17, 2023

e-belfer Apr 18, 2023

aesharpe Apr 18, 2023

aesharpe Apr 17, 2023

e-belfer Apr 18, 2023

aesharpe Apr 18, 2023 •

edited

e-belfer Apr 18, 2023

e-belfer commented Apr 18, 2023

aesharpe commented Apr 18, 2023 •

edited

aesharpe commented Apr 19, 2023 •

edited

		@@ -1,213 +1,6 @@
		record_id_eia,record_id_ferc1,signature_1,signature_2,notes

Validate and save csv of all 1:m FERC-EIA matches #2516

Validate and save csv of all 1:m FERC-EIA matches #2516

Conversation

e-belfer commented Apr 10, 2023 • edited

PR Checklist

review-notebook-app bot commented Apr 10, 2023

e-belfer commented Apr 10, 2023

codecov bot commented Apr 17, 2023 • edited

Codecov Report

aesharpe commented Apr 17, 2023 • edited

aesharpe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aesharpe Apr 18, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

e-belfer commented Apr 18, 2023

aesharpe commented Apr 18, 2023 • edited

aesharpe commented Apr 19, 2023 • edited

e-belfer commented Apr 10, 2023 •

edited

codecov bot commented Apr 17, 2023 •

edited

aesharpe commented Apr 17, 2023 •

edited

aesharpe Apr 18, 2023 •

edited

aesharpe commented Apr 18, 2023 •

edited

aesharpe commented Apr 19, 2023 •

edited