Distributed generation eia861 #724

aesharpe · 2020-08-13T17:40:54Z

Here's the data for the distributed generation tables with a few small changes to the utility data tables.
This PR also contains changes to the extract module to address the strange -1 column problem.

…eneration_eia861

…f regions

cmgosnell

looks good! i have a few comments, but mostly it looks clean (do I sound like a broken record?, lol).

cmgosnell · 2020-08-17T16:19:16Z

src/pudl/constants.py

+    'gas',
+    'oil',
+    'other',
+    'renewable',  # needs prefix 'all' to not confuse with 'other'


is this prefix comment still relevant?

not anymore! thanks

cmgosnell · 2020-08-17T16:23:27Z

src/pudl/extract/eia861.py

@@ -41,6 +41,7 @@ def process_raw(self, df, yr, page):
            "The data has not yet been validated, and the structure may change."
        )
        column_map_numeric = self._metadata.get_column_map(yr, page)
+        # print(column_map_numeric)


can you get rid of this little guy?... or change it to a logger.debug with a more informative message if you think it might be helpful in the future.

cmgosnell · 2020-08-17T16:39:49Z

src/pudl/transform/eia861.py

+    # * Remove old pct cols and totals cols
+    ###########################################################################
+
+    # Separate datasets into years with only pct values (pre-2010) and years with only mw values (post-2010)


can you add a brief addtional comment here noting why. Like "some the pre-2010 columns are reported as percentages while the post-2010 columns are reported as MWs, so we need to standardize them"

cmgosnell · 2020-08-17T16:40:27Z

src/pudl/transform/eia861.py

+        ).append(df_post_2010_misc)
+        .drop(['distributed_generation_owned_capacity_pct',
+               'backup_capacity_pct',
+               'total_capacity_mw'], axis=1)


why are these columns being dropped here?

When a table goes through the _pct_to_mw() function to calculate the mw component breakdown from the pct value. it does not delete the old, now irrelevant _pct columns. This is why these columns are deleted here, after running _pct_to_mw() instead. The total_capacity_mw col is also dropped here because, as a total column, it is not included in the final output. This is the _misc table and only handles the distributed_generation_owned_capacity and backup_capacity_pct columns because they are not considered "components" in the same way that hydro_capacity_pct etc. are (this was the issue with the totals not equaling the sum). Below, there are a few lines of code that do the same thing for the _tech table, housing these other "true" component columns and deleting their _pct column counterparts after running pct_to_mw()

cmgosnell · 2020-08-17T16:41:20Z

src/pudl/transform/eia861.py

+
+    tfr_dfs["distributed_generation_tech_eia861"] = tidy_dg_tech
+    tfr_dfs["distributed_generation_fuel_eia861"] = tidy_dg_fuel
+    tfr_dfs["distributed_generation_misc_eia861"] = transformed_dg_misc


do we need to delete the original table from tfr_dfs as well??

Yes! you're right

cmgosnell · 2020-08-17T16:42:56Z

src/pudl/package_data/meta/xlsx_maps/eia861/column_maps/distributed_generation_eia861.csv

+pv_capacity_mw,-1,-1,-1,-1,-1,-1,13,13,14,14,14,14
+all_storage_capacity_mw,-1,-1,-1,-1,-1,-1,14,14,15,15,15,15
+other_capacity_mw,-1,-1,-1,-1,-1,-1,15,15,16,16,16,16
+total_capacity_mw_2,-1,-1,-1,-1,-1,-1,16,16,17,17,17,17


ugh, sorry you had to redo all of these! I know you've been redoing lots of them... but each time I see these it makes me think we shouldn't have even done the rough draft version before being able to test them.

cmgosnell · 2020-08-17T16:43:50Z

src/pudl/extract/excel.py

@@ -81,7 +81,7 @@ def get_skipfooter(self, year, page):

    def get_column_map(self, year, page):
        """Returns the dictionary mapping input columns to pudl columns for given year and page."""
-        return {v: k for k, v in self._column_map[page].T.loc[str(year)].to_dict().items()}
+        return {v: k for k, v in self._column_map[page].T.loc[str(year)].to_dict().items() if v != -1}


this nice little fix! i bet it took way longer to track the problem down then to fix it.

Unfortunately yes! But glad it didn't run into too many problems during the test. Initially it was > -1 but that wasn't compatible with the 860 format.

zaneselvans · 2020-08-17T18:39:09Z

Here is an overview of what got changed by this pull request:

Issues
======
- Added 2

See the complete overview on Codacy

zaneselvans · 2020-08-17T18:39:12Z

src/pudl/transform/eia861.py

+    )
+
+    logger.info('Tidying Distributed Generation Fuel Table')
+    tidy_dg_fuel, fuel_idx_cols = _tidy_class_dfs(


Codacy found an issue: Unused variable 'fuel_idx_cols'

zaneselvans · 2020-08-17T18:39:13Z

src/pudl/transform/eia861.py

+    ###########################################################################
+
+    logger.info('Tidying Distributed Generation Tech Table')
+    tidy_dg_tech, tech_idx_cols = _tidy_class_dfs(


Codacy found an issue: Unused variable 'tech_idx_cols'

codecov · 2020-08-17T18:52:06Z

Codecov Report

Merging #724 into sprint21 will decrease coverage by 1.20%.
The diff coverage is 91.03%.

@@             Coverage Diff              @@
##           sprint21     #724      +/-   ##
============================================
- Coverage     75.19%   73.99%   -1.20%     
============================================
  Files            39       39              
  Lines          4825     4726      -99     
============================================
- Hits           3628     3497     -131     
- Misses         1197     1229      +32

Impacted Files	Coverage Δ
src/pudl/extract/ferc1.py	`76.67% <ø> (ø)`
src/pudl/transform/eia861.py	`96.05% <90.28%> (-0.03%)`	⬇️
src/pudl/constants.py	`100.00% <100.00%> (ø)`
src/pudl/extract/excel.py	`94.23% <100.00%> (ø)`
src/pudl/workspace/datastore.py	`61.67% <0.00%> (+0.78%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4d0d33...b1c1082. Read the comment docs.

cmgosnell

these minor updates look great!

Austen Sharpe and others added 8 commits August 7, 2020 10:31

instantiate first bit of DG table

f457eea

Merge branch 'reliability_and_utility_data_eia861' into distributed_g…

bb4517a

…eneration_eia861

progress on fixing column association: incomplete

f6874bf

fix bug with -1 values getting read into distributed generation table

d46f2ee

distributed generation table

8b23e85

merge with sprint21

0fbe577

make nerc_class list same as recognized_nerc_region list

71e85b9

update column dtype assignment set nerc cols to reference same list o…

abde875

…f regions

aesharpe added the eia861 Anything having to do with EIA Form 861 label Aug 13, 2020

aesharpe added this to the PUDL Sprint 21 milestone Aug 13, 2020

aesharpe requested a review from cmgosnell August 13, 2020 17:40

cmgosnell reviewed Aug 17, 2020

View reviewed changes

Austen Sharpe added 3 commits August 17, 2020 14:27

Address PR comments: delete comments

3fc3917

Merge branch 'sprint21' into distributed_generation_eia861

d0c1fe1

merge with sprint21

b1c1082

zaneselvans reviewed Aug 17, 2020

View reviewed changes

cmgosnell approved these changes Aug 17, 2020

View reviewed changes

aesharpe merged commit ab1e80b into sprint21 Aug 17, 2020

aesharpe deleted the distributed_generation_eia861 branch August 18, 2020 16:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed generation eia861 #724

Distributed generation eia861 #724

aesharpe commented Aug 13, 2020 •

edited

cmgosnell left a comment

cmgosnell Aug 17, 2020

aesharpe Aug 17, 2020

cmgosnell Aug 17, 2020

cmgosnell Aug 17, 2020

cmgosnell Aug 17, 2020

aesharpe Aug 17, 2020

cmgosnell Aug 17, 2020

aesharpe Aug 17, 2020

cmgosnell Aug 17, 2020

cmgosnell Aug 17, 2020

aesharpe Aug 17, 2020

zaneselvans commented Aug 17, 2020

zaneselvans Aug 17, 2020

zaneselvans Aug 17, 2020

codecov bot commented Aug 17, 2020 •

edited

cmgosnell left a comment

Distributed generation eia861 #724

Distributed generation eia861 #724

Conversation

aesharpe commented Aug 13, 2020 • edited

cmgosnell left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaneselvans commented Aug 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 17, 2020 • edited

Codecov Report

cmgosnell left a comment

Choose a reason for hiding this comment

aesharpe commented Aug 13, 2020 •

edited

codecov bot commented Aug 17, 2020 •

edited