Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed generation eia861 #724

Merged
merged 11 commits into from
Aug 17, 2020
Merged

Conversation

aesharpe
Copy link
Member

@aesharpe aesharpe commented Aug 13, 2020

Here's the data for the distributed generation tables with a few small changes to the utility data tables.
This PR also contains changes to the extract module to address the strange -1 column problem.

@aesharpe aesharpe added the eia861 Anything having to do with EIA Form 861 label Aug 13, 2020
@aesharpe aesharpe added this to the PUDL Sprint 21 milestone Aug 13, 2020
Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! i have a few comments, but mostly it looks clean (do I sound like a broken record?, lol).

'gas',
'oil',
'other',
'renewable', # needs prefix 'all' to not confuse with 'other'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this prefix comment still relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not anymore! thanks

@@ -41,6 +41,7 @@ def process_raw(self, df, yr, page):
"The data has not yet been validated, and the structure may change."
)
column_map_numeric = self._metadata.get_column_map(yr, page)
# print(column_map_numeric)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you get rid of this little guy?... or change it to a logger.debug with a more informative message if you think it might be helpful in the future.

# * Remove old pct cols and totals cols
###########################################################################

# Separate datasets into years with only pct values (pre-2010) and years with only mw values (post-2010)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a brief addtional comment here noting why. Like "some the pre-2010 columns are reported as percentages while the post-2010 columns are reported as MWs, so we need to standardize them"

).append(df_post_2010_misc)
.drop(['distributed_generation_owned_capacity_pct',
'backup_capacity_pct',
'total_capacity_mw'], axis=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these columns being dropped here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a table goes through the _pct_to_mw() function to calculate the mw component breakdown from the pct value. it does not delete the old, now irrelevant _pct columns. This is why these columns are deleted here, after running _pct_to_mw() instead. The total_capacity_mw col is also dropped here because, as a total column, it is not included in the final output. This is the _misc table and only handles the distributed_generation_owned_capacity and backup_capacity_pct columns because they are not considered "components" in the same way that hydro_capacity_pct etc. are (this was the issue with the totals not equaling the sum). Below, there are a few lines of code that do the same thing for the _tech table, housing these other "true" component columns and deleting their _pct column counterparts after running pct_to_mw()


tfr_dfs["distributed_generation_tech_eia861"] = tidy_dg_tech
tfr_dfs["distributed_generation_fuel_eia861"] = tidy_dg_fuel
tfr_dfs["distributed_generation_misc_eia861"] = transformed_dg_misc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to delete the original table from tfr_dfs as well??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! you're right

pv_capacity_mw,-1,-1,-1,-1,-1,-1,13,13,14,14,14,14
all_storage_capacity_mw,-1,-1,-1,-1,-1,-1,14,14,15,15,15,15
other_capacity_mw,-1,-1,-1,-1,-1,-1,15,15,16,16,16,16
total_capacity_mw_2,-1,-1,-1,-1,-1,-1,16,16,17,17,17,17
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, sorry you had to redo all of these! I know you've been redoing lots of them... but each time I see these it makes me think we shouldn't have even done the rough draft version before being able to test them.

@@ -81,7 +81,7 @@ def get_skipfooter(self, year, page):

def get_column_map(self, year, page):
"""Returns the dictionary mapping input columns to pudl columns for given year and page."""
return {v: k for k, v in self._column_map[page].T.loc[str(year)].to_dict().items()}
return {v: k for k, v in self._column_map[page].T.loc[str(year)].to_dict().items() if v != -1}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this nice little fix! i bet it took way longer to track the problem down then to fix it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately yes! But glad it didn't run into too many problems during the test. Initially it was > -1 but that wasn't compatible with the 860 format.

Copy link
Member

Codacy Here is an overview of what got changed by this pull request:

Issues
======
- Added 2
           

See the complete overview on Codacy

)

logger.info('Tidying Distributed Generation Fuel Table')
tidy_dg_fuel, fuel_idx_cols = _tidy_class_dfs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codacy found an issue: Unused variable 'fuel_idx_cols'

###########################################################################

logger.info('Tidying Distributed Generation Tech Table')
tidy_dg_tech, tech_idx_cols = _tidy_class_dfs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codacy found an issue: Unused variable 'tech_idx_cols'

@codecov
Copy link

codecov bot commented Aug 17, 2020

Codecov Report

Merging #724 into sprint21 will decrease coverage by 1.20%.
The diff coverage is 91.03%.

Impacted file tree graph

@@             Coverage Diff              @@
##           sprint21     #724      +/-   ##
============================================
- Coverage     75.19%   73.99%   -1.20%     
============================================
  Files            39       39              
  Lines          4825     4726      -99     
============================================
- Hits           3628     3497     -131     
- Misses         1197     1229      +32     
Impacted Files Coverage Δ
src/pudl/extract/ferc1.py 76.67% <ø> (ø)
src/pudl/transform/eia861.py 96.05% <90.28%> (-0.03%) ⬇️
src/pudl/constants.py 100.00% <100.00%> (ø)
src/pudl/extract/excel.py 94.23% <100.00%> (ø)
src/pudl/workspace/datastore.py 61.67% <0.00%> (+0.78%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4d0d33...b1c1082. Read the comment docs.

Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these minor updates look great!

@aesharpe aesharpe merged commit ab1e80b into sprint21 Aug 17, 2020
@aesharpe aesharpe deleted the distributed_generation_eia861 branch August 18, 2020 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia861 Anything having to do with EIA Form 861
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants