-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed generation eia861 #724
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! i have a few comments, but mostly it looks clean (do I sound like a broken record?, lol).
src/pudl/constants.py
Outdated
'gas', | ||
'oil', | ||
'other', | ||
'renewable', # needs prefix 'all' to not confuse with 'other' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this prefix comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not anymore! thanks
src/pudl/extract/eia861.py
Outdated
@@ -41,6 +41,7 @@ def process_raw(self, df, yr, page): | |||
"The data has not yet been validated, and the structure may change." | |||
) | |||
column_map_numeric = self._metadata.get_column_map(yr, page) | |||
# print(column_map_numeric) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you get rid of this little guy?... or change it to a logger.debug
with a more informative message if you think it might be helpful in the future.
# * Remove old pct cols and totals cols | ||
########################################################################### | ||
|
||
# Separate datasets into years with only pct values (pre-2010) and years with only mw values (post-2010) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a brief addtional comment here noting why. Like "some the pre-2010 columns are reported as percentages while the post-2010 columns are reported as MWs, so we need to standardize them"
).append(df_post_2010_misc) | ||
.drop(['distributed_generation_owned_capacity_pct', | ||
'backup_capacity_pct', | ||
'total_capacity_mw'], axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these columns being dropped here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a table goes through the _pct_to_mw()
function to calculate the mw component breakdown from the pct value. it does not delete the old, now irrelevant _pct
columns. This is why these columns are deleted here, after running _pct_to_mw()
instead. The total_capacity_mw
col is also dropped here because, as a total column, it is not included in the final output. This is the _misc
table and only handles the distributed_generation_owned_capacity
and backup_capacity_pct
columns because they are not considered "components" in the same way that hydro_capacity_pct
etc. are (this was the issue with the totals not equaling the sum). Below, there are a few lines of code that do the same thing for the _tech
table, housing these other "true" component columns and deleting their _pct
column counterparts after running pct_to_mw()
|
||
tfr_dfs["distributed_generation_tech_eia861"] = tidy_dg_tech | ||
tfr_dfs["distributed_generation_fuel_eia861"] = tidy_dg_fuel | ||
tfr_dfs["distributed_generation_misc_eia861"] = transformed_dg_misc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to delete the original table from tfr_dfs
as well??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! you're right
pv_capacity_mw,-1,-1,-1,-1,-1,-1,13,13,14,14,14,14 | ||
all_storage_capacity_mw,-1,-1,-1,-1,-1,-1,14,14,15,15,15,15 | ||
other_capacity_mw,-1,-1,-1,-1,-1,-1,15,15,16,16,16,16 | ||
total_capacity_mw_2,-1,-1,-1,-1,-1,-1,16,16,17,17,17,17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh, sorry you had to redo all of these! I know you've been redoing lots of them... but each time I see these it makes me think we shouldn't have even done the rough draft version before being able to test them.
@@ -81,7 +81,7 @@ def get_skipfooter(self, year, page): | |||
|
|||
def get_column_map(self, year, page): | |||
"""Returns the dictionary mapping input columns to pudl columns for given year and page.""" | |||
return {v: k for k, v in self._column_map[page].T.loc[str(year)].to_dict().items()} | |||
return {v: k for k, v in self._column_map[page].T.loc[str(year)].to_dict().items() if v != -1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this nice little fix! i bet it took way longer to track the problem down then to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately yes! But glad it didn't run into too many problems during the test. Initially it was > -1 but that wasn't compatible with the 860 format.
Here is an overview of what got changed by this pull request: Issues
======
- Added 2
See the complete overview on Codacy |
) | ||
|
||
logger.info('Tidying Distributed Generation Fuel Table') | ||
tidy_dg_fuel, fuel_idx_cols = _tidy_class_dfs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codacy found an issue: Unused variable 'fuel_idx_cols'
########################################################################### | ||
|
||
logger.info('Tidying Distributed Generation Tech Table') | ||
tidy_dg_tech, tech_idx_cols = _tidy_class_dfs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codacy found an issue: Unused variable 'tech_idx_cols'
Codecov Report
@@ Coverage Diff @@
## sprint21 #724 +/- ##
============================================
- Coverage 75.19% 73.99% -1.20%
============================================
Files 39 39
Lines 4825 4726 -99
============================================
- Hits 3628 3497 -131
- Misses 1197 1229 +32
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these minor updates look great!
Here's the data for the distributed generation tables with a few small changes to the utility data tables.
This PR also contains changes to the extract module to address the strange -1 column problem.