-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform eia861 short form #3565
base: main
Are you sure you want to change the base?
Transform eia861 short form #3565
Conversation
Note that 150+ auto-generated RST files that document the API have been commited in this PR. They're not supposed to be part of the repository, and are built as an intermediate step between the Python modules and the HTML output of the docs. |
Okay, I'll backtrack the changes and update the PR |
5c83ffe
to
574dbbc
Compare
This reverts commit 574dbbc. remove index files
6d95595
to
a33045a
Compare
e7e542b
to
dde8ccd
Compare
@zaneselvans I have backtracked the changes and removed the index files that were added before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @Nancy9ice !
See my comments about column names etc. (Many of which are old things we did poorly and would love for you to help fix).
Other Updates:
- Add a note to our
release_notes.rst
file explaining the addition of this table - Add this table to the list of foreign_key_rule exceptions in the schema for
core_eia860__scd_utilities
inmetadata/resources/eia860
], | ||
"primary_key": [ | ||
"utility_id_eia", | ||
"state", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is state definitely a primary key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is state definitely a primary key?
'state' is not the primary key independently. It is one of the variables that contribute to the compound primary key. All the fields stated under the 'primary_key' all come together to form the compound primary key. Am I wrong about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I just wanted to check! (I looked at it in a notebook and it seems like balancing_authority_code_eia
doesn't need to be part of the primary key (the tables are the same length if you drop duplicates without balancing_authority_code_eia
) however, it's possible that in the future this column could be part of the primary key? @cmgosnell what do you think here?
Also @Nancy9ice take a look at #3540 if you haven't already and make sure to address the issues there too. |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Hi @Nancy9ice, thanks for these edits (and sorry it's taken a minute to respond)-- Generally speaking we like to keep our PRs pretty small. When I was suggesting changing the boolean column names, I meant those pertaining to the That way we can test run the whole ETL on the column name changes and not have to worry about |
bfdd24d
to
f9011bd
Compare
For more information, see https://pre-commit.ci
Okay. I have done the edits. This time, I touched only the columns associated with the short-form table. That is, green_pricing, demand_side_management, water_heater, net_metering, and time_responsive_programs. I also ensured that these column changes were reflected in the other tables that bore these columns too. I'll await your feedback as regards these changes while working on the other changes you suggested under this pull request. |
For more information, see https://pre-commit.ci
My guess is the right way to do this is pull the short form data out of all the other tables where it got stashed in 2019, and integrate it with the other short form data. I think the short form is a simplified / less extensive version of a lot of the data and probably doesn't match the schema / meaning of the full form exactly. This would result in the other tables being more uniform across all the years, and the short form data being all in a single place with a single format. But would also probably mean needing to touch all of the existing transforms to exclude the short form data, and then creating another set of short-form-only transforms that feeds into something that combines the different sources of short form data together into a unified table. Which does sound a bit messy. |
balancing_authority_code_eia,-1,4,4,5,5,5,5,-1,5,5,5 | ||
sales_revenue,4,5,5,6,6,6,6,-1,6,6,6 | ||
sales_mwh,5,6,6,7,7,7,7,-1,7,7,7 | ||
customers,6,7,7,8,8,8,8,-1,8,8,8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this would have consequences for other tables and so we don't want to deal with it right now, but I think this should probably be num_customers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the src.pudl.metadata.fields, there's the customers key in the JSON fields. There's no num_customers or total_customers. Please check to confirm that this is correct so I'll know if I still need to change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "customers key"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nancy9ice what I meant here is that I think we originally did a poor job of naming this column, or that it was named before we had thought about our naming conventions, so I was suggesting that we take this opportunity to rename customers
which is pretty vague, to something that provides a bit more context about the column, like customers_num
or num_customers
though searching through fields.py
for other instances of _num
and num_
it doesn't seem as if we have established a strong convention right now, so maybe we should just ignore this for now and deal with it in a future round of renaming / deeper EIA-861 integration.
Hey @Nancy9ice I added one little comment, merged in main, and did some alembic gymnastics. What do you think about @zaneselvans's comment above regarding the 2019 data? |
The fact that the 2019 short form data is interwoven with all of the other data is unfortunate, and seems like it'll take a bit of work to untangle. Probably just getting a table with all the other short form data available in the database is a good incremental step forward, and we should write up a separate issue about removing all the 2019 short form data from the other tables and integrating it into the dedicated short-form table to be addressed in the future. |
I can write up an issue for the last legs here and move forward with merging this. Might be a |
It's a good idea to have a separate issue for that because it would require series of steps just like this issue |
It looks like there's some alembic migration / database schema issues in here that need to be fixed. I tried running the tests locally and got the same multiple heads problem that's causing the unit tests to fail in CI. |
Can I get the full error so I'll see what's wrong? |
@Nancy9ice I think Austen has fixed the alembic issue, but for future reference all of the checks that run on a PR provide "Details" links to the logs generated by that check, and it's a good idea to click through and take a look at what happened when one of them fails. I'm wondering if you've installed our git pre-commit hooks since it looks like the Ruff linter is failing within the |
EDIT sorry, this was my bad. I hadn't rematerialized the furthest upstream extracted EIA-861 assets and there were some column name changes. Now I'm getting a
|
Oh, I think I skipped that. I'll add that now. |
…y9ice/pudl into transform-eia861-short-form
Here's the issue for the 2019 data! #3654 |
I was able to successfully materialize the |
Closes #3540.
What problem does this address?
This addresses the lack of the eia861 short_form table
What did you change?
To-do list
Testing
How did you make sure this worked? How can a reviewer verify this?
I confirmed that this worked after I ran the 'make pytest-coverage' command. @zaneselvans please could you run the
build-deploy-pudl
GitHub Action so that I'll see if there's any other thing to do in the code?