-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dagsterification of ferc1_eia #2938
Conversation
Hmm, I had thought that this was just going to be glue / crosswalk table that provided the necessary FERC & EIA columns for merging together data that had both FERC & EIA planty records. |
this is my one question about the pr. i haven't been in y'all's convos about the intention for this table so idk what the intention was here. I believe could pretty easily make a skinny glue table and a wide marty table. |
If the big wide table is what we've been generating, then I assume that's what RMI is depending on downstream, in which case we need to continue providing it in some form. How does this data currently get used? It must have a bajillion columns. Regardless of that, it also seems like the glue table would be a generally useful thing. You could use it to stick together FERC + EIA data in a variety of contexts without needing to read this monster table. What columns would be part of the glue? |
I added all of the extra columns a while back while working in the rmi-ferc1-eia repo because I kept going back and merging in more columns from the EIA or FERC data. If the whole point of this merge is to connect the financial data in FERC and the operational data in EIA... it seems like we should probably publish this big table. Especially because the EIA merge is a merge from the I think the skinny table could just have: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to talk more about the structure of the glue.
okay @zaneselvans and i chatted about table design and landed on keeping the wide table because the in an ideal world we would follow up with this table with an addition table that is at the level of EIA generators (or EIA generator-owners). it would be a 1:m association table with one FERC1 record being linked to many EIA generator records. and the FERC1 data could be proportionally allocated across those generators. we did a lot of that work over in this repo. |
On the Naming of Things, I do think that getting a more descriptive table name in here would be great. Under the new naming convention what would it be? What "data source" are we using for the composite analytical tables? On the asset groups, to me it feels like Should we create an issue to get the already-existing FERC1 data allocated to EIA generator-ownership slices table migrated over from the RMI repo and/or the coarse-grained primary-fuel based linkage for the remaining 20% of FERC1 records that aren't getting associated in the current process. Do you want me to take a stab at a table description, and then you can tell me how it's wrong because I don't totally understand this table? |
…udl into dag-the-ferc1-eia
…udl into dag-the-ferc1-eia
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dazhong says the Zenodo fix CI should be done in about 15 minutes!
"sources": ["eia860", "eia923"], | ||
"etl_group": "outputs", | ||
"field_namespace": "eia", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this table really have no primary key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm sure it could have some gross composite key but i just copied this over when migrating it. so it did not beforehand and i think determining one and enforcing it feels oos for this pr.
Codecov ReportAll modified lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #2938 +/- ##
=======================================
- Coverage 88.5% 88.5% -0.1%
=======================================
Files 90 91 +1
Lines 10797 10805 +8
=======================================
+ Hits 9564 9569 +5
- Misses 1233 1236 +3
☔ View full report in Codecov by Sentry. |
PR Overview
record_id_ferc1
Questions
plant_parts_eia
andplants_all_ferc1
? rn i just added em all... i think that's data marty but idk it is a lot. ANSWER: all of emferc1_eia
. a group of 1... this seems like our current pattern. i don't love that tbh but i'm going to move forward w/ that and if we come up with a collective good idea around this we should standardize.PR Checklist
dev
).