-
-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pydantic for ETL settings validation #1292
Conversation
…. Removed old validation functions. Created global pydantic settings config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change the stock settings filename? Shouldn't we just switch both the full and the fast over to using the new names? Or are we not ready to switch yet? I guess this is still a draft.
…lass variable. Replace eia860_ytd to eia860m.
…udl into pydantic-settings
…tions from constants.
Codecov Report
@@ Coverage Diff @@
## dev #1292 +/- ##
==========================================
+ Coverage 82.64% 83.28% +0.63%
==========================================
Files 55 56 +1
Lines 6626 6583 -43
==========================================
+ Hits 5476 5482 +6
+ Misses 1150 1101 -49
Continue to review full report at Codecov.
|
This is ready for another review. Changes
Pydantic has made our settings validation cleaner and more reusable for future datasets. That being said, we can continue to improve on dynamically generating settings validation for new datasets. I wrote up a couple of issues (#1316, #1264) outlining my ideas. I think dataset settings validation will continue to evolve as we integrate this branch with prefect and metadata updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly I have questions. Let me know if you'd like to talk in a call. But overall this looks great and way way better than the janky spaghetti we had before.
…ttings. replaced all etl_params with etl_settings.
Testing out one more change to |
So far I have created pydantic models for everything but ferc1_to_sqlite and CEMS.
Currently, I am specifying all individual dataset settings as attributes in
DatasetsSettings
. This works fine for now but in the future, we could dynamically add dataset settings by reading them in from a separate file. I think we would still have to manually create validators to establish dependencies between datasets like eia 860 and 923.We could also move these validators into the dataset pipelines on the prefect branch. That way the validators and actual pipeline are coupled. A super out-there idea is to have perfect dependencies be inferred from the validators or vice versa.