Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pydantic for ETL settings validation #1292

Merged
merged 14 commits into from
Nov 4, 2021
Merged

Use pydantic for ETL settings validation #1292

merged 14 commits into from
Nov 4, 2021

Conversation

bendnorman
Copy link
Member

@bendnorman bendnorman commented Oct 14, 2021

So far I have created pydantic models for everything but ferc1_to_sqlite and CEMS.

Currently, I am specifying all individual dataset settings as attributes in DatasetsSettings. This works fine for now but in the future, we could dynamically add dataset settings by reading them in from a separate file. I think we would still have to manually create validators to establish dependencies between datasets like eia 860 and 923.

We could also move these validators into the dataset pipelines on the prefect branch. That way the validators and actual pipeline are coupled. A super out-there idea is to have perfect dependencies be inferred from the validators or vice versa.

@bendnorman bendnorman self-assigned this Oct 14, 2021
@bendnorman bendnorman marked this pull request as draft October 14, 2021 17:59
…. Removed old validation functions. Created global pydantic settings config.
@bendnorman bendnorman linked an issue Oct 14, 2021 that may be closed by this pull request
13 tasks
Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the stock settings filename? Shouldn't we just switch both the full and the fast over to using the new names? Or are we not ready to switch yet? I guess this is still a draft.

src/pudl/settings.py Show resolved Hide resolved
src/pudl/settings.py Outdated Show resolved Hide resolved
src/pudl/settings.py Outdated Show resolved Hide resolved
src/pudl/settings.py Outdated Show resolved Hide resolved
src/pudl/settings.py Outdated Show resolved Hide resolved
src/pudl/settings.py Outdated Show resolved Hide resolved
src/pudl/settings.py Outdated Show resolved Hide resolved
src/pudl/settings.py Outdated Show resolved Hide resolved
test/unit/settings_test.py Outdated Show resolved Hide resolved
src/pudl/package_data/settings/etl_full_pydantic.yml Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 2, 2021

Codecov Report

Merging #1292 (4bb5d52) into dev (9b8a672) will increase coverage by 0.63%.
The diff coverage is 95.88%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #1292      +/-   ##
==========================================
+ Coverage   82.64%   83.28%   +0.63%     
==========================================
  Files          55       56       +1     
  Lines        6626     6583      -43     
==========================================
+ Hits         5476     5482       +6     
+ Misses       1150     1101      -49     
Impacted Files Coverage Δ
src/pudl/cli.py 67.50% <50.00%> (+5.28%) ⬆️
src/pudl/convert/ferc1_to_sqlite.py 62.16% <50.00%> (+12.16%) ⬆️
src/pudl/settings.py 96.27% <96.27%> (ø)
src/pudl/constants.py 100.00% <100.00%> (ø)
src/pudl/etl.py 92.52% <100.00%> (+12.30%) ⬆️
src/pudl/transform/eia.py 95.45% <100.00%> (ø)
src/pudl/workspace/datastore.py 68.83% <0.00%> (-1.62%) ⬇️
src/pudl/transform/ferc1.py 91.61% <0.00%> (-0.36%) ⬇️
src/pudl/analysis/timeseries_cleaning.py 88.40% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b8a672...4bb5d52. Read the comment docs.

@bendnorman
Copy link
Member Author

bendnorman commented Nov 2, 2021

This is ready for another review.

Changes

  • GenericDatasetSettings now requires working tables and working partitions class variables. There can be an arbitrary number of partitions. create_dataset_settings(dataset_name) will dynamically create a field for each partition.
  • pc.WORKING_PARTITIONS and pc.PUDL_TABLES remain the one true place partition and table metadata.
  • I updated the development settings files docs.

Pydantic has made our settings validation cleaner and more reusable for future datasets. That being said, we can continue to improve on dynamically generating settings validation for new datasets. I wrote up a couple of issues (#1316, #1264) outlining my ideas.

I think dataset settings validation will continue to evolve as we integrate this branch with prefect and metadata updates.

Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly I have questions. Let me know if you'd like to talk in a call. But overall this looks great and way way better than the janky spaghetti we had before.

docs/dev/settings_files.rst Outdated Show resolved Hide resolved
src/pudl/constants.py Outdated Show resolved Hide resolved
src/pudl/constants.py Outdated Show resolved Hide resolved
src/pudl/etl.py Outdated Show resolved Hide resolved
src/pudl/package_data/settings/etl_fast.yml Outdated Show resolved Hide resolved
test/conftest.py Outdated Show resolved Hide resolved
src/pudl/settings.py Show resolved Hide resolved
src/pudl/settings.py Show resolved Hide resolved
src/pudl/settings.py Outdated Show resolved Hide resolved
src/pudl/settings.py Outdated Show resolved Hide resolved
…ttings. replaced all etl_params with etl_settings.
@bendnorman
Copy link
Member Author

Testing out one more change to etl_full.yml before merging this in.

@bendnorman bendnorman merged commit cc0b0d8 into dev Nov 4, 2021
@bendnorman bendnorman deleted the pydantic-settings branch November 4, 2021 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement pydantic validation of existing settings file
2 participants