Use pydantic for ETL settings validation #1292

bendnorman · 2021-10-14T17:58:52Z

So far I have created pydantic models for everything but ferc1_to_sqlite and CEMS.

Currently, I am specifying all individual dataset settings as attributes in DatasetsSettings. This works fine for now but in the future, we could dynamically add dataset settings by reading them in from a separate file. I think we would still have to manually create validators to establish dependencies between datasets like eia 860 and 923.

We could also move these validators into the dataset pipelines on the prefect branch. That way the validators and actual pipeline are coupled. A super out-there idea is to have perfect dependencies be inferred from the validators or vice versa.

test/unit/settings_test.py

…. Removed old validation functions. Created global pydantic settings config.

zaneselvans

Why change the stock settings filename? Shouldn't we just switch both the full and the fast over to using the new names? Or are we not ready to switch yet? I guess this is still a draft.

src/pudl/settings.py

test/unit/settings_test.py

src/pudl/package_data/settings/etl_full_pydantic.yml

…lass variable. Replace eia860_ytd to eia860m.

…udl into pydantic-settings

…tions from constants.

…ings files.

codecov · 2021-11-02T19:19:09Z

Codecov Report

Merging #1292 (4bb5d52) into dev (9b8a672) will increase coverage by 0.63%.
The diff coverage is 95.88%.

@@            Coverage Diff             @@
##              dev    #1292      +/-   ##
==========================================
+ Coverage   82.64%   83.28%   +0.63%     
==========================================
  Files          55       56       +1     
  Lines        6626     6583      -43     
==========================================
+ Hits         5476     5482       +6     
+ Misses       1150     1101      -49

Impacted Files	Coverage Δ
src/pudl/cli.py	`67.50% <50.00%> (+5.28%)`	⬆️
src/pudl/convert/ferc1_to_sqlite.py	`62.16% <50.00%> (+12.16%)`	⬆️
src/pudl/settings.py	`96.27% <96.27%> (ø)`
src/pudl/constants.py	`100.00% <100.00%> (ø)`
src/pudl/etl.py	`92.52% <100.00%> (+12.30%)`	⬆️
src/pudl/transform/eia.py	`95.45% <100.00%> (ø)`
src/pudl/workspace/datastore.py	`68.83% <0.00%> (-1.62%)`	⬇️
src/pudl/transform/ferc1.py	`91.61% <0.00%> (-0.36%)`	⬇️
src/pudl/analysis/timeseries_cleaning.py	`88.40% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b8a672...4bb5d52. Read the comment docs.

bendnorman · 2021-11-02T21:05:01Z

This is ready for another review.

Changes

GenericDatasetSettings now requires working tables and working partitions class variables. There can be an arbitrary number of partitions. create_dataset_settings(dataset_name) will dynamically create a field for each partition.
pc.WORKING_PARTITIONS and pc.PUDL_TABLES remain the one true place partition and table metadata.
I updated the development settings files docs.

Pydantic has made our settings validation cleaner and more reusable for future datasets. That being said, we can continue to improve on dynamically generating settings validation for new datasets. I wrote up a couple of issues (#1316, #1264) outlining my ideas.

I think dataset settings validation will continue to evolve as we integrate this branch with prefect and metadata updates.

zaneselvans

Mostly I have questions. Let me know if you'd like to talk in a call. But overall this looks great and way way better than the janky spaghetti we had before.

docs/dev/settings_files.rst

src/pudl/constants.py

src/pudl/etl.py

src/pudl/package_data/settings/etl_fast.yml

test/conftest.py

src/pudl/settings.py

…ttings. replaced all etl_params with etl_settings.

bendnorman · 2021-11-04T18:59:37Z

Testing out one more change to etl_full.yml before merging this in.

First pass at pydantic etl settings. Pass all unit tests. No CEMS yet.

ca9070f

bendnorman requested a review from zaneselvans October 14, 2021 17:58

bendnorman self-assigned this Oct 14, 2021

bendnorman marked this pull request as draft October 14, 2021 17:59

bendnorman commented Oct 14, 2021

View reviewed changes

test/unit/settings_test.py Outdated Show resolved Hide resolved

Added cems and eia settings. Changed etl.py to work with new settings…

61ecf33

…. Removed old validation functions. Created global pydantic settings config.

bendnorman linked an issue Oct 14, 2021 that may be closed by this pull request

Implement pydantic validation of existing settings file #1288

Closed

13 tasks

bendnorman and others added 2 commits October 15, 2021 13:19

Experiment with generic dataset model. Created ferc1tosqlite settings.

bcfdff1

Merge branch 'dev' into pydantic-settings

f964ce6

zaneselvans requested changes Oct 16, 2021

View reviewed changes

bendnorman added 5 commits October 18, 2021 09:00

Sort working years.

534b279

GenericDatasetSettings now contains more general working_partitions c…

5eab849

…lass variable. Replace eia860_ytd to eia860m.

Merge branch 'pydantic-settings' of github.com:catalyst-cooperative/p…

fbbe073

…udl into pydantic-settings

Added create_dataset_settings helper function. Using tables and parti…

a3e19f1

…tions from constants.

Updated integration tests to use new settings class. Updated the sett…

273c521

…ings files.

bendnorman added 2 commits November 2, 2021 12:26

Updated settings documentation

5606b83

Merge dev into pydantic-seettings.

1fa4956

bendnorman mentioned this pull request Nov 2, 2021

Dynamically create dataset setting validation. #1316

Open

bendnorman mentioned this pull request Nov 2, 2021

EPA CEMS can't process states (AK and PR) that only show up in some years #1264

Closed

bendnorman marked this pull request as ready for review November 2, 2021 21:18

Fixed ferc1_solo_test yaml.

aa6836c

zaneselvans reviewed Nov 3, 2021

View reviewed changes

Removed refyear from settings file. Added from_yaml() method to EtlSe…

f7806b8

…ttings. replaced all etl_params with etl_settings.

zaneselvans approved these changes Nov 4, 2021

View reviewed changes

Updated etl_full.yml to new structure.

4bb5d52

bendnorman merged commit cc0b0d8 into dev Nov 4, 2021

bendnorman deleted the pydantic-settings branch November 4, 2021 21:40

bendnorman mentioned this pull request Nov 4, 2021

Implement pydantic validation of existing settings file #1288

Closed

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use pydantic for ETL settings validation #1292

Use pydantic for ETL settings validation #1292

bendnorman commented Oct 14, 2021 •

edited

Loading

zaneselvans left a comment

codecov bot commented Nov 2, 2021 •

edited

Loading

bendnorman commented Nov 2, 2021 •

edited

Loading

zaneselvans left a comment

bendnorman commented Nov 4, 2021

Use pydantic for ETL settings validation #1292

Use pydantic for ETL settings validation #1292

Conversation

bendnorman commented Oct 14, 2021 • edited Loading

zaneselvans left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 2, 2021 • edited Loading

Codecov Report

bendnorman commented Nov 2, 2021 • edited Loading

Changes

zaneselvans left a comment

Choose a reason for hiding this comment

bendnorman commented Nov 4, 2021

bendnorman commented Oct 14, 2021 •

edited

Loading

codecov bot commented Nov 2, 2021 •

edited

Loading

bendnorman commented Nov 2, 2021 •

edited

Loading