Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer validation of PudlTabl datastore to eia861/ferc714 ETL methods #1275

Merged
merged 1 commit into from
Oct 8, 2021

Conversation

zaneselvans
Copy link
Member

@gschivley's users were having issues with PUDL being installed as a dependency, and relying on a raw Datastore being present, even though they're just working with data in the SQLite database, becuase of the hack we have in place now to provide direct access to the ETL outputs for ferc714 and eia861 inside the PudlTabl output object. This PR addresses that problem:

  • Documented all of the inputs to the PudlTabl class, including the Datastore.
  • Moved the instantiation of a default Datastore into the methods where it's actually required (etl_ferc714 and etl_eia861) within the PudlTabl class, so that anyone who is relying solely on a free-floating PUDL SQLite DB doesn't have to worry about having a working datastore PUDL workspace... until they want to access the data that actually depends on it.

* Documented all of the inputs to the PudlTabl class, including the
  Datastore.
* Moved the instantiation of a default Datastore into the methods where
  it's actually required (etl_ferc714 and etl_eia861) within the
  PudlTabl class, so that anyone who is relying solely on a
  free-floating PUDL SQLite DB doesn't have to worry about having a
  working datastore PUDL workspace... until they want to access the data
  that actually depends on it.
@zaneselvans zaneselvans added output Exporting data from PUDL into other platforms or interchange formats. datastore Managing the acquisition and organization of external raw data. ferc714 Anything having to do with FERC Form 714 eia861 Anything having to do with EIA Form 861 labels Oct 8, 2021
@codecov
Copy link

codecov bot commented Oct 8, 2021

Codecov Report

Merging #1275 (bae3905) into dev (11d95c4) will decrease coverage by 0.28%.
The diff coverage is 36.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #1275      +/-   ##
==========================================
- Coverage   81.25%   80.96%   -0.28%     
==========================================
  Files          54       54              
  Lines        6393     6409      +16     
==========================================
- Hits         5194     5189       -5     
- Misses       1199     1220      +21     
Impacted Files Coverage Δ
src/pudl/output/pudltabl.py 74.81% <36.36%> (-2.74%) ⬇️
src/pudl/analysis/timeseries_cleaning.py 85.78% <0.00%> (-3.06%) ⬇️
src/pudl/workspace/datastore.py 70.45% <0.00%> (+1.62%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 11d95c4...bae3905. Read the comment docs.

Copy link
Member

@bendnorman bendnorman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just so I understand, this datastore logic in etl_eia861 and etl_ferc714 will be removed once they are included in the main etl?

@zaneselvans
Copy link
Member Author

Yes that's right. This is just a hack because we weren't able to work with the EIA 861 under the current harvesting & entity resolution problem, but we got them to the point of being extracted and transformed and wanted the data to be available in some form.

Ultimately I think we want to be able to fully separate data production from data use, potentially even splitting the ETL package from a slimmed down user facing analytical package. Or a user might even just have the SQLite and Parquet files, and be working with some other toolset entirely.

@zaneselvans zaneselvans merged commit be1dbe3 into dev Oct 8, 2021
@zaneselvans zaneselvans deleted the pudltabl-datastore branch October 8, 2021 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datastore Managing the acquisition and organization of external raw data. eia861 Anything having to do with EIA Form 861 ferc714 Anything having to do with FERC Form 714 output Exporting data from PUDL into other platforms or interchange formats.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants