Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter FERC714 ETL by year #2649

Merged
merged 3 commits into from Jun 10, 2023
Merged

Filter FERC714 ETL by year #2649

merged 3 commits into from Jun 10, 2023

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Jun 9, 2023

This issue partially addresses issue #2628 and blocking issues found in #2550. FERC 714 data is not time-subsetted, with the extractor reading in CSV files that are organized on a per-table basis. This code changes the extraction step to add the ability to filter by year on the record_yr column, which is present in all tables except for the respondent_id_ferc714 table. It also updates the fast ETL to include 2 years of data for the FERC 714 run: 2019 and 2020.

While this should not make large performance improvements in the current ETL, this should hopefully help with some of the memory constraints encountered in #2550.

PR Checklist

  • Merge the most recent version of the branch you are merging into (probably dev).
  • All CI checks are passing. Run tests locally to debug failures
  • Make sure you've included good docstrings.
  • For major data coverage & analysis changes, run data validation tests
  • Include unit tests for new functions and classes.
  • Defensive data quality/sanity checks in analyses & data processing functions.
  • Update the release notes and reference reference the PR and related issues.
  • Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@e-belfer e-belfer changed the base branch from main to dev June 9, 2023 01:30
@codecov
Copy link

codecov bot commented Jun 9, 2023

Codecov Report

Patch coverage: 100.0% and project coverage change: -0.1 ⚠️

Comparison is base (5183494) 87.1% compared to head (59cf4c3) 87.1%.

❗ Current head 59cf4c3 differs from pull request most recent head d6f2963. Consider uploading reports for the commit d6f2963 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             dev   #2649     +/-   ##
=======================================
- Coverage   87.1%   87.1%   -0.1%     
=======================================
  Files         86      86             
  Lines      10001   10004      +3     
=======================================
+ Hits        8716    8717      +1     
- Misses      1285    1287      +2     
Impacted Files Coverage Δ
src/pudl/extract/ferc714.py 100.0% <100.0%> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@e-belfer e-belfer requested a review from zschira June 9, 2023 12:11
@e-belfer e-belfer linked an issue Jun 9, 2023 that may be closed by this pull request
2 tasks
@e-belfer e-belfer marked this pull request as ready for review June 9, 2023 12:23
@e-belfer e-belfer self-assigned this Jun 9, 2023
@e-belfer e-belfer added this to the 2023 Spring milestone Jun 9, 2023
Copy link
Member

@zschira zschira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

ds = context.resources.datastore
ferc714_settings = context.resources.dataset_settings.ferc714
years = ", ".join(map(str, ferc714_settings.years))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very satisfying line to me

@e-belfer e-belfer merged commit 9a7d9a0 into dev Jun 10, 2023
4 of 6 checks passed
@e-belfer e-belfer deleted the 714-year-filter branch June 10, 2023 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Enable filtering by year for EIA 861 and FERC 714 ETLs
2 participants