Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Transition CEMS paritions to year_quarter from year and quarter #3139

Merged
merged 12 commits into from Dec 11, 2023

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Dec 8, 2023

Overview

Closes #2973.

What problem does this address?

Really deals with the specific desire to change the partitions.

What did you change?

  • basically just the partitions from year & quarter to year_quarter.
  • THIS IS STILL A DRAFT bc i haven't been able to publish a new archive. I have a draft archive and a local archive that I was able to test everything on but this will break the builds rn because the datapackage.json doesn't actually exist on zenodo right now.

Testing

How did you make sure this worked? How can a reviewer verify this?

  • ran a full etl and validation tests

To-do list

Tasks

Edit tasklist title
Beta Give feedback Tasklist Tasks, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Make sure full ETL runs & make pytest-integration-full passes locally
    Options
  2. For major data coverage & analysis changes, run data validation tests
    Options
  3. If updating analyses or data processing functions: make sure to update or write data validation tests
    Options
  4. Update the release notes: reference the PR and related issues.
    Options
  5. Review the PR yourself and call out any questions or issues you have
    Options

test/unit/workspace/datastore_test.py Outdated Show resolved Hide resolved
Comment on lines 109 to 112
@property
def year(self):
"""Returns the year associated with this year_quarter partion."""
return pd.to_datetime(self.year_quarter).year
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Using a property to extract the year but self.year_quarter[-1] to extract the quarter feels confusing. Can we do the same thing for both of them?
  • It's not clear here what the expected structure of the "year quarter" is. Is it 2022q1? Or 2022_1 or 2022-1?
  • Given that there's now only a partition dimension, is there a reason for this class to be a NamedTuple? It could use an annotated string type, that explicitly indicates the expected format? Maybe something like:
class EpaCemsPartition(BaseModel):
    year_quarter: Annotated[str, StringConstraints(strict=True, pattern=r"^20\d{2}[qQ][1-4]$")]

    @property
    def year(self: Self) -> str:
        return self.year_quarter[0:4]

    @property
    def quarter(self: Self) -> str:
        return self.year_quarter[-1]

    def get_key(self: Self) -> str:
        return self.year_quarter

    def get_filters(self: Self) -> str:
        return {"year_quarter": self.year_quarter}

    def get_quarterly_file(self: Self) -> Path:
        """Return the name of the CSV file that holds annual hourly data."""
        return Path(f"epacems-{self.year}-{self.quarter}.csv")

Copy link
Member Author

@cmgosnell cmgosnell Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea i thought i was going to use the year property in more contexts but it turned out not to really be necessary. and I didn't make a quarter property because ideally we will be able to make a new cems archive soon with the file name more closely matching the partition so we can just do this:

    def get_quarterly_file(self: Self) -> Path:
        """Return the name of the CSV file that holds annual hourly data."""
        return Path(f"epacems-{self.year_quarter}.csv")

so instead of adding more properties, I'd rather just remove the year.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also doesn't this20\d{2} indicate the year can only be 2000 and on?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. Yes.

    year_quarter: Annotated[str, StringConstraints(strict=True, pattern=r"^(19|20)\d{2}[qQ][1-4]$")]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i am using the year property in other places.

i still think it is too much to add the quarter as an attribute. but i will do the extraction of the quarter in the same way (w/ to_datetime) to make it more consistent.

src/pudl/etl/epacems_assets.py Outdated Show resolved Hide resolved
src/pudl/etl/epacems_assets.py Outdated Show resolved Hide resolved
@cmgosnell cmgosnell marked this pull request as ready for review December 11, 2023 21:18
@cmgosnell cmgosnell merged commit 98861eb into cems-quarterly Dec 11, 2023
16 of 18 checks passed
@cmgosnell cmgosnell deleted the cems-year_quarters branch December 11, 2023 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

2 participants