New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Transition CEMS paritions to year_quarter
from year
and quarter
#3139
Conversation
src/pudl/extract/epacems.py
Outdated
@property | ||
def year(self): | ||
"""Returns the year associated with this year_quarter partion.""" | ||
return pd.to_datetime(self.year_quarter).year |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Using a property to extract the
year
butself.year_quarter[-1]
to extract thequarter
feels confusing. Can we do the same thing for both of them? - It's not clear here what the expected structure of the "year quarter" is. Is it
2022q1
? Or2022_1
or2022-1
? - Given that there's now only a partition dimension, is there a reason for this class to be a
NamedTuple
? It could use an annotated string type, that explicitly indicates the expected format? Maybe something like:
class EpaCemsPartition(BaseModel):
year_quarter: Annotated[str, StringConstraints(strict=True, pattern=r"^20\d{2}[qQ][1-4]$")]
@property
def year(self: Self) -> str:
return self.year_quarter[0:4]
@property
def quarter(self: Self) -> str:
return self.year_quarter[-1]
def get_key(self: Self) -> str:
return self.year_quarter
def get_filters(self: Self) -> str:
return {"year_quarter": self.year_quarter}
def get_quarterly_file(self: Self) -> Path:
"""Return the name of the CSV file that holds annual hourly data."""
return Path(f"epacems-{self.year}-{self.quarter}.csv")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea i thought i was going to use the year property in more contexts but it turned out not to really be necessary. and I didn't make a quarter property because ideally we will be able to make a new cems archive soon with the file name more closely matching the partition so we can just do this:
def get_quarterly_file(self: Self) -> Path:
"""Return the name of the CSV file that holds annual hourly data."""
return Path(f"epacems-{self.year_quarter}.csv")
so instead of adding more properties, I'd rather just remove the year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also doesn't this20\d{2}
indicate the year can only be 2000 and on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops. Yes.
year_quarter: Annotated[str, StringConstraints(strict=True, pattern=r"^(19|20)\d{2}[qQ][1-4]$")]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i am using the year property in other places.
i still think it is too much to add the quarter as an attribute. but i will do the extraction of the quarter in the same way (w/ to_datetime
) to make it more consistent.
…pudl into cems-year_quarters
…pudl into cems-year_quarters
Overview
Closes #2973.
What problem does this address?
Really deals with the specific desire to change the partitions.
What did you change?
year
&quarter
toyear_quarter
.Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list
Tasks
make pytest-integration-full
passes locally