-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cems partition tweaks #222
Conversation
# Create zipfile to store year/quarter combinations of files | ||
filename = f"epacems-{year}-{quarter}.csv" | ||
# filename = f"epacems-{year}q{quarter}.csv" | ||
archive_path = self.download_directory / f"epacems-{year}-{quarter}.zip" | ||
# archive_path = self.download_directory / f"epacems-{year}q{quarter}.zip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left these commented out file name and path updates because in an ideal world we would convert all of the file names to be more consistent with the partition. this is not a required change because the datastore knows how to grab a file based on the partition regardless of if the partition and the filename match because of the info stored in the datapackage.json. Nonetheless this would still be a good change to make if we could easily fully update the file names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these comments as-is would be kind of confusing in the future, since they don't have context as to why they're commented out.
When we make the next quarterly updates to the archive, we'll need to re-download and re-upload all the files anyways, right? Could we rename everything then? If we can, then I think we should probably go ahead and change the code now. Otherwise, we should probably leave some more context for these comments - either some words in a # TODO:
or a full GH issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is a very good point! I guess I am partial to adding a comment in there with a # TODO:
and creating an issue once we talk about how to make re-making this archive a less flaky thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits to pick about your comments but otherwise good! And my condolences for having to deal with long CEMS downloads / Zenodo uploads.
# Create zipfile to store year/quarter combinations of files | ||
filename = f"epacems-{year}-{quarter}.csv" | ||
# filename = f"epacems-{year}q{quarter}.csv" | ||
archive_path = self.download_directory / f"epacems-{year}-{quarter}.zip" | ||
# archive_path = self.download_directory / f"epacems-{year}q{quarter}.zip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these comments as-is would be kind of confusing in the future, since they don't have context as to why they're commented out.
When we make the next quarterly updates to the archive, we'll need to re-download and re-upload all the files anyways, right? Could we rename everything then? If we can, then I think we should probably go ahead and change the code now. Otherwise, we should probably leave some more context for these comments - either some words in a # TODO:
or a full GH issue.
Co-authored-by: Dazhong Xia <dazhong.xia@catalyst.coop>
…udl-archiver into cems_year_quarter
this is attached to catalyst-cooperative/pudl#3139
I triiiiiedd to make a new archive version which both updated the partition from
year
&quarter
->year_quarter
as well as update the file names to be more in line with the partition update.But renaming the file names means a much more difficult archive update because it breaks the connection between the old files and the new ones. So I reverted that and just updated the datapackage.json with @jdangerx 's postman help. I left the desired filenames as commented out lines in the pr.
I also added small i think updates to the readme to add some additional pointers for cats who haven't interacted w/ the archiver in some time.