Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cems partition tweaks #222

Merged
merged 8 commits into from
Dec 13, 2023
Merged

cems partition tweaks #222

merged 8 commits into from
Dec 13, 2023

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Dec 8, 2023

this is attached to catalyst-cooperative/pudl#3139

I triiiiiedd to make a new archive version which both updated the partition from year & quarter -> year_quarter as well as update the file names to be more in line with the partition update.

But renaming the file names means a much more difficult archive update because it breaks the connection between the old files and the new ones. So I reverted that and just updated the datapackage.json with @jdangerx 's postman help. I left the desired filenames as commented out lines in the pr.

I also added small i think updates to the readme to add some additional pointers for cats who haven't interacted w/ the archiver in some time.

Comment on lines 74 to 78
# Create zipfile to store year/quarter combinations of files
filename = f"epacems-{year}-{quarter}.csv"
# filename = f"epacems-{year}q{quarter}.csv"
archive_path = self.download_directory / f"epacems-{year}-{quarter}.zip"
# archive_path = self.download_directory / f"epacems-{year}q{quarter}.zip"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left these commented out file name and path updates because in an ideal world we would convert all of the file names to be more consistent with the partition. this is not a required change because the datastore knows how to grab a file based on the partition regardless of if the partition and the filename match because of the info stored in the datapackage.json. Nonetheless this would still be a good change to make if we could easily fully update the file names.

Copy link
Member

@jdangerx jdangerx Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these comments as-is would be kind of confusing in the future, since they don't have context as to why they're commented out.

When we make the next quarterly updates to the archive, we'll need to re-download and re-upload all the files anyways, right? Could we rename everything then? If we can, then I think we should probably go ahead and change the code now. Otherwise, we should probably leave some more context for these comments - either some words in a # TODO: or a full GH issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is a very good point! I guess I am partial to adding a comment in there with a # TODO: and creating an issue once we talk about how to make re-making this archive a less flaky thing to do.

@cmgosnell cmgosnell marked this pull request as ready for review December 12, 2023 15:18
Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits to pick about your comments but otherwise good! And my condolences for having to deal with long CEMS downloads / Zenodo uploads.

README.md Outdated Show resolved Hide resolved
Comment on lines 74 to 78
# Create zipfile to store year/quarter combinations of files
filename = f"epacems-{year}-{quarter}.csv"
# filename = f"epacems-{year}q{quarter}.csv"
archive_path = self.download_directory / f"epacems-{year}-{quarter}.zip"
# archive_path = self.download_directory / f"epacems-{year}q{quarter}.zip"
Copy link
Member

@jdangerx jdangerx Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these comments as-is would be kind of confusing in the future, since they don't have context as to why they're commented out.

When we make the next quarterly updates to the archive, we'll need to re-download and re-upload all the files anyways, right? Could we rename everything then? If we can, then I think we should probably go ahead and change the code now. Otherwise, we should probably leave some more context for these comments - either some words in a # TODO: or a full GH issue.

@cmgosnell cmgosnell merged commit 6916bbf into main Dec 13, 2023
3 checks passed
@cmgosnell cmgosnell deleted the cems_year_quarter branch December 13, 2023 21:17
@e-belfer e-belfer mentioned this pull request Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

2 participants