cems partition tweaks #222

cmgosnell · 2023-12-08T20:55:36Z

this is attached to catalyst-cooperative/pudl#3139

I triiiiiedd to make a new archive version which both updated the partition from year & quarter -> year_quarter as well as update the file names to be more in line with the partition update.

But renaming the file names means a much more difficult archive update because it breaks the connection between the old files and the new ones. So I reverted that and just updated the datapackage.json with @jdangerx 's postman help. I left the desired filenames as commented out lines in the pr.

I also added small i think updates to the readme to add some additional pointers for cats who haven't interacted w/ the archiver in some time.

dataset_doi.yaml

cmgosnell · 2023-12-11T17:17:18Z

src/pudl_archiver/archivers/epacems.py

        # Create zipfile to store year/quarter combinations of files
        filename = f"epacems-{year}-{quarter}.csv"
+        # filename = f"epacems-{year}q{quarter}.csv"
        archive_path = self.download_directory / f"epacems-{year}-{quarter}.zip"
+        # archive_path = self.download_directory / f"epacems-{year}q{quarter}.zip"


I left these commented out file name and path updates because in an ideal world we would convert all of the file names to be more consistent with the partition. this is not a required change because the datastore knows how to grab a file based on the partition regardless of if the partition and the filename match because of the info stored in the datapackage.json. Nonetheless this would still be a good change to make if we could easily fully update the file names.

I think these comments as-is would be kind of confusing in the future, since they don't have context as to why they're commented out.

When we make the next quarterly updates to the archive, we'll need to re-download and re-upload all the files anyways, right? Could we rename everything then? If we can, then I think we should probably go ahead and change the code now. Otherwise, we should probably leave some more context for these comments - either some words in a # TODO: or a full GH issue.

yeah this is a very good point! I guess I am partial to adding a comment in there with a # TODO: and creating an issue once we talk about how to make re-making this archive a less flaky thing to do.

src/pudl_archiver/archivers/epacems.py

jdangerx

Nits to pick about your comments but otherwise good! And my condolences for having to deal with long CEMS downloads / Zenodo uploads.

README.md

jdangerx · 2023-12-12T15:49:25Z

src/pudl_archiver/archivers/epacems.py

        # Create zipfile to store year/quarter combinations of files
        filename = f"epacems-{year}-{quarter}.csv"
+        # filename = f"epacems-{year}q{quarter}.csv"
        archive_path = self.download_directory / f"epacems-{year}-{quarter}.zip"
+        # archive_path = self.download_directory / f"epacems-{year}q{quarter}.zip"


I think these comments as-is would be kind of confusing in the future, since they don't have context as to why they're commented out.

When we make the next quarterly updates to the archive, we'll need to re-download and re-upload all the files anyways, right? Could we rename everything then? If we can, then I think we should probably go ahead and change the code now. Otherwise, we should probably leave some more context for these comments - either some words in a # TODO: or a full GH issue.

Co-authored-by: Dazhong Xia <dazhong.xia@catalyst.coop>

…udl-archiver into cems_year_quarter

cmgosnell and others added 3 commits December 8, 2023 15:54

cems partition tweaks

8762ab4

Update Zenodo API collections

c43986f

Merge branch 'update-zenodo-collections' into cems_year_quarter

30e4e1d

cmgosnell commented Dec 11, 2023

View reviewed changes

dataset_doi.yaml Show resolved Hide resolved

cmgosnell commented Dec 11, 2023

View reviewed changes

src/pudl_archiver/archivers/epacems.py Show resolved Hide resolved

cmgosnell requested a review from jdangerx December 12, 2023 15:15

cmgosnell marked this pull request as ready for review December 12, 2023 15:18

Merge branch 'main' into cems_year_quarter

172aa51

jdangerx requested changes Dec 12, 2023

View reviewed changes

cmgosnell and others added 4 commits December 12, 2023 09:00

Update README.md

299276d

Co-authored-by: Dazhong Xia <dazhong.xia@catalyst.coop>

add TODO for updating the CEMS file names

f02ac1b

Merge branch 'main' into cems_year_quarter

16240df

Merge branch 'cems_year_quarter' of github.com:catalyst-cooperative/p…

198d0f6

…udl-archiver into cems_year_quarter

cmgosnell requested a review from jdangerx December 12, 2023 19:05

cmgosnell enabled auto-merge December 12, 2023 19:58

jdangerx approved these changes Dec 13, 2023

View reviewed changes

cmgosnell merged commit 6916bbf into main Dec 13, 2023
3 checks passed

cmgosnell deleted the cems_year_quarter branch December 13, 2023 21:17

e-belfer mentioned this pull request Dec 18, 2023

Validate cems archive #232

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cems partition tweaks #222

cems partition tweaks #222

cmgosnell commented Dec 8, 2023 •

edited

cmgosnell Dec 11, 2023

jdangerx Dec 12, 2023 •

edited

cmgosnell Dec 12, 2023

jdangerx left a comment

jdangerx Dec 12, 2023 •

edited

cems partition tweaks #222

cems partition tweaks #222

Conversation

cmgosnell commented Dec 8, 2023 • edited

cmgosnell Dec 11, 2023

Choose a reason for hiding this comment

jdangerx Dec 12, 2023 • edited

Choose a reason for hiding this comment

cmgosnell Dec 12, 2023

Choose a reason for hiding this comment

jdangerx left a comment

Choose a reason for hiding this comment

jdangerx Dec 12, 2023 • edited

Choose a reason for hiding this comment

cmgosnell commented Dec 8, 2023 •

edited

jdangerx Dec 12, 2023 •

edited

jdangerx Dec 12, 2023 •

edited