Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create github action to sync zenodo caches #1935

Merged
merged 10 commits into from
Sep 22, 2022
Merged

Conversation

bendnorman
Copy link
Member

@bendnorman bendnorman commented Sep 16, 2022

This PR:

  • Sends the pudl-deployments slack channel the log file
  • Loads outputs to nightly-build-outputs.catalyst.coop bucket instead of the pudl-etl-logs bucket
  • Creates a github action that updates and syncs the GCS Zenodo caches and can be run manually on feature branches
  • Deploys new data to datasette if the build was successful

@codecov
Copy link

codecov bot commented Sep 16, 2022

Codecov Report

Base: 82.8% // Head: 82.8% // No change to project coverage 👍

Coverage data is based on head (0e1cf1c) compared to base (2ddd197).
Patch has no changes to coverable lines.

Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #1935   +/-   ##
=====================================
  Coverage   82.8%   82.8%           
=====================================
  Files         65      65           
  Lines       7436    7436           
=====================================
  Hits        6158    6158           
  Misses      1278    1278           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@zaneselvans
Copy link
Member

Hmm, I guess the service account needs another permission to access the requester-pays bucket.

@bendnorman
Copy link
Member Author

I'm a dingus, I forgot to add the gcloud set up step.

Comment on lines 6 to 8
push:
branches:
- zenodo-cache-action # For testing purposes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make more sense for this action to run on a push to dev instead of on a schedule? @zaneselvans

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's all either uploading data to GCS, or rsync-ing data between different GCS buckets, so there shouldn't be any cost associated with it running right? That does seem like it would be better for usability. Like as soon as you've got stuff working with the new archive locally, you make a PR, and as soon as that PR with the new archive DOIs in it shows up on dev it's available in the cache for CI, the nightly builds, etc.

If it's set run on push to dev, does it also get run when someone makes (or finally merges) a PR into dev? Would it make sense to have it run on pull_request and require the sync to finish before the tests run, so that when you're integrating a new DOI / archive all the CI that happens before the new DOI makes it to dev can still rely on the zenodo cache?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no cost associated with running the action.

IIRC, if it's set to run on push to dev, it will run when someone merges a PR into dev. I think it makes sense for this to run before the ci tests run. What is the best way to do this? Use workflow_run?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the actions are in separate workflows then yeah I think workflow_run is the tool for it.

What I was trying to get at is whether there's a good reason to defer the synchronization until the changes hit dev. If it doesn't take very long and doesn't cost us anything, then getting it synchronized earlier rather than later seems like it would be nice, since it'll avoid flakiness during the development process prior to the changes getting merged into dev. And it would only be syncing new archives that have actually been published on Zenodo, so we should have already eyeballed them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, so are you saying this action should run on any pull request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only reason we'd want to defer the sync until the changes hit dev is if a PR adds a DOI, but it doesn't get merged into dev then we'll have some unused archives in the GCS caches. We can always go in and delete them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience has been that's pretty rare, so I think not deferring sounds good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm workflow_run doesn't seem to be working :/ I also wonder if running zenodo-cache-sync on push and pull_request is be problematic because we'll have two runners writing and syncing to the GCS buckets.

@bendnorman
Copy link
Member Author

I also made some small changes to tackle #1651.

@bendnorman
Copy link
Member Author

This PR:

  • Sends the pudl-deployments slack channel the log file
  • Loads outputs to nightly-build-outputs.catalyst.coop bucket instead of the pudl-etl-logs bucket
  • Creates a github action that updates and syncs the GCS Zenodo caches and can be run manually on feature branches
  • Deploys new data to datasette if the build was successful

Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤞🏼

@bendnorman bendnorman merged commit 32caab0 into dev Sep 22, 2022
@bendnorman bendnorman deleted the zenodo-cache-action branch September 22, 2022 05:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants