-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create github action to sync zenodo caches #1935
Conversation
Codecov ReportBase: 82.8% // Head: 82.8% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## dev #1935 +/- ##
=====================================
Coverage 82.8% 82.8%
=====================================
Files 65 65
Lines 7436 7436
=====================================
Hits 6158 6158
Misses 1278 1278 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Hmm, I guess the service account needs another permission to access the requester-pays bucket. |
I'm a dingus, I forgot to add the gcloud set up step. |
push: | ||
branches: | ||
- zenodo-cache-action # For testing purposes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make more sense for this action to run on a push to dev instead of on a schedule? @zaneselvans
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's all either uploading data to GCS, or rsync-ing data between different GCS buckets, so there shouldn't be any cost associated with it running right? That does seem like it would be better for usability. Like as soon as you've got stuff working with the new archive locally, you make a PR, and as soon as that PR with the new archive DOIs in it shows up on dev
it's available in the cache for CI, the nightly builds, etc.
If it's set run on push to dev, does it also get run when someone makes (or finally merges) a PR into dev? Would it make sense to have it run on pull_request
and require the sync to finish before the tests run, so that when you're integrating a new DOI / archive all the CI that happens before the new DOI makes it to dev
can still rely on the zenodo cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no cost associated with running the action.
IIRC, if it's set to run on push to dev, it will run when someone merges a PR into dev. I think it makes sense for this to run before the ci tests run. What is the best way to do this? Use workflow_run
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the actions are in separate workflows then yeah I think workflow_run
is the tool for it.
What I was trying to get at is whether there's a good reason to defer the synchronization until the changes hit dev
. If it doesn't take very long and doesn't cost us anything, then getting it synchronized earlier rather than later seems like it would be nice, since it'll avoid flakiness during the development process prior to the changes getting merged into dev
. And it would only be syncing new archives that have actually been published on Zenodo, so we should have already eyeballed them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ok, so are you saying this action should run on any pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only reason we'd want to defer the sync until the changes hit dev
is if a PR adds a DOI, but it doesn't get merged into dev
then we'll have some unused archives in the GCS caches. We can always go in and delete them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My experience has been that's pretty rare, so I think not deferring sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm workflow_run
doesn't seem to be working :/ I also wonder if running zenodo-cache-sync
on push and pull_request is be problematic because we'll have two runners writing and syncing to the GCS buckets.
I also made some small changes to tackle #1651. |
This PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤞🏼
361b4dd
to
e9ca193
Compare
This PR:
pudl-deployments
slack channel the log filenightly-build-outputs.catalyst.coop
bucket instead of thepudl-etl-logs
bucket