-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename s3 bucket #2793
Rename s3 bucket #2793
Conversation
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## dev #2793 +/- ##
=====================================
Coverage 88.5% 88.5%
=====================================
Files 90 90
Lines 10139 10139
=====================================
Hits 8982 8982
Misses 1157 1157 ☔ View full report in Codecov by Sentry. |
I keep getting this certificate error when I run the
I decided to setup GCP Storage Transfer Service to copy the data from s3 to the GCS bucket. |
Ah, running into the ogr failure getting debugged in #2849 |
Is there a reason not to use a valid domain name in the |
The AWS open data startup script creates log buckets using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, though I can't see the Storage Transfer job in the Google Cloud console due to permissions stuff. It would be nice to set that up through Terraform, too, instead of manually, but that shouldn't block this already long-suffering PR.
Looks like the storage transfer job is exposed through the Terraform provider: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_transfer_job
echo "Copying outputs to GCP intake bucket" | ||
gsutil -m -u $GCP_BILLING_PROJECT cp -r "$PUDL_OUTPUT/*" "gs://intake.catalyst.coop/$GITHUB_REF" | ||
function copy_outputs_to_distribution_bucket() { | ||
echo "Copying outputs to GCP distribution bucket" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"distribution" means "user-facing", right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes!
docs/dev/nightly_data_builds.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There's still a few references to "PUDL Intake catalogs" in the non-code sections of this doc, should we rename them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point. I'll remove the references since we aren't supporting the intake catalogs.
Sounds good! Yeah, I wanted to use Terraform but thought it would drag the PR on for longer :/ |
PR Overview
This PR:
s3://pudl.catalyst.coop
andgs://pudl.catalyst.coop
. The build script still writes outputs tos3://intake.catalyst.coop
for backward compatibility for folks still using intake.catalyst.coop. We'll deprecate it at a later date.intake.catalyst.coop
topud.catalyst.coop
in docs.Things that changed in the cloud:
s3://pudl.catalyst.coop
andgs://pudl.catalyst.coop
.s3://pudl.catalyst.coop
calledpudl.catalyst.coop-logs
. The AWS Open Data Program requires we have a 1 month retention policy on the log bucket. I created a GCP Storage Transfer job that copies the s3 logs daily to a GCS bucket calledpudl-s3-logs.catalyst.coop
so we can retain logs for longer than a month.PR Checklist
dev
).