Skip to content

POC: Set up culture cron as a cloud function #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Conversation

jwbron
Copy link

@jwbron jwbron commented May 12, 2025

Summary:

This implements a cloud function that runs the culture-cron script, which currently runs on toby. The function is triggered via pubsub, and the schedule is set by a cloud scheduler.

Setting up terraform CI is out of scope for this POC. The intent is to show what's required to get a cron-like job up and running in GCP.

Issue: INFRA-10650

Test plan:

@jwbron jwbron self-assigned this May 12, 2025
@jwbron jwbron requested a review from a team May 12, 2025 18:41
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are required in this file to allow it to be called as a cloud function and ensure we're only raising errors that are transient, so only errors that can potentially be recovered from will be retried.

Copy link
Member

@csilvers csilvers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I totally understand how all these parts fit together, so I can't really judge this yet. But what is your take on how this looks compared to the github-actions approach?

"""
try:
with urllib.request.urlopen(_CULTURE_MESSAGES_CSV_URL) as response:
if response.status != 200:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda outside the scope of this PR, but 5xx errors should be Transient, and 4xx and 3xx Permanent, like you do below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe ignore the details here, this was quickly thrown together. This is just meant to convey the fact that we'll need to update scripts to have better error handling if we use cloud functions.

member = "serviceAccount:${google_service_account.culture_cron.email}"
}

resource "google_project_iam_member" "secret_accessor" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this let the service account access all secrets? Can we limit it to just the secrets we secrets we care about?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, note that this is still a pretty rough outline that I put together in 5-10 minutes. It's meant to convey the idea. If you think it's worth the time, I can go through and get this more or less prod-ready, but it'll look pretty similar.


resource "google_project_iam_member" "pubsub_publisher" {
project = var.project_id
role = "roles/pubsub.publisher"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, maybe only allow this to publish to a particular topic?

@jwbron
Copy link
Author

jwbron commented May 13, 2025

what is your take on how this looks compared to the github-actions approach?

High-level, my opinion hasn't changed. The github actions setup:

  • was pretty straightforward
  • will have better execution visibility
  • will probably be less intimidating to developers
  • has the logging limitations we've talked about elsewhere

This setup:

  • requires much more knowledge of terraform, which I think isn't as simple to grasp as github actions
  • has similar code visibility but again, requires decent working knowledge of terraform
  • will be faster to start up and run since it's a cloud function vs a cloud run job
  • has the requirement that we update whatever scripts we're running to have robust error handling that only causes functions to fail on transient errors, which we can retry
  • generally has a more complicated setup overhead

My takeaway is:

  • if we want to allow for more flexible execution environments (support for cloud functions, cloud run jobs, tasks, etc.), we may prefer this cloud scheduler setup, but we should note that it will require folks to have decent working knowledge of terraform
  • if we want to focus on a simple setup with minimal code, better execution visibility, and baked-in support for task orchestration, we would prefer github actions

Considering that we're already using github actions for CI and other flows, my recommendation would be that we should support both workflows. If someone wants to setup a workflow that requires complex orchestration that would require something like cloud tasks, it would be way easier to set it up as a github actions workflow. If someone wants a super lightweight, fast-running task, setting up a cloud function like this makes a lot of sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants