Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add run launcher for GCP Cloud Run Jobs #21864

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

timchap
Copy link

@timchap timchap commented May 15, 2024

Summary & Motivation

I have been working on a cloud-native Dagster deployment for my company. I have opted to use Cloud Run Jobs for the run launcher(s) as it provides a batch containerised job environment which:

  • is highly managed
  • is highly scalable (consider increasing the Cloud Run Admin API quota, which is 10 req/min by default)
  • has zero idle cost
  • is fast to spin up
  • works elegantly in conjunction with Cloud Run Services for code server hosting

I noticed this discussion indicating that others are interested in a similar approach, which motivated this PR.

Note that on GCP a Cloud Run Job is a fairly persistent resource. That is, instead of creating a new Job resource for each run, it is preferable to have one persistent Job per code location and re-execute this job once per run with arg/env overrides. As such, the configuration of the Job itself (resources, image, service account) is not managed by Dagster in this implementation - instead, I do this with Terraform to ensure the proper coupling between code servers and run launchers. I may publish this as a public Terraform module when I have the chance, otherwise if there's a suitable way to share this within the Dagster project let me know.

How I Tested These Changes

I've been using the CloudRunRunLauncher in our corporate Dagster deployment for several months with no issues. I've included unit tests which mock the GCP clients. I am open to adding more integration-style tests which interact with cloud services directly, but I may need some guidance from the maintainers to do so (e.g. does the CI test environment have permissions for Cloud Run).

…sed on the multiple code location feature of the new run launcher
@timchap
Copy link
Author

timchap commented May 22, 2024

PR has been updated with scripts for an example deployment provided by @baumann-t.

We remain available for any other questions or change requests @garethbrickman. Thanks! 🙏

@clement-chaneching
Copy link

That's great, thank you for this PR, I was also waiting for cloud run jobs to be supported!
I'm also very interested in the terraform part, would it be possible for you to share it?

@AndreaGiardini
Copy link
Contributor

I would love to see this as well :)

@timchap
Copy link
Author

timchap commented Jul 3, 2024

For those who were interested in the Terraform module, I have finally organised my deployment into a module. I also prepared a walkthrough to assist in running a demo/POC deployment. It comes with a few caveats as you will see, but with a bit of tweaking you should be able to get a fairly fully-featured prod-ready Dagster deployment running on GCP managed services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants