Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add run launcher for GCP Cloud Run Jobs #21864

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions examples/deploy_cloud_run/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM python:3.10-slim

ARG PIPELINE_DIR

RUN mkdir -p /opt/dagster/app

WORKDIR /opt/dagster/app

COPY ${PIPELINE_DIR} .

RUN pip install \
dagster \
dagster-postgres \
-e .

ENV DAGSTER_HOME=/opt/dagster/app/
68 changes: 68 additions & 0 deletions examples/deploy_cloud_run/cloud_run_pipeline/dagster.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
scheduler:
module: dagster.core.scheduler
class: DagsterDaemonScheduler

run_coordinator:
module: dagster.core.run_coordinator
class: QueuedRunCoordinator

run_launcher:
module: dagster_gcp.cloud_run
class: CloudRunRunLauncher
config:
# PROJECT
project: my-gcp-project
# REGION
region: my-region
job_name_by_code_location:
pipeline1: pipeline1-job
pipeline2: pipeline2-job

run_storage:
module: dagster_postgres.run_storage
class: PostgresRunStorage
config:
postgres_db:
hostname:
env: DAGSTER_PG_HOST
username:
env: DAGSTER_PG_USERNAME
password:
env: DAGSTER_PG_PASSWORD
db_name:
env: DAGSTER_PG_DB
port: 5432

schedule_storage:
module: dagster_postgres.schedule_storage
class: PostgresScheduleStorage
config:
postgres_db:
hostname:
env: DAGSTER_PG_HOST
username:
env: DAGSTER_PG_USERNAME
password:
env: DAGSTER_PG_PASSWORD
db_name:
env: DAGSTER_PG_DB
port: 5432

event_log_storage:
module: dagster_postgres.event_log
class: PostgresEventLogStorage
config:
postgres_db:
hostname:
env: DAGSTER_PG_HOST
username:
env: DAGSTER_PG_USERNAME
password:
env: DAGSTER_PG_PASSWORD
db_name:
env: DAGSTER_PG_DB
port: 5432

run_monitoring:
enabled: true
poll_interval_seconds: 10
47 changes: 47 additions & 0 deletions examples/deploy_cloud_run/cloud_run_pipeline/pipeline1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# pipeline1

This is a [Dagster](https://dagster.io/) project scaffolded with [`dagster project scaffold`](https://docs.dagster.io/getting-started/create-new-project).

## Getting started

First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in ["editable mode"](https://pip.pypa.io/en/latest/topics/local-project-installs/#editable-installs) so that as you develop, local code changes will automatically apply.

```bash
pip install -e ".[dev]"
```

Then, start the Dagster UI web server:

```bash
dagster dev
```

Open http://localhost:3000 with your browser to see the project.

You can start writing assets in `pipeline1/assets.py`. The assets are automatically loaded into the Dagster code location as you define them.

## Development

### Adding new Python dependencies

You can specify new Python dependencies in `setup.py`.

### Unit testing

Tests are in the `pipeline1_tests` directory and you can run tests using `pytest`:

```bash
pytest pipeline1_tests
```

### Schedules and sensors

If you want to enable Dagster [Schedules](https://docs.dagster.io/concepts/partitions-schedules-sensors/schedules) or [Sensors](https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors) for your jobs, the [Dagster Daemon](https://docs.dagster.io/deployment/dagster-daemon) process must be running. This is done automatically when you run `dagster dev`.

Once your Dagster Daemon is running, you can start turning on schedules and sensors for your jobs.

## Deploy on Dagster Cloud

The easiest way to deploy your Dagster project is to use Dagster Cloud.

Check out the [Dagster Cloud Documentation](https://docs.dagster.cloud) to learn more.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from dagster import Definitions, load_assets_from_modules

from . import assets

all_assets = load_assets_from_modules([assets])

defs = Definitions(
assets=all_assets,
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from dagster import asset

@asset
def asset_1():
pass
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[tool.dagster]
module_name = "pipeline1"
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[metadata]
name = pipeline1
11 changes: 11 additions & 0 deletions examples/deploy_cloud_run/cloud_run_pipeline/pipeline1/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from setuptools import find_packages, setup

setup(
name="pipeline1",
packages=find_packages(exclude=["pipeline1_tests"]),
install_requires=[
"dagster",
"dagster-cloud"
],
extras_require={"dev": ["dagster-webserver", "pytest"]},
)
47 changes: 47 additions & 0 deletions examples/deploy_cloud_run/cloud_run_pipeline/pipeline2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# pipeline2

This is a [Dagster](https://dagster.io/) project scaffolded with [`dagster project scaffold`](https://docs.dagster.io/getting-started/create-new-project).

## Getting started

First, install your Dagster code location as a Python package. By using the --editable flag, pip will install your Python package in ["editable mode"](https://pip.pypa.io/en/latest/topics/local-project-installs/#editable-installs) so that as you develop, local code changes will automatically apply.

```bash
pip install -e ".[dev]"
```

Then, start the Dagster UI web server:

```bash
dagster dev
```

Open http://localhost:3000 with your browser to see the project.

You can start writing assets in `pipeline2/assets.py`. The assets are automatically loaded into the Dagster code location as you define them.

## Development

### Adding new Python dependencies

You can specify new Python dependencies in `setup.py`.

### Unit testing

Tests are in the `pipeline2_tests` directory and you can run tests using `pytest`:

```bash
pytest pipeline2_tests
```

### Schedules and sensors

If you want to enable Dagster [Schedules](https://docs.dagster.io/concepts/partitions-schedules-sensors/schedules) or [Sensors](https://docs.dagster.io/concepts/partitions-schedules-sensors/sensors) for your jobs, the [Dagster Daemon](https://docs.dagster.io/deployment/dagster-daemon) process must be running. This is done automatically when you run `dagster dev`.

Once your Dagster Daemon is running, you can start turning on schedules and sensors for your jobs.

## Deploy on Dagster Cloud

The easiest way to deploy your Dagster project is to use Dagster Cloud.

Check out the [Dagster Cloud Documentation](https://docs.dagster.cloud) to learn more.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from dagster import Definitions, load_assets_from_modules

from . import assets

all_assets = load_assets_from_modules([assets])

defs = Definitions(
assets=all_assets,
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

from dagster import asset

@asset
def asset_2():
pass
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[tool.dagster]
module_name = "pipeline2"
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[metadata]
name = pipeline2
11 changes: 11 additions & 0 deletions examples/deploy_cloud_run/cloud_run_pipeline/pipeline2/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
from setuptools import find_packages, setup

setup(
name="pipeline2",
packages=find_packages(exclude=["pipeline2_tests"]),
install_requires=[
"dagster",
"dagster-cloud"
],
extras_require={"dev": ["dagster-webserver", "pytest"]},
)
7 changes: 7 additions & 0 deletions examples/deploy_cloud_run/cloud_run_pipeline/workspace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
load_from:
- python_module:
module_name: pipeline1
working_directory: pipeline1
- python_module:
module_name: pipeline2
working_directory: pipeline2
32 changes: 32 additions & 0 deletions examples/deploy_cloud_run/deploy_cloud_run_job.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash

PROJECT_ID="" # add your project id
FOLDER_NAME="dagster"
REGION="" # add your region
SERVICE_ACCOUNT_EMAIL="" # add your service account
PIPELINE_BASE_DIR="./cloud_run_pipeline"

for PIPELINE_DIR in $PIPELINE_BASE_DIR/*; do
if [ -d "$PIPELINE_DIR" ]; then
PIPELINE_NAME=$(basename "$PIPELINE_DIR")
IMAGE_NAME="dagster-$PIPELINE_NAME"
JOB_NAME="${PIPELINE_NAME}-job"

echo "Building Docker image for $PIPELINE_NAME..."
docker build -t $IMAGE_NAME -f Dockerfile --build-arg PIPELINE_DIR=$PIPELINE_DIR --platform=linux/amd64 .

echo "Tagging Docker image for $PIPELINE_NAME..."
docker tag $IMAGE_NAME $REGION-docker.pkg.dev/$PROJECT_ID/$FOLDER_NAME/$IMAGE_NAME:latest

echo "Pushing Docker image for $PIPELINE_NAME to Google Artifact Registry..."
docker push $REGION-docker.pkg.dev/$PROJECT_ID/$FOLDER_NAME/$IMAGE_NAME:latest

echo "Creating Cloud Run job for $PIPELINE_NAME..."
gcloud beta run jobs create $JOB_NAME \
--image=$REGION-docker.pkg.dev/$PROJECT_ID/$FOLDER_NAME/$IMAGE_NAME:latest \
--region=$REGION \
--service-account=$SERVICE_ACCOUNT_EMAIL \
--project=$PROJECT_ID \
--set-secrets="DAGSTER_PG_HOST=DAGSTER_PG_HOST:latest,DAGSTER_PG_USERNAME=DAGSTER_PG_USERNAME:latest,DAGSTER_PG_PASSWORD=DAGSTER_PG_PASSWORD:latest,DAGSTER_PG_DB=DAGSTER_PG_DB:latest"
fi
done
73 changes: 73 additions & 0 deletions examples/deploy_cloud_run/deploy_vm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# !/bin/bash

VM_NAME="dagster-vm"
ZONE="" # add your zone
PROJECT_ID="" # add your project id
LOCAL_FILE_PATH="./cloud_run_pipeline/*"
DAGSTER_GCP_PATH="../../python_modules/libraries/dagster-gcp/*"
REMOTE_DAGSTER_GCP_PATH="/opt/dagster/app/python_modules/libraries/dagster_gcp"
REMOTE_DIR="/opt/dagster/app"
SERVICE_ACCOUNT_EMAIL="" # service account used must have the right to launch a cloud run job and access secrets from secret manager
SCOPES="cloud-platform"

gcloud compute instances create $VM_NAME \
--zone=$ZONE \
--machine-type=e2-micro \
--image-family=ubuntu-2204-lts \
--image-project=ubuntu-os-cloud \
--project=$PROJECT_ID \
--service-account=$SERVICE_ACCOUNT_EMAIL \
--scopes=$SCOPES

echo "waiting for VM to be created..."
sleep 40

gcloud compute ssh $VM_NAME --zone=$ZONE --command="
sudo mkdir -p $REMOTE_DIR $REMOTE_DAGSTER_GCP_PATH
sudo chown -R $USER $REMOTE_DIR $REMOTE_DAGSTER_GCP_PATH
" --project=$PROJECT_ID

gcloud compute scp --recurse $LOCAL_FILE_PATH ${VM_NAME}:$REMOTE_DIR \
--zone=$ZONE \
--project=$PROJECT_ID

gcloud compute scp --recurse $DAGSTER_GCP_PATH ${VM_NAME}:$REMOTE_DAGSTER_GCP_PATH \
--zone=$ZONE \
--project=$PROJECT_ID

gcloud compute ssh $VM_NAME --zone=$ZONE --command="
# Update package list and install prerequisites
sudo apt-get update
sudo apt-get install -y python3.10 python3-venv python3-pip python3.10-distutils

python3 -m pip install --upgrade pip

# Create and activate a virtual environment with Python 3.10
python3 -m venv .venv
source .venv/bin/activate

# fetch secrets for postgres db from secret manager
export DAGSTER_PG_HOST=\$(gcloud secrets versions access latest --secret='DAGSTER_PG_HOST' --project=$PROJECT_ID)
export DAGSTER_PG_USERNAME=\$(gcloud secrets versions access latest --secret='DAGSTER_PG_USERNAME' --project=$PROJECT_ID)
export DAGSTER_PG_PASSWORD=\$(gcloud secrets versions access latest --secret='DAGSTER_PG_PASSWORD' --project=$PROJECT_ID)
export DAGSTER_PG_DB=\$(gcloud secrets versions access latest --secret='DAGSTER_PG_DB' --project=$PROJECT_ID)

# Install Dagster
pip install dagster-postgres dagster-webserver
# install dagster-gcp
pip install $REMOTE_DAGSTER_GCP_PATH

echo 'installed dagster'

export DAGSTER_HOME=$REMOTE_DIR
cd $REMOTE_DIR

echo 'starting dagster daemon'
# Start the Dagster daemon
nohup dagster-daemon run &

echo 'starting dagster webserver'
# Start the dagster webserver
nohup dagster-webserver -h 0.0.0.0 -p 3000

" --project=$PROJECT_ID
1 change: 1 addition & 0 deletions pyright/master/requirements-pinned.txt
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,7 @@ google-auth-httplib2==0.2.0
google-auth-oauthlib==1.2.0
google-cloud-bigquery==3.21.0
google-cloud-core==2.4.1
google-cloud-run==0.10.5
google-cloud-storage==2.16.0
google-crc32c==1.5.0
google-re2==1.1.20240501
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .run_launcher import CloudRunRunLauncher
Loading