# Scheduling Vertex Pipelines with Cloud Scheduler

## Setup

Pip install Kubeflow Pipelines SDK ([kfp](https://pypi.org/project/kfp/#history)) version v2.0.0b1 or higher ([required by Artifact Registry](https://cloud.google.com/vertex-ai/docs/pipelines/create-pipeline-template)) along with the Vertex AI SDK (aiplatform) and other required packages:

In [None]:
%pip install kfp==2.0.0b11 google-cloud-aiplatform==1.19.0 google-api-python-client==1.8.0 \
    --user \
    --index-url https://repository.walmart.com/repository/pypi-proxy/simple/ \
    --default-timeout 300

Restart kernel:

In [None]:
# Automatically restart kernel after installs
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

Verify your Kubeflow Pipelines SDK ([kfp](https://pypi.org/project/kfp/#history)) is version v2.0.0b1 or higher ([required by Artifact Registry](https://cloud.google.com/vertex-ai/docs/pipelines/create-pipeline-template)):

In [None]:
import kfp
kfp.__version__

Either manually set your PROJECT_ID or use the below `gcloud` command to retrieve it:

In [None]:
shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
PROJECT_ID=shell_output[0]

PROJECT_ID

Set the region or leave it as `us-central1`.

Note: If you change the region, make sure your network is configured to run in that region.

In [None]:
REGION="us-central1"

Set your GCS bucket name:

In [None]:
GCS_BUCKET_NAME="ADD YOUR GCS BUCKET NAME"
GCS_BUCKET_URI=f"gs://{GCS_BUCKET_NAME}"

**Only if your bucket doesn't already exist:** Run the following cell to create your Cloud Storage bucket

In [None]:
! gsutil mb -l $REGION $GCS_BUCKET_URI

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_URI

## Define and Compile Pipeline

Imports:

In [None]:
from kfp import compiler
from kfp.dsl import pipeline, component, Artifact, Dataset, Model, Input, Output, OutputPath, InputPath
from typing import NamedTuple

Set pipeline inputs or use suggested values set below:

In [None]:
PIPELINE_NETWORK="projects/12856960411/global/networks/vpcnet-private-svc-access-usc1"
PIPELINE_NAME="hello-world-pipeline"
PIPELINE_ROOT=f"{GCS_BUCKET_URI}/pipeline-root/{PIPELINE_NAME}"
PIPELINE_YAML="hello_world_pipeline.yaml"
PIPELINE_PARAMS={"text": "Hello World!"}

Hello world component:

In [None]:
@component
def hello_world(text: str):
    print(text)

Pipeline definition:

In [None]:
@pipeline(
    name=PIPELINE_NAME,
    description="Hello world example pipeline",
    pipeline_root=PIPELINE_ROOT,
)
def pipeline(text: str = "Hello world!"):
    hello_world(text=text)

Compile pipeline into YAML file:

In [None]:
compiler.Compiler().compile(
    pipeline_func=pipeline, package_path=PIPELINE_YAML
)

Take a look at the contents of the pipeline definition YAML:

In [None]:
! cat $PIPELINE_YAML

## Create kfp Artifact Registry and Upload Pipeline Template

Set the name for your kfp Artifact Registry or use sugggested value below:

In [None]:
KFP_REG_NAME="kfp-registry"

Create a kfp Artifact Registry if you don't already have one:

In [None]:
! gcloud artifacts repositories create $KFP_REG_NAME \
    --location=$REGION \
    --repository-format=KFP

Connect to registry via client:

In [None]:
from kfp.registry import RegistryClient

client = RegistryClient(host=f"https://{REGION}-kfp.pkg.dev/{PROJECT_ID}/{KFP_REG_NAME}")

Set pipeline template tags (like version) and generate template URL path from other inputs:

In [None]:
TEMPLATE_TAGS=["v1", "latest"]
TEMPLATE_PATH=f"https://{REGION}-kfp.pkg.dev/{PROJECT_ID}/{KFP_REG_NAME}/{PIPELINE_NAME}/{TEMPLATE_TAGS[0]}"

Upload pipeline template to the registry with extra headers like a description:

In [None]:
templateName, versionName = client.upload_pipeline(
    file_name=PIPELINE_YAML,
    tags=TEMPLATE_TAGS,
    extra_headers={"description":"This is an example hello world pipeline template."})

Setting default compute engine service account for pipeline:

In [None]:
shell_output = ! gcloud projects describe $PROJECT_ID
project_number = shell_output[-1].split(":")[1].strip().replace("'", "")
PIPELINE_SA = f"{project_number}-compute@developer.gserviceaccount.com"

PIPELINE_SA

If you'd like to use a custom service account for the pipeline (rather than the default compute engine service account), then add your custom service account below:

In [None]:
# PIPELINE_SA=""

## Run the Vertex Pipeline via SDK and CURL

Import the Vertex SDK (aiplatform) and run the Vertex pipeline with kfp Artifact Registry template path:

In [None]:
import google.cloud.aiplatform as vertex

In [None]:
job = vertex.PipelineJob(
    display_name=PIPELINE_NAME,
    template_path=TEMPLATE_PATH,
    project=PROJECT_ID,
    location=REGION,
    parameter_values=PIPELINE_PARAMS,
    enable_caching=False)

job.submit(network=PIPELINE_NETWORK, service_account=PIPELINE_SA)

Set the Vertex AI endpoint and auth token, and construct the URL and JSON body for the CURL command (refer to the [pipelineJobs REST API documentation](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.pipelineJobs) for more details):

In [None]:
ENDPOINT=f"https://{REGION}-aiplatform.googleapis.com/v1"
shell_output=!gcloud auth application-default print-access-token
AUTH_TOKEN=shell_output[0]
URL=f"{ENDPOINT}/projects/{PROJECT_ID}/locations/{REGION}/pipelineJobs"

RUNTIME_BODY={
    "displayName": PIPELINE_NAME,
    "runtimeConfig": {
            "parameterValues": PIPELINE_PARAMS,
            "gcsOutputDirectory": PIPELINE_ROOT,
    },
    "network": PIPELINE_NETWORK,
    "templateUri": TEMPLATE_PATH,
    "serviceAccount": PIPELINE_SA
}

RUNTIME_BODY

Run the Vertex pipeline via the below CURL command:

In [None]:
! curl -X POST $URL?pipelineJobId=$PIPELINE_NAME-$(date +%Y%m%d%H%M%S) -d "$RUNTIME_BODY" \
 -H "Content-Type: application/json" \
 -H "Authorization: Bearer $AUTH_TOKEN" -v

Note if you'd like to disable caching when running the pipeline via the REST API, you must edit the pipeline spec YAML file directly (set `enableCache` to `false`) and then re-upload the pipeline spec to the kfp artifact registry). See the example below:
```yaml
root:
  dag:
    tasks:
      hello-world:
        cachingOptions:
          enableCache: false
```

## Create Cloud Scheduler Job to Run Vertex Pipeline

Set Cloud Scheduler specific inputs like name, cron schedule, time zone, and service account (below command sets the default compute engine service account):

In [None]:
SCHEDULE_NAME=f"{PIPELINE_NAME}-http-schedule-csa"
SCHEDULE_CRON="0 */3 * * *"
SCHEDULE_TIME_ZONE="PST"

shell_output = ! gcloud projects describe $PROJECT_ID
project_number = shell_output[-1].split(":")[1].strip().replace("'", "")
SCHEDULE_SERVICE_ACCOUNT = f"{project_number}-compute@developer.gserviceaccount.com"

SCHEDULE_SERVICE_ACCOUNT

If you'd like to use a custom service account for the Cloud Scheduler job (rather than the default compute engine service account), then add your custom service account below:

In [None]:
# SCHEDULE_SERVICE_ACCOUNT=""

Note the Cloud Scheduler service account must have the `iam.serviceAccounts.actAs` permission for the pipeline service account (which is included in the `Service Account User role`). See [documentation](https://cloud.google.com/iam/docs/understanding-roles#iam.serviceAccountUser) for more details.

Create Cloud Scheduler job (see [documentation](https://cloud.google.com/sdk/gcloud/reference/scheduler/jobs/create/http) for details on the arguments):

In [None]:
! gcloud scheduler jobs create http $SCHEDULE_NAME \
    --schedule="$SCHEDULE_CRON" \
    --time-zone=$SCHEDULE_TIME_ZONE \
    --uri=$URL \
    --http-method=POST \
    --oauth-service-account-email=$SCHEDULE_SERVICE_ACCOUNT \
    --headers=Content-Type=application/json,User-Agent=Google-Cloud-Scheduler \
    --max-retry-attempts=2 \
    --message-body="$RUNTIME_BODY" \
    --location=$REGION

Manually trigger Cloud Scheduler job:

In [None]:
! gcloud scheduler jobs run $SCHEDULE_NAME \
    --location $REGION \
    --quiet

Describe the Cloud Scheduler job:

In [None]:
! gcloud scheduler jobs describe $SCHEDULE_NAME \
    --location $REGION

## Cleanup

Delete Cloud Scheduler job:

In [None]:
! gcloud scheduler jobs delete $SCHEDULE_NAME \
    --location $REGION \
    --quiet

**Only delete if you'd like to delete the entire kfp Artifact Registry that was listed above:**

In [None]:
! gcloud artifacts repositories delete $KFP_REG_NAME \
    --location $REGION \
    --quiet