In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# E2E ML on GCP: MLOps stage 3 : formalization: get started with Kubeflow Pipelines

<table align="left">
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage3/get_started_with_kubeflow_pipelines.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
    <td>
        <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage3/get_started_with_kubeflow_pipelines.ipynb">
        <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt="Colab logo"> Run in Colab
        </a>
  </td>
    
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage3/get_started_with_kubeflow_pipelines.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>
<br/><br/><br/>

## Overview


This tutorial demonstrates how to use Vertex AI for E2E MLOps on Google Cloud in production. This tutorial covers stage 3 : formalization: get started with Kubeflow Pipelines.

### Objective

In this tutorial, you learn how to use `Kubeflow Pipelines`(KFP).

This tutorial uses the following Google Cloud ML services:

- `Vertex AI Pipelines`

The steps performed include:

- Building KFP lightweight Python function components.
- Assembling and compiling KFP components into a pipeline.
- Executing a KFP pipeline using Vertex AI Pipelines.
- Loading component and pipeline definitions from a source code repository.
- Building sequential, parallel, multiple output components.
- Building control flow into pipelines.

### Costs
This tutorial uses billable components of Google Cloud:

- Vertex AI
- Cloud Storage

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage pricing](https://cloud.google.com/storage/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.

## Installations

Install the following packages for executing this MLOps notebooks.

In [None]:
import os

# The Google Cloud Notebook product has specific requirements
IS_GOOGLE_CLOUD_NOTEBOOK = os.path.exists("/opt/deeplearning/metadata/env_version")

# Google Cloud Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_GOOGLE_CLOUD_NOTEBOOK:
    USER_FLAG = "--user"
    
! pip3 install tensorflow-io==0.18 $USER_FLAG -q
! pip3 install --upgrade google-cloud-aiplatform \
                        pyarrow \
                        kfp $USER_FLAG -q

### Restart the kernel

Once you've installed the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`.

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

In [None]:
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID:", PROJECT_ID)

In [None]:
! gcloud config set project $PROJECT_ID

#### Region

You can also change the `REGION` variable, which is used for operations
throughout the rest of this notebook.  Below are regions supported for Vertex AI. We recommend that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.

Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
REGION = "[your-region]"  # @param {type: "string"}

if REGION == "[your-region]":
    REGION = "us-central1"

#### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append the timestamp onto the name of resources you create in this tutorial.

In [None]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

### Authenticate your Google Cloud account

**If you are using Google Cloud Notebooks**, your environment is already authenticated. Skip this step.

**If you are using Colab**, run the cell below and follow the instructions when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

In the Cloud Console, go to the [Create service account key](https://console.cloud.google.com/apis/credentials/serviceaccountkey) page.

1. **Click Create service account**.

2. In the **Service account name** field, enter a name, and click **Create**.

3. In the **Grant this service account access to project** section, click the Role drop-down list. Type "Vertex" into the filter box, and select **Vertex Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.

4. Click Create. A JSON file that contains your key downloads to your local environment.

5. Enter the path to your service account key as the GOOGLE_APPLICATION_CREDENTIALS variable in the cell below and run the cell.

In [None]:
# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

import os
import sys

# If on Google Cloud Notebook, then don't execute this code
if not os.path.exists("/opt/deeplearning/metadata/env_version"):
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

When you initialize the Vertex SDK for Python, you specify a Cloud Storage staging bucket. The staging bucket is where all the data associated with your dataset and model resources are retained across sessions.

Set the name of your Cloud Storage bucket below. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.

In [None]:
BUCKET_NAME = "[your-bucket-name]"  # @param {type:"string"}
BUCKET_URI = "gs://{}".format(BUCKET_NAME)

In [None]:
if BUCKET_URI == "" or BUCKET_URI is None or BUCKET_URI == "gs://[your-bucket-name]":
    BUCKET_URI = "gs://" + PROJECT_ID + "aip-" + TIMESTAMP

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $REGION $BUCKET_URI

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_URI

#### Service Account

You use a service account to create Vertex AI Pipeline jobs. If you do not want to use your project's Compute Engine service account, set `SERVICE_ACCOUNT` to another service account ID.

In [None]:
SERVICE_ACCOUNT = "[your-service-account]"  # @param {type:"string"}

In [None]:
if (
    SERVICE_ACCOUNT == ""
    or SERVICE_ACCOUNT == "[your-service-account]"
    or SERVICE_ACCOUNT is None
):
    shell_output = ! gcloud projects describe $PROJECT_ID | sed -nre 's:.*projectNumber\: (.*):\1:p'
    SERVICE_ACCOUNT = (
        shell_output[0].replace("'", "") + "-compute@developer.gserviceaccount.com"
    )

print("Service Account:", SERVICE_ACCOUNT)

#### Set service account access for Vertex AI Pipelines

Run the following commands to grant your service account access to read and write pipeline artifacts in the bucket that you created in the previous step. You only need to run this step once per service account.

In [None]:
! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectCreator $BUCKET_URI

! gsutil iam ch serviceAccount:{SERVICE_ACCOUNT}:roles/storage.objectViewer $BUCKET_URI

### Import libraries

In [None]:
from typing import NamedTuple

import google.cloud.aiplatform as aiplatform
import tensorflow as tf
from kfp import dsl
from kfp.v2 import compiler
from kfp.v2.dsl import component

### Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
aiplatform.init(project=PROJECT_ID, staging_bucket=BUCKET_URI)

## Pipeline introduction

Vertex AI Pipelines lets you orchestrate your machine learning (ML) workflows in a serverless manner. Pipelines are re-usable, and their executions and artifact generation can be tracked by Vertex AI Experiments and Vertex AI ML Metadata. With pipelines, you do the following:

    1. Design the pipeline workflow.
    2. Compile the pipeline.
    3. Schedule pipeline execution (or run now).
    4. Get the pipeline results.

Pipelines are designed using domain specific language (DSL). Vertex AI Pipelines support both KFP DSL and TFX DSL for designing pipelines.

In addition to designing components, you can use a wide variety of pre-built Google Cloud Pipeline Components for Vertex AI services.

Learn more about [Building a pipeline](https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline).

## Basic pipeline

This step demonstrates the basics of constructing and executing a pipeline. You do the following:

1. Design a simple Python function based component to output the input string.
2. Construct a pipeline that uses the component.
2. Compile the pipeline.
3. Execute the pipeline.

### Design hello world component

To create a KFP component from a Python function, you add the KFP DSL decorator `@component` to the function. In this example, the decorator takes the following parameters:

- `output_component_file`(optional): write the component description to a YAML file such that the component is portable.
- `base_image`(optional): The interpreter for executing the Python function. By default it is Python 3.7

In [None]:
@component(output_component_file="hello_world.yaml", base_image="python:3.9")
def hello_world(text: str) -> str:
    print(text)
    return text


! cat hello_world.yaml

### Design the hello world pipeline

Next, you design the pipeline for running the hello world component. A pipeline is specified as a Python function with the KFP DSL decorator `@dsl.component`, with the following parameters:

- `name`: Name of the pipeline.
- `description`: Description of the pipeline.
- `pipeline_root`: The artifact repository where KFP stores a pipeline’s artifacts.

In [None]:
PIPELINE_ROOT = "{}/pipeline_root/hello_world".format(BUCKET_URI)


@dsl.pipeline(
    name="hello-world",
    description="A simple intro pipeline",
    pipeline_root=PIPELINE_ROOT,
)
def pipeline(text: str = "hi there"):
    hello_world_task = hello_world(text)
    return hello_world_task

### Compile the hello world pipeline

Once the design of the pipeline is completed, the next step is to compile it. The pipeline definition is compiled into a JSON formatted file, which is transportable and can be interpreted by both KFP and Vertex AI Pipelines.

Compile the pipeline with the Compiler().compile() method using the following parameters:

- `pipeline_func`: The corresponding DSL function that defines the pipeline.
- `package_path`: The JSON file to write the transportable compiled pipeline to.

In [None]:
compiler.Compiler().compile(pipeline_func=pipeline, package_path="hello_world.json")

! cat hello_world.json

### Execute the hello world pipeline

Now that the pipeline is compiled, you can execute it by:

- Creating a Vertex AI PipelineJob with the following parameters:
    - `display_name`: The human readable name for the job.
    - `template_path`: The compiled JSON pipeline definition.
    - `pipeline_root`: Where to write output artifacts to.

Click on the generated link below `INFO:google.cloud.aiplatform.pipeline_jobs:View Pipeline Job:` to see your job run in the Cloud Console.

In [None]:
pipeline = aiplatform.PipelineJob(
    display_name="hello_world",
    template_path="hello_world.json",
    pipeline_root=PIPELINE_ROOT,
)

pipeline.run()

! rm hello_world.json

### View the hello world pipeline execution results

In [None]:
PROJECT_NUMBER = pipeline.gca_resource.name.split("/")[1]
print(PROJECT_NUMBER)


def print_pipeline_output(job, output_task_name):
    JOB_ID = job.name
    print(JOB_ID)
    for _ in range(len(job.gca_resource.job_detail.task_details)):
        TASK_ID = job.gca_resource.job_detail.task_details[_].task_id
        EXECUTE_OUTPUT = (
            PIPELINE_ROOT
            + "/"
            + PROJECT_NUMBER
            + "/"
            + JOB_ID
            + "/"
            + output_task_name
            + "_"
            + str(TASK_ID)
            + "/executor_output.json"
        )
        GCP_RESOURCES = (
            PIPELINE_ROOT
            + "/"
            + PROJECT_NUMBER
            + "/"
            + JOB_ID
            + "/"
            + output_task_name
            + "_"
            + str(TASK_ID)
            + "/gcp_resources"
        )
        EVAL_METRICS = (
            PIPELINE_ROOT
            + "/"
            + PROJECT_NUMBER
            + "/"
            + JOB_ID
            + "/"
            + output_task_name
            + "_"
            + str(TASK_ID)
            + "/evaluation_metrics"
        )
        if tf.io.gfile.exists(EXECUTE_OUTPUT):
            ! gsutil cat $EXECUTE_OUTPUT
            return EXECUTE_OUTPUT
        elif tf.io.gfile.exists(GCP_RESOURCES):
            ! gsutil cat $GCP_RESOURCES
            return GCP_RESOURCES
        elif tf.io.gfile.exists(EVAL_METRICS):
            ! gsutil cat $EVAL_METRICS
            return EVAL_METRICS

    return None


print_pipeline_output(pipeline, "hello-world")

### Delete a pipeline job

After a pipeline job is completed, you can delete the pipeline job with the `delete()` method. Prior to completion, a pipeline job can be canceled with the method `cancel()`.

In [None]:
pipeline.delete()

### Load a component from YAML definition

By storing the component definition, you can share and resuse the component by loading the component from its corresponding YAML file definition:

    hello_world_op = components.load_component_from_file('./hello_world.yaml').

You can also use the `load_component_from_url` method, if your component YAML file is stored online, such as in a git repository.

In [None]:
from kfp import components

PIPELINE_ROOT = "{}/pipeline_root/hello_world-v2".format(BUCKET_URI)

hello_world_op = components.load_component_from_file("./hello_world.yaml")


@dsl.pipeline(
    name="hello-world-v2",
    description="A simple intro pipeline",
    pipeline_root=PIPELINE_ROOT,
)
def pipeline(text: str = "hi there"):
    hello_world_task = hello_world_op(text)
    return hello_world_task


compiler.Compiler().compile(pipeline_func=pipeline, package_path="hello_world-v2.json")

pipeline = aiplatform.PipelineJob(
    display_name="hello_world-v2",
    template_path="hello_world-v2.json",
    pipeline_root=PIPELINE_ROOT,
)

pipeline.run()

! rm hello_world-v2.json hello_world.yaml

### Delete a pipeline job

After a pipeline job is completed, you can delete the pipeline job with the `delete()` method. Prior to completion, a pipeline job can be canceled with the method `cancel()`.

In [None]:
pipeline.delete()

### Loading components and pipeline YAML definitions from source control

By storing the component and pipeline definitions in a source repository, like Github, you can version control your components and pipelines, as follows:

- Use the method `load_component_from_url()`.

- Pull the raw file format version from the repo. For github, that will be in the form of:

    https://raw.githubusercontent.com/....

- Specify the version of the component/pipeline. For github, that will be the branch.

In [None]:
VERSION = "main"
hello_world_op = components.load_component_from_url(
    f"https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/{VERSION}/notebooks/community/ml_ops/stage3/src/hello_world.yaml"
)

! wget https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/{VERSION}/notebooks/community/ml_ops/stage3/src/hello_world.json -O hello_git_example.json

pipeline = aiplatform.PipelineJob(
    display_name="hello_world-git",
    template_path="hello_git_example.json",
    pipeline_root=PIPELINE_ROOT,
)

pipeline.run()

! rm -f hello_git_example.json

### Delete a pipeline job

After a pipeline job is completed, you can delete the pipeline job with the `delete()` method. Prior to completion, a pipeline job can be canceled with the method `cancel()`.

In [None]:
pipeline.delete()

### Package dependencies

Each component is assembled and executed within its own container. If a component has a dependency on one or more Python packages, you specify installing the packages with the parameter `packages_to_install`.

In [None]:
@component(packages_to_install=["numpy"])
def numpy_mean(values: list) -> float:
    import numpy as np

    return np.mean(values)


PIPELINE_ROOT = "{}/pipeline_root/numpy_mean".format(BUCKET_URI)


@dsl.pipeline(
    name="numpy", description="A simple intro pipeline", pipeline_root=PIPELINE_ROOT
)
def pipeline(values: list = [2, 3]):
    numpy_task = numpy_mean(values)
    return numpy_task


compiler.Compiler().compile(pipeline_func=pipeline, package_path="numpy_mean.json")

pipeline = aiplatform.PipelineJob(
    display_name="numpy_mean",
    template_path="numpy_mean.json",
    pipeline_root=PIPELINE_ROOT,
)

pipeline.run()

print_pipeline_output(pipeline, "numpy-mean")

! rm numpy_mean.json

### Delete a pipeline job

After a pipeline job is completed, you can delete the pipeline job with the `delete()` method. Prior to completion, a pipeline job can be canceled with the method `cancel()`.

In [None]:
pipeline.delete()

## Sequential tasks in pipeline

Next, you design and execute a pipeline with sequential tasks. In this example, the first task adds two integers and the second tasks divides the result (output) of the add task by 2.

*Note:* The output from the add task is referenced by the property `output`.

In [None]:
PIPELINE_ROOT = "{}/pipeline_root/add_div2".format(BUCKET_URI)


@component(output_component_file="add.yaml", base_image="python:3.9")
def add(v1: int, v2: int) -> int:
    return v1 + v2


@component(output_component_file="div2.yaml", base_image="python:3.9")
def div_by_2(v: int) -> int:
    return v // 2


@dsl.pipeline(
    name="add-div2", description="A simple intro pipeline", pipeline_root=PIPELINE_ROOT
)
def pipeline(v1: int = 4, v2: int = 5):
    add_task = add(v1, v2)
    div2_task = div_by_2(add_task.output)
    return div2_task


compiler.Compiler().compile(pipeline_func=pipeline, package_path="add_div2.json")

pipeline = aiplatform.PipelineJob(
    display_name="add_div2",
    template_path="add_div2.json",
    pipeline_root=PIPELINE_ROOT,
)

pipeline.run()

print_pipeline_output(pipeline, "div-by-2")

! rm add.yaml div2.yaml add_div2.json

### Delete a pipeline job

After a pipeline job is completed, you can delete the pipeline job with the `delete()` method. Prior to completion, a pipeline job can be canceled with the method `cancel()`.

In [None]:
pipeline.delete()

### Multiple output pipeline

Next, you design and execute a pipeline where a first component has multiple outputs, which are then used as inputs to the next component. To distinguish between the outputs, when used as inputs to the next component, you follow:

1. Set the function return type to `NamedTuple`.
2. In NamedTuple, specify a name and type for each output, in the specified order.
3. In subsequent component, refer to the named output when using it as input.

In [None]:
PIPELINE_ROOT = "{}/pipeline_root/multi_output".format(BUCKET_URI)


@component()
def multi_output(
    text1: str, text2: str
) -> NamedTuple(
    "Outputs",
    [
        ("output_1", str),  # Return parameters
        ("output_2", str),
    ],
):
    output_1 = text1 + " "
    output_2 = text2
    return (output_1, output_2)


@component()
def concat(text1: str, text2: str) -> str:
    return text1 + text2


@dsl.pipeline(
    name="multi-output",
    description="A simple intro pipeline",
    pipeline_root=PIPELINE_ROOT,
)
def pipeline(text1: str = "hello", text2: str = "world"):
    multi_output_task = multi_output(text1, text2)
    concat_task = concat(
        multi_output_task.outputs["output_1"],
        multi_output_task.outputs["output_2"],
    )
    return concat_task


compiler.Compiler().compile(pipeline_func=pipeline, package_path="multi_output.json")

pipeline = aiplatform.PipelineJob(
    display_name="multi-output",
    template_path="multi_output.json",
    pipeline_root=PIPELINE_ROOT,
)

pipeline.run()

print_pipeline_output(pipeline, "concat")

! rm multi_output.json

### Delete a pipeline job

After a pipeline job is completed, you can delete the pipeline job with the `delete()` method. Prior to completion, a pipeline job can be canceled with the method `cancel()`.

In [None]:
pipeline.delete()

## Parallel tasks in component

Next, you design and execute a pipeline with parallel tasks. In this example, one parallel task adds up a list of integers and another substracts them. Note that the compiler knows these two tasks can be run in parallel, because their input is not dependent on the output of the other task.

Finally, the `add_int` task waits on the two parallel tasks to complete, and then adds together the two outputs.

In [None]:
PIPELINE_ROOT = "{}/pipeline_root/parallel".format(BUCKET_URI)


@component()
def add_list(values: list) -> int:
    ret = 0
    for value in values:
        ret = value + ret
    return ret


@component()
def sub_list(values: list) -> int:
    ret = 0
    for value in values:
        ret = value - ret
    return ret


@component()
def add_int(value1: int, value2: int) -> int:
    return value1 + value2


@dsl.pipeline(
    name="parallel", description="A simple intro pipeline", pipeline_root=PIPELINE_ROOT
)
def pipeline(values: list = [1, 2, 3]):
    add_list_task = add_list(values)
    sub_list_task = sub_list(values)
    add_task = add_int(add_list_task.output, sub_list_task.output)
    return add_task


compiler.Compiler().compile(pipeline_func=pipeline, package_path="parallel.json")

pipeline = aiplatform.PipelineJob(
    display_name="parallel",
    template_path="parallel.json",
    pipeline_root=PIPELINE_ROOT,
)

pipeline.run()

print_pipeline_output(pipeline, "add")

! rm parallel.json

### Delete a pipeline job

After a pipeline job is completed, you can delete the pipeline job with the `delete()` method. Prior to completion, a pipeline job can be canceled with the method `cancel()`.

In [None]:
pipeline.delete()

## Control flow in pipeline

While Python control statements(e.g., if/else, for) can be used in a component, they cannot be used in a pipeline function. Each task in a pipeline function runs as a node in a graph. Thus a control flow statement also has to run as a graph node. To support this, KFP provides a set of DSL statements that implement control flow as a graph node.

### dsl.ParallelFor

The statement `dsl.ParallelFor()` implements a `for` loop, where each iteration in the `for` loop runs in parallel.

In [None]:
PIPELINE_ROOT = "{}/pipeline_root/parallel_for".format(BUCKET_URI)


@component()
def double(val: int) -> int:
    return val * 2


@component
def echo(val: int) -> int:
    return val


@dsl.pipeline(
    name="parallel-for",
    description="A simple intro pipeline",
    pipeline_root=PIPELINE_ROOT,
)
def pipeline(values: list = [1, 2, 3]):
    with dsl.ParallelFor(values) as item:
        output = double(item).output
        echo_task = echo(output)
    return echo_task


compiler.Compiler().compile(pipeline_func=pipeline, package_path="parallel_for.json")

pipeline = aiplatform.PipelineJob(
    display_name="parallel-for",
    template_path="parallel_for.json",
    pipeline_root=PIPELINE_ROOT,
)

pipeline.run()

print_pipeline_output(pipeline, "echo")

! rm parallel_for.json

### Delete a pipeline job

After a pipeline job is completed, you can delete the pipeline job with the `delete()` method. Prior to completion, a pipeline job can be canceled with the method `cancel()`.

In [None]:
pipeline.delete()

### dsl.Condition

The statement `dsl.Condition()` implements an `if` statement. There is no support for an `else` or `elif` statement. You use a separate `dsl.Condition()` for each value you want to test for. For example, if the output from a task is `1` or `0`, you will have two `dsl.Condition()` statements, one for 1 and one for 0.

The condition in `dsl.Condition()` is evaluated at run-time, not compile time. As such it is not Python code anymore.  The condition is of type `ConditionOperator`. This operator has three parts:

1. PipelineParam or task output
2. == or !=
3. string or integer value

A `dsl.Condition()` can be named using the `name` parameter while defining the condition.

In [None]:
@component()
def flip() -> int:
    import random

    return random.randint(0, 1)


@component()
def heads() -> bool:
    print("heads")
    return True


@component()
def tails() -> bool:
    print("tails")
    return False


@dsl.pipeline(
    name="condition", description="A simple intro pipeline", pipeline_root=PIPELINE_ROOT
)
def pipeline():
    flip_task = flip()
    with dsl.Condition(flip_task.output == 1, name="true_clause"):
        task = heads()
    with dsl.Condition(flip_task.output == 0, name="false_clause"):
        task = tails()
    return task


compiler.Compiler().compile(pipeline_func=pipeline, package_path="condition.json")

pipeline = aiplatform.PipelineJob(
    display_name="condition",
    template_path="condition.json",
    pipeline_root=PIPELINE_ROOT,
)

pipeline.run()

print_pipeline_output(pipeline, "flip")

! rm condition.json

### Delete a pipeline job

After a pipeline job is completed, you can delete the pipeline job with the `delete()` method. Prior to completion, a pipeline job can be canceled with the method `cancel()`.

In [None]:
pipeline.delete()

### Caching in pipeline components

When running a pipeline with Vertex AI Pipelines, the outcome state of each task is cached. With caching, if the pipeline is run again, and the compiled definition of the task and state has not changed, the cached output will be used instead of running the task again.

To override caching, i.e., force run the task, you set the parameter `enable_caching` to `False` when creating the Vertex AI Pipeline job.

```
pipeline = aiplatform.PipelineJob(
    display_name="example",
    template_path="example.json",
    pipeline_root=PIPELINE_ROOT,
    enable_caching=False
)
```

### Asynchronous execution of pipeline

When running a pipeline with the method `run()`, the pipeline is run synchronously. To run asynchronously, you use the method `submit()`. Once the job has started, your Python script can continue to execute. To block execution, you can use the method `wait()`.

### Setting machine resources for pipeline steps

By default, Vertex AI Pipelines automatically finds the best matching machine type to run the component. You can override and specify the machine resources on a per component basis, when you invoke the component in a pipeline, as follows:

```
@dsl.pipeline(name='my-pipeline')
def pipeline():
  task = taskOp().
    set_cpu_limit('CPU_LIMIT').
    set_memory_limit('MEMORY_LIMIT').
    add_node_selector_constraint(SELECTOR_CONSTRAINT).
    set_gpu_limit(GPU_LIMIT))
```

Learn more about [Specifying machine types in pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/machine-types)

# Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

### Cloud Storage Bucket

Set `delete_bucket` to True to delete the Cloud storage bucket used in this notebook.

In [None]:
delete_bucket = False

if delete_bucket or os.getenv("IS_TESTING"):
    ! gsutil rm -r $BUCKET_URI