# Vertex Pipelines: Qwik Start

## Vertex Pipelines settings

There are a few additional libraries you'll need to install in order to use Vertex Pipelines:

- `Kubeflow Pipelines`: This is the SDK used to build the pipeline. Vertex Pipelines supports running pipelines built with both Kubeflow Pipelines or TFX.
- `Google Cloud Pipeline Components`: This library provides pre-built components that make it easier to interact with Vertex AI services from your pipeline steps.

### Step 1: Create Python notebook and install libraries

From the Launcher menu in your Notebook instance, create a notebook by selecting Python 3:

<img src="img/GCP_python.png">

You can access the Launcher menu by clicking on the + sign in the top left of your notebook instance.

To install both services needed for this lab, first set the user flag in a notebook cell:

In [1]:
USER_FLAG = "--user"

Then run the following from your notebook:

In [2]:
!pip3 install {USER_FLAG} google-cloud-aiplatform==1.0.0 --upgrade
!pip3 install {USER_FLAG} kfp google-cloud-pipeline-components==0.1.1 --upgrade

Collecting google-cloud-aiplatform==1.0.0
  Downloading google_cloud_aiplatform-1.0.0-py2.py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m00:01[0m:00:01[0m
Collecting google-cloud-storage<2.0.0dev,>=1.32.0 (from google-cloud-aiplatform==1.0.0)
  Downloading google_cloud_storage-1.44.0-py2.py3-none-any.whl (106 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.8/106.8 kB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting google-cloud-bigquery<3.0.0dev,>=1.15.0 (from google-cloud-aiplatform==1.0.0)
  Downloading google_cloud_bigquery-2.34.4-py2.py3-none-any.whl (206 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m206.6/206.6 kB[0m [31m26.6 MB/s[0m eta [36m0:00:00[0m
Collecting packaging>=14.3 (from google-cloud-aiplatform==1.0.0)
  Downloading packaging-21.3-py3-none-any.whl (40 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m

After installing these packages you'll need to restart the kernel:

In [1]:
import os

if not os.getenv("IS_TESTING"):
    # インストール後カーネルを自動で再起動
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

Finally, check that you have correctly installed the packages. The KFP SDK version should be >=1.6:

In [55]:
!python3 -c "import kfp; print('KFP SDK version: {}'.format(kfp.__version__))"
!python3 -c "import google_cloud_pipeline_components; print('google_cloud_pipeline_components version: {}'.format(google_cloud_pipeline_components.__version__))"

KFP SDK version: 1.8.22
google_cloud_pipeline_components version: 0.1.1


### Step 2: Set your project ID and bucket

Throughout this training you'll reference your Cloud Project ID and the bucket you created earlier. Next you'll create variables for each of those.

Then create a variable to store your bucket name.

In [1]:
PROJECT_ID = "ccbd-ecbdp-bds"
BUCKET_NAME = "gs://ccbd-bds-aa-kfp-test"

### Step 3: Import libraries

Add the following to import the libraries you'll be using throughout this lab:

In [2]:
from typing import NamedTuple

import kfp
from kfp import dsl
from kfp.v2 import compiler
from kfp.v2.dsl import (Artifact, Dataset, Input, InputPath, Model, Output,
                        OutputPath, ClassificationMetrics, Metrics, component)
from kfp.v2.google.client import AIPlatformClient

from google.cloud import aiplatform
from google_cloud_pipeline_components import aiplatform as gcc_aip

### Step 4: Define constants

The last thing you need to do before building the pipeline is define some constant variables. `PIPELINE_ROOT` is the Cloud Storage path where the artifacts created by your pipeline will be written. You're using `asia-northeast1`. as the region here:

In [3]:
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin

REGION="asia-northeast1"
PIPELINE_ROOT = f"{BUCKET_NAME}/pipeline_root/"
print(PIPELINE_ROOT)

env: PATH=/opt/conda/bin:/opt/conda/condabin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/jupyter/.local/bin
gs://ccbd-bds-aa-kfp-test/pipeline_root/


After running the code above, you should see the root directory for your pipeline printed. This is the Cloud Storage location where the artifacts from your pipeline will be written. It will be in the format of `gs://<bucket_name>/pipeline_root/`.

## Creating your first pipeline

- Create a short pipeline using the KFP SDK. This pipeline doesn't do anything ML related (don't worry, you'll get there!), this exercise is to teach you:
    - How to create custom components in the KFP SDK
    - How to run and monitor a pipeline in Vertex Pipelines

- You'll create a pipeline that prints out a sentence using two outputs: a product name and an emoji description. This pipeline will consist of three components:

    - product_name: This component will take a product name as input, and return that string as output.
    - emoji: This component will take the text description of an emoji and convert it to an emoji. For example, the text code for ✨ is "sparkles". This component uses an emoji library to show you how to manage external dependencies in your pipeline.
    - build_sentence: This final component will consume the output of the previous two to build a sentence that uses the emoji. For example, the resulting output might be "Vertex Pipelines is ✨".

### Step 1: Create a Python function based component

Using the KFP SDK, you can create components based on Python functions. First build the `product_name` component, which simply takes a string as input and returns that string.

- Add the following to your notebook:

In [4]:
@component(base_image="python:3.9", output_component_file="first-component.yaml")
def product_name(text: str) -> str:
    return text

Take a closer look at the syntax here:

- The `@component` decorator compiles this function to a component when the pipeline is run. You'll use this anytime you write a custom component.
- The `base_image` parameter specifies the container image this component will use.
- The `output_component_file` parameter is optional, and specifies the yaml file to write the compiled component to. After running the cell you should see that file written to your notebook instance. If you wanted to share this component with someone, you could send them the generated yaml file and have them load it with the following:

In [5]:
product_name_component = kfp.components.load_component_from_file('./first-component.yaml')

The `-> str` after the function definition specifies the output type for this component.

### Step 2: Create two additional components

1. To complete the pipeline, create two more components. The first one takes a string as input, and converts this string to its corresponding emoji if there is one. It returns a tuple with the input text passed, and the resulting emoji:

In [6]:
@component(base_image="python:3.9", output_component_file="second-component.yaml", packages_to_install=["emoji"])
def emoji(
    text: str,
) -> NamedTuple(
    "Outputs",
    [
        ("emoji_text", str),  # Return parameters
        ("emoji", str),
    ],
):
    import emoji

    emoji_text = text
    emoji_str = emoji.emojize(':' + emoji_text + ':', language='alias')
    print("output one: {}; output_two: {}".format(emoji_text, emoji_str))
    return (emoji_text, emoji_str)

This component is a bit more complex than the previous one. Here's what's new:

- The `packages_to_install` parameter tells the component any external library dependencies for this container. In this case, you're using a library called emoji.
- This component returns a `NamedTuple` called `Outputs`. Notice that each of the strings in this tuple have keys: `emoji_text` and `emoji`. You'll use these in your next component to access the output.

2. The final component in this pipeline will consume the output of the first two and combine them to return a string:

In [7]:
@component(base_image="python:3.9", output_component_file="third-component.yaml")
def build_sentence(
    product: str,
    emoji: str,
    emojitext: str
) -> str:
    print("We completed the pipeline, hooray!")
    end_str = product + " is "
    if len(emoji) > 0:
        end_str += emoji
    else:
        end_str += emojitext
    return(end_str)

You might be wondering: how does this component know to use the output from the previous steps you defined?

Good question! You will tie it all together in the next step.

### Step 3: Putting the components together into a pipeline

The component definitions defined above created factory functions that can be used in a pipeline definition to create steps.

1. To set up a pipeline, use the `@dsl.pipeline` decorator, give the pipeline a name and description, and provide the root path where your pipeline's artifacts should be written. By artifacts, it means any output files generated by your pipeline. This intro pipeline doesn't generate any, but your next pipeline will.

2. In the next block of code you define an `intro_pipeline` function. This is where you specify the inputs to your initial pipeline steps, and how steps connect to each other:

- product_task takes a product name as input. Here you're passing "Vertex Pipelines" but you can change this to whatever you'd like.
- emoji_task takes the text code for an emoji as input. You can also change this to whatever you'd like. For example, "party_face" refers to the 🥳 emoji. Note that since both this and the product_task component don't have any steps that feed input into them, you manually specify the input for these when you define your pipeline.
- The last step in the pipeline - consumer_task has three input parameters:
    - The output of product_task. Since this step only produces one output, you can reference it via product_task.output.
    - The emoji output of the emoji_task step. See the emoji component defined above where you named the output parameters.
    - Similarly, the emoji_text named output from the emoji component. In case your pipeline is passed text that doesn't correspond with an emoji, it'll use this text to construct a sentence.

In [8]:
@dsl.pipeline(
    name="hello-world",
    description="An intro pipeline",
    pipeline_root=PIPELINE_ROOT,
)

# ここでパラメータ `text` と `emoji_str` を変更することでパイプラインの出力を変更可能
def intro_pipeline(text: str = "Vertex Pipelines", emoji_str: str = "sparkles"):
    product_task = product_name(text)
    emoji_task = emoji(emoji_str)
    consumer_task = build_sentence(
        product_task.output,
        emoji_task.outputs["emoji"],
        emoji_task.outputs["emoji_text"],
    )

### Step 4: Compile and run the pipeline

1. With your pipeline defined, you're ready to compile it. The following will generate a JSON file that you'll use to run the pipeline:

In [9]:
compiler.Compiler().compile(
    pipeline_func=intro_pipeline, package_path="intro_pipeline_job.json"
)



2. Next, instantiate an API client:

In [10]:
api_client = AIPlatformClient(
    project_id=PROJECT_ID,
    region=REGION,
)



3. Finally, run the pipeline:

In [11]:
response = api_client.create_run_from_job_spec(
    job_spec_path="intro_pipeline_job.json",
    # pipeline_root=PIPELINE_ROOT  # パイプライン定義の一部として PIPELINE_ROOT を定義しなかった場合はこの引数が必要になる
)

INFO:googleapiclient.discovery:URL being requested: POST https://asia-northeast1-aiplatform.googleapis.com/v1beta1/projects/ccbd-ecbdp-bds/locations/asia-northeast1/pipelineJobs?pipelineJobId=hello-world-20240116052534&alt=json


Running the pipeline should generate a link to view the pipeline run in your console. It should look like this when complete:

<img src="img/GCP_pipeline.png">

4. This pipeline will take 5-6 minutes to run. When complete, you can click on the `build-sentence` component to see the final output:

<img src="img/GCP_pipelineinfo.png">

Now that you're familiar with how the KFP SDK and Vertex Pipelines works, you're ready to build a pipeline that creates and deploys an ML model using other Vertex AI services.