![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FMLOps%2FPipelines&file=Vertex+AI+Pipelines+-+Components.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Components.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FMLOps%2FPipelines%2FVertex%2520AI%2520Pipelines%2520-%2520Components.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Components.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Components.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

---
This is part of a [series of notebook based workflows](./readme.md) that teach all the ways to use pipelines within Vertex AI. The suggested order and description/reason is:

|Link To Section|Notebook Workflow|Description|
|---|---|---|
||[Vertex AI Pipelines - Start Here](./Vertex%20AI%20Pipelines%20-%20Start%20Here.ipynb)|What are pipelines? Start here to go from code to pipeline and see it in action.|
||[Vertex AI Pipelines - Introduction](./Vertex%20AI%20Pipelines%20-%20Introduction.ipynb)|Introduction to pipelines with the console and Vertex AI SDK|
|_**This Notebook**_|[Vertex AI Pipelines - Components](./Vertex%20AI%20Pipelines%20-%20Components.ipynb)|An introduction to all the ways to create pipeline components from your code|
||[Vertex AI Pipelines - IO](./Vertex%20AI%20Pipelines%20-%20IO.ipynb)|An overview of all the type of inputs and outputs for pipeline components|
||[Vertex AI Pipelines - Control](./Vertex%20AI%20Pipelines%20-%20Control.ipynb)|An overview of controlling the flow of exectution for pipelines|
||[Vertex AI Pipelines - Secret Manager](./Vertex%20AI%20Pipelines%20-%20Secret%20Manager.ipynb)|How to pass sensitive information to pipelines and components|
||[Vertex AI Pipelines - Scheduling](./Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb)|How to schedule pipeline execution|
||[Vertex AI Pipelines - Notifications](./Vertex%20AI%20Pipelines%20-%20Notifications.ipynb)|How to send email notification of pipeline status.|
||[Vertex AI Pipelines - Management](./Vertex%20AI%20Pipelines%20-%20Management.ipynb)|Managing, Reusing, and Storing pipelines and components|
||[Vertex AI Pipelines - Testing](./Vertex%20AI%20Pipelines%20-%20Testing.ipynb)|Strategies for testing components and pipeliens locally and remotely to aide development.|
||[Vertex AI Pipelines - Managing Pipeline Jobs](./Vertex%20AI%20Pipelines%20-%20Managing%20Pipeline%20Jobs.ipynb)|Manage runs of pipelines in an environment: list, check status, filtered list, cancel and delete jobs.|


To discover these notebooks as part of an introduction to MLOps orchestration [start here](./readme.md).  To read more about MLOps also check out [the parent folder](../readme.md).

---

# Vertex AI Pipelines - Components 

[Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) is a serverless  runner for Kubeflow Pipelines [(KFP)](https://www.kubeflow.org/docs/components/pipelines/v2/introduction/) and the [TensorFlow Extended (TFX)](https://www.tensorflow.org/tfx/guide/understanding_tfx_pipelines) framework.

Components are used to run the steps of a pipelines.  A pipeline task runs the component with inputs and results in the components outputs.  The components execute code on compute with a container image.

This notebook will focus on the different types of component construction:
- [Pre-built Google Cloud Pipeline Components](https://cloud.google.com/vertex-ai/docs/pipelines/components-introduction)
- [Custom KFP Components](https://www.kubeflow.org/docs/components/pipelines/v2/components/)
    - Python Components:
        - Lightweight Python Components
        - Containerized Python Components
    - Arbitrary Containers:
        - Container Components
    - Importer Components
        - A provided importer for artifact created prior to the pipeline

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [3]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform'),
    ('google.cloud.storage', 'google-cloud-storage'),
    ('google.cloud.artifactregistry_v1', 'google-cloud-artifact-registry'),
    ('kfp', 'kfp'),
    ('google_cloud_pipeline_components', 'google-cloud-pipeline-components'),
    ('docker', 'docker')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable aiplatform.googleapis.com
!gcloud services enable artifactregistry.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [7]:
REGION = 'us-central1'
EXPERIMENT = 'pipeline-components'
SERIES = 'mlops'

# gcs bucket
GCS_BUCKET = PROJECT_ID

Packages

In [8]:
import os
import time
import importlib
from google.cloud import aiplatform
from google.cloud import storage
from google.cloud import artifactregistry_v1
import kfp
from typing import NamedTuple
import docker

Clients

In [9]:
# vertex ai clients
aiplatform.init(project = PROJECT_ID, location = REGION)

# gcs storage client
gcs = storage.Client(project = GCS_BUCKET)

# artifact registry client
ar_client = artifactregistry_v1.ArtifactRegistryClient()

Docker Check:

In [10]:
docker_client = docker.from_env()

if docker_client.ping():
    print(f"Docker is installed and running. Version: {docker_client.version()['Version']}")
else:
    print('Docker is either not installed or not running - please fix before proceeding.')

Docker is installed and running. Version: 20.10.17


parameters:

In [11]:
DIR = f"temp/{SERIES}-{EXPERIMENT}"

In [12]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:

In [13]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Setup Artifact Registry

[Artifact registry](https://cloud.google.com/artifact-registry/docs) organizes artifacts with repositories.  Each repository contains packages and is designated to hold a partifcular format of package: Docker images, Python Packages and [others](https://cloud.google.com/artifact-registry/docs/supported-formats#package).  There is even a registry type specifically for [Kubeflow pipeline templates](https://cloud.google.com/artifact-registry/docs/kfp?hl=en).

### List Repositories

This may be empty if no repositories have been created for this project

In [14]:
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    print(repo.name)

projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-docker
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python


### Create/Retrieve Docker Image Repository

Create an Artifact Registry Repository to hold Docker Images created by this notebook.  First, check to see if it is already created by a previous run and retrieve it if it has.  Otherwise, create one named for this project.

In [15]:
docker_repo = None
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    if f'{PROJECT_ID}' == repo.name.split('/')[-1]:
        docker_repo = repo
        print(f'Retrieved existing repo: {docker_repo.name}')

if not docker_repo:
    operation = ar_client.create_repository(
        request = artifactregistry_v1.CreateRepositoryRequest(
            parent = f'projects/{PROJECT_ID}/locations/{REGION}',
            repository_id = f'{PROJECT_ID}',
            repository = artifactregistry_v1.Repository(
                description = f'A repository for the {SERIES} series that holds docker images.',
                name = f'{PROJECT_ID}',
                format_ = artifactregistry_v1.Repository.Format.DOCKER,
                labels = {'series': SERIES}
            )
        )
    )
    print('Creating Repository ...')
    docker_repo = operation.result()
    print(f'Completed creating repo: {docker_repo.name}')

Retrieved existing repo: projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915


In [16]:
docker_repo.name, docker_repo.format_.name

('projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915',
 'DOCKER')

In [17]:
REPOSITORY = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{docker_repo.name.split('/')[-1]}"

In [18]:
REPOSITORY

'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915'

---
## Components

[KFP Components](https://www.kubeflow.org/docs/components/pipelines/v2/components/) are the runners for pipelines tasks. They run code in a container as a job.  The container is specified as a [`base_image` parameter](https://www.kubeflow.org/docs/components/pipelines/v2/components/), which defaults to `python:3.7` currently and can be specified at component creation which is demonstrated throughout this workflow.


**Compute Resources** For Components:

Running pipleines on Vertex AI Pipelines runs each component as a Vertex AI Training `CustomJob`.  This defaults to a vm based on `e2-standard-4` (4 core CPU, 16GB memory).  This can be modified at the task level of pipelines to choose different computing resources, including GPUs.  For example:

```Python
@kfp.dsl.pipeline()
def pipeline():
    task = component().set_cpu_limit(C).set_memory_limit(M).add_node_selector_constraint(A).set_accelerator_limit(G).
```
Where the inputs are defining [machine configuration for the step](https://cloud.google.com/vertex-ai/docs/pipelines/machine-types):
- C = a string representing the number of CPUs (up to 96).
- M = a string represent the memory limit.  An integer follwed by K, M, or G (up to 624GB).
- A = a string representing the desired GPU  or TPU type
- G = an integer representing the multiple of A desired.

In [19]:
from google_cloud_pipeline_components.v1.model import ModelGetOp
from google_cloud_pipeline_components.types import artifact_types

### Prebuilt Google Cloud Pipeline Components

Google Cloud provides a growing list of components covering AutoML, Batch Prediction, BigQuery Ml, .... and MANY more services!  

The benefits of a prebuilt components include:
- simple debugging
- standarized artifact types that are tracked with Vertex AI ML Metadata.  Ths makes lineage easy!
- These don't have to launch a container to then launch a service - which is more cost effective!

These can be reviewed several ways:
- Directly in the documentation: [Google Cloud Pipeline Components List](https://cloud.google.com/vertex-ai/docs/pipelines/gcpc-list)
- At their accompanying [API Reference](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.10.0/api/index.html)
- At their source in the GitHub repository for kubeflow pipeliens: [GitHub/kubeflow/pipelines/components/google-cloud](https://github.com/kubeflow/pipelines/tree/master/components/google-cloud)

These prebuilt component also include prebuilt artifact types for Google Cloud Resources:
- [Google Cloud Artifact Types](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.10.0/api/artifact_types.html)

Here, the `ModelGetOp` component is used to retrieve an artifact for a model in the Vertex AI Model Registry.

```python
from google_cloud_pipeline_components.v1.model import ModelGetOp

    vertex_model_1 = ModelGetOp(
        model_name = model_name.outputs['model_name'],
        project = project,
        location = region
    ).set_display_name('Prebuilt Component')
```

### Lightweight Python Components

A simple way to create a component from a Python function.  This will create the container at the runtime of the task from the `base_image` and install the `packages_to_install`.

References:
- [Lightweight Python Components](https://www.kubeflow.org/docs/components/pipelines/v2/components/lightweight-python-components/)

Here, a simple function will use the Vertex AI SDK to retrieve a list of all models and pass the the versioned resource name of the first one as an ouput.

In [20]:
@kfp.dsl.component(
    base_image = 'python:3.10',
    packages_to_install = ['google-cloud-aiplatform']
)
def example_lightweight(
    project: str,
    region: str
) -> NamedTuple('lightweight_outputs', model_name = str, model_resource_name = str, uri = str):
    
    # vertex ai client
    from google.cloud import aiplatform
    aiplatform.init(project = project, location = region)
    
    # list models in region
    models = aiplatform.Model.list()
    
    outputs = NamedTuple('lightweight_outputs', model_name = str, model_resource_name = str, uri = str)
    
    return outputs(
        models[0].versioned_resource_name.split('/')[-1],
        models[0].versioned_resource_name,
        f"https://{region}-aiplatform.googleapis.com/v1/{models[0].versioned_resource_name}"
    )

### Containerized Python Components

This extends the idea of the lightweight Python components.  This builds the container for the component and installs the `packages_to_install` so that they are already installed at the time it runs.

References:
- [Containerized Python Components](https://www.kubeflow.org/docs/components/pipelines/v2/components/containerized-python-components/)
    - There is even a registry type specifically for [Kubeflow pipeline templates](https://cloud.google.com/artifact-registry/docs/kfp?hl=en).
- [Developer Note](https://github.com/kubeflow/pipelines/issues/9568#issuecomment-1622223720) from a GitHub issue that describes the Containerized Python Components very well.

The container images need to be saved for usage in Google Cloud.  This section makes use of the Artifact Registry docker container repository created/retrieved above.

This example replicates the lightweight Python component above as a containerized Python component.

First, create a local folder to hold the Python source code:

In [21]:
if not os.path.exists(DIR + '/src'):
    os.makedirs(DIR + '/src')

Now, create the Python file(s) in the folder:

In [22]:
%%writefile {DIR}/src/__init__.py
# init file

Overwriting temp/mlops-pipeline-components/src/__init__.py


In [50]:
%%writefile {DIR}/src/my_code.py

def example_function(project, region):

    # vertex ai client
    from google.cloud import aiplatform
    
    # vertex ai initialize SDK
    aiplatform.init(project = project, location = region)
    
    # list models in region
    models = aiplatform.Model.list()
    
    model_name = models[0].versioned_resource_name.split('/')[-1],
    model_resource_name = models[0].versioned_resource_name,
    uri = f"https://{region}-aiplatform.googleapis.com/v1/{models[0].versioned_resource_name}"
    
    return [model_name, model_resource_name, uri]

Overwriting temp/mlops-pipeline-components/src/my_code.py


Create the component in the folder also and have it import and use the function:

In [51]:
print(f'{REPOSITORY}/{SERIES}-{EXPERIMENT}')

us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/mlops-pipeline-components


In [104]:
%%writefile {DIR}/src/my_component.py
import kfp
from my_code import example_function
from google_cloud_pipeline_components.types import artifact_types

@kfp.dsl.component(
    base_image = 'python:3.10',
    target_image = 'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/mlops-pipeline-components',
    packages_to_install = ['google-cloud-aiplatform', 'google_cloud_pipeline_components']
)
def example_python_container(
    project: str,
    region: str,
    vertex_model: kfp.dsl.Output[artifact_types.VertexModel]
):
    
    response = example_function(project, region)
    vertex_model.uri = response[2]
    vertex_model.metadata['model_resource_name'] = response[1]
    
    return

Overwriting temp/mlops-pipeline-components/src/my_component.py


The source code is created in a structure that now looks like:

```
src/
├── __init__.py
├── my_code.py
└── my_component.py
```

Unlike other component types, this one needs to be built.  Behind the scenes KFP will create a `Dockerfile` and do the `docker build` process while also pushing the resulting image to container repository specified in Artifact Registry by the `target_image`.

Configure authentication to Artifact Registry:

In [105]:
!gcloud auth configure-docker $REGION-docker.pkg.dev


{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud",
    "us-central1-docker.pkg.dev": "gcloud"
  }
}
Adding credentials for: us-central1-docker.pkg.dev
gcloud credential helpers already registered correctly.


Build the component and push to Artifact Registry:

This uses the KFP CLI command [`kfp component build`](https://kubeflow-pipelines.readthedocs.io/en/stable/source/cli.html#kfp-component-build).

In [106]:
!kfp component build $DIR/src/ --component-filepattern my_component.py --push-image

Building component using KFP package path: kfp==2.7.0
Found 1 component(s) in file /home/jupyter/vertex-ai-mlops/MLOps/temp/mlops-pipeline-components/src/my_component.py:
Example python container: ComponentInfo(name='Example python container', function_name='example_python_container', func=<function example_python_container at 0x7f0cf1095b40>, target_image='us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/mlops-pipeline-components', module_path=PosixPath('/home/jupyter/vertex-ai-mlops/MLOps/temp/mlops-pipeline-components/src/my_component.py'), component_spec=ComponentSpec(name='example-python-container', implementation=Implementation(container=ContainerSpecImplementation(image='us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/mlops-pipeline-components', command=['sh', '-c', '\nif ! [ -x "$(command -v pip)" ]; then\n    python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1 pyt

In [107]:
print(f"Review the Custom Container with Artifact Registry in the Google Cloud Console:\nhttps://console.cloud.google.com/artifacts/docker/{PROJECT_ID}/{REGION}/{PROJECT_ID}?project={PROJECT_ID}")

Review the Custom Container with Artifact Registry in the Google Cloud Console:
https://console.cloud.google.com/artifacts/docker/statmike-mlops-349915/us-central1/statmike-mlops-349915?project=statmike-mlops-349915


Import the component to use it in the pipeline definition below

>**NOTE:** Re-running this section of the notebook with iterative changes to the functions requires forcing the reload of the function from the file/module.  This is forced here by using the `importlib.reload(my_component)` action.  

In [108]:
%pwd

'/home/jupyter/vertex-ai-mlops/MLOps'

In [109]:
%cd {DIR}/src
import my_component
importlib.reload(my_component)
from my_component import example_python_container
%cd ../../../

/home/jupyter/vertex-ai-mlops/MLOps/temp/mlops-pipeline-components/src
/home/jupyter/vertex-ai-mlops/MLOps


In [110]:
%pwd

'/home/jupyter/vertex-ai-mlops/MLOps'

### Container Components

Any container can be a component with [Container Components](https://www.kubeflow.org/docs/components/pipelines/v2/components/container-components/).  

This looks a lot like a lightweight Python component but it orchestrates the running of the specified container image with inputs and ouputs.

The example below takes the [alpine docker image](https://hub.docker.com/_/alpine) and runs a simple command that takes the `model_resource_name` of an artifact created by the Importer Component (created below) and prints it out as part of a string which is returned as an output.  The output type here is [`kfp.dsl.OutputPath`](https://kubeflow-pipelines.readthedocs.io/en/2.0.0b6/source/dsl.html#kfp.dsl.OutputPath)  which indicates the named parameter is a link to a filepath (the output of the container in this case).
- The shell command for `mkdir` is used to create an output loation  with the `kfp.dsl.OutputPath` variable
- The `echo` command along with the `>` write to instruction are used to write values to the `kfp.dsl_OutputPath` using the directory created


In [111]:
@kfp.dsl.container_component
def example_container(
    vertex_model_a: kfp.dsl.Input[artifact_types.VertexModel],
    note: kfp.dsl.OutputPath(str)
):
    return kfp.dsl.ContainerSpec(
        image = 'alpine',
        command = [
            'sh', '-c', '''RESPONSE="The Model is: $0!"\
                            && echo $RESPONSE\
                            && mkdir -p $(dirname $1)\
                            && echo $RESPONSE > $1
                            '''
        ],
        args = [vertex_model_a.metadata['model_resource_name'], note]
    )

### Importer Components

Sometime the artifact that is need is created before the pipeline.  The `dsl.importer` component is a quick way to import the artifact. [More on the Importer Component](https://www.kubeflow.org/docs/components/pipelines/v2/components/importer-component/).

While the `dsl.importer` component can be used to import [generic artifacts](https://www.kubeflow.org/docs/components/pipelines/v2/data-types/artifacts/) it can also be used to import predefined [Google Cloud Artifact Types](https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.10.0/api/artifact_types.html) as shown in the Vertex AI documentation page for [Create an ML artifact](https://cloud.google.com/vertex-ai/docs/pipelines/use-components#use_an_importer_node).

Here, the `dsl.importer` component is used to load a model in the Vertex AI Model Registry.

```python
vertex_model_2 = kfp.dsl.importer(
        artifact_uri = model_name.outputs['uri'],
        artifact_class = artifact_types.VertexModel,
        metadata = {'model_resource_name': model_name.outputs['model_resource_name']}
    )
```

---
## Vertex AI Pipelines

- [Vertex AI Python SDK for Pipeline Jobs](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob)
- [Specify machine configurations for a component](https://cloud.google.com/vertex-ai/docs/pipelines/machine-types)

### Create Pipeline

In [112]:
@kfp.dsl.pipeline(
    name = f'{SERIES}-{EXPERIMENT}',
    description = 'A simple pipeline for testing',
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root'
)
def example_pipeline(
    project: str,
    region: str,
    
):
    from google_cloud_pipeline_components.types import artifact_types
    from google_cloud_pipeline_components.v1.model import ModelGetOp
    
    # Lightweight Python Components
    model_name = example_lightweight(
        project = project,
        region = region
    ).set_display_name('Lightweight Python Component').set_cpu_limit('1')
    
    # prebuilt Google Cloud Pipeline Component
    vertex_model_1 = ModelGetOp(
        model_name = model_name.outputs['model_name'],
        project = project,
        location = region
    ).set_display_name('Prebuilt Component')
    
    # importer component
    vertex_model_2 = kfp.dsl.importer(
        artifact_uri = model_name.outputs['uri'],
        artifact_class = artifact_types.VertexModel,
        metadata = {'model_resource_name': model_name.outputs['model_resource_name']}
    ).set_display_name('Importer Component')
    
    # container component
    container = example_container(
        vertex_model_a = vertex_model_2.outputs['artifact']
    ).set_display_name('Container Component').set_cpu_limit('1')
    
    # python container component
    python_container = example_python_container(
        project = project,
        region = region
    ).set_display_name('Python Container Component').set_cpu_limit('1').set_caching_options(False)
    

### Compile Pipeline

In [113]:
kfp.compiler.Compiler().compile(
    pipeline_func = example_pipeline,
    package_path = f'{DIR}/{SERIES}-{EXPERIMENT}.yaml'
)

### Create Pipeline Job

In [114]:
parameters = dict(
    project = PROJECT_ID,
    region = REGION,
)

In [115]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{SERIES}-{EXPERIMENT}",
    template_path = f"{DIR}/{SERIES}-{EXPERIMENT}.yaml",
    parameter_values = parameters,
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

### Submit Pipeline Job

In [116]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-components-20240317185545
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-components-20240317185545')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-components-20240317185545?project=1026793852137


In [117]:
print(f'The Dashboard can be viewed here:\n{pipeline_job._dashboard_uri()}')

The Dashboard can be viewed here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/mlops-pipeline-components-20240317185545?project=1026793852137


In [118]:
pipeline_job.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-components-20240317185545 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-components-20240317185545 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/mlops-pipeline-components-20240317185545


### Retrieve Pipeline Information

In [119]:
aiplatform.get_pipeline_df(pipeline = f'{SERIES}-{EXPERIMENT}')

Unnamed: 0,pipeline_name,run_name,param.vmlmd_lineage_integration,param.input:project,param.input:region
0,mlops-pipeline-components,mlops-pipeline-components-20240317185545,{'pipeline_run_component': {'pipeline_run_id':...,statmike-mlops-349915,us-central1
1,mlops-pipeline-components,mlops-pipeline-components-20240317184845,{'pipeline_run_component': {'project_id': 'sta...,statmike-mlops-349915,us-central1
2,mlops-pipeline-components,mlops-pipeline-components-20240317184016,{'pipeline_run_component': {'pipeline_run_id':...,statmike-mlops-349915,us-central1
3,mlops-pipeline-components,mlops-pipeline-components-20240317183641,{'pipeline_run_component': {'location_id': 'us...,statmike-mlops-349915,us-central1
4,mlops-pipeline-components,mlops-pipeline-components-20240317183407,{'pipeline_run_component': {'location_id': 'us...,statmike-mlops-349915,us-central1
5,mlops-pipeline-components,mlops-pipeline-components-20240317141033,{'pipeline_run_component': {'task_name': 'mlop...,statmike-mlops-349915,us-central1
6,mlops-pipeline-components,mlops-pipeline-components-20240317134615,{'pipeline_run_component': {'pipeline_run_id':...,statmike-mlops-349915,us-central1
7,mlops-pipeline-components,mlops-pipeline-components-20240317132352,{'pipeline_run_component': {'location_id': 'us...,statmike-mlops-349915,us-central1
8,mlops-pipeline-components,mlops-pipeline-components-20240317131702,{'pipeline_run_component': {'parent_task_names...,statmike-mlops-349915,us-central1
9,mlops-pipeline-components,mlops-pipeline-components-20240317125135,{'pipeline_run_component': {'task_name': 'mlop...,statmike-mlops-349915,us-central1


In [120]:
tasks = {task.task_name: task for task in pipeline_job.task_details}

In [121]:
for task in tasks:
  print(task, tasks[task].state)

example-lightweight State.SKIPPED
mlops-pipeline-components-20240317185545 State.SUCCEEDED
importer State.SUCCEEDED
model-get State.SKIPPED
example-container State.SKIPPED
example-python-container State.SUCCEEDED


In [122]:
for task in tasks:
    print(task)

example-lightweight
mlops-pipeline-components-20240317185545
importer
model-get
example-container
example-python-container


In [124]:
tasks['model-get']

task_id: 388922620231286784
task_name: "model-get"
create_time {
  seconds: 1710701747
  nanos: 11807000
}
start_time {
  seconds: 1710701748
  nanos: 279817000
}
end_time {
  seconds: 1710701748
  nanos: 279817000
}
executor_detail {
  container_detail {
    main_job: "projects/1026793852137/locations/us-central1/customJobs/2955894006044688384"
  }
}
state: SKIPPED
execution {
  name: "projects/1026793852137/locations/us-central1/metadataStores/default/executions/18032667361741787548"
  display_name: "model-get"
  state: CACHED
  etag: "1710701748099"
  create_time {
    seconds: 1710701747
    nanos: 709000000
  }
  update_time {
    seconds: 1710701748
    nanos: 99000000
  }
  schema_title: "system.ContainerExecution"
  schema_version: "0.0.1"
  metadata {
    fields {
      key: "input:location"
      value {
        string_value: "us-central1"
      }
    }
    fields {
      key: "input:model_name"
      value {
        string_value: "model_dev_sklearn-workflow@14"
      }
    }

---
## More!

Want to schedule a pipeline like this? Check out this workflow:
- [Vertex AI Pipelines - Scheduling](./Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb)