# 03Tools - Pipeline Example 1

**Deploying the current best model to an endpoint**

The notebooks `03a` through `03f` all train ML models with different techniques using the same training data.  This might represent:
- multiple coworkers working on the same project or
- different approaches being tried at different times
- continuous development of multiple techniques in parallel

In each attempt, the final model is registered to Vertex AI Model Registry as the latest, best attempt for the approach.  All of the model are registered with the same label value for `series`.  Using this label, we could routinely review all the approaches and pick the "current" best model for the domain.

An example workflow might be:

- Candidate selection path:
    - Get list of candidate models: Vertex AI Model Registry where labels.series={SERIES} and version_alias=default
    - Loop (async) over list of candidate models
        - Gather Evaluation: log metrics: evaluation, confusion matrix, ROC curve
    - Pick the best candidate model
- Current model review path:
    - Check for endpoint, create if needed
    - Get the deployed model with most traffic 
    - Condition: if there is a deployed model
        - Gather Evaluation: log metrics: evaluation, confusion matrix, ROC curve
    - Condition: if there is not a deployed model
        - deploy best on endpoint
- Compare And update Path:
    - Condition: if best > deployed
        - deploy best on endpoint

In this notebook, this outline will be turned into a Vertex AI Pipeline!

<p align="center">
  <img alt="Pipeline Dashboard" src="../architectures/notebooks/03/pipeline_ex1.png" width="45%">
</p>

**Pipelines**
When the workflow includes multiple steps with direct dependicies we have a pipeline.  Using Vertex AI Pipelines we can do all these steps in a single serverless manner.  Each of the steps of our workflow outlined above turn into components.  The components are connected through inputs and outputs.  

**Prerequisites:**
-  One or More of `03a`, `03b`, `03c`, `03d`, `03e`, `03f`
    - Each of these register a model/version in the Vertex AI Model Registry

**Resources:**
- Pipelines:
    - [Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction)
        - [Kubeflow](https://www.kubeflow.org/docs/components/pipelines/v2/author-a-pipeline/)
            - [Install the Kubeflow Pipelines SDK](https://www.kubeflow.org/docs/components/pipelines/v1/sdk/install-sdk/)
        - [Google Cloud Pre-Built Pipeline Components](https://cloud.google.com/vertex-ai/docs/pipelines/gcpc-list)
            - [SDK Reference](https://google-cloud-pipeline-components.readthedocs.io)
- [BigQuery](https://cloud.google.com/bigquery)
    - [Documentation:](https://cloud.google.com/bigquery/docs/query-overview)
    - [API:](https://cloud.google.com/bigquery/docs/reference/libraries-overview)
        - [Clients](https://cloud.google.com/bigquery/docs/reference/libraries)
            - [Python SDK:](https://github.com/googleapis/python-bigquery)
            - [Python Library Reference:](https://cloud.google.com/python/docs/reference/bigquery/latest)
- [Vertex AI](https://cloud.google.com/vertex-ai)
    - [Documentation:](https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform)
    - [API:](https://cloud.google.com/vertex-ai/docs/reference)
        - [Clients:](https://cloud.google.com/vertex-ai/docs/start/client-libraries)
            - [Python SDK:](https://github.com/googleapis/python-aiplatform)
            - [Python Library Reference:](https://cloud.google.com/python/docs/reference/aiplatform/latest)


**Conceptual Flow & Workflow**
<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/03tools_pipe1_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/03tools_pipe1_console.png" width="45%">
</p>

---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [2]:
REGION = 'us-central1'
EXPERIMENT = '03_pipeline_ex1'
SERIES = '03'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [3]:
from google.cloud import aiplatform
from google.cloud import bigquery
from datetime import datetime

from datetime import datetime
from typing import NamedTuple

from kfp import dsl
from kfp.v2 import dsl as dsl2
from kfp.v2 import compiler
from kfp.v2.dsl import Artifact, Input, Metrics, ClassificationMetrics, HTML, Output, component

clients:

In [4]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()

parameters:

In [5]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET = PROJECT_ID
URI = f"gs://{BUCKET}/{SERIES}/{EXPERIMENT}/pipelines"
DIR = f"temp/{EXPERIMENT}"

In [6]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

List the service accounts current roles:

In [7]:
!gcloud projects get-iam-policy $PROJECT_ID --filter="bindings.members:$SERVICE_ACCOUNT" --format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/bigquery.admin
roles/owner
roles/run.admin
roles/storage.objectAdmin


>Note: If the resulting list is missing [roles/storage.objectAdmin](https://cloud.google.com/storage/docs/access-control/iam-roles) then [revisit the setup notebook](../00%20-%20Setup/00%20-%20Environment%20Setup.ipynb#permissions) and add this permission to the service account with the provided instructions.

environment:

In [8]:
!rm -rf {DIR}
!mkdir -p {DIR}

---

## Pipeline

This section follows the process I take to build a workflow as a Vertex AI Pipeline.  
1. Create an outline of the workflow
2. Prepare components, custom or prebuilt for each step
3. Define the pipeline
4. Compile and run the pipeline

The build process can be iterative where 2, 3, and 4 and created and tested as iterative steps.

### 1 - Outline Pipeline

Write down in words the flow you want to achieve along with any conditional elements:

- Candidate selection path:
    - Get list of candidate models: Vertex AI Model Registry where labels.series={SERIES} and version_alias=default
    - Loop (async) over list of candidate models
        - Gather Evaluation: log metrics: evaluation, confusion matrix, ROC curve
    - Pick the best candidate model
- Current model review path:
    - Check for endpoint, create if needed
    - Get the deployed model with most traffic 
    - Condition: if there is a deployed model
        - Gather Evaluation: log metrics: evaluation, confusion matrix, ROC curve
    - Condition: if there is not a deployed model
        - deploy best on endpoint
- Compare And update Path:
    - Condition: if best > deployed
        - deploy best on endpoint

### 2 - Prepare Components

#### Component: list_series_models
Get a list of BQML model names that are registred in the Vertex AI Model Registry for the Series

In [9]:
# from kfp.v2.dsl import Artifact, Input, Metrics, Output, component
@component(
    base_image = "python:3.9",
    packages_to_install = ["google-cloud-aiplatform"]
)
def list_series_models(
    project: str,
    region: str,
    series: str
) -> NamedTuple('outputs', [('candidates', list)]):

    # setup
    from collections import namedtuple
    result = namedtuple('outputs', ['candidates'])
    
    from google.cloud import aiplatform
    aiplatform.init(project = project, location = region)
    
    # get list of candidate models for series
    #candidates = [f"{model.name}" for model in aiplatform.Model.list(filter = f"labels.series={series}")]
    candidates = [f"{model.display_name}_{model.labels['timestamp']}" for model in aiplatform.Model.list(filter = f"labels.series={series}")  if model.display_name[3:5] == '03'] # if statement filter out the unsupervised models in 03g, 03h, 03i and model in pipeline 2

    return result(candidates)

#### Component: bqml_evaluate
Gather evaluation metrics for a specified BQML model: metrics, confusion matric, ROC curve

In [10]:
# from kfp.v2.dsl import Artifact, Input, Metrics, Output, component
@component(
    base_image = "python:3.9",
    packages_to_install = ['pandas', 'db-dtypes', 'google-cloud-bigquery', 'google-cloud-storage']
)
def bqml_evaluate(
    project: str,
    region: str,
    bq_project: str,
    bq_dataset: str,
    bq_model: str,
    bq_test_table: str,
    bq_results: str,
    best_metric: str,
    metrics: Output[Metrics],
    class_metrics: Output[ClassificationMetrics]
) -> NamedTuple('outputs', [('best_metric', float)]):

    # setup
    import json
    from google.cloud import bigquery
    bq = bigquery.Client(project = bq_project)
    from google.cloud import storage
    gcs = storage.Client(project = project)
    from collections import namedtuple
    result = namedtuple('outputs', ['best_metric'])
    
    # BQML: ML.EVALUTE
    query = f"""
    SELECT *
    FROM ML.EVALUATE (
        MODEL `{bq_project}.{bq_dataset}.{bq_model}`,
        (SELECT * FROM `{bq_test_table}`)
    )
    """
    bq_eval = bq.query(query = query).to_dataframe()
    bq_eval = bq_eval.to_dict(orient = 'records')[0]
    for key in bq_eval:
        metrics.log_metric(key, bq_eval[key])

    # BQML: ML.CONFUSION_MATRIX
    query = f"""
    SELECT *
    FROM ML.CONFUSION_MATRIX (
        MODEL `{bq_project}.{bq_dataset}.{bq_model}`,
        (SELECT * FROM `{bq_test_table}`)
    )
    """
    bq_cm = bq.query(query = query).to_dataframe()
    classes = ['Not Fraud', 'Fraud']
    # ignore the 'trial_id' column that is included when hyperparameter tuning is used and skip the first=label column
    matrix = bq_cm.loc[:, bq_cm.columns != 'trial_id'].iloc[:, 1:].values.tolist()
    class_metrics.log_confusion_matrix(classes, matrix)

    # BQML: ML.ROC_CURVE
    query = f"""
    SELECT *
    FROM ML.ROC_CURVE (
        MODEL `{bq_project}.{bq_dataset}.{bq_model}`,
        (SELECT * FROM `{bq_test_table}`)
    )
    """
    bq_roc = bq.query(query = query).to_dataframe()
    class_metrics.log_roc_curve(
        bq_roc['false_positive_rate'].tolist(),
        bq_roc['recall'].tolist(),
        bq_roc['threshold'].tolist()
    )
    
    # save bq_eval results to common file {'candidate': bq_eval}
    file = f"bq_eval_{bq_model}.json"
    bq_eval['candidate'] = bq_model
    with open(file, 'w') as fp:
        json.dump(bq_eval, fp)
    
    bucket = gcs.bucket(project)
    path = bq_results.split(f'gs://{project}/')[-1]
    blob = bucket.blob(f"{path}/{file}")
    blob.upload_from_filename(f"{file}")
    
    return result(bq_eval[best_metric])

#### Component: best_candidate

In [11]:
# from kfp.v2.dsl import Artifact, Input, Metrics, Output, component
@component(
    base_image = "python:3.9",
    packages_to_install = ['pandas', 'google-cloud-storage', 'pretty_html_table']
)
def best_candidate(
    project: str,
    region: str,
    bq_results: str,
    best_metric: str,
    candidates: list,
    metrics: Output[HTML]
) -> NamedTuple('outputs', [('best_candidate', str), ('best_metric', float)]):

    # setup
    import json
    import pandas as pd
    from pretty_html_table import build_table
    from google.cloud import storage
    gcs = storage.Client(project = project)
    from collections import namedtuple
    result = namedtuple('outputs', ['best_candidate', 'best_metric'])
    
    # retrieve bq_results to a list of dictionaires: bq_evals = [{}, {}, ...]
    path = bq_results.split(f'gs://{project}/')[-1]
    bucket = gcs.bucket(project)
    blobs = list(bucket.list_blobs(prefix = path))
    
    candidate_evals = []
    for blob in blobs:
        file = blob.name.split('/')[-1]
        blob.download_to_filename(file)
        with open(file, 'r') as fp:
            evals = json.load(fp)
        if evals['candidate'] in candidates:
            candidate_evals.append(evals)
            
    # convert list of dictionaries to pandas dataframe:
    candidate_evals = pd.DataFrame(candidate_evals)
    
    # pick best candidate based on best_metric:
    if best_metric in ['accuracy', 'precision', 'recall', 'f1_score', 'roc_auc']:
        candidate_evals['best'] = candidate_evals[best_metric].rank(method = 'dense', ascending = False)
    elif best_metric in ['log_loss']:
        candidate_evals['best'] = candidate_evals[best_metric].rank(method = 'dense', ascending = True)
    best_candidate = candidate_evals.loc[candidate_evals['best'] == 1].iloc[0]
    
    # output evals to HTML Table in metrics:
    with open(metrics.path, 'w') as fp:
        fp.write(build_table(candidate_evals, 'blue_light'))
        
    return result(best_candidate['candidate'], best_candidate[best_metric])

#### Component: get_endpoint

Look for Vertex AI Endpoint for the series and if missing, create it:

In [12]:
# from kfp.v2.dsl import Artifact, Input, Metrics, Output, component
@component(
    base_image = "python:3.9",
    packages_to_install = ['google-cloud-aiplatform']
)
def get_endpoint(
    project: str,
    region: str,
    series: str,
    bq_dataset: str 
) -> NamedTuple('outputs', [('endpoint_resource_name', str)]):

    # setup
    from collections import namedtuple
    result = namedtuple('outputs', ['endpoint_resource_name'])
    
    from google.cloud import aiplatform
    aiplatform.init(project = project, location = region)
    
    # retrieve endpoint
    endpoints = aiplatform.Endpoint.list(filter = f"labels.series={series}")
    if endpoints:
        endpoint = endpoints[0]
        print(f"Endpoint Exists: {endpoints[0].resource_name}")
    else:
        endpoint = aiplatform.Endpoint.create(
            display_name = f"{series}",
            labels = {'series' : f"{series}"}    
        )
        print(f"Endpoint Created: {endpoint.resource_name}")
    
    return result(endpoint.resource_name)

#### Component: get_deployed_model

Get the models deployed on the endpoint and return one with most/all traffic:

In [13]:
# from kfp.v2.dsl import Artifact, Input, Metrics, Output, component
@component(
    base_image = "python:3.9",
    packages_to_install = ['google-cloud-aiplatform']
)
def get_deployed_model(
    project: str,
    region: str,
    series: str,
    endpoint_resource_name: str,
) -> NamedTuple('outputs', [('model', str)]):

    # setup
    from collections import namedtuple
    result = namedtuple('outputs', ['model'])
    
    from google.cloud import aiplatform
    aiplatform.init(project = project, location = region)
    
    # retrieve endpoint
    endpoint = aiplatform.Endpoint(endpoint_name = endpoint_resource_name)
    
    # retrieve deployed model with most traffic and get BQML model name
    traffic_split = endpoint.traffic_split
    if traffic_split:
        deployed_model_id = max(traffic_split, key = traffic_split.get)
        if deployed_model_id:
            for model in endpoint.list_models():
                if model.id == deployed_model_id:
                    deployed_model = model.model+f'@{model.model_version_id}'
            deployed_model = aiplatform.Model(model_name = deployed_model)
            bq_model = deployed_model.display_name+f"_{deployed_model.labels['timestamp']}"
        else: bq_model = 'none'
    else: bq_model = 'none'
    
    return result(bq_model)

#### Component: deploy_candidate

In [23]:
# from kfp.v2.dsl import Artifact, Input, Metrics, Output, component
@component(
    base_image = "python:3.9",
    packages_to_install = ['google-cloud-aiplatform']
)
def deploy_candidate(
    project: str,
    region: str,
    series: str,
    endpoint_resource_name: str,
    bq_model: str
) -> NamedTuple('outputs', [('model_version', str)]):

    DEPLOY_COMPUTE = 'n1-standard-4'
    
    # setup
    from collections import namedtuple
    result = namedtuple('outputs', ['model_version'])
    
    from google.cloud import aiplatform
    aiplatform.init(project = project, location = region)
    
    # retrieve endpoint
    endpoint = aiplatform.Endpoint(endpoint_name = endpoint_resource_name)
    
    # retrieve model
    model_display_name = ('_').join(bq_model.split('_')[0:-1])
    model_timestamp = bq_model.split('_')[-1]
    model_experiment = bq_model.split('_')[1]
    model = aiplatform.Model.list(filter = f"labels.series={series} AND labels.experiment={model_experiment}")[0]
    
    # get all versions of the model:
    client_options = {"api_endpoint": f"{region}-aiplatform.googleapis.com"}
    model_client = aiplatform.gapic.ModelServiceClient(client_options = client_options)
    model_versions = list(model_client.list_model_versions(name = model.resource_name))
    for version in model_versions:
        if version.labels['timestamp'] == model_timestamp:
           model = aiplatform.Model(model_name = version.name) 
    
    # deploy the candidate model to the endpoint with 100% traffic
    endpoint.deploy(
        model = model,
        deployed_model_display_name = model.display_name,
        traffic_percentage = 100,
        machine_type = DEPLOY_COMPUTE,
        min_replica_count = 1,
        max_replica_count = 1
    )
    
    # remove models without traffic
    for deployed_model in endpoint.list_models():
        if deployed_model.id in endpoint.traffic_split:
            print(f"Model {deployed_model.display_name} with version {deployed_model.model_version_id} has traffic = {endpoint.traffic_split[deployed_model.id]}")
        else:
            endpoint.undeploy(deployed_model_id = deployed_model.id)
            print(f"Undeploying {deployed_model.display_name} with version {deployed_model.model_version_id} because it has no traffic.")

    return result(model.versioned_resource_name)

### 3 - Define Pipeline

Recall the Outline, notice it is include as comments in the pipeline definition below:
- Candidate selection path:
    - Get list of candidate models: Vertex AI Model Registry where labels.series={SERIES} and version_alias=default
    - Loop (async) over list of candidate models
        - Gather Evaluation: log metrics: evaluation, confusion matrix, ROC curve
    - Pick the best candidate model
- Current model review path:
    - Check for endpoint, create if needed
    - Get the deployed model with most traffic 
    - Condition: if there is a deployed model
        - Gather Evaluation: log metrics: evaluation, confusion matrix, ROC curve
    - Condition: if there is not a deployed model
        - deploy best on endpoint
- Compare And update Path:
    - Condition: if best > deployed
        - deploy best on endpoint

In [24]:
# from kfp import dsl
@dsl.pipeline(
    name = f'series-{SERIES}-endpoint-update',
    description = 'Update endpoint with best model.'
)
def endpoint_update_pipeline(
    project: str,
    region: str,
    series: str,
    experiment: str,
    bq_project: str,
    bq_dataset: str,
    bq_test_table: str,
    bq_results: str,
    best_metric: str
):
    from google_cloud_pipeline_components.types import artifact_types
    from kfp.v2.components import importer_node
    
# - Candidate selection path:
    
    # - Get list of candidate models: Vertex AI Model Registry where labels.series={SERIES} and version_alias=default
    candidate_models = list_series_models(
        project = project,
        region = region,
        series = series
    ).set_display_name('List Models in Series').set_caching_options(True)
    
    # - Loop (async) over list of candidate models
    with dsl.ParallelFor(candidate_models.outputs['candidates']) as candidate:
        
        # - Gather Evaluation: log metrics: evaluation, confusion matrix, ROC curve
        candidate_metrics = bqml_evaluate(
            project = project,
            region = region,
            bq_project = bq_project,
            bq_dataset = bq_dataset,
            bq_model = candidate,
            bq_test_table = bq_test_table,
            bq_results = bq_results,
            best_metric = best_metric
        ).set_display_name('Gather Candidate Metrics').set_caching_options(True)
    
    # - Pick the best candidate model    
    best = best_candidate(
        project = project,
        region = region,
        bq_results = bq_results,
        best_metric = best_metric,
        candidates = candidate_models.outputs['candidates']
    ).after(candidate_metrics).set_display_name('Pick The Best Candidate').set_caching_options(True)

# - Current model review path:

    # - Check for endpoint, create if needed
    endpoint = get_endpoint(
        project = project,
        region = region,
        series = series,
        bq_dataset = bq_dataset 
    ).set_display_name('Get the Endpoint').set_caching_options(True)
    
    # - Get the deployed model with most traffic, if any
    current_model = get_deployed_model(
        project = project,
        region = region,
        series = series,
        endpoint_resource_name = endpoint.outputs['endpoint_resource_name'] 
    ).set_display_name('Get The Deployed Model').set_caching_options(True)
    
    # - Condition: if there is a deployed model
    with dsl.Condition(
        current_model.outputs['model'] != 'none',
        name = 'compare_models'
    ):
    
        # - Gather Evaluation: log metrics: evaluation, confusion matrix, ROC curve
        current_metrics = bqml_evaluate(
            project = project,
            region = region,
            bq_project = bq_project,
            bq_dataset = bq_dataset,
            bq_model = current_model.outputs['model'],
            bq_test_table = bq_test_table,
            bq_results = bq_results,
            best_metric = best_metric
        ).set_display_name('Gather Current Metrics').set_caching_options(True)
    
# - Compare And update Path:

        # - Condition: if best > deployed
        with dsl.Condition(
            best.outputs['best_metric'] > current_metrics.outputs['best_metric'],
            name = 'replace_deployed_model'
        ):
            
            # - deploy best on endpoint
            deploy = deploy_candidate(
                project = project,
                region = region,
                series = series,
                endpoint_resource_name = endpoint.outputs['endpoint_resource_name'],
                bq_model = best.outputs['best_candidate']
            ).set_display_name('Deploy The Candidate Model').set_caching_options(True)
            
    # - Condition: if there is not a deployed model
    with dsl.Condition(
        current_model.outputs['model'] == 'none',
        name = 'deploy_model'
    ):
        
        # - deploy best on endpoint
        deploy = deploy_candidate(
            project = project,
            region = region,
            series = series,
            endpoint_resource_name = endpoint.outputs['endpoint_resource_name'],
            bq_model = best.outputs['best_candidate']
        ).set_display_name('Deploy The Candidate Model').set_caching_options(True)

### 4 - Compile And Run Pipeline

#### Collect Inputs

In [25]:
bq_test_table = f"{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}_TEST"
query = f"""
CREATE OR REPLACE VIEW `{bq_test_table}` AS
    SELECT * EXCEPT({','.join(VAR_OMIT.split())}, splits),
    FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`
    WHERE splits = 'TEST'
"""
job = bq.query(query = query)
job.result()

<google.cloud.bigquery.table._EmptyRowIterator at 0x7fed8e3c9f90>

In [26]:
parameter_values = {
    "project" : PROJECT_ID,
    "region" : REGION,
    "experiment" : EXPERIMENT,
    "series": SERIES,
    "bq_project": BQ_PROJECT,
    "bq_dataset": BQ_DATASET,
    "bq_test_table": bq_test_table,
    "bq_results": f"{URI}/bq_results",
    "best_metric": 'accuracy' # accuracy, precision, recall, f1_score, log_loss, roc_auc
}

#### Compile Pipeline

In [27]:
# from kfp.v2 import compiler
compiler.Compiler().compile(
    pipeline_func = endpoint_update_pipeline,
    package_path = f"{DIR}/{EXPERIMENT}.json"
)

#### Define Pipeline Job
Using compiled pipeline:

In [28]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f"{EXPERIMENT}_tournament",
    template_path = f"{DIR}/{EXPERIMENT}.json",
    parameter_values = parameter_values,
    pipeline_root = f"{URI}/pipeline_root",
    enable_caching = False, # overrides all component/task settings
    labels = {'experiment': EXPERIMENT, 'series': SERIES}
)

#### Submit Pipeline Job

In [29]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/series-03-endpoint-update-20221003152920
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/series-03-endpoint-update-20221003152920')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/series-03-endpoint-update-20221003152920?project=1026793852137


Using the following link to view the job in the GCP console:

In [30]:
print(f'The Dashboard can be viewed here:\n{pipeline_job._dashboard_uri()}')

The Dashboard can be viewed here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/series-03-endpoint-update-20221003152920?project=1026793852137


#### Wait On Pipeline Job

In [31]:
pipeline_job.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/series-03-endpoint-update-20221003152920 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/series-03-endpoint-update-20221003152920 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/series-03-endpoint-update-20221003152920 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/series-03-endpoint-update-20221003152920 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/series-03-endpoint-update-20221003152920 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/series-03-endpoint-update-20221003152920 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1

---
## Dashboard Screenshot:

<img src="../architectures/notebooks/03/pipeline_ex1.png">

---
## Work In Progress

Use Vertex AI ML Metadata Artifact Types:
 - [Reference](https://cloud.google.com/vertex-ai/docs/pipelines/artifact-types)
 - [Consume or produce artifact in your component](https://cloud.google.com/vertex-ai/docs/pipelines/use-components#consume_or_produce_artifacts_in_your_component)

```
    # import prebuilt components
    from google_cloud_pipeline_components.v1.bigquery import (
        BigqueryEvaluateModelJobOp,
        BigqueryMLConfusionMatrixJobOp,
        BigqueryMLRocCurveJobOp
    )
    from google_cloud_pipeline_components.types import artifact_types
    from kfp.v2.components import importer_node

...

    with dsl.ParallelFor(candidate_models.outputs['candidates']) as candidate:
        #candidate.set_display_name(str(candidate_models.outputs['candidates']))
        
        bqml_model = importer_node.importer(
            artifact_uri = artifact_uri,
            artifact_class = artifact_types.BQMLModel,
            metadata = {
                'projectId': "statmike-mlops-349915",
                'datasetId': "fraud",
                'modelId': "03f_fraud_20220909135610"
            }
        )
        bq_eval = BigqueryEvaluateModelJobOp(
            project = project,
            location = region,
            model = bqml_model.outputs['artifact'],
            table_name = bq_test_table
        )
        bq_cm = BigqueryMLConfusionMatrixJobOp(
            project = project,
            location = region,
            model = bqml_model.outputs['artifact'],
            table_name = bq_test_table
        )
        bq_roc = BigqueryMLRocCurveJobOp(
            project = project,
            location = region,
            model = bqml_model.outputs['artifact'],
            table_name = bq_test_table
        ) 
```

---
## Remove Resources
see notebook "99 - Cleanup"