![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FMLOps%2FPipelines&file=Vertex+AI+Pipelines+-+Testing.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Testing.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FMLOps%2FPipelines%2FVertex%2520AI%2520Pipelines%2520-%2520Testing.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Testing.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Pipelines/Vertex%20AI%20Pipelines%20-%20Testing.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

---
This is part of a [series of notebook based workflows](./readme.md) that teach all the ways to use pipelines within Vertex AI. The suggested order and description/reason is:

||Notebook Workflow|Description|
|---|---|---|
||[Vertex AI Pipelines - Start Here](./Vertex%20AI%20Pipelines%20-%20Start%20Here.ipynb)|What are pipelines? Start here to go from code to pipeline and see it in action.|
||[Vertex AI Pipelines - Introduction](./Vertex%20AI%20Pipelines%20-%20Introduction.ipynb)|Introduction to pipelines with the console and Vertex AI SDK|
||[Vertex AI Pipelines - Components](./Vertex%20AI%20Pipelines%20-%20Components.ipynb)|An introduction to all the ways to create pipeline components from your code|
||[Vertex AI Pipelines - IO](./Vertex%20AI%20Pipelines%20-%20IO.ipynb)|An overview of all the type of inputs and outputs for pipeline components|
||[Vertex AI Pipelines - Control](./Vertex%20AI%20Pipelines%20-%20Control.ipynb)|An overview of controlling the flow of exectution for pipelines|
||[Vertex AI Pipelines - Secret Manager](./Vertex%20AI%20Pipelines%20-%20Secret%20Manager.ipynb)|How to pass sensitive information to pipelines and components|
||[Vertex AI Pipelines - Scheduling](./Vertex%20AI%20Pipelines%20-%20Scheduling.ipynb)|How to schedule pipeline execution|
||[Vertex AI Pipelines - Notifications](./Vertex%20AI%20Pipelines%20-%20Notifications.ipynb)|How to send email notification of pipeline status.|
||[Vertex AI Pipelines - Management](./Vertex%20AI%20Pipelines%20-%20Management.ipynb)|Managing, Reusing, and Storing pipelines and components|
|_**This Notebook**_|[Vertex AI Pipelines - Testing](./Vertex%20AI%20Pipelines%20-%20Testing.ipynb)|Strategies for testing components and pipeliens locally and remotely to aide development.|
||[Vertex AI Pipelines - Managing Pipeline Jobs](./Vertex%20AI%20Pipelines%20-%20Managing%20Pipeline%20Jobs.ipynb)|Manage runs of pipelines in an environment: list, check status, filtered list, cancel and delete jobs.|


To discover these notebooks as part of an introduction to MLOps orchestration [start here](./readme.md).  To read more about MLOps also check out [the parent folder](../readme.md).

---

# Vertex AI Pipelines - Testing

When creating pipeline components and pipelines the process of testing can be aided by local testing and several strategies for remote (On Vertex AI Pipelines) testing.  The notebook based workflow will cover examples of these local and remote strategies.

**Why test locally?**

> To iterate quickly by being able to run code with iterative changes and not need to wait on the startup time of remote execution.

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [4]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [5]:
try:
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
    print('Colab authorized to GCP')
except Exception:
    print('Not a Colab Environment')
    pass

Not a Colab Environment


---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [176]:
# tuples of (import name, install name, min_version)
packages = [
    ('google.cloud.aiplatform', 'google-cloud-aiplatform'),
    ('kfp', 'kfp'),
    ('docker', 'docker')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### Docker

This notebook uses local Docker to run KFP components and pipelines locally for test.  Check for a Docker install and if missing prompt to install:
- [Docker installation](https://docs.docker.com/engine/install/)

In [1]:
import docker
 
try:
    docker_client = docker.from_env()
    if docker_client.ping():
        print(f"Docker is installed and running. Version: {docker_client.version()['Version']}")
except Exception as e:
    print('Docker is either not installed or not running - please fix before proceeding.\nhttps://docs.docker.com/engine/install/')

Docker is installed and running. Version: 20.10.17


### API Enablement

In [8]:
!gcloud services enable aiplatform.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [9]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

Inputs

In [10]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [11]:
REGION = 'us-central1'
EXPERIMENT = 'pipeline-testing'
SERIES = 'mlops'

# gcs bucket
GCS_BUCKET = PROJECT_ID

Packages

In [167]:
import os
import glob
import json
import yaml
import time
import importlib
from google.cloud import aiplatform
import kfp
from typing import NamedTuple
from IPython.display import Markdown

In [128]:
aiplatform.__version__

'1.62.0'

In [129]:
kfp.__version__

'2.7.0'

Clients

In [13]:
# vertex ai clients
aiplatform.init(project = PROJECT_ID, location = REGION)

parameters:

In [14]:
DIR = f"temp/{SERIES}-{EXPERIMENT}"

In [15]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

environment:
- make a local folder for temporary storage

In [16]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Local Testing

The fastest way to test a component or full pipeline is to [test it locally](https://www.kubeflow.org/docs/components/pipelines/user-guides/core-functions/execute-kfp-pipelines-locally/#overview).  This is made each by importing and the [local module](https://kubeflow-pipelines.readthedocs.io/en/latest/source/local.html):
```Python
from kfp import local

local.init(runner = local.DockerRunner())
```

This method has a few limitations but these are acceptable for a development environment where you need fast iteration.  Using these options will be ignored during local execution:

- task-level configuration will be ignored, like `.set_memory_limit`, `.set_accelerator_type`
- pipeline-level optimizations like `.set_retry`, `.set_caching_options`
- some control flow operators like `kfp.dsl.ParallelFor` and `kfp.dsl.ExitHandler`

Local environment options, **runners**:
- [`DockerRunner`](https://www.kubeflow.org/docs/components/pipelines/user-guides/core-functions/execute-kfp-pipelines-locally/#runner-dockerrunner)
    - the best option for local runtime environment isolation
    - most similar to remote execution
    - can run all component types
    - requires [Docker to be installed](https://docs.docker.com/engine/install/) but does not require the user to directly use Docker 
- [`SubprocessRunner`](https://www.kubeflow.org/docs/components/pipelines/user-guides/core-functions/execute-kfp-pipelines-locally/#runner-subprocessrunner)
    - does not support custom images
    - only works with lightweight Python components
    - does install dependencies and uses a virtual environment for them

### Initialize Local Execution Session

In [19]:
kfp.local.init(
    runner = kfp.local.DockerRunner(),
    pipeline_root = DIR
)

### Create Component & Test Locally

Components are created as they normally are. The local exectuion is already initialized so testing is as easy as createing a task that uses the component.

In [112]:
@kfp.dsl.component(
    base_image = 'python:3.10',
    packages_to_install = ['numpy']
)
def roll_dice(num_dice: int = 1) -> int:
    import numpy as np
    result = sum([np.random.randint(1,7) for n in range(num_dice)])
    return result

In [113]:
test = roll_dice(num_dice = 1)

14:08:09.604 - INFO - Executing task [96m'roll-dice'[0m
14:08:09.606 - INFO - Streamed logs:

    Found image 'python:3.10'

    
    [notice] A new release of pip is available: 23.0.1 -> 24.2
    [notice] To update, run: pip install --upgrade pip
    [KFP Executor 2024-09-10 14:08:20,209 INFO]: Looking for component `roll_dice` in --component_module_path `/tmp/tmp.8G1yZVY8M8/ephemeral_component.py`
    [KFP Executor 2024-09-10 14:08:20,210 INFO]: Loading KFP component "roll_dice" from /tmp/tmp.8G1yZVY8M8/ephemeral_component.py (directory "/tmp/tmp.8G1yZVY8M8" and module name "ephemeral_component")
    [KFP Executor 2024-09-10 14:08:20,210 INFO]: Got executor_input:
    {
        "inputs": {
            "parameterValues": {
                "num_dice": 1
            }
        },
        "outputs": {
            "parameters": {
                "Output": {
                    "outputFile": "/home/jupyter/vertex-ai-mlops/MLOps/Pipelines/temp/mlops-pipeline-testing/roll-dice-2024-09-10-14

In [114]:
test.output

5

In [115]:
test.inputs, test.output

({'num_dice': 1}, 5)

In [116]:
test.name

'roll-dice'

In [117]:
try:
    test.dependent_tasks
except Exception as err:
    print(err)

Task has no dependent tasks since it is executed independently.


### Create Pipeline & Test Locally

Similar to components, the creation of a pipeline is the same and testing locally is as easy as creating a run of the pipeline since local execution is already initialized.

In [118]:
@kfp.dsl.pipeline(
    name = f'{SERIES}-{EXPERIMENT}',
    description = 'A pipeline built and tested locally first.'
)
def rolling_pipeline(
    num_dice: int
) -> int:
    
    roll_1 = roll_dice(num_dice = num_dice)
    roll_2 = roll_dice(num_dice = roll_1.output)

    return roll_2.output

In [119]:
test_pipeline = rolling_pipeline(num_dice = 1)

14:09:27.420 - INFO - Running pipeline: [95m'mlops-pipeline-testing'[0m
--------------------------------------------------------------------------------
14:09:27.424 - INFO - Executing task [96m'roll-dice'[0m
14:09:27.425 - INFO - Streamed logs:

    Found image 'python:3.10'

    
    [notice] A new release of pip is available: 23.0.1 -> 24.2
    [notice] To update, run: pip install --upgrade pip
    [KFP Executor 2024-09-10 14:09:37,977 INFO]: Looking for component `roll_dice` in --component_module_path `/tmp/tmp.5I0DMbUjEJ/ephemeral_component.py`
    [KFP Executor 2024-09-10 14:09:37,977 INFO]: Loading KFP component "roll_dice" from /tmp/tmp.5I0DMbUjEJ/ephemeral_component.py (directory "/tmp/tmp.5I0DMbUjEJ" and module name "ephemeral_component")
    [KFP Executor 2024-09-10 14:09:37,978 INFO]: Got executor_input:
    {
        "inputs": {
            "parameterValues": {
                "num_dice": 1
            }
        },
        "outputs": {
            "parameters": {
     

In [120]:
test_pipeline.inputs, test_pipeline.outputs

({'num_dice': 1}, {'Output': 20})

In [121]:
test_pipeline.name

'mlops-pipeline-testing'

### Test Component Returning An Artifact

The example above used simple parameter return values.  Similarly, artifacts can be returned and read directly for review.  In this case a single output artifact without a name.

This example uses the built in generic artifact type from kfp: [`kfp.dsl.Artifact`](https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Artifact)

In [209]:
@kfp.dsl.component(
    base_image = 'python:3.10'
)
def flip_coins(
    num_coins: int = 1
) -> kfp.dsl.Artifact:
    import random
    flipmap = ['T', 'H']
    flips = [flipmap[random.randint(0, 1)] for n in range(num_coins)]
    
    response = kfp.dsl.Artifact(
        metadata = dict(flips = flips),
        uri = kfp.dsl.get_uri(suffix = ''),
        name = 'Flips History'
    )
    
    return response

In [210]:
test_artifact = flip_coins(num_coins = 10)

22:02:55.874 - INFO - Executing task [96m'flip-coins'[0m
22:02:55.877 - INFO - Streamed logs:

    Found image 'python:3.10'

    [KFP Executor 2024-09-11 22:03:01,355 INFO]: Looking for component `flip_coins` in --component_module_path `/tmp/tmp.SQTD498jFG/ephemeral_component.py`
    [KFP Executor 2024-09-11 22:03:01,355 INFO]: Loading KFP component "flip_coins" from /tmp/tmp.SQTD498jFG/ephemeral_component.py (directory "/tmp/tmp.SQTD498jFG" and module name "ephemeral_component")
    [KFP Executor 2024-09-11 22:03:01,356 INFO]: Got executor_input:
    {
        "inputs": {
            "parameterValues": {
                "num_coins": 10
            }
        },
        "outputs": {
            "artifacts": {
                "Output": {
                    "artifacts": [
                        {
                            "name": "Output",
                            "type": {
                                "schemaTitle": "system.Artifact",
                                "schemaV

In [211]:
test_artifact.inputs

{'num_coins': 10}

In [212]:
test_artifact.outputs

{'Output': <kfp.dsl.types.artifact_types.Artifact at 0x7f9d9e2dc4f0>}

In [213]:
test_artifact.outputs['Output'].path

'/home/jupyter/vertex-ai-mlops/MLOps/Pipelines/temp/mlops-pipeline-testing/flip-coins-2024-09-11-22-02-55-873113/flip-coins/'

In [214]:
test_artifact.outputs['Output'].name

'Flips History'

In [215]:
test_artifact.outputs['Output'].metadata

{'flips': ['T', 'T', 'T', 'H', 'H', 'H', 'T', 'H', 'H', 'T']}

In [216]:
!ls {test_artifact.outputs['Output'].path}

executor_output.json


In [217]:
with open(test_artifact.outputs['Output'].path + '/executor_output.json') as f:
    contents = f.read()

In [218]:
contents = json.loads(contents)
contents

{'artifacts': {'Output': {'artifacts': [{'name': 'Flips History',
     'uri': '/home/jupyter/vertex-ai-mlops/MLOps/Pipelines/temp/mlops-pipeline-testing/flip-coins-2024-09-11-22-02-55-873113/flip-coins/',
     'metadata': {'flips': ['T',
       'T',
       'T',
       'H',
       'H',
       'H',
       'T',
       'H',
       'H',
       'T']}}]}}}

In [161]:
contents['artifacts']['Output']['artifacts'][0]['metadata']['flips']

['H', 'T', 'T', 'H', 'T', 'H', 'T', 'T', 'H', 'T']

### Test Component With Input Artifact

Artifact can all be input values.  This example modifies the above example to input the number of coins as an artifact.

This example uses the built in generic artifact type from kfp: [`kfp.dsl.Artifact`](https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.Artifact)

In [241]:
@kfp.dsl.component(
    base_image = 'python:3.10'
)
def flip_coins(
    num_coins: int
) -> kfp.dsl.Artifact:
    import random
    flipmap = ['T', 'H']
    flips = [flipmap[random.randint(0, 1)] for n in range(num_coins)]
    
    response = kfp.dsl.Artifact(
        metadata = dict(flips = flips),
        uri = kfp.dsl.get_uri(suffix = ''),
        name = 'Flips History'
    )
    
    return response

In [242]:
num_coins = kfp.dsl.Artifact(metadata = dict(num_coins = 15))

In [243]:
num_coins.metadata['num_coins']

15

In [244]:
test_input_artifact = flip_coins(num_coins = num_coins.metadata['num_coins'])

22:14:37.863 - INFO - Executing task [96m'flip-coins'[0m
22:14:37.864 - INFO - Streamed logs:

    Found image 'python:3.10'

    [KFP Executor 2024-09-11 22:14:43,213 INFO]: Looking for component `flip_coins` in --component_module_path `/tmp/tmp.s84q9H3zN3/ephemeral_component.py`
    [KFP Executor 2024-09-11 22:14:43,213 INFO]: Loading KFP component "flip_coins" from /tmp/tmp.s84q9H3zN3/ephemeral_component.py (directory "/tmp/tmp.s84q9H3zN3" and module name "ephemeral_component")
    [KFP Executor 2024-09-11 22:14:43,214 INFO]: Got executor_input:
    {
        "inputs": {
            "parameterValues": {
                "num_coins": 15
            }
        },
        "outputs": {
            "artifacts": {
                "Output": {
                    "artifacts": [
                        {
                            "name": "Output",
                            "type": {
                                "schemaTitle": "system.Artifact",
                                "schemaV

In [246]:
test_input_artifact.inputs

{'num_coins': 15}

In [247]:
test_input_artifact.outputs

{'Output': <kfp.dsl.types.artifact_types.Artifact at 0x7f9dc1c965c0>}

In [248]:
test_input_artifact.outputs['Output'].path

'/home/jupyter/vertex-ai-mlops/MLOps/Pipelines/temp/mlops-pipeline-testing/flip-coins-2024-09-11-22-14-37-862208/flip-coins/'

In [249]:
test_input_artifact.outputs['Output'].name

'Flips History'

In [250]:
test_input_artifact.outputs['Output'].metadata

{'flips': ['T',
  'T',
  'T',
  'H',
  'T',
  'T',
  'H',
  'H',
  'H',
  'H',
  'H',
  'H',
  'H',
  'T',
  'H']}

### Testing Component Returning Multiple Outputs

As show in the [Vertex AI Pipelines - IO](./Vertex%20AI%20Pipelines%20-%20IO.ipynb) workflow in this series, the use of multiple outputs uses `NamedTuple` objects so each output can be individual referenced. 


In [188]:
@kfp.dsl.component(
    base_image = 'python:3.10',
    packages_to_install = ['numpy']
)
def roll_dice_v2(
    num_dice: int = 1
) -> NamedTuple('output', rolls=list, total=int, roll_data=kfp.dsl.Artifact):
    
    import numpy as np
    from typing import NamedTuple
    result = NamedTuple('output', rolls=list, total=int, roll_data=kfp.dsl.Artifact)
    
    rolls = [np.random.randint(1,7) for n in range(num_dice)]
    total = sum(rolls)
    roll_data = kfp.dsl.Artifact(
        #name = 'roll_data',
        uri = kfp.dsl.get_uri(suffix = ''),
        metadata = dict(rolls = rolls, total=total)
    )
    
    
    return result(rolls, total, roll_data)

In [189]:
test_multiple = roll_dice_v2(num_dice = 5)

16:32:12.552 - INFO - Executing task [96m'roll-dice-v2'[0m
16:32:12.554 - INFO - Streamed logs:

    Found image 'python:3.10'

    
    [notice] A new release of pip is available: 23.0.1 -> 24.2
    [notice] To update, run: pip install --upgrade pip
    [KFP Executor 2024-09-10 16:32:22,897 INFO]: Looking for component `roll_dice_v2` in --component_module_path `/tmp/tmp.5rLVChyCwe/ephemeral_component.py`
    [KFP Executor 2024-09-10 16:32:22,897 INFO]: Loading KFP component "roll_dice_v2" from /tmp/tmp.5rLVChyCwe/ephemeral_component.py (directory "/tmp/tmp.5rLVChyCwe" and module name "ephemeral_component")
    [KFP Executor 2024-09-10 16:32:22,899 INFO]: Got executor_input:
    {
        "inputs": {
            "parameterValues": {
                "num_dice": 5
            }
        },
        "outputs": {
            "parameters": {
                "rolls": {
                    "outputFile": "/home/jupyter/vertex-ai-mlops/MLOps/Pipelines/temp/mlops-pipeline-testing/roll-dice-v2-20

In [190]:
test_multiple.inputs

{'num_dice': 5}

In [191]:
test_multiple.outputs

{'rolls': [5.0, 5.0, 2.0, 1.0, 1.0],
 'total': 14,
 'roll_data': <kfp.dsl.types.artifact_types.Artifact at 0x7f9d9e325f00>}

In [192]:
test_multiple.outputs['roll_data'].name

''

In [193]:
test_multiple.outputs['roll_data'].path

'/home/jupyter/vertex-ai-mlops/MLOps/Pipelines/temp/mlops-pipeline-testing/roll-dice-v2-2024-09-10-16-32-12-550903/roll-dice-v2/'

In [194]:
test_multiple.outputs['roll_data'].metadata

{'rolls': [5.0, 5.0, 2.0, 1.0, 1.0], 'total': 14.0}

### Review The Local Pipeline Root

All the local runs used a local pipeline root that was defined during the initialization of the local execution environment.  This directly contains the inputs and outputs from each of the local executions above:

In [195]:
directories = [entry for entry in os.listdir(DIR) if os.path.isdir(os.path.join(DIR, entry))]
directories

['flip-coins-2024-09-10-13-38-15-692210',
 'flip-coins-2024-09-10-13-48-05-571837',
 'mlops-pipeline-testing-2024-09-10-14-09-27-420198',
 'roll-dice-2024-09-10-11-51-05-713378',
 'flip-coins-2024-09-10-13-45-27-121412',
 'flip-coins-2024-09-10-13-52-24-983731',
 'flip-coins-2024-09-10-13-48-43-233520',
 'roll-dice-v2-2024-09-10-15-24-25-002196',
 'roll-dice-v2-2024-09-10-15-19-43-225701',
 'roll-dice-2024-09-10-12-08-01-004102',
 'flip-coins-2024-09-10-13-51-06-315019',
 'roll-dice-2024-09-10-14-08-09-602441',
 'roll-dice-2024-09-10-11-55-44-841933',
 'flip-coins-2024-09-10-13-35-20-581991',
 'roll-dice-v2-2024-09-10-15-18-47-343566',
 'flip-coins-2024-09-10-14-10-39-370425',
 'flip-coins-2024-09-10-13-43-50-913848',
 'roll-dice-v2-2024-09-10-16-32-12-550903',
 'mlops-pipeline-testing-2024-09-10-12-32-46-673825',
 'flip-coins-2024-09-10-13-41-17-343787',
 'flip-coins-2024-09-10-13-36-38-386154']

In [196]:
os.listdir(os.path.join(DIR, directories[-1]))

['flip-coins']

In [197]:
for file_path in glob.iglob(os.path.join(DIR, directories[-1], '**/*'), recursive = True):
    if os.path.isfile(file_path):
        print(file_path)

temp/mlops-pipeline-testing/flip-coins-2024-09-10-13-36-38-386154/flip-coins/executor_output.json


---
## Remote Testing (On Vertex AI Pipelines)

Remote means the production environment, in this case Vertex AI Pipelines.  You may still need to do testing in this enviorment before doing full production runs or setting a scheduled pipeline.  There are several strategies that can make testing easier:

- Caching
- Single Component Pipelines

Each of these is covered with example in the following sections:

---
### Single Component Pipelines

Isolating a single component can be helpful for testing.  Since components can be used as pipelines this is actually really easy.  Just compile the component and submit it as a pipeline run on Vertex AI Pipelines.

Compile the component (last one above) as a pipeline:

In [198]:
type(roll_dice_v2)

kfp.dsl.python_component.PythonComponent

In [199]:
kfp.compiler.Compiler().compile(
    roll_dice_v2,
    package_path = f'{DIR}/component_roll_dice_v2.yaml'
)

In [200]:
list(glob.iglob(DIR + '/*.yaml'))

['temp/mlops-pipeline-testing/component_roll_dice_v2.yaml']

Create a pipeline job from the compiled component:

In [201]:
pipeline_job = aiplatform.PipelineJob(
    display_name = f'{SERIES}-{EXPERIMENT}',
    template_path = f"{DIR}/component_roll_dice_v2.yaml",
    parameter_values = dict(num_dice = 10),
    pipeline_root = f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/pipeline_root',
    enable_caching = None # True (enabled), False (disable), None (defer to component level caching) 
)

In [202]:
response = pipeline_job.submit(
    service_account = SERVICE_ACCOUNT
)

Creating PipelineJob
PipelineJob created. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/roll-dice-v2-20240910163313
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/1026793852137/locations/us-central1/pipelineJobs/roll-dice-v2-20240910163313')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/roll-dice-v2-20240910163313?project=1026793852137


In [203]:
pipeline_job.wait()

PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/roll-dice-v2-20240910163313 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/roll-dice-v2-20240910163313 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/roll-dice-v2-20240910163313 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/1026793852137/locations/us-central1/pipelineJobs/roll-dice-v2-20240910163313 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob run completed. Resource name: projects/1026793852137/locations/us-central1/pipelineJobs/roll-dice-v2-20240910163313


In [208]:
aiplatform.get_pipeline_df(pipeline = f'roll-dice-v2')

Unnamed: 0,pipeline_name,run_name,param.output:rolls,param.vertex-ai-pipelines-artifact-argument-binding,param.vmlmd_lineage_integration,param.input:num_dice,param.output:total,metric.rolls,metric.total
0,roll-dice-v2,roll-dice-v2-20240910163313,"[6.0, 5.0, 6.0, 5.0, 4.0, 3.0, 1.0, 6.0, 6.0, ...",{'output:roll_data': ['projects/1026793852137/...,{'pipeline_run_component': {'location_id': 'us...,10.0,44.0,"[6.0, 5.0, 6.0, 5.0, 4.0, 3.0, 1.0, 6.0, 6.0, ...",44.0
1,roll-dice-v2,roll-dice-v2-20240910162251,,,{'pipeline_run_component': {'parent_task_names...,10.0,,,


**Select The Pipeline Run In The Console:**
<p align="center"><center>
    <img align="center" alt="Pipeline Run" src="../resources/images/screenshots/pipelines/testing-run.png" width="70%">
</center></p>

**Review The Pipeline: Parameters**
<p align="center"><center>
    <img align="center" alt="Pipeline parameters" src="../resources/images/screenshots/pipelines/testing-parameters.png" width="70%">
</center></p>

**Review The Pipeline: Artifacts**
<p align="center"><center>
    <img align="center" alt="Pipeline Artifacts" src="../resources/images/screenshots/pipelines/testing-artifacts.png" width="70%">
</center></p>

---
### Caching

Caching helps with reusing redudant components across multiple runs of a pipeline.  How is redundant determined?  By the same component running with the same inputs while the output is still available.  To enable this behavior, or disable it, KFP provides [caching](https://www.kubeflow.org/docs/components/pipelines/user-guides/core-functions/caching/).

**Task Level**

Within a pipeline definition each individual task can be assigned caching options using the [`.set_caching_options` method](https://kubeflow-pipelines.readthedocs.io/en/latest/source/dsl.html#kfp.dsl.PipelineTask.set_caching_options) which has the parameter `enable_caching` for setting to True or False.  Note that the pipeline level caching options, covered next, might override this setting.

**Pipeline Level**

Pipeline runs on [Vertex AI Pipelines can utilize the pipeline level caching](https://cloud.google.com/vertex-ai/docs/pipelines/configure-caching) with option `enable_caching` found in ['aiplatform.PipelineJob'](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.PipelineJob).  This property has the following behavior:
- Pipelines with the same name share a cache
- `enable_caching` = 
    - `False`: Overrides any task level caching with `.set_caching_options` and disables caching
    - `True`: Overrides any task level caching with `.set_caching_options` and enables caching
    - Undefined: Defers to the task level caching with `.set_caching_options`
- Cached results do not have a timeout or time-to-live (TTL) and as long as the entry is not deleted from the Vertex AI ML Metadata it will honor the above behaviors.

**When not to use caching:**

When a component is not deterministic, meaning it might have different outputs for the same inputs, then caching should not be used.  Example of this:
- Random assignment of variables within the component
- The component fetches information from an external service where the address/url might be the input and remain unchanged but the external content/data changes.