# Running custom model training on Vertex AI Pipelines

In this lab, you will learn how to run a custom model training job using the Kubeflow Pipelines SDK on Vertex AI Pipelines.

## Learning objectives

* Use the Kubeflow Pipelines SDK to build scalable ML pipelines.
* Create and containerize a custom Scikit-learn model training job that uses Vertex AI managed datasets.
* Run a batch prediction job within Vertex AI Pipelines.
* Use pre-built components, which are provided through the google_cloud_pipeline_components library, to interact with Vertex AI services.


## Vertex AI Pipelines setup
There are a few additional libraries you'll need to install in order to use Vertex AI Pipelines:

* __Kubeflow Pipelines__: This is the SDK you'll be using to build your pipeline. Vertex AI Pipelines supports running pipelines built with both Kubeflow Pipelines or TFX.
* __Google Cloud Pipeline Components__: This library provides pre-built components that make it easier to interact with Vertex AI services from your pipeline steps.

Each learning objective will correspond to a __#TODO__ in the notebook, where you will complete the notebook cell's code before running the cell. Refer to the [solution notebook](../solutions/custom_model_training.ipynb) for reference.


To install both of the services to be used in this notebook, first set the user flag in the notebook cell:

In [1]:
USER_FLAG = "--user"

In [2]:
!pip3 install {USER_FLAG} google-cloud-aiplatform==1.7.0 --upgrade
!pip3 install {USER_FLAG} kfp==1.8.9 google-cloud-pipeline-components==0.2.0

Collecting google-cloud-aiplatform==1.7.0
  Downloading google_cloud_aiplatform-1.7.0-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
Collecting google-cloud-storage<2.0.0dev,>=1.32.0
  Downloading google_cloud_storage-1.44.0-py2.py3-none-any.whl (106 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m106.8/106.8 KB[0m [31m656.1 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Installing collected packages: google-cloud-storage, google-cloud-aiplatform
[0mSuccessfully installed google-cloud-aiplatform-1.7.0 google-cloud-storage-1.44.0
Collecting kfp==1.8.9
  Downloading kfp-1.8.9.tar.gz (296 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.3/296.3 KB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting google-cloud-pipeline-components==0.2.0
  Downloading google_cloud_pipe

You may see some warning messages in the install output.

After installing these packages you'll need to restart the kernel:

In [3]:
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

Finally, check that you have correctly installed the packages. The KFP SDK version should be >=1.8:

In [1]:
!python3 -c # TODO 1: Your code here
!python3 -c "import google_cloud_pipeline_components; print('google_cloud_pipeline_components version: {}'.format(google_cloud_pipeline_components.__version__))"

KFP SDK version: 1.8.9
google_cloud_pipeline_components version: 0.2.0


### Set your project ID and bucket
Throughout this notebook, you'll reference your Cloud project ID and the bucket you created earlier. Next you'll create variables for each of those.

If you don't know your project ID you may be able to get it by running the following:

In [2]:
import os
PROJECT_ID = ""

# Get your Google Cloud project ID from gcloud
if not os.getenv("IS_TESTING"):
    shell_output=!gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID: ", PROJECT_ID)

Project ID:  qwiklabs-gcp-03-829279d2a7be


Otherwise, set it here:

In [3]:
if PROJECT_ID == "" or PROJECT_ID is None:
    PROJECT_ID = "your-project-id"  # @param {type:"string"}

Then create a variable to store your bucket name. If you created it in this lab, the following will work. Otherwise, you'll need to set this manually:

In [4]:
BUCKET_NAME="gs://" + PROJECT_ID + "-bucket"

In [5]:
!echo {BUCKET_NAME}

gs://qwiklabs-gcp-03-829279d2a7be-bucket


### Import libraries
Add the following to import the libraries you'll be using throughout this notebook:

In [6]:
from kfp.v2 import compiler, dsl
from kfp.v2.dsl import pipeline

from google.cloud import aiplatform
from google_cloud_pipeline_components import aiplatform as gcc_aip

### Define constants
The last thing you need to do before building your pipeline is define some constant variables. `PIPELINE_ROOT` is the Cloud Storage path where the artifacts created by your pipeline will be written. You're using `us-central1` as the region here, but if you used a different region when you created your bucket, update the `REGION` variable in the code below:

In [7]:
PATH=%env PATH
%env PATH={PATH}:/home/jupyter/.local/bin
REGION="us-central1"

PIPELINE_ROOT = f"{BUCKET_NAME}/pipeline_root/"
PIPELINE_ROOT

env: PATH=/usr/local/cuda/bin:/opt/conda/bin:/opt/conda/condabin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/jupyter/.local/bin


'gs://qwiklabs-gcp-03-829279d2a7be-bucket/pipeline_root/'

After running the code above, you should see the root directory for your pipeline printed. This is the Cloud Storage location where the artifacts from your pipeline will be written. It will be in the format of `gs://YOUR-BUCKET-NAME/pipeline_root/`

## Configuring a custom model training job
Before you set up your pipeline, you need to write the code for your custom model training job. To train the model, you'll use the UCI Machine Learning [Dry beans dataset](https://archive.ics.uci.edu/ml/datasets/Dry+Bean+Dataset), from: KOKLU, M. and OZKAN, I.A., (2020), "Multiclass Classification of Dry Beans Using Computer Vision and Machine Learning Techniques."In Computers and Electronics in Agriculture, 174, 105507. [DOI](https://www.sciencedirect.com/science/article/abs/pii/S0168169919311573?via%3Dihub).

Your first pipeline step will create a managed dataset in Vertex AI using a BigQuery table that contains a version of this beans data. The dataset will be passed as input to your training job. In your training code, you'll have access to environment variable to access this managed dataset.

Here's how you'll set up your custom training job:

* Write a Scikit-learn `DecisionTreeClassifier` model to classify bean types in your data.
* Package the training code in a Docker container and push it to Container Registry

From there, you'll be able to start a Vertex AI Training job directly from your pipeline. Let's get started!

### Define your training code in a Docker container
Run the following to set up a directory where you'll add your containerized code:

In [20]:
!mkdir traincontainer
!touch traincontainer/Dockerfile
!mkdir traincontainer/trainer
!touch traincontainer/trainer/train.py

After running those commands, you should see a directory called traincontainer/ created on the left (you may need to click the refresh icon to see it). You'll see the following in your traincontainer/ directory:

```
+ Dockerfile
+ trainer/
    + train.py
```
Your first step in containerizing your code is to create a Dockerfile. In your Dockerfile you'll include all the commands needed to run your image. It'll install all the libraries you're using and set up the entry point for your training code. Run the following to create a Dockerfile file locally in your notebook:

In [23]:
%%writefile traincontainer/Dockerfile
FROM gcr.io/deeplearning-platform-release/sklearn-cpu.0-23
WORKDIR /

# Copies the trainer code to the docker image.
COPY trainer /trainer

RUN pip install sklearn google-cloud-bigquery joblib pandas google-cloud-storage

# Sets up the entry point to invoke the trainer.
ENTRYPOINT ["python", "-m", "trainer.train"]

Overwriting traincontainer/Dockerfile


Run the following to create `train.py` file. This retrieves the data from your managed dataset, puts it into a Pandas DataFrame, trains a Scikit-learn model, and uploads the trained model to Cloud Storage:

In [24]:
%%writefile traincontainer/trainer/train.py
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import roc_curve
from sklearn.model_selection import train_test_split
from google.cloud import bigquery
from google.cloud import storage
from joblib import dump

import os
import pandas as pd

bqclient = bigquery.Client()
storage_client = storage.Client()

def download_table(bq_table_uri: str):
    prefix = "bq://"
    if bq_table_uri.startswith(prefix):
        bq_table_uri = bq_table_uri[len(prefix):]

    table = bigquery.TableReference.from_string(bq_table_uri)
    rows = bqclient.list_rows(
        table,
    )
    return rows.to_dataframe(create_bqstorage_client=False)

# These environment variables are from Vertex AI managed datasets
training_data_uri = os.environ["AIP_TRAINING_DATA_URI"]
test_data_uri = os.environ["AIP_TEST_DATA_URI"]

# Download data into Pandas DataFrames, split into train / test
df = download_table(training_data_uri)
test_df = download_table(test_data_uri)
labels = df.pop("Class").tolist()
data = df.values.tolist()
test_labels = test_df.pop("Class").tolist()
test_data = test_df.values.tolist()

# Define and train the Scikit model
skmodel = DecisionTreeClassifier()
skmodel.fit(data, labels)
score = # TODO 2: Your code here
print('accuracy is:',score)

# Save the model to a local file
dump(skmodel, "model.joblib")

# Upload the saved model file to GCS
bucket = storage_client.get_bucket("YOUR_GCS_BUCKET")
model_directory = os.environ["AIP_MODEL_DIR"]
storage_path = os.path.join(model_directory, "model.joblib")
blob = storage.blob.Blob.from_string(storage_path, client=storage_client)
blob.upload_from_filename("model.joblib")

Overwriting traincontainer/trainer/train.py


Run the following to replace YOUR_GCS_BUCKET from the script above with the name of your Cloud Storage bucket:

In [25]:
BUCKET = BUCKET_NAME[5:] # Trim the 'gs://' before adding to train script
!sed -i -r 's@YOUR_GCS_BUCKET@'"$BUCKET"'@' traincontainer/trainer/train.py

You can also do this manually if you'd prefer. If you do, make sure not to include the gs:// in your bucket name when you update the script.

Now your training code is in a Docker container and you're ready to run training in the Cloud.

### Push container to Container Registry

With your training code complete, you're ready to push this to Google Container Registry. Later when you configure the training component of your pipeline, you'll point Vertex AI Pipelines at this container.

Replace `YOUR_PROJECT_ID` with your PROJECT_ID in the IMAGE_URI.

In [41]:
!PROJECT_ID=$(gcloud config get-value project)
!IMAGE_URI="gcr.io/YOUR_PROJECT_ID/scikit:v1"

Again, replace `YOUR_PROJECT_ID` with your PROJECT_ID and build your container by running the following:

In [86]:
!docker build ./traincontainer -t gcr.io/YOUR_PROJECT_ID/scikit:v1

Sending build context to Docker daemon  9.728kB
Step 1/5 : FROM gcr.io/deeplearning-platform-release/sklearn-cpu.0-23
latest: Pulling from deeplearning-platform-release/sklearn-cpu.0-23

[1B88808835: Pulling fs layer 
[1Bcf64a9ea: Pulling fs layer 
[1Bc2aa40d3: Pulling fs layer 
[1B2e264020: Pulling fs layer 
[1Bb700ef54: Pulling fs layer 
[1B65f0ab42: Pulling fs layer 
[1Bad4dbd7d: Pulling fs layer 
[1B609522d8: Pulling fs layer 
[1B2168e631: Pulling fs layer 
[1Bbb01bc78: Pulling fs layer 
[1Bda654f5c: Pulling fs layer 
[1B40a4f176: Pulling fs layer 
[1Beec75c0c: Pulling fs layer 
[1B728eb7d7: Pulling fs layer 
[1B30f752a4: Pulling fs layer 
[1B5f7af5b1: Pulling fs layer 
[1Bb2dca45b: Pulling fs layer 
[1B864a46df: Pulling fs layer 
[1B31a7db85: Pulling fs layer 
[1Badbf67c9: Pull complete 403kB/1.403kBB20A[2K[17A[2K[20A[2K[17A[2K[20A[2K[20A[2K[18A[2K[16A[2K[18A[2K[17A[2K[18A[2K[17A[2K[20A[2K[17A[2K[14A[2K[17A[2K[14A[2K[17A[2K[1

Finally, push the container to Container Registry:

In [88]:
!docker push gcr.io/$PROJECT_ID/scikit:v1

The push refers to repository [gcr.io/qwiklabs-gcp-03-829279d2a7be/scikit]

[1B680fc625: Preparing 
[1B1b61b640: Preparing 
[1B4f8407f3: Preparing 
[1B368e27aa: Preparing 
[1Bdd2b9e38: Preparing 
[1Be29d8d24: Preparing 
[1B95a574c8: Preparing 
[1B10151b48: Preparing 
[1Bc089358e: Preparing 
[1B9b36546a: Preparing 
[1B82ce8d0b: Preparing 
[1B467ac3a5: Preparing 
[1B91c31559: Preparing 
[1Bae11254c: Preparing 
[1B2bcbe281: Preparing 
[1B4c112e39: Preparing 
[1B048fd290: Preparing 
[1Bbf18a086: Preparing 
[1B7a45d8d8: Preparing 
[1B6651fb01: Preparing 
[1Bd5cafaa0: Preparing 
[2Bd5cafaa0: Mounted from deeplearning-platform-release/sklearn-cpu.0-23 [18A[2K[19A[2K[17A[2K[21A[2K[16A[2K[15A[2K[13A[2K[12A[2K[10A[2K[11A[2K[9A[2K[5A[2K[8A[2K[6A[2K[3A[2K[4A[2K[1A[2K[2A[2Kv1: digest: sha256:4bc18a9be14b00f020df23d14b668172bf91f9f831494509a4d77e550ffce3f4 size: 4916


Navigate to the [Container Registry section](https://console.cloud.google.com/gcr/) of your Cloud console to verify your container is there.

## Configuring a batch prediction job
The last step of your pipeline will run a batch prediction job. For this to work, you need to provide a CSV file in Cloud Storage that contains the examples you want to get predictions on. You'll create this CSV file in your notebook and copy it to Cloud Storage using the `gcloud storage` command line tool.

### Copying batch prediction examples to Cloud Storage
The following file contains 3 examples from each class in your beans dataset. The example below doesn't include the `Class` column since that is what your model will be predicting. Run the following to create this CSV file locally in your notebook:

In [89]:
%%writefile batch_examples.csv
Area,Perimeter,MajorAxisLength,MinorAxisLength,AspectRation,Eccentricity,ConvexArea,EquivDiameter,Extent,Solidity,roundness,Compactness,ShapeFactor1,ShapeFactor2,ShapeFactor3,ShapeFactor4
23288,558.113,207.567738,143.085693,1.450653336,0.7244336162,23545,172.1952453,0.8045881703,0.9890847314,0.9395021523,0.8295857874,0.008913077034,0.002604069884,0.6882125787,0.9983578734
23689,575.638,205.9678003,146.7475015,1.403552348,0.7016945718,24018,173.6714472,0.7652721693,0.9863019402,0.8983750474,0.8431970773,0.00869465998,0.002711119968,0.7109813112,0.9978994889
23727,559.503,189.7993849,159.3717704,1.190922235,0.5430731512,24021,173.8106863,0.8037601626,0.9877607094,0.952462433,0.9157600082,0.007999299741,0.003470231343,0.8386163926,0.9987269085
31158,641.105,212.0669751,187.1929601,1.132879009,0.4699241567,31474,199.1773023,0.7813134733,0.989959967,0.9526231013,0.9392188582,0.0068061806,0.003267009878,0.8821320637,0.9993488983
32514,649.012,221.4454899,187.1344232,1.183349841,0.5346736437,32843,203.4652564,0.7849831,0.9899826447,0.9700068737,0.9188051492,0.00681077351,0.002994124691,0.8442029022,0.9989873701
33078,659.456,235.5600775,178.9312328,1.316483846,0.6503915309,33333,205.2223615,0.7877214708,0.9923499235,0.9558229607,0.8712102818,0.007121351881,0.002530662194,0.7590073551,0.9992209221
33680,683.09,256.203255,167.9334938,1.525623324,0.7552213942,34019,207.081404,0.80680321,0.9900349805,0.9070392732,0.8082699962,0.007606985006,0.002002710402,0.6533003868,0.9966903078
33954,716.75,277.3684803,156.3563259,1.773951126,0.825970469,34420,207.9220419,0.7994819873,0.9864613597,0.8305492781,0.7496238998,0.008168948587,0.001591181142,0.5619359911,0.996846984
36322,719.437,272.0582306,170.8914975,1.591993952,0.7780978465,36717,215.0502424,0.7718560075,0.9892420405,0.8818487005,0.7904566678,0.007490177594,0.001803782407,0.6248217437,0.9947124371
36675,742.917,285.8908964,166.8819538,1.713132487,0.8119506999,37613,216.0927123,0.7788277766,0.9750618137,0.8350248381,0.7558572692,0.0077952528,0.001569528272,0.5713202115,0.9787472145
37454,772.679,297.6274753,162.1493177,1.835514817,0.8385619338,38113,218.3756257,0.8016695205,0.9827093118,0.7883332637,0.7337213257,0.007946480356,0.001420623993,0.5383469838,0.9881438654
37789,766.378,313.5680678,154.3409867,2.031657789,0.8704771226,38251,219.3500608,0.7805870567,0.9879218844,0.8085170916,0.6995293312,0.008297866252,0.001225659709,0.4893412853,0.9941740339
47883,873.536,327.9986493,186.5201272,1.758516115,0.822571799,48753,246.9140116,0.7584464543,0.9821549443,0.7885506623,0.7527897207,0.006850002074,0.00135695419,0.5666923636,0.9965376533
49777,861.277,300.7570338,211.6168613,1.42123379,0.7105823885,50590,251.7499649,0.8019106536,0.9839296304,0.843243269,0.8370542883,0.00604208839,0.001829706116,0.7006598815,0.9958014989
49882,891.505,357.1890036,179.8346914,1.986207449,0.8640114945,51042,252.0153467,0.7260210171,0.9772736178,0.7886896753,0.7055518063,0.007160679276,0.001094585314,0.4978033513,0.9887407248
53249,919.923,325.3866286,208.9174205,1.557489212,0.7666552108,54195,260.3818974,0.6966846347,0.9825445152,0.7907120655,0.8002231025,0.00611066177,0.001545654241,0.6403570138,0.9973491406
61129,964.969,369.3481688,210.9473449,1.750902193,0.8208567513,61796,278.9836198,0.7501135067,0.9892064211,0.8249553283,0.7553404711,0.006042110436,0.001213219664,0.5705392272,0.9989583843
61918,960.372,353.1381442,224.0962377,1.575832543,0.7728529173,62627,280.7782864,0.7539207091,0.9886790043,0.8436218213,0.7950947556,0.005703319619,0.00140599258,0.6321756704,0.9962029945
141953,1402.05,524.2311633,346.3974998,1.513380332,0.7505863011,143704,425.1354762,0.7147107987,0.9878152313,0.9074598849,0.8109694843,0.003692991084,0.0009853172185,0.6576715044,0.9953071199
145285,1440.991,524.9567463,353.0769977,1.486805285,0.7400216694,146709,430.0960442,0.7860466375,0.9902937107,0.8792413513,0.8192980608,0.003613289371,0.001004269363,0.6712493125,0.9980170255
146153,1476.383,526.1933264,356.528288,1.475881001,0.7354662103,149267,431.3789276,0.7319360978,0.9791380546,0.8425962592,0.8198107159,0.003600290972,0.001003163512,0.6720896099,0.991924286

Writing batch_examples.csv


Then, copy the file to your Cloud Storage bucket:

In [90]:
!gcloud storage cp batch_examples.csv $BUCKET_NAME

Copying file://batch_examples.csv [Content-Type=text/csv]...
/ [1 files][  4.0 KiB/  4.0 KiB]                                                
Operation completed over 1 objects/4.0 KiB.                                      


You'll reference this file in the next step when you define your pipeline.

### Building a pipeline with pre-built components
Now that your training code is in the cloud, you're ready to call it from your pipeline. The pipeline you'll define will use three pre-built components from the `google_cloud_pipeline_components` library you installed earlier. These predefined components simplify the code you need to write to set up your pipeline, and will allow us to use Vertex AI services like model training and batch prediction.

If you can't find a pre-built component for the task you want to accomplish, you can define your own Python-based custom components. To see an example, check out [this codelab](https://codelabs.developers.google.com/vertex-pipelines-intro#5).

Here's what your three-step pipeline will do:

* Create a managed dataset in Vertex AI.
* Run a training job on Vertex AI using the custom container you set up.
* Run a batch prediction job on your trained Scikit-learn classification model.

### Define your pipeline
Because you're using pre-built components, you can set up your entire pipeline in the pipeline definition.

In [91]:
@pipeline(name="automl-beans-custom",
                  pipeline_root=PIPELINE_ROOT)
def pipeline(
    bq_source: str = "bq://sara-vertex-demos.beans_demo.large_dataset",
    bucket: str = BUCKET_NAME,
    project: str = PROJECT_ID,
    gcp_region: str = REGION,
    bq_dest: str = "",
    container_uri: str = "",
    batch_destination: str = ""
):
    dataset_create_op = gcc_aip.TabularDatasetCreateOp(
        display_name="tabular-beans-dataset",
        bq_source=bq_source,
        project=project,
        location=gcp_region
    )

    training_op = gcc_aip.CustomContainerTrainingJobRunOp(
        display_name="pipeline-beans-custom-train",
        container_uri=container_uri,
        project=project,
        location=gcp_region,
        dataset=dataset_create_op.outputs["dataset"],
        staging_bucket=bucket,
        training_fraction_split=0.8,
        validation_fraction_split=0.1,
        test_fraction_split=0.1,
        bigquery_destination=bq_dest,
        model_serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.0-24:latest",
        model_display_name="scikit-beans-model-pipeline",
        machine_type="n1-standard-4",
    )
    batch_predict_op = gcc_aip.ModelBatchPredictOp(
        project=project,
        location=gcp_region,
        job_display_name="beans-batch-predict",
        model=training_op.outputs["model"],
        gcs_source_uris=["{0}/batch_examples.csv".format(BUCKET_NAME)],
        instances_format="csv",
        gcs_destination_output_uri_prefix=batch_destination,
        machine_type="n1-standard-4"
    )

### Compile and run the pipeline
With your pipeline defined, you're ready to compile it. The following will generate a JSON file that you'll use to run the pipeline:

In [92]:
compiler.Compiler().compile(
    pipeline_func=pipeline, package_path="custom_train_pipeline.json"
)



Next, create a `TIMESTAMP` variable. You'll use this in your job ID:

In [93]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

Then define your pipeline job, passing in a few project-specific parameters:

In [94]:
pipeline_job = aiplatform.PipelineJob(
    display_name="custom-train-pipeline",
    template_path="custom_train_pipeline.json",
    job_id="custom-train-pipeline-{0}".format(TIMESTAMP),
    parameter_values={
        "project": PROJECT_ID,
        "bucket": BUCKET_NAME,
        "bq_dest": "bq://{0}".format(PROJECT_ID),
        "container_uri": "gcr.io/{0}/scikit:v1".format(PROJECT_ID),
        "batch_destination": "{0}/batchpredresults".format(BUCKET_NAME)
    },
    enable_caching=True,
)

Finally, run the job to create a new pipeline execution:

In [95]:
# TODO 3: Your code here

INFO:google.cloud.aiplatform.pipeline_jobs:Creating PipelineJob
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob created. Resource name: projects/66861668564/locations/us-central1/pipelineJobs/custom-train-pipeline-20220412152931
INFO:google.cloud.aiplatform.pipeline_jobs:To use this PipelineJob in another session:
INFO:google.cloud.aiplatform.pipeline_jobs:pipeline_job = aiplatform.PipelineJob.get('projects/66861668564/locations/us-central1/pipelineJobs/custom-train-pipeline-20220412152931')
INFO:google.cloud.aiplatform.pipeline_jobs:View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/custom-train-pipeline-20220412152931?project=66861668564


After running this cell, you should see logs with a link to view the pipeline run in your console. Navigate to that link. You can also access it by opening your [Pipelines dashboard](https://console.cloud.google.com/vertex-ai/pipelines). This pipeline will take __35-40 minutes__ to run, but you can continue to the next step before it completes. Next you'll learn more about what's happening in each of these pipeline steps.

For further instructions, please refer to the lab manual.