**Check environment dependencies**

In [1]:
! python3 -c "import kfp; print('KFP SDK version: {}'.format(kfp.__version__))"
! python3 -c "import google_cloud_pipeline_components; print('google_cloud_pipeline_components version: {}'.format(google_cloud_pipeline_components.__version__))"
! python3 -c "import sklearn; print('Sklearn version: {}'.format(sklearn.__version__))"

KFP SDK version: 1.8.14
google_cloud_pipeline_components version: 1.0.26
Sklearn version: 1.0.2


In [8]:
from datetime import datetime

import google.cloud.aiplatform as aip
from google.cloud import aiplatform
import kfp
from kfp.v2 import dsl, compiler
from kfp.v2.google.client import AIPlatformClient

# custom code for data processing and model training
from utils import create_data, train_model

**Define environment variables**

User should update the <code>BUCKET_NAME</code>. The <code>PROJECT_ID</code> is picked up based on the gcloud configuration

In [9]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET_NAME = "black-friday-dataset-test"  # modify
BUCKET_URI = f"gs://{BUCKET_NAME}"
REGION = "us-central1"
PIPELINE_ROOT = "{}/pipeline_root/black_friday".format(BUCKET_URI)
DISPLAY_NAME = "black-friday-" + TIMESTAMP
PACKAGE_PATH = "pipeline.json"
project_id_shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
PROJECT_ID = project_id_shell_output[0]

## Data processing and model training

**Initialize the client**

In [10]:
aip.init(project=PROJECT_ID, staging_bucket=BUCKET_URI)

**Define the pipeline**

In [5]:
@dsl.pipeline(
    pipeline_root=PIPELINE_ROOT,
    name="black-friday-pipeline",
)
def pipeline(
    train_file_x: str,
    train_file_y: str,
    test_file_x: str,
    test_file_y: str,
    best_params_file: str,
    metrics_file: str,
    num_iterations: int,
    hp_tune: bool,
):

    create_data_task = create_data(
        project_id="mwpmltr", bucket_name=BUCKET_NAME, dataset_id="black_friday"
    )

    train_model_task = train_model(
        hp_tune=hp_tune,
        project_id="mwpmltr",
        bucket_name=BUCKET_NAME,
        num_iterations=num_iterations,
        train_file_x=create_data_task.outputs["train_file_x"],
        test_file_x=create_data_task.outputs["test_file_x"],
        train_file_y=create_data_task.outputs["train_file_y"],
        test_file_y=create_data_task.outputs["test_file_y"],
    )

In [6]:
compiler.Compiler().compile(pipeline_func=pipeline, package_path=PACKAGE_PATH)



**Submit the pipeline to Vertex AI Pipeline**

In [7]:
job = aip.PipelineJob(
    display_name=DISPLAY_NAME,
    template_path=PACKAGE_PATH,
    pipeline_root=PIPELINE_ROOT,
    parameter_values={
        "train_file_x": "x_train.csv",
        "train_file_y": "y_train.csv",
        "test_file_x": "x_test.csv",
        "test_file_y": "y_test.csv",
        "best_params_file": "best_params.json",
        "metrics_file": "metrics.json",
        "num_iterations": 100,
        "hp_tune": True,
    },
)

job.run()

Creating PipelineJob
PipelineJob created. Resource name: projects/55590906972/locations/us-central1/pipelineJobs/black-friday-pipeline-20221102205557
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/55590906972/locations/us-central1/pipelineJobs/black-friday-pipeline-20221102205557')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/black-friday-pipeline-20221102205557?project=55590906972
PipelineJob projects/55590906972/locations/us-central1/pipelineJobs/black-friday-pipeline-20221102205557 current state:
PipelineState.PIPELINE_STATE_PENDING
PipelineJob projects/55590906972/locations/us-central1/pipelineJobs/black-friday-pipeline-20221102205557 current state:
PipelineState.PIPELINE_STATE_PENDING
PipelineJob projects/55590906972/locations/us-central1/pipelineJobs/black-friday-pipeline-20221102205557 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/55590906972/location

**Copy the model to your local directory**

Get the URI of the model file by clicking through the Vertex AI Pipeline UI. Select the model artifact and follow the path to the model

In [8]:
! gsutil cp gs://black-friday-dataset-test/pipeline_root/black_friday/55590906972/black-friday-pipeline-20221102205557/train-model_-8375679907920871424/model_file.pkl model.pkl



Updates are available for some Google Cloud CLI components.  To install them,
please run:
  $ gcloud components update

Copying gs://black-friday-dataset-test/pipeline_root/black_friday/55590906972/black-friday-pipeline-20221102205557/train-model_-8375679907920871424/model_file.pkl...
==> NOTE: You are downloading one or more large file(s), which would            
run significantly faster if you enabled sliced object downloads. This
feature is enabled by default but requires that compiled crcmod be
installed (see "gsutil help crcmod").

| [1 files][519.2 MiB/519.2 MiB]   1001 KiB/s                                   
Operation completed over 1 objects/519.2 MiB.                                    


## Model Deployment

**Set environment variables for command line arguments**

For custom prediction routines, a docker image must be provided

In [5]:
%env PROJECT_ID={PROJECT_ID}
%env REGION={REGION}
%env REPOSITORY=black-friday-v1
%env IMAGE=black-friday-image

env: PROJECT_ID=mwpmltr
env: REGION=us-central1
env: REPOSITORY=black-friday-v1
env: IMAGE=black-friday-image


In [6]:
# build image
!docker build --tag=$REGION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE .

Sending build context to Docker daemon  544.5MB
Step 1/6 : FROM python:3.9-slim
 ---> f550e60adaa9
Step 2/6 : WORKDIR /app
 ---> Using cache
 ---> 93c16a38e41e
Step 3/6 : COPY . /app
 ---> 24e1f2494e63
Step 4/6 : RUN pip3 install scikit-learn==1.1.3 gunicorn flask flask-cors
 ---> Running in 2ff0371183d3
Collecting scikit-learn==1.1.3
  Downloading scikit_learn-1.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 30.8/30.8 MB 44.2 MB/s eta 0:00:00
Collecting gunicorn
  Downloading gunicorn-20.1.0-py3-none-any.whl (79 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.5/79.5 KB 12.6 MB/s eta 0:00:00
Collecting flask
  Downloading Flask-2.2.2-py3-none-any.whl (101 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 101.5/101.5 KB 14.3 MB/s eta 0:00:00
Collecting flask-cors
  Downloading Flask_Cors-3.0.10-py2.py3-none-any.whl (14 kB)
Collecting scipy>=1.3.2
  Downloading scipy-1.9.3-cp39-cp39-manylinux_2_17_x86_64.manylinux20

In [11]:
# import os 
# os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'rtc_service_account_key2.json'
# ! gcloud auth activate-service-account rtcrichard@mwpmltr.iam.gserviceaccount.com --key-file=rtc_service_account_key2.json

In [12]:
# create repository in artifact repository
! gcloud artifacts repositories create $REPOSITORY  \
                             --repository-format=docker \
                             --location=$REGION

[1;31mERROR:[0m (gcloud.artifacts.repositories.create) ALREADY_EXISTS: the repository already exists


In [7]:
# push docker image to the newly created artifact repository
! docker push $REGION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE

Using default tag: latest
The push refers to repository [us-central1-docker.pkg.dev/mwpmltr/black-friday-v1/black-friday-image]

[1Be4e6a48f: Preparing 
[1B1f48f276: Preparing 
[1Ba62c7f75: Preparing 
[1Be1641be0: Preparing 
[1Be0b737bd: Preparing 
[1Bbe188a46: Preparing 
[1Bb8372e59: Preparing 
[8Be4e6a48f: Pushed     405MB/400MBMB[6A[2K[1A[2K[8A[2K[8A[2K[8A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[8A[2K[8A[2K[8A[2K[8A[2K[8A[2K[8A[2K[8A[2K[8A[2K[8A[2K[8A[2K[8A[2K[7A[2K[7A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[8A[2K[7A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[8A[2K[7A[2K[

In [8]:
# upload model to Vertex AI  model registry
! gcloud ai models upload \
  --region=$REGION \
  --display-name=black-friday-model \
  --container-image-uri=$REGION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE \
  --container-ports=5005 \
  --container-health-route=/healthz \
  --container-predict-route=/predict

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
Waiting for operation [3834895690652188672]...done.                            


In [9]:
# list models to double check
!gcloud ai models list \
  --region=us-central1 \
  --filter=display_name=black-friday-model

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
MODEL_ID             DISPLAY_NAME
2448194580638597120  black-friday-model
1954628208976461824  black-friday-model


In [10]:
# create a Vertex AI endpoint
!gcloud ai endpoints create \
  --region=us-central1 \
  --display-name=black-friday

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
Waiting for operation [4693957319573110784]...done.                            
Created Vertex AI endpoint: projects/55590906972/locations/us-central1/endpoints/3469676067214589952.


**Deploy the model to the endpoint**

The model endpoint is collected from the previous cell. The model id is collected from the list of models

In [11]:

!gcloud ai endpoints deploy-model 3469676067214589952 \
  --region=us-central1 \
  --model=2448194580638597120 \
  --display-name=black-friday-model \
  --machine-type=n1-standard-4 \
  --min-replica-count=1 \
  --max-replica-count=2 

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
Waiting for operation [5178094279515439104]...done.                            
Deployed a model to the endpoint 3469676067214589952. Id of the deployed model: 3031331567545876480.


## Sample Prediction

In [1]:
import pandas as pd
data = pd.read_csv("gs://black-friday-dataset-test/pipeline_root/black_friday/55590906972/black-friday-pipeline-20221102205557/create-data_847692128933904384/test_file_x", header=None)

In [5]:
ENDPOINT_ID="3469676067214589952"
PROJECT_ID="55590906972"
REGION= "us-central1"

In [11]:
def endpoint_predict_sample(
    project: str, location: str, instances: list, endpoint: str
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint(endpoint)

    prediction = endpoint.predict(instances=instances)
    print(prediction)
    return prediction

In [12]:
endpoint_predict_sample(project=PROJECT_ID, location=REGION, instances=data.iloc[0].tolist(), endpoint=ENDPOINT_ID)

Prediction(predictions=[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], deployed_model_id='3031331567545876480', model_version_id='1', model_resource_name='projects/55590906972/locations/us-central1/models/2448194580638597120', explanations=None)


Prediction(predictions=[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]], deployed_model_id='3031331567545876480', model_version_id='1', model_resource_name='projects/55590906972/locations/us-central1/models/2448194580638597120', explanations=None)