
---
---
**Changes In Progress**

Every attempt is being made to ensure the public version of this notebook runs without error while the follow enhancements are being made:
- The workflow of the notebook is being adapted to a Kubeflow Pipeline running on Vertex AI Pipelines
- Including Evaluation data within Vertex AI Model Registry
- Update the Vertex AI Experiments integration which has been great simplified within the API over the past few months
- add client library reference links to each section

Order: notebook 05 and 05a will be updated first.  Then 05b-05i will follow quickly.

This note will be removed once these changes are complete.

---
---

# 05f - Vertex AI > Training > Training Pipelines - With Custom Container

**05 Series Overview**

>**NOTE:** The notebooks in the `05 - TensorFlow` series demonstrate training, serving and operations for TensorFlow models and take advantage of [Vertex AI TensorBoard](https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview) to track training across experiments.  Running these notebooks will create a Vertex AI TensorBoard instance which previously (before August 2023) had a subscription cost but is now priced based on storage of which this notebook will create minimal size (<2MB). - [Vertex AI Pricing](https://cloud.google.com/vertex-ai/pricing#tensorboard).

Where a model gets trained is where it consumes computing resources.  With Vertex AI, you have choices for configuring the computing resources available at training.  This notebook is an example of an execution environment.  When it was set up there were choices for machine type and accelerators (GPUs).  

In the [05 - Vertex AI Custom Model - TensorFlow - in Notebook](./05%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20in%20Notebook.ipynb) notebook, the model training happened directly in the notebook.  The model was then imported to Vertex AI and deployed to an endpoint for online predictions. 

In this `05a-05i` series of demonstrations, the same model is trained using managed computing resources in Vertex AI Training as managed jobs.  These jobs will be demonstrated as:

-  Custom Job that trains and saves (to GCS) a model from a python script (`05a`), python source distribution (`05b`), and custom container (`05c`)
-  Training Pipeline that trains and registers a model from a python script (`05d`), python source distribution (`05e`), and custom container (`05f`)
-  Hyperparameter Tuning Jobs from a python script (`05g`), python source distribution (`05h`), and custom container (`05i`)

**This Notebook (`05f`): An extension of `05c` as a Training Pipeline that saves the model to Vertex AI > Model Registry**

This notebook trains the same Tensorflow Keras model from [05 - Vertex AI Custom Model - TensorFlow - in Notebook](./05%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20in%20Notebook.ipynb) by first modifying and saving the training code to a Python script as shown in [05 - Vertex AI Custom Model - TensorFlow - Notebook to Script](./05%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Notebook%20to%20Script.ipynb). 

A custom container is built that contains the script and required Python libaries.  For more guidance on this process visit the tip notebook [Python Custom Containers](../Tips/Python%20Custom%20Containers.ipynb).  The script, along with a `requirement.txt` and `Dockerfile` are stored in GCS and then used by Cloud Build to extend the pre-built Vertex AI container and store the resulting image in Artifact Registry.  

Compared to the Custom Job in `05c` the primary considerations for this Training Pipeline are:
- the final model is registered to Vertex AI > Model Registry
- the training pipeline triggers a similar custom job in Vertex AI > Training

This job is launched using the Vertex AI client library:
- [Python Cloud Client Libraries](https://cloud.google.com/python/docs/reference)
    - [google-cloud-aiplatform](https://cloud.google.com/python/docs/reference/aiplatform/latest)
        - [`aiplatform` package](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform)
            - [`aiplatform.CustomContainerTrainingJob()`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomContainerTrainingJob#google_cloud_aiplatform_CustomContainerTrainingJob)
            - and run with the method [`.run()`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomContainerTrainingJob#google_cloud_aiplatform_CustomContainerTrainingJob_run)

**Vertex AI Training**

In Vertex AI Training you can run your training code as a job where you specify the compute resources to use. For tips on preparing code, running training jobs, and workflows for building custom containers with software and training code combined please visit these [tips notebooks](../Tips/readme.md) in this repository:
- [Python Packages](../Tips/Python%20Packages.ipynb)
- [Python Custom Containers](../Tips/Python%20Custom%20Containers.ipynb)
- [Python Training](../Tips/Python%20Training.ipynb)

<p align="center" width="100%">
    <img src="../architectures/overview/training.png" width="45%">
    &nbsp; &nbsp; &nbsp; &nbsp;
    <img src="../architectures/overview/training2.png" width="45%">
</p>

**Prerequisites:**
-  [01 - BigQuery - Table Data Source](../01%20-%20Data%20Sources/01%20-%20BigQuery%20-%20Table%20Data%20Source.ipynb)
-  Understanding:
    -  Model overview in [05 - Vertex AI Custom Model - TensorFlow - in Notebook](./05%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20in%20Notebook.ipynb)
    -  Convert notebook code to Python Script in [05 - Vertex AI Custom Model - TensorFlow - Notebook to Script](./05%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Notebook%20to%20Script.ipynb)

**Resources:**
- [Vertex AI Custom Container For Training](https://cloud.google.com/vertex-ai/docs/training/containers-overview)

**Conceptual Flow & Workflow**

<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/05f_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/05f_console.png" width="45%">
</p>

---
## Setup

### Package Installs (if needed)

This notebook uses the Python Clients for
- Google Service Usage
    - to enable APIs (Artifact Registry and Cloud Build)
- Artifact Registry
    - to create repositories for Python packages and Docker containers
- Cloud Build
    - To build custom Docker containers

The cells below check to see if the required Python libraries are installed.  If any are not it will print a message to do the install with the associated pip command to use.  These installs must be completed before continuing this notebook.

In [2]:
try:
    import google.cloud.service_usage_v1
except ImportError:
    print('You need to pip install google-cloud-service-usage')
    !pip install google-cloud-service-usage -q

In [3]:
try:
    import google.cloud.artifactregistry_v1
except ImportError:
    print('You need to pip install google-cloud-artifact-registry')
    !pip install google-cloud-artifact-registry -q

In [4]:
try:
    import google.cloud.devtools.cloudbuild
except ImportError:
    print('You need to pip install google-cloud-build')
    !pip install google-cloud-build

### Environment

inputs:

In [5]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [6]:
REGION = 'us-central1'
EXPERIMENT = '05f'
SERIES = '05'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Resources
BASE_IMAGE = 'gcr.io/deeplearning-platform-release/tf-cpu.2-3'
DEPLOY_IMAGE ='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest'
TRAIN_COMPUTE = 'n1-standard-4'
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters
EPOCHS = 10
BATCH_SIZE = 100

packages:

In [7]:
from google.cloud import aiplatform
from datetime import datetime
import pkg_resources
from IPython.display import Markdown as md
from google.cloud import service_usage_v1
from google.cloud.devtools import cloudbuild_v1
from google.cloud import artifactregistry_v1
from google.cloud import storage
from google.cloud import bigquery
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np
import pandas as pd

clients:

In [12]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client(project=PROJECT_ID)
gcs = storage.Client(project=PROJECT_ID)
su_client = service_usage_v1.ServiceUsageClient()
ar_client = artifactregistry_v1.ArtifactRegistryClient()
cb_client = cloudbuild_v1.CloudBuildClient()

parameters:

In [13]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET = PROJECT_ID
URI = f"gs://{BUCKET}/{SERIES}/{EXPERIMENT}"
DIR = f"temp/{EXPERIMENT}"

In [14]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

List the service accounts current roles:

In [15]:
!gcloud projects get-iam-policy $PROJECT_ID --filter="bindings.members:$SERVICE_ACCOUNT" --format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/bigquery.admin
roles/owner
roles/run.admin
roles/storage.objectAdmin


>Note: If the resulting list is missing [roles/storage.objectAdmin](https://cloud.google.com/storage/docs/access-control/iam-roles) then [revisit the setup notebook](../00%20-%20Setup/00%20-%20Environment%20Setup.ipynb#permissions) and add this permission to the service account with the provided instructions.

environment:

In [16]:
!rm -rf {DIR}
!mkdir -p {DIR}

Experiment Tracking:

In [17]:
FRAMEWORK = 'tf'
TASK = 'classification'
MODEL_TYPE = 'dnn'
EXPERIMENT_NAME = f'experiment-{SERIES}-{EXPERIMENT}-{FRAMEWORK}-{TASK}-{MODEL_TYPE}'
RUN_NAME = f'run-{TIMESTAMP}'

### Enable APIs

Using Cloud Build and Artifact Registry requires enabling these APIs for the Google Cloud Project.

Options for enabeling these.  In this notebook option 2 is used.
 1. Use the APIs & Services page in the console: https://console.cloud.google.com/apis
     - `+ Enable APIs and Services`
     - Search for Cloud Build and Enable
     - Search for Artifact Registry and Enable
 2. Use [Google Service Usage](https://cloud.google.com/service-usage/docs) API from Python
     - [Python Client For Service Usage](https://github.com/googleapis/python-service-usage)
     - [Python Client Library Documentation](https://cloud.google.com/python/docs/reference/serviceusage/latest)
     
The following code cells use the Service Usage Client to:
- get the state of the service
- if 'DISABLED':
    - Try enabling the service and return the state after trying
- if 'ENABLED' print the state for confirmation

#### Artifact Registry

In [18]:
artifactregistry = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/artifactregistry.googleapis.com'
    )
).state.name


if artifactregistry == 'DISABLED':
    print(f'Artifact Registry is currently {artifactregistry} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/artifactregistry.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Artifact Registry is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Artifact Registry already enabled for project: {PROJECT_ID}')

Artifact Registry already enabled for project: statmike-mlops-349915


#### Cloud Build

In [19]:
cloudbuild = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/cloudbuild.googleapis.com'
    )
).state.name


if cloudbuild == 'DISABLED':
    print(f'Cloud Build is currently {cloudbuild} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/cloudbuild.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Cloud Build is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Cloud Build already enabled for project: {PROJECT_ID}')

Cloud Build already enabled for project: statmike-mlops-349915


---
## Get Vertex AI Experiments Tensorboard Instance Name
[Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview) has managed [Tensorboard](https://www.tensorflow.org/tensorboard) instances that you can track Tensorboard Experiments (a training run or hyperparameter tuning sweep).  

The training job will show up as an experiment for the Tensorboard instance and have the same name as the training job ID.

This code checks to see if a Tensorboard Instance has been created in the project, retrieves it if so, creates it otherwise:

In [20]:
tb = aiplatform.Tensorboard.list(filter=f"labels.series={SERIES}")
if tb:
    tb = tb[0]
else: 
    tb = aiplatform.Tensorboard.create(display_name = SERIES, labels = {'series' : f'{SERIES}'})

In [21]:
tb.resource_name

'projects/1026793852137/locations/us-central1/tensorboards/7876136041294331904'

---
## Setup Vertex AI Experiments

The code in this section initializes the experiment and starts a run that represents this notebook.  Throughout the notebook sections for model training and evaluation information will be logged to the experiment using:
- [.log_params](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_params)
- [.log_metrics](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_metrics)
- [.log_time_series_metrics](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_time_series_metrics)

In [22]:
aiplatform.init(experiment = EXPERIMENT_NAME, experiment_tensorboard = tb.resource_name)

---
## Training

### Python File for Training

This notebook trains the same Tensorflow Keras model from [05 - Vertex AI Custom Model - TensorFlow - in Notebook](./05%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20in%20Notebook.ipynb) by first modifying and saving the training code to a python script as shown in [05 - Vertex AI Custom Model - TensorFlow - Notebook to Script](./05%20-%20Vertex%20AI%20Custom%20Model%20-%20TensorFlow%20-%20Notebook%20to%20Script.ipynb) which stores the script in [`./code/train.py`](./code/train.py).

**Review the script:**

In [23]:
SCRIPT_PATH = './code/train.py'

with open(SCRIPT_PATH, 'r') as file:
    data = file.read()
md(f"```python\n\n{data}\n```")

```python


# package import
from tensorflow.python.framework import dtypes
from tensorflow_io.bigquery import BigQueryClient
import tensorflow as tf
from google.cloud import bigquery
from google.cloud import aiplatform
import argparse
import os
import sys

# import argument to local variables
parser = argparse.ArgumentParser()
# the passed param, dest: a name for the param, default: if absent fetch this param from the OS, type: type to convert to, help: description of argument
parser.add_argument('--epochs', dest = 'epochs', default = 10, type = int, help = 'Number of Epochs')
parser.add_argument('--batch_size', dest = 'batch_size', default = 32, type = int, help = 'Batch Size')
parser.add_argument('--var_target', dest = 'var_target', type=str)
parser.add_argument('--var_omit', dest = 'var_omit', type=str, nargs='*')
parser.add_argument('--project_id', dest = 'project_id', type=str)
parser.add_argument('--bq_project', dest = 'bq_project', type=str)
parser.add_argument('--bq_dataset', dest = 'bq_dataset', type=str)
parser.add_argument('--bq_table', dest = 'bq_table', type=str)
parser.add_argument('--region', dest = 'region', type=str)
parser.add_argument('--experiment', dest = 'experiment', type=str)
parser.add_argument('--series', dest = 'series', type=str)
parser.add_argument('--experiment_name', dest = 'experiment_name', type=str)
parser.add_argument('--run_name', dest = 'run_name', type=str)
args = parser.parse_args()

# clients
bq = bigquery.Client(project = args.project_id)
aiplatform.init(project = args.project_id, location = args.region)

# Vertex AI Experiment
if args.run_name in [run.name for run in aiplatform.ExperimentRun.list(experiment = args.experiment_name)]:
    expRun = aiplatform.ExperimentRun(run_name = args.run_name, experiment = args.experiment_name)
else:
    expRun = aiplatform.ExperimentRun.create(run_name = args.run_name, experiment = args.experiment_name)
expRun.log_params({'experiment': args.experiment, 'series': args.series, 'project_id': args.project_id})

# get schema from bigquery source
query = f"SELECT * FROM {args.bq_project}.{args.bq_dataset}.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{args.bq_table}'"
schema = bq.query(query).to_dataframe()

# get number of classes from bigquery source
nclasses = bq.query(query = f'SELECT DISTINCT {args.var_target} FROM {args.bq_project}.{args.bq_dataset}.{args.bq_table} WHERE {args.var_target} is not null').to_dataframe()
nclasses = nclasses.shape[0]
expRun.log_params({'data_source': f'bq://{args.bq_project}.{args.bq_dataset}.{args.bq_table}', 'nclasses': nclasses, 'var_split': 'splits', 'var_target': args.var_target})

# Make a list of columns to omit
OMIT = args.var_omit + ['splits']

# use schema to prepare a list of columns to read from BigQuery
selected_fields = schema[~schema.column_name.isin(OMIT)].column_name.tolist()

# all the columns in this data source are either float64 or int64
output_types = [dtypes.float64 if x=='FLOAT64' else dtypes.int64 for x in schema[~schema.column_name.isin(OMIT)].data_type.tolist()]

# remap input data to Tensorflow inputs of features and target
def transTable(row_dict):
    target = row_dict.pop(args.var_target)
    target = tf.one_hot(tf.cast(target, tf.int64), nclasses)
    target = tf.cast(target, tf.float32)
    return(row_dict, target)

# function to setup a bigquery reader with Tensorflow I/O
def bq_reader(split):
    reader = BigQueryClient()

    training = reader.read_session(
        parent = f"projects/{args.project_id}",
        project_id = args.bq_project,
        table_id = args.bq_table,
        dataset_id = args.bq_dataset,
        selected_fields = selected_fields,
        output_types = output_types,
        row_restriction = f"splits='{split}'",
        requested_streams = 3
    )
    
    return training

# setup feed for train, validate and test
train = bq_reader('TRAIN').parallel_read_rows().prefetch(1).map(transTable).shuffle(args.batch_size*10).batch(args.batch_size)
validate = bq_reader('VALIDATE').parallel_read_rows().prefetch(1).map(transTable).batch(args.batch_size)
test = bq_reader('TEST').parallel_read_rows().prefetch(1).map(transTable).batch(args.batch_size)
expRun.log_params({'training.batch_size': args.batch_size, 'training.shuffle': 10*args.batch_size, 'training.prefetch': 1})

# Logistic Regression

# model input definitions
feature_columns = {header: tf.feature_column.numeric_column(header) for header in selected_fields if header != args.var_target}
feature_layer_inputs = {header: tf.keras.layers.Input(shape = (1,), name = header) for header in selected_fields if header != args.var_target}

# feature columns to a Dense Feature Layer
feature_layer_outputs = tf.keras.layers.DenseFeatures(feature_columns.values(), name = 'feature_layer')(feature_layer_inputs)

# batch normalization of inputs
normalized = tf.keras.layers.BatchNormalization(name = 'batch_normalization_layer')(feature_layer_outputs)

# logistic - using softmax activation to nclasses
logistic = tf.keras.layers.Dense(nclasses, activation = tf.nn.softmax, name = 'logistic')(normalized)

# the model
model = tf.keras.Model(
    inputs = feature_layer_inputs,
    outputs = logistic,
    name = args.experiment
)

# compile
model.compile(
    optimizer = tf.keras.optimizers.SGD(), #SGD or Adam
    loss = tf.keras.losses.CategoricalCrossentropy(),
    metrics = ['accuracy', tf.keras.metrics.AUC(curve = 'PR', name = 'auprc')]
)

# setup tensorboard logs and train
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=os.environ['AIP_TENSORBOARD_LOG_DIR'], histogram_freq=1)
history = model.fit(train, epochs = args.epochs, callbacks = [tensorboard_callback], validation_data = validate)
expRun.log_params({'training.epochs': history.params['epochs']})
for e in range(0, history.params['epochs']):
    expRun.log_time_series_metrics(
        {
            'train_loss': history.history['loss'][e],
            'train_accuracy': history.history['accuracy'][e],
            'train_auprc': history.history['auprc'][e],
            'val_loss': history.history['val_loss'][e],
            'val_accuracy': history.history['val_accuracy'][e],
            'val_auprc': history.history['val_auprc'][e]
        }
    )

# test evaluations:
loss, accuracy, auprc = model.evaluate(test)
expRun.log_metrics({'test_loss': loss, 'test_accuracy': accuracy, 'test_auprc': auprc})

# val evaluations:
loss, accuracy, auprc = model.evaluate(validate)
expRun.log_metrics({'val_loss': loss, 'val_accuracy': accuracy, 'val_auprc': auprc})

# training evaluations:
loss, accuracy, auprc = model.evaluate(train)
expRun.log_metrics({'train_loss': loss, 'train_accuracy': accuracy, 'train_auprc': auprc})

# output the model save files
model.save(os.getenv("AIP_MODEL_DIR"))
expRun.log_params({'model.save': os.getenv("AIP_MODEL_DIR")})
expRun.end_run()

```

### Creating a Custom Container with Cloud Build

Cloud Build creates and manages the build on GCP.  The API creates a build by providing:
- location of the source
- instructions
- location to store the built artifacts

The instruction part of Cloud Build has options:
- Dockerfile
- Build Config file (YAML or JSON)
- Cloud Native Buildpacks

This notebook uses the approach of using the Python Client for Cloud Build and not referencing any local files.  For that reason, the first step is creating a Dockerfile for the workflow and storing it in GCS. The next step is running Cloud Build and using the client to specify the Build config rather than a config file.  The steps of the build config start with getting the code (git clone, or copy from GCS) and copying the Dockerfile.  

There are many workflows for creating containers with ML training code.  Many of the most common ones are explored in the tips notebook [Python Custom Containers](../Tips/Python%20Custom%20Containers.ipynb).  The method used here is the simplest - copy the training code directly into the container.  The other methods include packaging the training code as a Python Distribution and using `pip install` in from GCS, GitHub and even Artifact Registry as a private repository.

#### Store Resources in Cloud Storage

In [24]:
bucket = gcs.lookup_bucket(PROJECT_ID)
SOURCEPATH = f'{SERIES}/{EXPERIMENT}/training'

#### Copy Training Code

In [25]:
blob = bucket.blob(f'{SOURCEPATH}/{EXPERIMENT}_trainer/train.py')
blob.upload_from_filename(SCRIPT_PATH)

#### Create Requirements.txt File for Python

In [26]:
requirements = f"""tensorflow_io
google-cloud-aiplatform>={aiplatform.__version__}
protobuf=={pkg_resources.get_distribution('protobuf').version}
"""

In [27]:
blob = bucket.blob(f'{SOURCEPATH}/requirements.txt')
blob.upload_from_string(requirements)

#### Create the Dockerfile
A basic dockerfile thats take the base image and copies the code in and define an entrypoint - what python script to run first in this case.  Add RUN entries to pip install additional packages.

In [28]:
dockerfile = f"""
FROM {BASE_IMAGE}
WORKDIR /training
# copy requirements and install them
COPY requirements.txt ./
RUN pip install --no-cache-dir --upgrade pip \
  && pip install --no-cache-dir -r requirements.txt
## Copies the trainer code to the docker image
COPY {EXPERIMENT}_trainer/* ./{EXPERIMENT}_trainer/
## Sets up the entry point to invoke the trainer
ENTRYPOINT ["python", "-m", "{EXPERIMENT}_trainer.train"]
"""

In [29]:
blob = bucket.blob(f'{SOURCEPATH}/Dockerfile')
blob.upload_from_string(dockerfile)

#### Setup Artifact Registry

Artifact registry organizes artifacts with repositories.  Each repository contains packages and is designated to hold a partifcular format of package: Docker images, Python Packages and [others](https://cloud.google.com/artifact-registry/docs/supported-formats#package).

##### List Repositories

This may be empty if no repositories have been created for this project

In [30]:
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    print(repo.name)

projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-docker
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python


#### Create Docker Image Repository

Create an Artifact Registry Repository to hold Docker Images created by this notebook.  First, check to see if it is already created by a previous run and retrieve it if it has.  Otherwise, create!

In [31]:
docker_repo = None
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    if f'{PROJECT_ID}' == repo.name.split('/')[-1]:
        docker_repo = repo
        print(f'Retrieved existing repo: {docker_repo.name}')

if not docker_repo:
    operation = ar_client.create_repository(
        request = artifactregistry_v1.CreateRepositoryRequest(
            parent = f'projects/{PROJECT_ID}/locations/{REGION}',
            repository_id = f'{PROJECT_ID}',
            repository = artifactregistry_v1.Repository(
                description = f'A repository for the {EXPERIMENT} experiment that holds docker images.',
                name = f'{PROJECT_ID}',
                format_ = artifactregistry_v1.Repository.Format.DOCKER,
                labels = {'series': SERIES, 'experiment': EXPERIMENT}
            )
        )
    )
    print('Creating Repository ...')
    docker_repo = operation.result()
    print(f'Completed creating repo: {docker_repo.name}')

Retrieved existing repo: projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915


In [32]:
docker_repo.name, docker_repo.format_.name

('projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915',
 'DOCKER')

In [33]:
REPOSITORY = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{docker_repo.name.split('/')[-1]}"

#### Build Custom Container
Use the Cloud Build client to construct and run the build instructions.  Here the files collected in GCS are copied to the build instance, then the Docker build in run in the folder with the `Dockerfile`.  The resulting image is pushed to Artifact Registry (setup above).

In [34]:
# setup the build config with empty list of steps - these will be added sequentially
build = cloudbuild_v1.Build(
    steps = []
)
# retrieve the source
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/gsutil',
        'args': ['cp', '-r', f'gs://{PROJECT_ID}/{SOURCEPATH}/*', '/workspace']
    }
)
# docker build
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/docker',
        'args': ['build', '-t', f'{REPOSITORY}/{EXPERIMENT}_trainer', '/workspace']
    }    
)
# docker push
build.images = [f"{REPOSITORY}/{EXPERIMENT}_trainer"]

In [35]:
build

steps {
  name: "gcr.io/cloud-builders/gsutil"
  args: "cp"
  args: "-r"
  args: "gs://statmike-mlops-349915/05/05f/training/*"
  args: "/workspace"
}
steps {
  name: "gcr.io/cloud-builders/docker"
  args: "build"
  args: "-t"
  args: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05f_trainer"
  args: "/workspace"
}
images: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05f_trainer"

In [36]:
operation = cb_client.create_build(
    project_id = PROJECT_ID,
    build = build
)

In [37]:
response = operation.result()
response.status, response.artifacts

(<Status.SUCCESS: 3>,
 images: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/05f_trainer")

In [38]:
print(f"Review the Custom Container with Artifact Registry in the Google Cloud Console:\nhttps://console.cloud.google.com/artifacts/docker/{PROJECT_ID}/{REGION}/{PROJECT_ID}-docker?project={PROJECT_ID}")

Review the Custom Container with Artifact Registry in the Google Cloud Console:
https://console.cloud.google.com/artifacts/docker/statmike-mlops-349915/us-central1/statmike-mlops-349915-docker?project=statmike-mlops-349915


### Setup Training Job

In [39]:
CMDARGS = [
    "--epochs=" + str(EPOCHS),
    "--batch_size=" + str(BATCH_SIZE),
    "--var_target=" + VAR_TARGET,
    "--var_omit=" + VAR_OMIT,
    "--project_id=" + PROJECT_ID,
    "--bq_project=" + BQ_PROJECT,
    "--bq_dataset=" + BQ_DATASET,
    "--bq_table=" + BQ_TABLE,
    "--region=" + REGION,
    "--experiment=" + EXPERIMENT,
    "--series=" + SERIES,
    "--experiment_name=" + EXPERIMENT_NAME,
    "--run_name=" + RUN_NAME
]

In [40]:
trainingJob = aiplatform.CustomContainerTrainingJob(
    display_name = f'{SERIES}_{EXPERIMENT}_{TIMESTAMP}',
    container_uri = f"{REPOSITORY}/{EXPERIMENT}_trainer",
    model_serving_container_image_uri = DEPLOY_IMAGE,
    staging_bucket = f"{URI}/models/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

### Run Training Job AND Upload The Model
The training job will automatically upload the model to the Vertex AI Model Registry and return the link to the model.

In [41]:
modelmatch = aiplatform.Model.list(filter = f'display_name={SERIES}_{EXPERIMENT} AND labels.series={SERIES} AND labels.experiment={EXPERIMENT}')

upload_model = True
if modelmatch:
    print("Model Already in Registry:")
    if RUN_NAME in modelmatch[0].version_aliases:
        print("This version already loaded, no action taken.")
        upload_model = False
        model = aiplatform.Model(model_name = modelmatch[0].resource_name)
    else:
        print('Loading model as new default version.')
        parent_model = modelmatch[0].resource_name
else:
    print('This is a new model, creating in model registry')
    parent_model = ''
    
if upload_model:
    model = trainingJob.run(
        model_display_name = f'{SERIES}_{EXPERIMENT}',
        model_labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'},
        model_id = f'model_{SERIES}_{EXPERIMENT}',
        parent_model = parent_model,
        is_default_version = True,
        model_version_aliases = [RUN_NAME],
        model_version_description = RUN_NAME,
        base_output_dir = f"{URI}/models/{TIMESTAMP}",
        service_account = SERVICE_ACCOUNT,
        args = CMDARGS,
        replica_count = 1,
        machine_type = TRAIN_COMPUTE,
        accelerator_count = 0,
        tensorboard = tb.resource_name
    )

Model Already in Registry:
Loading model as new default version.
Training Output directory:
gs://statmike-mlops-349915/05/05f/models/20230211141850 
View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/6131480056045764608?project=1026793852137
CustomContainerTrainingJob projects/1026793852137/locations/us-central1/trainingPipelines/6131480056045764608 current state:
PipelineState.PIPELINE_STATE_RUNNING
View backing custom job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/2117541335035543552?project=1026793852137
View tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7876136041294331904+experiments+2117541335035543552
CustomContainerTrainingJob projects/1026793852137/locations/us-central1/trainingPipelines/6131480056045764608 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomContainerTrainingJob projects/1026793852137/location

Get the backing Custom Job for the Training Pipeline:

In [42]:
clientPL = aiplatform.gapic.PipelineServiceClient(client_options = {'api_endpoint': f'{REGION}-aiplatform.googleapis.com'})

In [43]:
from google.protobuf.json_format import MessageToDict

backingCustomJob = MessageToDict(clientPL.get_training_pipeline(name = trainingJob.resource_name)._pb)['trainingTaskMetadata']['backingCustomJob']

In [44]:
customJob = aiplatform.CustomJob.get(backingCustomJob)
customJob.resource_name, customJob.display_name

('projects/1026793852137/locations/us-central1/customJobs/2117541335035543552',
 '05_05f_20230211141850-custom-job')

Create hyperlinks to job and tensorboard here:

In [45]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"
board_link = f"https://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{customJob.name.split('/')[-1]}"

print(f'Review the Training Pipeline here:\nhttps://console.cloud.google.com/vertex-ai/training/training-pipelines?project={PROJECT_ID}')
print(f'Review the Custom Job here:\n{job_link}')
print(f'Review the TensorBoard From the Job here:\n{board_link}')
print(f'Review the model in the Vertex AI Model Registry:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/models/{model.name}?project={PROJECT_ID}')

Review the Training Pipeline here:
https://console.cloud.google.com/vertex-ai/training/training-pipelines?project=statmike-mlops-349915
Review the Custom Job here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/training/2117541335035543552/cpu?cloudshell=false&project=statmike-mlops-349915
Review the TensorBoard From the Job here:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7876136041294331904+experiments+2117541335035543552
Review the model in the Vertex AI Model Registry:
https://console.cloud.google.com/vertex-ai/locations/us-central1/models/model_05_05f?project=statmike-mlops-349915


---
## Serving

### Vertex AI Experiment Update and Review

In [46]:
expRun = aiplatform.ExperimentRun(run_name = RUN_NAME, experiment = EXPERIMENT_NAME)

In [47]:
expRun.log_params({
    'model.uri': model.uri,
    'model.display_name': model.display_name,
    'model.name': model.name,
    'model.resource_name': model.resource_name,
    'model.version_id': model.version_id,
    'model.versioned_resource_name': model.versioned_resource_name,
    'trainingPipelines.display_name': trainingJob.display_name,
    'trainingPipelines.resource_name': trainingJob.resource_name,
    'customJobs.display_name': customJob.display_name,
    'customJobs.resource_name': customJob.resource_name,
    'customJobs.link': job_link,
    'customJobs.tensorboard': board_link
})

Complete the experiment run:

In [48]:
expRun.update_state(state = aiplatform.gapic.Execution.State.COMPLETE)

Retrieve the experiment:

In [49]:
exp = aiplatform.Experiment(experiment_name = EXPERIMENT_NAME)

In [50]:
exp.get_data_frame()

Unnamed: 0,experiment_name,run_name,run_type,state,param.training.epochs,param.training.prefetch,param.data_source,param.trainingPipelines.display_name,param.customJobs.link,param.trainingPipelines.resource_name,...,metric.val_accuracy,metric.train_auprc,metric.train_loss,metric.val_auprc,time_series_metric.train_loss,time_series_metric.val_accuracy,time_series_metric.train_auprc,time_series_metric.train_accuracy,time_series_metric.val_loss,time_series_metric.val_auprc
0,experiment-05-05f-tf-classification-dnn,run-20230211141850,system.ExperimentRun,COMPLETE,10.0,1.0,bq://statmike-mlops-349915.fraud.fraud_prepped,05_05f_20230211141850,https://console.cloud.google.com/vertex-ai/loc...,projects/1026793852137/locations/us-central1/t...,...,0.999256,0.999471,0.005722,0.999389,0.00484,0.999256,0.999538,0.999224,0.005861,0.999389


Review the Experiments TensorBoard to compare runs:

In [51]:
print(f"The Experiment TensorBoard Link:\nhttps://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{exp.name}")

The Experiment TensorBoard Link:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7876136041294331904+experiments+experiment-05-05f-tf-classification-dnn


In [52]:
expRun.get_time_series_data_frame()

Unnamed: 0,step,wall_time,train_loss,val_accuracy,train_auprc,train_accuracy,val_loss,val_auprc
0,1,2023-02-11 14:38:11.823000+00:00,0.059591,0.999044,0.998242,0.984903,0.01146,0.999125
1,2,2023-02-11 14:38:11.912000+00:00,0.008609,0.999186,0.999304,0.999088,0.00837,0.999229
2,3,2023-02-11 14:38:11.983000+00:00,0.006779,0.999221,0.999382,0.999158,0.007522,0.99927
3,4,2023-02-11 14:38:12.085000+00:00,0.006082,0.999256,0.999416,0.999167,0.007115,0.999282
4,5,2023-02-11 14:38:12.187000+00:00,0.005778,0.999256,0.999411,0.99922,0.006795,0.999335
5,6,2023-02-11 14:38:12.251000+00:00,0.005502,0.999256,0.999422,0.999211,0.006538,0.999338
6,7,2023-02-11 14:38:12.343000+00:00,0.005277,0.999256,0.999466,0.999211,0.006324,0.99934
7,8,2023-02-11 14:38:12.441000+00:00,0.005095,0.999256,0.999502,0.999233,0.006119,0.999388
8,9,2023-02-11 14:38:12.525000+00:00,0.004987,0.999256,0.999491,0.999228,0.005986,0.999389
9,10,2023-02-11 14:38:12.605000+00:00,0.00484,0.999256,0.999538,0.999224,0.005861,0.999389


### Review Experiment and Run in Console

In [53]:
print(f'Review The Experiment in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/experiments/{EXPERIMENT_NAME}?project={PROJECT_ID}')

Review The Experiment in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/experiments/experiment-05-05f-tf-classification-dnn?project=statmike-mlops-349915


In [54]:
print(f'Review The Experiment Run in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/experiments/{EXPERIMENT_NAME}/runs/{EXPERIMENT_NAME}-{RUN_NAME}?project={PROJECT_ID}')

Review The Experiment Run in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/experiments/experiment-05-05f-tf-classification-dnn/runs/experiment-05-05f-tf-classification-dnn-run-20230211141850?project=statmike-mlops-349915


### Compare This Run Using Experiments

Get a list of all experiments in this project:

In [55]:
experiments = aiplatform.Experiment.list()

Remove experiments not in the SERIES:

In [56]:
experiments = [e for e in experiments if e.name.split('-')[0:2] == ['experiment', SERIES]]

Combine the runs from all experiments in SERIES into a single dataframe:

In [57]:
results = []
for experiment in experiments:
        results.append(experiment.get_data_frame())
        print(experiment.name)
results = pd.concat(results)

experiment-05-05f-tf-classification-dnn
experiment-05-05e-tf-classification-dnn
experiment-05-05d-tf-classification-dnn
experiment-05-05c-tf-classification-dnn
experiment-05-05b-tf-classification-dnn
experiment-05-05a-tf-classification-dnn
experiment-05-05-tf-classification-dnn


Create ranks for models within experiment and across the entire SERIES:

In [58]:
def ranker(metric = 'metric.test_auprc'):
    ranks = results[['experiment_name', 'run_name', 'param.model.display_name', 'param.model.version_id', metric]].copy().reset_index(drop = True)
    ranks['series_rank'] = ranks[metric].rank(method = 'dense', ascending = False)
    ranks['experiment_rank'] = ranks.groupby('experiment_name')[metric].rank(method = 'dense', ascending = False)
    return ranks.sort_values(by = ['experiment_name', 'run_name'])
    
ranks = ranker('metric.test_auprc')
ranks

Unnamed: 0,experiment_name,run_name,param.model.display_name,param.model.version_id,metric.test_auprc,series_rank,experiment_rank
7,experiment-05-05-tf-classification-dnn,run-20230210115433,05_05,6,0.99913,8.0,1.0
6,experiment-05-05a-tf-classification-dnn,run-20230210122632,05_05a,5,0.999154,7.0,2.0
5,experiment-05-05a-tf-classification-dnn,run-20230210132930,05_05a,6,0.999578,1.0,1.0
4,experiment-05-05b-tf-classification-dnn,run-20230210130602,05_05b,4,0.999341,6.0,1.0
3,experiment-05-05c-tf-classification-dnn,run-20230210130701,05_05c,3,0.999533,2.0,1.0
2,experiment-05-05d-tf-classification-dnn,run-20230211141913,05_05d,3,0.999527,4.0,1.0
1,experiment-05-05e-tf-classification-dnn,run-20230211141838,05_05e,3,0.999532,3.0,1.0
0,experiment-05-05f-tf-classification-dnn,run-20230211141850,05_05f,19,0.999443,5.0,1.0


In [59]:
current_rank = ranks.loc[(ranks['param.model.display_name'] == model.display_name) & (ranks['param.model.version_id'] == model.version_id)]
current_rank

Unnamed: 0,experiment_name,run_name,param.model.display_name,param.model.version_id,metric.test_auprc,series_rank,experiment_rank
0,experiment-05-05f-tf-classification-dnn,run-20230211141850,05_05f,19,0.999443,5.0,1.0


In [60]:
print(f"The current model is ranked {current_rank['experiment_rank'].iloc[0]} within this experiment and {current_rank['series_rank'].iloc[0]} across this series.")

The current model is ranked 1.0 within this experiment and 5.0 across this series.


### Create/Retrieve The Endpoint For This Series

In [61]:
endpoints = aiplatform.Endpoint.list(filter = f"labels.series={SERIES}")
if endpoints:
    endpoint = endpoints[0]
    print(f"Endpoint Exists: {endpoints[0].resource_name}")
else:
    endpoint = aiplatform.Endpoint.create(
        display_name = f"{SERIES}",
        labels = {'series' : f"{SERIES}"}    
    )
    print(f"Endpoint Created: {endpoint.resource_name}")
    
print(f'Review the Endpoint in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint.name}?project={PROJECT_ID}')

Endpoint Exists: projects/1026793852137/locations/us-central1/endpoints/8352053052307406848
Review the Endpoint in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/8352053052307406848?project=statmike-mlops-349915


In [62]:
endpoint.display_name

'05'

In [63]:
endpoint.traffic_split

{'5147434054177521664': 100}

In [64]:
deployed_models = endpoint.list_models()
#deployed_models

### Should This Model Be Deployed?
Is it better than the model already deployed on the endpoint?

In [65]:
deploy = False
if deployed_models:
    for deployed_model in deployed_models:
        deployed_rank = ranks.loc[(ranks['param.model.display_name'] == deployed_model.display_name) & (ranks['param.model.version_id'] == deployed_model.model_version_id)]['series_rank'].iloc[0]
        model_rank = current_rank['series_rank'].iloc[0]
        if deployed_model.display_name == model.display_name and deployed_model.model_version_id == model.version_id:
            print(f'The current model/version is already deployed.')
            break
        elif model_rank <= deployed_rank:
            deploy = True
            print(f'The current model is ranked better ({model_rank}) than a currently deployed model ({deployed_rank}).')
            break
    if deploy == False: print(f'The current model is ranked worse ({model_rank}) than a currently deployed model ({deployed_rank})')
else: 
    deply = True
    print('No models currently deployed.')

The current model is ranked worse (5.0) than a currently deployed model (1.0)


### Deploy Model To Endpoint

In [66]:
if deploy:
    print(f'Deploying model with 100% of traffic...')
    endpoint.deploy(
        model = model,
        deployed_model_display_name = model.display_name,
        traffic_percentage = 100,
        machine_type = DEPLOY_COMPUTE,
        min_replica_count = 1,
        max_replica_count = 1
    )
else: print(f'Not deploying - current model is worse ({model_rank}) than the currently deployed model ({deployed_rank})') 

Not deploying - current model is worse (5.0) than the currently deployed model (1.0)


### Remove Deployed Models without Traffic

In [67]:
for deployed_model in endpoint.list_models():
    if deployed_model.id in endpoint.traffic_split:
        print(f"Model {deployed_model.display_name} with version {deployed_model.model_version_id} has traffic = {endpoint.traffic_split[deployed_model.id]}")
    else:
        endpoint.undeploy(deployed_model_id = deployed_model.id)
        print(f"Undeploying {deployed_model.display_name} with version {deployed_model.model_version_id} because it has no traffic.")

Model 05_05a with version 6 has traffic = 100


In [68]:
endpoint.traffic_split

{'5147434054177521664': 100}

In [69]:
#endpoint.list_models()

---
## Prediction

See many more details on requesting predictions in the [05Tools - Prediction](./05Tools%20-%20Prediction.ipynb) notebook.

### Prepare a record for prediction: instance and parameters lists

In [70]:
n = 10
pred = bq.query(
    query = f"""
        SELECT * EXCEPT({VAR_TARGET}, {VAR_OMIT}, splits)
        FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}
        WHERE splits='TEST'
        LIMIT {n}
        """
).to_dataframe()

In [71]:
pred

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
0,35337,1.092844,-0.01323,1.359829,2.731537,-0.707357,0.873837,-0.79613,0.437707,0.39677,...,-0.240428,0.037603,0.380026,-0.167647,0.027557,0.592115,0.219695,0.03697,0.010984,0.0
1,60481,1.238973,0.035226,0.063003,0.641406,-0.260893,-0.580097,0.049938,-0.034733,0.405932,...,-0.26508,-0.060003,-0.053585,-0.057718,0.104983,0.537987,0.589563,-0.046207,-0.006212,0.0
2,139587,1.870539,0.211079,0.224457,3.889486,-0.380177,0.249799,-0.577133,0.179189,-0.120462,...,-0.374356,0.196006,0.656552,0.180776,-0.060226,-0.228979,0.080827,0.009868,-0.036997,0.0
3,162908,-3.368339,-1.980442,0.153645,-0.159795,3.847169,-3.516873,-1.209398,-0.292122,0.760543,...,-0.923275,-0.545992,-0.252324,-1.171627,0.214333,-0.159652,-0.060883,1.294977,0.120503,0.0
4,165236,2.180149,0.218732,-2.637726,0.348776,1.063546,-1.249197,0.942021,-0.547652,-0.087823,...,-0.250653,0.234502,0.825237,-0.176957,0.563779,0.730183,0.707494,-0.131066,-0.090428,0.0
5,62606,1.199408,0.352007,0.379645,1.372017,0.291347,0.524919,-0.117555,0.132907,-0.935169,...,-0.042979,-0.050291,-0.126609,-0.022218,-0.599026,0.258188,0.928721,-0.058988,-0.008856,0.0
6,90719,1.937447,0.337882,-0.00063,3.816486,0.276515,1.079842,-0.730626,0.197353,1.137566,...,-0.315667,-0.038376,0.208914,0.160189,-0.015145,-0.162678,-0.000843,-0.018178,-0.039339,0.0
7,113350,1.8919,0.401086,-0.119983,4.0475,0.049952,0.192793,-0.108512,-0.0404,-0.390391,...,-0.267639,0.094177,0.613712,0.070986,0.079543,0.135219,0.128961,0.003667,-0.045079,0.0
8,156499,0.060003,1.461355,0.378915,2.835455,1.626526,-0.164732,1.551858,-0.412927,-1.735264,...,-0.175275,0.042293,0.277536,-0.123379,1.081552,-0.053079,-0.149809,-0.314438,-0.216539,0.0
9,73902,-1.85926,2.158799,1.085671,2.615483,0.24666,2.133925,-1.569015,-2.612353,-1.312509,...,0.590142,-0.867178,-0.700479,0.231972,-1.374527,0.140285,0.128806,0.153606,0.092042,0.0


In [72]:
newobs = pred.to_dict(orient = 'records')
#newobs[0]

In [73]:
newobs[0]

{'Time': 35337,
 'V1': 1.0928441854981998,
 'V2': -0.0132303486713432,
 'V3': 1.35982868199426,
 'V4': 2.7315370965921004,
 'V5': -0.707357349219652,
 'V6': 0.8738370029866129,
 'V7': -0.7961301510622031,
 'V8': 0.437706509544851,
 'V9': 0.39676985012996396,
 'V10': 0.587438102569443,
 'V11': -0.14979756231827498,
 'V12': 0.29514781622888103,
 'V13': -1.30382621882143,
 'V14': -0.31782283120234495,
 'V15': -2.03673231037199,
 'V16': 0.376090905274179,
 'V17': -0.30040350116459497,
 'V18': 0.433799615590844,
 'V19': -0.145082264348681,
 'V20': -0.240427548108996,
 'V21': 0.0376030733329398,
 'V22': 0.38002620963091405,
 'V23': -0.16764742731151097,
 'V24': 0.0275573495476881,
 'V25': 0.59211469704354,
 'V26': 0.219695164116351,
 'V27': 0.0369695108704894,
 'V28': 0.010984441006191,
 'Amount': 0.0}

In [91]:
#instances = [json_format.ParseDict(newobs[0], Value())]

### Get Predictions: Python Client

In [83]:
prediction = endpoint.predict(instances = newobs[0:1])
prediction

Prediction(predictions=[[0.997585893, 0.00241406099]], deployed_model_id='2042624323570630656', model_version_id='3', model_resource_name='projects/1026793852137/locations/us-central1/models/model_05_05h', explanations=None)

In [84]:
prediction = endpoint.predict(instances = newobs)
prediction

Prediction(predictions=[[0.997585893, 0.00241406215], [0.991356134, 0.00864387583], [0.981079, 0.0189210437], [0.995278, 0.00472202944], [0.975121081, 0.0248789042], [0.98930186, 0.0106981583], [0.987706482, 0.0122935064], [0.96555239, 0.0344476253], [0.952486455, 0.0475135632], [0.989052594, 0.010947438]], deployed_model_id='2042624323570630656', model_version_id='3', model_resource_name='projects/1026793852137/locations/us-central1/models/model_05_05h', explanations=None)

In [86]:
prediction.predictions[0]

[0.997585893, 0.00241406215]

In [87]:
np.argmax(prediction.predictions[0])

0

### Get Predictions: REST

In [88]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": newobs[0:1]}))

In [89]:
!curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict

{
  "predictions": [
    [
      0.997585893,
      0.00241406099
    ]
  ],
  "deployedModelId": "2042624323570630656",
  "model": "projects/1026793852137/locations/us-central1/models/model_05_05h",
  "modelDisplayName": "05_05h",
  "modelVersionId": "3"
}


### Get Predictions: gcloud (CLI)

In [90]:
!gcloud beta ai endpoints predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --json-request={DIR}/request.json

Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]
[[0.997585893, 0.00241406099]]


---
## Remove Resources
see notebook "99 - Cleanup"