# Vertex AI > Notebooks - Models Built in Notebooks with Tensorflow


**Prerequisites:**
- [01 - BigQuery - Table Data Source](../01%20-%20Data%20Sources/01%20-%20BigQuery%20-%20Table%20Data%20Source.ipynb)



---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'fifth-sprite-402605'

In [2]:
REGION = 'us-central1'
EXPERIMENT = '05'
SERIES = '05'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Resources
DEPLOY_COMPUTE = 'n1-standard-4'
DEPLOY_IMAGE='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters
EPOCHS = 4
BATCH_SIZE = 100

In [3]:
#!pip install tensorflow==2.10.0 tensorflow-io==0.27.0

packages:

In [4]:
from google.cloud import bigquery

from tensorflow.python.framework import dtypes
from tensorflow_io.bigquery import BigQueryClient
import tensorflow as tf

from google.cloud import aiplatform
from datetime import datetime
import os

from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np
import pandas as pd
from sklearn import metrics as metrics

2023-10-23 06:17:24.880640: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-23 06:17:25.036157: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-10-23 06:17:25.036186: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-10-23 06:17:25.070758: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-23 06:17:25.828377: W tensorflow/stream_executor/pla

clients:

In [5]:
aiplatform.init(project = PROJECT_ID, location = REGION)
bq = bigquery.Client(project = PROJECT_ID)

parameters:

In [6]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET = PROJECT_ID
URI = f"gs://{BUCKET}/{SERIES}/{EXPERIMENT}"
DIR = f"temp/{EXPERIMENT}"

environment:

In [7]:
!rm -rf {DIR}
!mkdir -p {DIR}

Experiment Tracking:

In [8]:
FRAMEWORK = 'tf'
TASK = 'classification'
MODEL_TYPE = 'dnn'
EXPERIMENT_NAME = f'experiment-{SERIES}-{EXPERIMENT}-{FRAMEWORK}-{TASK}-{MODEL_TYPE}'
RUN_NAME = f'run-{TIMESTAMP}'

---
## Get Vertex AI Experiments Tensorboard Instance Name
[Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview) has managed [Tensorboard](https://www.tensorflow.org/tensorboard) instances that you can track Tensorboard Experiments (a training run or hyperparameter tuning sweep).  

The training job will show up as an experiment for the Tensorboard instance and have the same name as the training job ID.

This code checks to see if a Tensorboard Instance has been created in the project, retrieves it if so, creates it otherwise:

In [9]:
tb = aiplatform.Tensorboard.list(filter=f"labels.series={SERIES}")
if tb:
    tb = tb[0]
else: 
    tb = aiplatform.Tensorboard.create(display_name = SERIES, labels = {'series' : f'{SERIES}'})

In [10]:
tb.resource_name

'projects/117917517031/locations/us-central1/tensorboards/775749433861079040'

---
## Setup Vertex AI Experiments

The code in this section initializes the experiment and starts a run that represents this notebook.  Throughout the notebook sections for model training and evaluation information will be logged to the experiment using:
- [.log_params](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_params)
- [.log_metrics](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_metrics)
- [.log_time_series_metrics](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_time_series_metrics)

Initialize the Experiment:

In [11]:
aiplatform.init(experiment = EXPERIMENT_NAME, experiment_tensorboard = tb.resource_name)

Create an experiment run:

In [12]:
if RUN_NAME in [run.name for run in aiplatform.ExperimentRun.list(experiment = EXPERIMENT_NAME)]:
    expRun = aiplatform.ExperimentRun(run_name = RUN_NAME, experiment = EXPERIMENT_NAME)
    print('This run already exist with, using previous.')
else:
    expRun = aiplatform.ExperimentRun.create(run_name = RUN_NAME, experiment = EXPERIMENT_NAME)
    print('Starting a new run.')

Associating projects/117917517031/locations/us-central1/metadataStores/default/contexts/experiment-05-05-tf-classification-dnn-run-20231023061728 to Experiment: experiment-05-05-tf-classification-dnn
Starting a new run.


Log parameters to the experiment run:

In [13]:
expRun.log_params({'experiment': EXPERIMENT, 'series': SERIES, 'project_id': PROJECT_ID})

---
## Training Data
In this exercise the data source is a table in Google BigQuery. In this section the connection to BigQuery is done using Tensorflow I/O to read batches of training data in parallel during model training.

In [91]:
query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{BQ_TABLE}'"
schema = bq.query(query).to_dataframe()
schema

Unnamed: 0,table_catalog,table_schema,table_name,column_name,ordinal_position,is_nullable,data_type,is_generated,generation_expression,is_stored,is_hidden,is_updatable,is_system_defined,is_partitioning_column,clustering_ordinal_position,collation_name,column_default,rounding_mode
0,fifth-sprite-402605,fraud,fraud_prepped,Time,1,YES,INT64,NEVER,,,NO,,NO,NO,,,,
1,fifth-sprite-402605,fraud,fraud_prepped,V1,2,YES,FLOAT64,NEVER,,,NO,,NO,NO,,,,
2,fifth-sprite-402605,fraud,fraud_prepped,V2,3,YES,FLOAT64,NEVER,,,NO,,NO,NO,,,,
3,fifth-sprite-402605,fraud,fraud_prepped,V3,4,YES,FLOAT64,NEVER,,,NO,,NO,NO,,,,
4,fifth-sprite-402605,fraud,fraud_prepped,V4,5,YES,FLOAT64,NEVER,,,NO,,NO,NO,,,,
5,fifth-sprite-402605,fraud,fraud_prepped,V5,6,YES,FLOAT64,NEVER,,,NO,,NO,NO,,,,
6,fifth-sprite-402605,fraud,fraud_prepped,V6,7,YES,FLOAT64,NEVER,,,NO,,NO,NO,,,,
7,fifth-sprite-402605,fraud,fraud_prepped,V7,8,YES,FLOAT64,NEVER,,,NO,,NO,NO,,,,
8,fifth-sprite-402605,fraud,fraud_prepped,V8,9,YES,FLOAT64,NEVER,,,NO,,NO,NO,,,,
9,fifth-sprite-402605,fraud,fraud_prepped,V9,10,YES,FLOAT64,NEVER,,,NO,,NO,NO,,,,


### Number of Classes for the Label Column: VAR_TARGET
This is a supervised learning example that classifies examples into the classes found in the label column stored in the variable `VAR_TARGET`.

In [15]:
nclasses = bq.query(query = f'SELECT DISTINCT {VAR_TARGET} FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} WHERE {VAR_TARGET} is not null').to_dataframe()
nclasses

Unnamed: 0,Class
0,0
1,1


In [16]:
nclasses = nclasses.shape[0]
nclasses

2

In [17]:
expRun.log_params({'data_source': f'bq://{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}', 'nclasses': nclasses, 'var_split': 'splits', 'var_target': VAR_TARGET})

### Selected Columns and Data Types

Use the the table schema to prepare the TensorFlow model inputs:
- Omit unused columns
- Create list of `selected_fields` from the training source
- Define the data types, `output_types`, using the schema data and remapping to desired precision if needed

In [18]:
# Make a list of columns to omit
OMIT = VAR_OMIT.split() + ['splits']

# use schema to prepare a list of columns to read from BigQuery
selected_fields = schema[~schema.column_name.isin(OMIT)].column_name.tolist()

# all the columns in this data source are either float64 or int64
output_types = [dtypes.float64 if x=='FLOAT64' else dtypes.int64 for x in schema[~schema.column_name.isin(OMIT)].data_type.tolist()]

---
## Read From BigQuery Using TensorFlow I/O 

### Divide the inputs into features and target

Define a function that remaps the input data for TensorFlow into:
- features
- `target` - one_hot encoded for multi-class classification and also works for binary classification

In [19]:
def transTable(row_dict):
    target = row_dict.pop(VAR_TARGET)
    target = tf.one_hot(tf.cast(target,tf.int64), nclasses)
    target = tf.cast(target, tf.float32)
    return(row_dict, target)

### Setup Tensorflow I/O to Read Batches from BigQuery

Setup TensorFlow_IO client > session > table + table.map
- https://www.tensorflow.org/io/api_docs/python/tfio/bigquery/BigQueryClient

In [20]:
def bq_reader(split):
    reader = BigQueryClient()

    training = reader.read_session(
        parent = f"projects/{PROJECT_ID}",
        project_id = BQ_PROJECT,
        table_id = BQ_TABLE,
        dataset_id = BQ_DATASET,
        selected_fields = selected_fields,
        output_types = output_types,
        row_restriction = f"splits='{split}'",
        requested_streams = 3
    )
    
    return training.parallel_read_rows(sloppy = True, num_parallel_calls = tf.data.experimental.AUTOTUNE)

In [21]:
#bq_reader('TRAIN')#[0]

In [22]:
train = bq_reader('TRAIN').prefetch(1).map(transTable).shuffle(BATCH_SIZE*10).batch(BATCH_SIZE)
validate = bq_reader('VALIDATE').prefetch(1).map(transTable).batch(BATCH_SIZE)
test = bq_reader('TEST').prefetch(1).map(transTable).batch(BATCH_SIZE)

2023-10-23 06:18:09.841988: W tensorflow_io/core/kernels/audio_video_mp3_kernels.cc:271] libmp3lame.so.0 or lame functions are not available
2023-10-23 06:18:09.842308: I tensorflow_io/core/kernels/cpu_check.cc:128] Your CPU supports instructions that this TensorFlow IO binary was not compiled to use: AVX2 AVX512F FMA
2023-10-23 06:18:09.984142: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-10-23 06:18:09.984187: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2023-10-23 06:18:09.984219: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (vm-gcp-atin): /proc/driver/nvidia/version does not exist
2023-10-23 06:18:09.984634: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep

In [23]:
expRun.log_params({'training.batch_size': BATCH_SIZE, 'training.shuffle': 10*BATCH_SIZE, 'training.prefetch': 1})

---
## Define the Model: In The Notebook (local runtime)

In [24]:
# Logistic Regression

# model input definitions
feature_columns = {header: tf.feature_column.numeric_column(header) for header in selected_fields if header != VAR_TARGET}
feature_layer_inputs = {header: tf.keras.layers.Input(shape = (1,), name = header) for header in selected_fields if header != VAR_TARGET}

# feature columns to a Dense Feature Layer
feature_layer_outputs = tf.keras.layers.DenseFeatures(feature_columns.values(), name = 'feature_layer')(feature_layer_inputs)

# batch normalization of inputs
normalized = tf.keras.layers.BatchNormalization(name = 'batch_normalization_layer')(feature_layer_outputs)

# logistic - using softmax activation to nclasses
logistic = tf.keras.layers.Dense(nclasses, activation = tf.nn.softmax, name = 'logistic')(normalized)

# the model
model = tf.keras.Model(
    inputs = feature_layer_inputs,
    outputs = logistic,
    name = EXPERIMENT
)

# compile
model.compile(
    optimizer = tf.keras.optimizers.SGD(), #SGD or Adam
    loss = tf.keras.losses.CategoricalCrossentropy(),
    metrics = ['accuracy', tf.keras.metrics.AUC(curve = 'PR', name = 'auprc')]
)

---
## Train The Model: In The Notebook (local runtime)

Fit the Model:

In [25]:
# setup tensorboard logs and train
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir = os.path.join(DIR, "logs", f'{TIMESTAMP}'), histogram_freq=1)
history = model.fit(train, epochs = EPOCHS, callbacks = [tensorboard_callback], validation_data = validate)

Epoch 1/4


2023-10-23 06:18:28.263730: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:18:28.263782: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.


   2267/Unknown - 31s 12ms/step - loss: 0.0785 - accuracy: 0.9822 - auprc: 0.9947

2023-10-23 06:18:58.798539: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:18:58.798592: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.


Epoch 2/4


2023-10-23 06:19:04.485155: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:19:04.485207: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.




2023-10-23 06:19:32.435528: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:19:32.435578: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.


Epoch 3/4


2023-10-23 06:19:45.437148: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:19:45.437200: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.




2023-10-23 06:20:14.674152: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:20:14.674204: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.


Epoch 4/4


2023-10-23 06:20:26.384973: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:20:26.385034: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.




2023-10-23 06:20:55.529455: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:20:55.529510: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.




In [26]:
history.history['loss'][-1]

0.0061206878162920475

In [27]:
expRun.log_params({'training.epochs': history.params['epochs']})
history.params

{'verbose': 1, 'epochs': 4, 'steps': None}

Log the time series metrics to the experiments TensorBoard:

In [28]:
for e in range(0, history.params['epochs']):
    expRun.log_time_series_metrics(
        {
            'train_loss': history.history['loss'][e],
            'train_accuracy': history.history['accuracy'][e],
            'train_auprc': history.history['auprc'][e],
            'val_loss': history.history['val_loss'][e],
            'val_accuracy': history.history['val_accuracy'][e],
            'val_auprc': history.history['val_auprc'][e]
        }, step = e
    )

---
## Evaluate The Model: In The Notebook (local runtime)

Evaluate the model with the test data:

In [29]:
loss, accuracy, auprc = model.evaluate(test)

2023-10-23 06:21:24.400464: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:21:24.400521: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.




In [30]:
expRun.log_metrics({'test_loss': loss, 'test_accuracy': accuracy, 'test_auprc': auprc})

In [31]:
loss, accuracy, auprc = model.evaluate(validate)

2023-10-23 06:21:30.430906: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:21:30.430961: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.




In [32]:
expRun.log_metrics({'val_loss': loss, 'val_accuracy': accuracy, 'val_auprc': auprc})

In [33]:
loss, accuracy, auprc = model.evaluate(train)

2023-10-23 06:21:34.689831: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:21:34.689883: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.




In [34]:
expRun.log_metrics({'train_loss': loss, 'train_accuracy': accuracy, 'train_auprc': auprc})

---
## Custom Evaluation

Using the test data, calculate a series of metrics using [scikit-learn metrics](https://scikit-learn.org/stable/modules/model_evaluation.html).  Using TFIO to read the batches from BigQuery means the first step is getting the predictions and actual values into numpy arrays:

In [35]:
predictions = model.predict(test)

actuals = np.empty(shape = [0, predictions.shape[1]])
for features, target in test.take(-1): # -1 indicates all batches
    actuals = np.append(actuals, target.numpy(), axis = 0)

predictions_proba = np.max(predictions, axis = 1)
predictions = np.argmax(predictions, axis = 1)
actuals = np.argmax(actuals, axis = 1)

2023-10-23 06:22:04.890041: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:22:04.890100: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.




2023-10-23 06:22:10.086580: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:22:10.086634: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.


In [36]:
actuals[-20:]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [37]:
predictions[-20:]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Calculate metrics:

In [38]:
metrics.log_loss(actuals, predictions)

0.046691939478934984

In [39]:
metrics.accuracy_score(actuals, predictions)

0.998704572508928

In [40]:
metrics.average_precision_score(actuals, predictions)

0.42490484075445556

---
## Evaluate The Training With Tensorboard (On Vertex AI Experiments)

Resource: https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview

In [41]:
#!pip install google-cloud-aiplatform[tensorboard] -U -q

In [42]:
aiplatform.upload_tb_log(
    tensorboard_id = tb.name,
    tensorboard_experiment_name = EXPERIMENT_NAME,
    logdir = f'{DIR}/logs',
    experiment_display_name = EXPERIMENT_NAME,
    run_name_prefix = RUN_NAME,
    description = EXPERIMENT_NAME
)

Please consider uploading to a new experiment instead of an existing one, as the former allows for better upload performance.


View your Tensorboard at https://us-central1.tensorboard.googleusercontent.com/experiment/projects+117917517031+locations+us-central1+tensorboards+775749433861079040+experiments+experiment-05-05-tf-classification-dnn
[1m[2023-10-23T06:22:22][0m Started scanning logdir.
[1m[2023-10-23T06:22:30][0m Total uploaded: 36 scalars, 48 tensors (17.2 kB), 1 binary objects (102.2 kB)
One time TensorBoard log upload completed...[0m


---
## Save The Model

Create Prediction from a batch of the test data and review first row:

In [43]:
model.predict(test.take(1))[0]

2023-10-23 06:22:38.859220: E tensorflow/core/framework/dataset.cc:580] UNIMPLEMENTED: Cannot compute input sources for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.
2023-10-23 06:22:38.859267: E tensorflow/core/framework/dataset.cc:584] UNIMPLEMENTED: Cannot merge options for dataset of type IO>BigQueryDataset, because the dataset does not implement `InputDatasets`.




array([0.8589983 , 0.14100175], dtype=float32)

Save The Model

In [44]:
model.save(f'{URI}/models/{TIMESTAMP}/model')



INFO:tensorflow:Assets written to: gs://fifth-sprite-402605/05/05/models/20231023061728/model/assets


INFO:tensorflow:Assets written to: gs://fifth-sprite-402605/05/05/models/20231023061728/model/assets


In [45]:
expRun.log_params({'model.save': f'{URI}/models/{TIMESTAMP}/model'})

#### TensorFlow Model Load

In [46]:
tf_model = tf.saved_model.load(f'{URI}/models/{TIMESTAMP}/model')

---
## Serving

### Vertex AI Model Registry - Add Model/Version

Check to see if this model has been added to the Vertex AI Model Registry previously.  Add the current model as a a new model, or new version on an existing model.

In [47]:
modelmatch = aiplatform.Model.list(filter = f'display_name={SERIES}_{EXPERIMENT} AND labels.series={SERIES} AND labels.experiment={EXPERIMENT}')

upload_model = True
if modelmatch:
    print("Model Already in Registry:")
    if RUN_NAME in modelmatch[0].version_aliases:
        print("This version already loaded, no action taken.")
        upload_model = False
        model = aiplatform.Model(model_name = modelmatch[0].resource_name)
    else:
        print('Loading model as new default version.')
        parent_model = modelmatch[0].resource_name

else:
    print('This is a new model, creating in model registry')
    parent_model = ''

if upload_model:
    model = aiplatform.Model.upload(
        display_name = f'{SERIES}_{EXPERIMENT}',
        model_id = f'model_{SERIES}_{EXPERIMENT}',
        parent_model =  parent_model,
        serving_container_image_uri = DEPLOY_IMAGE,
        artifact_uri = f"{URI}/models/{TIMESTAMP}/model",
        is_default_version = True,
        version_aliases = [RUN_NAME],
        version_description = RUN_NAME,
        labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}        
    )

This is a new model, creating in model registry
Creating Model


INFO:google.cloud.aiplatform.models:Creating Model


Create Model backing LRO: projects/117917517031/locations/us-central1/models/model_05_05/operations/2439794792838725632


INFO:google.cloud.aiplatform.models:Create Model backing LRO: projects/117917517031/locations/us-central1/models/model_05_05/operations/2439794792838725632


Model created. Resource name: projects/117917517031/locations/us-central1/models/model_05_05@1


INFO:google.cloud.aiplatform.models:Model created. Resource name: projects/117917517031/locations/us-central1/models/model_05_05@1


To use this Model in another session:


INFO:google.cloud.aiplatform.models:To use this Model in another session:


model = aiplatform.Model('projects/117917517031/locations/us-central1/models/model_05_05@1')


INFO:google.cloud.aiplatform.models:model = aiplatform.Model('projects/117917517031/locations/us-central1/models/model_05_05@1')


>**Note** on Version Aliases:
>Expectation is a name starting with `a-z` that can include `[a-zA-Z0-9-]`
>
>**Retrieve a Model Resource**
>[aiplatform.Model()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model)
>```Python
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}') # retrieves default version
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}@time-{TIMESTAMP}') # retrieves specific version
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}', version = f'time-{TIMESTAMP}') # retrieves specific version
```

In [48]:
print(f'Review the model in the Vertex AI Model Registry:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/models/{model.name}?project={PROJECT_ID}')

Review the model in the Vertex AI Model Registry:
https://console.cloud.google.com/vertex-ai/locations/us-central1/models/model_05_05?project=fifth-sprite-402605


### Vertex AI Experiments - Update and Review

In [49]:
expRun.log_params({
    'model.uri': model.uri,
    'model.display_name': model.display_name,
    'model.name': model.name,
    'model.resource_name': model.resource_name,
    'model.version_id': model.version_id,
    'model.versioned_resource_name': model.versioned_resource_name
})

Complete the experiment run:

In [50]:
expRun.update_state(state = aiplatform.gapic.Execution.State.COMPLETE)

Retrieve the experiment:

In [51]:
exp = aiplatform.Experiment(experiment_name = EXPERIMENT_NAME)

In [52]:
exp.backing_tensorboard_resource_name

'projects/117917517031/locations/us-central1/tensorboards/775749433861079040'

In [53]:
exp.get_data_frame()

Unnamed: 0,experiment_name,run_name,run_type,state,param.model.resource_name,param.model.display_name,param.data_source,param.training.batch_size,param.experiment,param.var_target,...,metric.test_loss,metric.test_accuracy,metric.val_auprc,metric.val_loss,time_series_metric.train_loss,time_series_metric.train_accuracy,time_series_metric.train_auprc,time_series_metric.val_loss,time_series_metric.val_accuracy,time_series_metric.val_auprc
0,experiment-05-05-tf-classification-dnn,run-20231023061728,system.ExperimentRun,COMPLETE,projects/117917517031/locations/us-central1/mo...,05_05,bq://fifth-sprite-402605.fraud.fraud_prepped,100.0,5,Class,...,0.008853,0.998705,0.999317,0.007548,0.006121,0.999025,0.999644,0.007548,0.998883,0.999317
1,experiment-05-05-tf-classification-dnn,run-20231023054724,system.ExperimentRun,COMPLETE,projects/117917517031/locations/us-central1/mo...,05_05,bq://fifth-sprite-402605.fraud.fraud_prepped,100.0,5,Class,...,0.008102,0.99909,0.999355,0.007813,0.00599,0.99909,0.999574,0.007813,0.999127,0.999355


Review the Experiments TensorBoard to compare runs:

In [54]:
print(f"The Experiment TensorBoard Link:\nhttps://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{exp.name}")

The Experiment TensorBoard Link:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+117917517031+locations+us-central1+tensorboards+775749433861079040+experiments+experiment-05-05-tf-classification-dnn


In [55]:
expRun.get_time_series_data_frame()

Unnamed: 0,step,wall_time,train_loss,train_accuracy,train_auprc,val_loss,val_accuracy,val_auprc
0,1,2023-10-23 06:21:15.933000+00:00,0.012667,0.998445,0.999255,0.009639,0.998534,0.999453
1,2,2023-10-23 06:21:16.180000+00:00,0.007466,0.998924,0.999635,0.008082,0.998743,0.999375
2,3,2023-10-23 06:21:16.742000+00:00,0.006121,0.999025,0.999644,0.007548,0.998883,0.999317


### Review Experiment and Run in Console

In [56]:
print(f'Review The Experiment in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/experiments/{EXPERIMENT_NAME}?project={PROJECT_ID}')

Review The Experiment in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/experiments/experiment-05-05-tf-classification-dnn?project=fifth-sprite-402605


In [57]:
print(f'Review The Experiment Run in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/experiments/{EXPERIMENT_NAME}/runs/{EXPERIMENT_NAME}-{RUN_NAME}?project={PROJECT_ID}')

Review The Experiment Run in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/experiments/experiment-05-05-tf-classification-dnn/runs/experiment-05-05-tf-classification-dnn-run-20231023061728?project=fifth-sprite-402605


### Compare This Run Using Experiments

Get a list of all experiments in this project:

In [58]:
experiments = aiplatform.Experiment.list()

Remove experiments not in the SERIES:

In [59]:
experiments = [e for e in experiments if e.name.split('-')[0:2] == ['experiment', SERIES]]

Combine the runs from all experiments in SERIES into a single dataframe:

In [60]:
results = []
for experiment in experiments:
        results.append(experiment.get_data_frame())
        print(experiment.name)
results = pd.concat(results)

experiment-05-05-tf-classification-dnn


Create ranks for models within experiment and across the entire SERIES:

In [61]:
def ranker(metric = 'metric.test_auprc'):
    ranks = results[['experiment_name', 'run_name', 'param.model.display_name', 'param.model.version_id', metric]].copy().reset_index(drop = True)
    ranks['series_rank'] = ranks[metric].rank(method = 'dense', ascending = False)
    ranks['experiment_rank'] = ranks.groupby('experiment_name')[metric].rank(method = 'dense', ascending = False)
    return ranks.sort_values(by = ['experiment_name', 'run_name'])
    
ranks = ranker('metric.test_auprc')
ranks

Unnamed: 0,experiment_name,run_name,param.model.display_name,param.model.version_id,metric.test_auprc,series_rank,experiment_rank
1,experiment-05-05-tf-classification-dnn,run-20231023054724,05_05,1,0.999256,1.0,1.0
0,experiment-05-05-tf-classification-dnn,run-20231023061728,05_05,1,0.999215,2.0,2.0


In [62]:
current_rank = ranks.loc[(ranks['param.model.display_name'] == model.display_name) & (ranks['param.model.version_id'] == model.version_id)]
current_rank

Unnamed: 0,experiment_name,run_name,param.model.display_name,param.model.version_id,metric.test_auprc,series_rank,experiment_rank
1,experiment-05-05-tf-classification-dnn,run-20231023054724,05_05,1,0.999256,1.0,1.0
0,experiment-05-05-tf-classification-dnn,run-20231023061728,05_05,1,0.999215,2.0,2.0


In [63]:
print(f"The current model is ranked {current_rank['experiment_rank'].iloc[0]} within this experiment and {current_rank['series_rank'].iloc[0]} across this series.")

The current model is ranked 1.0 within this experiment and 1.0 across this series.


### Vertex AI Prediction - Create/Retrieve The Endpoint For This Series

In [64]:
endpoints = aiplatform.Endpoint.list(filter = f"labels.series={SERIES}")
if endpoints:
    endpoint = endpoints[0]
    print(f"Endpoint Exists: {endpoints[0].resource_name}")
else:
    endpoint = aiplatform.Endpoint.create(
        display_name = f"{SERIES}",
        labels = {'series' : f"{SERIES}"}    
    )
    print(f"Endpoint Created: {endpoint.resource_name}")
    
print(f'Review the Endpoint in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint.name}?project={PROJECT_ID}')

Creating Endpoint


INFO:google.cloud.aiplatform.models:Creating Endpoint


Create Endpoint backing LRO: projects/117917517031/locations/us-central1/endpoints/4399262570964320256/operations/1184416396709199872


INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/117917517031/locations/us-central1/endpoints/4399262570964320256/operations/1184416396709199872


Endpoint created. Resource name: projects/117917517031/locations/us-central1/endpoints/4399262570964320256


INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/117917517031/locations/us-central1/endpoints/4399262570964320256


To use this Endpoint in another session:


INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:


endpoint = aiplatform.Endpoint('projects/117917517031/locations/us-central1/endpoints/4399262570964320256')


INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/117917517031/locations/us-central1/endpoints/4399262570964320256')


Endpoint Created: projects/117917517031/locations/us-central1/endpoints/4399262570964320256
Review the Endpoint in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/4399262570964320256?project=fifth-sprite-402605


In [65]:
endpoint.display_name

'05'

In [66]:
endpoint.traffic_split

{}

In [67]:
deployed_models = endpoint.list_models()
#deployed_models

#### Should This Model Be Deployed?
Is it better than the model already deployed on the endpoint?

In [68]:
deploy = False
if deployed_models:
    for deployed_model in deployed_models:
        deployed_rank = ranks.loc[(ranks['param.model.display_name'] == deployed_model.display_name) & (ranks['param.model.version_id'] == deployed_model.model_version_id)]['series_rank'].iloc[0]
        model_rank = current_rank['series_rank'].iloc[0]
        if deployed_model.display_name == model.display_name and deployed_model.model_version_id == model.version_id:
            print(f'The current model/version is already deployed.')
            break
        elif model_rank <= deployed_rank:
            deploy = True
            print(f'The current model is ranked better ({model_rank}) than a currently deployed model ({deployed_rank}).')
            break
    if deploy == False: print(f'The current model is ranked worse ({model_rank}) than a currently deployed model ({deployed_rank})')
else: 
    deploy = True
    print('No models currently deployed.')

No models currently deployed.


#### Deploy Model To Endpoint

In [69]:
if deploy:
    print(f'Deploying model with 100% of traffic...')
    endpoint.deploy(
        model = model,
        deployed_model_display_name = model.display_name,
        traffic_percentage = 100,
        machine_type = DEPLOY_COMPUTE,
        min_replica_count = 1,
        max_replica_count = 1
    )
else: print(f'Not deploying - current model is worse ({model_rank}) than the currently deployed model ({deployed_rank})')

Deploying model with 100% of traffic...
Deploying Model projects/117917517031/locations/us-central1/models/model_05_05 to Endpoint : projects/117917517031/locations/us-central1/endpoints/4399262570964320256


INFO:google.cloud.aiplatform.models:Deploying Model projects/117917517031/locations/us-central1/models/model_05_05 to Endpoint : projects/117917517031/locations/us-central1/endpoints/4399262570964320256


Deploy Endpoint model backing LRO: projects/117917517031/locations/us-central1/endpoints/4399262570964320256/operations/2060366524232761344


INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/117917517031/locations/us-central1/endpoints/4399262570964320256/operations/2060366524232761344


Endpoint model deployed. Resource name: projects/117917517031/locations/us-central1/endpoints/4399262570964320256


INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/117917517031/locations/us-central1/endpoints/4399262570964320256


#### Remove Deployed Models without Traffic

In [92]:
for deployed_model in endpoint.list_models():
    if deployed_model.id in endpoint.traffic_split:
        print(f"Model {deployed_model.display_name} with version {deployed_model.model_version_id} has traffic = {endpoint.traffic_split[deployed_model.id]}")
    else:
        endpoint.undeploy(deployed_model_id = deployed_model.id)
        print(f"Undeploying {deployed_model.display_name} with version {deployed_model.model_version_id} because it has no traffic.")

Model 05_05 with version 1 has traffic = 100


In [93]:
endpoint.traffic_split

{'1370846908255305728': 100}

In [94]:
endpoint.list_models()

[id: "1370846908255305728"
 model: "projects/117917517031/locations/us-central1/models/model_05_05"
 display_name: "05_05"
 create_time {
   seconds: 1698042230
   nanos: 877416000
 }
 dedicated_resources {
   machine_spec {
     machine_type: "n1-standard-4"
   }
   min_replica_count: 1
   max_replica_count: 1
 }
 model_version_id: "1"]

---
## Online Prediction

### Prepare a record for prediction: instance and parameters lists

In [96]:
n = 10
pred = bq.query(
    query = f"""
        SELECT * EXCEPT({VAR_TARGET}, {VAR_OMIT}, splits)
        FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}
        WHERE splits='TEST'
        LIMIT {n}
        """
).to_dataframe()

In [97]:
pred

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
0,129834,-0.340232,2.502893,-2.123336,3.27429,2.781238,0.749164,0.999459,0.734927,-2.169208,...,-0.342277,-0.087134,-0.228322,-0.067413,-0.762246,-0.170996,0.019157,-0.04067,0.023008,0.0
1,113671,1.891091,0.321226,-0.142903,4.058212,-0.048428,0.166566,-0.183301,0.022226,-0.223479,...,-0.338267,0.081503,0.523432,0.084218,0.049362,0.09792,0.132655,-0.003539,-0.047447,0.0
2,63411,-1.808226,1.390167,0.266081,0.518695,-0.209067,-0.165734,-0.218656,0.76549,-0.752461,...,-0.261474,0.200491,0.538654,-0.296581,0.110225,-0.32147,0.533114,-1.076942,-0.595472,0.0
3,86534,1.87165,0.320887,0.255972,3.874758,-0.244906,0.285861,-0.474298,0.093078,-0.349966,...,-0.277242,0.213433,0.780687,0.162582,-0.018727,-0.177694,0.075748,0.019777,-0.033742,0.0
4,154994,1.99483,-0.273504,-0.280883,0.471706,-0.637261,-0.590863,-0.501702,-0.131346,1.207835,...,-0.173661,0.293171,1.16949,0.073709,0.051972,-0.006184,-0.139264,0.049037,-0.038147,0.0
5,3341,-0.286741,1.176355,2.505873,2.804536,0.038571,1.078103,0.011575,0.202091,-0.962279,...,0.261036,-0.030999,0.194469,-0.277251,-0.399436,-0.137051,0.232843,0.182048,0.129298,0.0
6,129787,-0.147554,1.176886,1.811677,4.704735,0.183953,0.74164,0.192683,0.222589,-1.279147,...,0.053832,0.146252,0.655139,-0.307063,0.046346,-0.163411,0.5956,0.179576,0.152155,0.0
7,35377,1.126534,0.289679,1.53731,2.706547,-0.759368,0.161174,-0.529165,0.141162,0.04673,...,-0.136887,-0.081741,-0.059834,0.041017,0.398586,0.312066,-0.045035,0.045694,0.036645,0.0
8,129175,1.871806,0.31011,-0.245871,3.858575,0.22927,0.695356,-0.248002,0.152463,-0.561724,...,-0.273772,0.119279,0.63007,0.023154,-0.314662,0.134237,0.133328,-0.003424,-0.059522,0.0
9,110602,-0.069236,1.14535,2.830041,4.849645,-0.037741,1.174016,-0.098474,0.092617,0.416461,...,0.108998,-0.383778,-0.493459,-0.03169,-0.078265,-0.779379,0.089735,0.028273,-0.056205,0.0


In [100]:
newobs = pred.to_dict(orient = 'records')

In [101]:
newobs[0]

{'Time': 129834,
 'V1': -0.340232044768614,
 'V2': 2.50289309622058,
 'V3': -2.12333626200597,
 'V4': 3.2742903170319404,
 'V5': 2.7812380029133497,
 'V6': 0.749163902015366,
 'V7': 0.999458966494039,
 'V8': 0.7349268096797199,
 'V9': -2.16920796632578,
 'V10': -1.3960327481938501,
 'V11': 0.30931402685906395,
 'V12': -0.9895730577869419,
 'V13': -0.423981971785906,
 'V14': -3.90067839426633,
 'V15': -0.536574827339307,
 'V16': 1.0393406110485301,
 'V17': 3.87122368852666,
 'V18': -0.0464031154773409,
 'V19': -3.28702948675253,
 'V20': -0.34227725879396403,
 'V21': -0.0871335919587312,
 'V22': -0.228321803031733,
 'V23': -0.0674131563149037,
 'V24': -0.7622463272802878,
 'V25': -0.17099567610229902,
 'V26': 0.0191567844693617,
 'V27': -0.0406701348285203,
 'V28': 0.023008467461218998,
 'Amount': 0.0}

### Get Predictions: Python Client

In [109]:
prediction = endpoint.predict(instances = newobs)
prediction

Prediction(predictions=[[0.858998299, 0.141001746], [0.996834815, 0.0031651922], [0.999823153, 0.000176848305], [0.992733479, 0.00726648793], [0.999197662, 0.000802317169], [0.998782575, 0.00121737667], [0.997467875, 0.00253215898], [0.998116136, 0.00188391213], [0.998504877, 0.00149511418], [0.998216808, 0.00178311253]], deployed_model_id='1370846908255305728', model_version_id='1', model_resource_name='projects/117917517031/locations/us-central1/models/model_05_05', explanations=None)

In [110]:
prediction.predictions[0]

[0.858998299, 0.141001746]

In [111]:
np.argmax(prediction.predictions[0])

0

### Get Predictions: REST

In [112]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": newobs[0:1]}))

In [113]:
!curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict

{
  "predictions": [
    [
      0.858998299,
      0.141001746
    ]
  ],
  "deployedModelId": "1370846908255305728",
  "model": "projects/117917517031/locations/us-central1/models/model_05_05",
  "modelDisplayName": "05_05",
  "modelVersionId": "1"
}


### Get Predictions: gcloud (CLI)

In [116]:
!gcloud beta ai endpoints predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --json-request={DIR}/request.json

Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]
[[0.858998299, 0.141001746]]


---
## Remove Resources

In [85]:
# remove endpoints

In [86]:
# remove models

In [87]:
# remove experiments

In [88]:
# remove training job

In [89]:
# remove pipeline runs

In [90]:
# remove GCS files