![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2F02+-+Vertex+AI+AutoML&dt=02b+-+Vertex+AI+-+AutoML+with+clients+%28code%29.ipynb)

# 02b - Vertex AI - AutoML with clients (code)

Use the Vertex AI Python Client to recreate the no-code approach of (02a) with code (Python).  This builds a custom model with AutoML and deploys it to an Endpoint for predictions and explanations. 

### Video Walkthrough of this notebook:
Includes conversational walkthrough and more explanatory information than the notebook:
<p align="center" width="100%"><center><a href="https://youtu.be/GOxHYfCLc6U" target="_blank" rel="noopener noreferrer"><img src="../architectures/thumbnails/playbutton/02b.png" width="40%"></a></center></p>

### Prerequisites:
-  01 - BigQuery - Table Data Source

### Resources:
-  [Python Client for Vertex AI](https://googleapis.dev/python/aiplatform/latest/aiplatform.html)
-  [AutoML Tabular Training Job With Python Client](https://cloud.google.com/vertex-ai/docs/training/automl-api#aiplatform_create_training_pipeline_tabular_classification_sample-python)
-  [Interpreting Explanations](https://cloud.google.com/vertex-ai/docs/predictions/interpreting-results-automl#tabular)

### Conceptual Flow & Workflow
<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/02b_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/02b_console.png" width="45%">
</p>

---
## Setup

inputs:

In [1]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'ivory-plane-372610'

In [2]:
REGION = 'us-central1'
DATANAME = 'fraud'
NOTEBOOK = '02b'

# Resources
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [3]:
from google.cloud import aiplatform
from datetime import datetime

from google.cloud import bigquery
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np

clients:

In [4]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bigquery = bigquery.Client()

parameters:

In [5]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
DIR = f"temp/{NOTEBOOK}"

environment:

In [6]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Create Dataset (link to BigQuery table)

In [8]:
dataset = aiplatform.TabularDataset.create(
    display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}', 
    bq_source = f'bq://{PROJECT_ID}.{DATANAME}.{DATANAME}_prepped',
    labels = {'notebook':f'{NOTEBOOK}'}
)

Creating TabularDataset
Create TabularDataset backing LRO: projects/24006034033/locations/us-central1/datasets/300028135917748224/operations/1847565806842413056
TabularDataset created. Resource name: projects/24006034033/locations/us-central1/datasets/300028135917748224
To use this TabularDataset in another session:
ds = aiplatform.TabularDataset('projects/24006034033/locations/us-central1/datasets/300028135917748224')


---
## Train Model with AutoML

In [9]:
column_specs = list(set(dataset.column_names) - set(VAR_OMIT.split()) - set([VAR_TARGET, 'splits']))

In [10]:
column_specs = dict.fromkeys(column_specs, 'auto')

In [11]:
column_specs

{'V24': 'auto',
 'V4': 'auto',
 'V5': 'auto',
 'V18': 'auto',
 'V2': 'auto',
 'Time': 'auto',
 'V27': 'auto',
 'V11': 'auto',
 'V17': 'auto',
 'V10': 'auto',
 'V13': 'auto',
 'V6': 'auto',
 'V19': 'auto',
 'V25': 'auto',
 'V16': 'auto',
 'Amount': 'auto',
 'V12': 'auto',
 'V28': 'auto',
 'V14': 'auto',
 'V8': 'auto',
 'V23': 'auto',
 'V1': 'auto',
 'V15': 'auto',
 'V26': 'auto',
 'V7': 'auto',
 'V20': 'auto',
 'V3': 'auto',
 'V21': 'auto',
 'V22': 'auto',
 'V9': 'auto'}

Define a Job:
- Consider Weighting
- Model Type
- Optimization Objective

https://googleapis.dev/python/aiplatform/latest/aiplatform.html#google.cloud.aiplatform.AutoMLTabularTrainingJob

In [12]:
tabular_classification_job = aiplatform.AutoMLTabularTrainingJob(
    display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    optimization_prediction_type = 'classification',
    optimization_objective = 'maximize-au-prc',
    column_specs = column_specs,
    labels = {'notebook':f'{NOTEBOOK}'}
)

In [13]:
model = tabular_classification_job.run(
    dataset = dataset,
    target_column = VAR_TARGET,
    predefined_split_column_name = 'splits',
    #    training_fraction_split = 0.8,
    #    validation_fraction_split = 0.1,
    #    test_fraction_split = 0.1,
    budget_milli_node_hours = 1000,
    model_display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    disable_early_stopping = False,
    model_labels = {'notebook':f'{NOTEBOOK}'}
)

View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/7910676237008240640?project=24006034033
AutoMLTabularTrainingJob projects/24006034033/locations/us-central1/trainingPipelines/7910676237008240640 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLTabularTrainingJob projects/24006034033/locations/us-central1/trainingPipelines/7910676237008240640 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLTabularTrainingJob projects/24006034033/locations/us-central1/trainingPipelines/7910676237008240640 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLTabularTrainingJob projects/24006034033/locations/us-central1/trainingPipelines/7910676237008240640 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLTabularTrainingJob projects/24006034033/locations/us-central1/trainingPipelines/7910676237008240640 current state:
PipelineState.PIPELINE_STATE_RUNNING
AutoMLTabularTrainingJob projects/24006034033/locations/us-central1/trainingPip

---
## Evaluation
While the model above was trained using AutoML with the API, it is still possible to review the evaluation metrics directly in the Google Cloud Console.  Just visit the Models section of Vertex AI service and select the model and it will present the evaluation metrics with many helpful visuals.

It is also possible to retrieve the evaluation metrics for you model using the API.  This section shows how to use the API.

<p align="center" width="100%"><center><a href="https://youtu.be/0vhviqmH8Gg" target="_blank" rel="noopener noreferrer"><img src="../architectures/thumbnails/playbutton/02b.png" width="40%"></a></center></p>
<p align="center" width="100%"><center>Part 2 Video</center></p>

For more information review [this page](https://cloud.google.com/vertex-ai/docs/training/evaluating-automl-models).

Setup a model client for the model create by this notebook:

In [14]:
model.resource_name

'projects/24006034033/locations/us-central1/models/8043184842902339584'

In [15]:
model_client = aiplatform.gapic.ModelServiceClient(
    client_options = {
        'api_endpoint' : f'{REGION}-aiplatform.googleapis.com'
    }
)

Retrives the aggregate model evalution metrics for the model as a whole.  First, use `.list_model_evaluations` to retrieve the evaluation id, then use `.get_model_evaluation` for the evaluation id:

In [16]:
evaluations = model_client.list_model_evaluations(parent = model.resource_name)
evals = iter(evaluations)
eval_id = next(evals).name
geteval = model_client.get_model_evaluation(name = eval_id)

Review several of the metrics include in the evaluation.  Also, compare these to the results in the console view.

In [17]:
geteval.metrics['auPrc']

0.9999533

In [18]:
for i in range(len(geteval.metrics['confusionMatrix']['annotationSpecs'])):
    print('True Label = ', geteval.metrics['confusionMatrix']['annotationSpecs'][i]['displayName'], ' has Predicted labels = ', geteval.metrics['confusionMatrix']['rows'][i])

True Label =  0  has Predicted labels =  [28489.0, 0.0]
True Label =  1  has Predicted labels =  [8.0, 37.0]


For models with labels you can retrieve the evaluation metrics for each slice of the model:

In [19]:
slices = model_client.list_model_evaluation_slices(parent = eval_id)

In [20]:
for slice in slices:
    print('Label = ', slice.slice_.value, 'has auPrc = ', slice.metrics['auPrc'])

Label =  1 has auPrc =  0.9523295
Label =  0 has auPrc =  0.9999645


---
## Endpoint and Deployment

In [21]:
endpoint = aiplatform.Endpoint.create(
    display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    labels = {'notebook':f'{NOTEBOOK}'}
)

Creating Endpoint
Create Endpoint backing LRO: projects/24006034033/locations/us-central1/endpoints/6572321958938017792/operations/7444414243757096960
Endpoint created. Resource name: projects/24006034033/locations/us-central1/endpoints/6572321958938017792
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/24006034033/locations/us-central1/endpoints/6572321958938017792')


In [22]:
endpoint.deploy(
    model = model,
    deployed_model_display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    traffic_percentage = 100,
    machine_type = DEPLOY_COMPUTE,
    min_replica_count = 1,
    max_replica_count = 1
)

Deploying Model projects/24006034033/locations/us-central1/models/8043184842902339584 to Endpoint : projects/24006034033/locations/us-central1/endpoints/6572321958938017792
Deploy Endpoint model backing LRO: projects/24006034033/locations/us-central1/endpoints/6572321958938017792/operations/6429978427691892736
Endpoint model deployed. Resource name: projects/24006034033/locations/us-central1/endpoints/6572321958938017792


---
## Prediction

### Prepare a record for prediction: instance and parameters lists

In [23]:
pred = bigquery.query(query = f"SELECT * FROM {DATANAME}.{DATANAME}_prepped WHERE splits='TEST' LIMIT 10").to_dataframe()

In [24]:
pred.head(4)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,166236,-0.62209,1.665084,-0.642326,-0.01291,0.581359,-0.354218,0.343285,0.459361,-0.564147,...,-0.431989,0.524353,0.535777,1.052987,0.193131,0.116374,0.0,0,7df222f9-f61d-414c-9aeb-95ce71e89ea3,TEST
1,123183,1.83272,0.137409,-0.128425,3.969397,-0.151112,0.491749,-0.417466,0.28936,-0.22606,...,0.108731,-0.026719,-0.016266,0.121483,-0.015474,-0.061161,0.0,0,3bf10040-5b7f-4875-a91f-2e1d6527881d,TEST
2,28140,-0.728655,1.1915,0.819239,3.014839,0.564918,2.248181,-0.573803,1.282026,-1.062475,...,0.082528,-1.117841,-0.68206,0.346157,-0.059915,0.012991,0.0,0,ac80cf44-ef9f-46ea-bfdf-ce2cf68141ac,TEST
3,159888,-3.146402,2.543688,-0.328957,2.499684,-0.112949,0.959888,-0.501032,0.632631,0.272793,...,0.085183,0.21283,-0.312526,-0.24838,-2.73144,-0.754864,0.0,0,fc4c771b-5917-49c4-a1dc-529cd6ef3281,TEST


In [26]:
newob = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET, 'splits'])]].to_dict(orient='records')[0]
newob

{'Time': 166236,
 'V1': -0.622090014217808,
 'V2': 1.6650838791726599,
 'V3': -0.642326132275291,
 'V4': -0.0129096183833018,
 'V5': 0.5813594663827301,
 'V6': -0.35421790825021604,
 'V7': 0.34328484582042107,
 'V8': 0.459361491826778,
 'V9': -0.564147227282772,
 'V10': -1.74389864971886,
 'V11': 1.7610236285681102,
 'V12': 0.430189738645047,
 'V13': 0.0407658617990947,
 'V14': -2.22563406668347,
 'V15': 0.84311316838164,
 'V16': -0.767662664134352,
 'V17': 3.3050180120954598,
 'V18': 1.48884814078476,
 'V19': 3.38615095509971,
 'V20': 0.44883911925511405,
 'V21': -0.172265109408835,
 'V22': -0.204461620364193,
 'V23': -0.431988510463775,
 'V24': 0.5243528397986429,
 'V25': 0.535777135715214,
 'V26': 1.05298718432717,
 'V27': 0.19313136672907502,
 'V28': 0.11637382272405197,
 'Amount': 0.0}

Need to understand the format of variables that the predictions expect.  AutoML may convert the type of some variables. The following cells retrieve the model from the endpoint and its schemata:

In [27]:
newob['Time'] = str(newob['Time'])

In [28]:
instances = [json_format.ParseDict(newob, Value())]
parameters = json_format.ParseDict({}, Value())

### Get Predictions: Python Client

In [29]:
prediction = endpoint.predict(instances=instances, parameters=parameters)

In [30]:
prediction

Prediction(predictions=[{'scores': [0.9997182488441467, 0.0002818354405462742], 'classes': ['0', '1']}], deployed_model_id='7402136578560098304', model_version_id='1', model_resource_name='projects/24006034033/locations/us-central1/models/8043184842902339584', explanations=None)

In [31]:
prediction.predictions[0]['classes'][np.argmax(prediction.predictions[0]['scores'])]

'0'

### Get Predictions: REST

In [32]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": [newob]}))

In [33]:
!curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict

{
  "predictions": [
    {
      "classes": [
        "0",
        "1"
      ],
      "scores": [
        0.99971824884414673,
        0.00028183544054627419
      ]
    }
  ],
  "deployedModelId": "7402136578560098304",
  "model": "projects/24006034033/locations/us-central1/models/8043184842902339584",
  "modelDisplayName": "02b_fraud_20230101110356",
  "modelVersionId": "1"
}


### Get Predictions: gcloud (CLI)

In [34]:
!gcloud beta ai endpoints predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --json-request={DIR}/request.json

Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]
[{'classes': ['0', '1'], 'scores': [0.9997182488441467, 0.0002818354405462742]}]


---
## Explanations
Interpretation Guide
- https://cloud.google.com/vertex-ai/docs/predictions/interpreting-results-automl#tabular

In [None]:
explanation = endpoint.explain(instances=instances, parameters=parameters)

In [None]:
explanation.predictions

In [None]:
print("attribution:")
print("baseline output",explanation.explanations[0].attributions[0].baseline_output_value)
print("instance output",explanation.explanations[0].attributions[0].instance_output_value)
print("output_index",explanation.explanations[0].attributions[0].output_index)
print("output display value",explanation.explanations[0].attributions[0].output_display_name)
print("approximation error",explanation.explanations[0].attributions[0].approximation_error)

In [None]:
import matplotlib.pyplot as plt
features = []
scores = []
for k in explanation.explanations[0].attributions[0].feature_attributions:
    features.append(k)
    scores.append(explanation.explanations[0].attributions[0].feature_attributions[k])
features = [x for _, x in sorted(zip(scores, features))]
scores = sorted(scores)
fig, ax = plt.subplots()
fig.set_size_inches(9, 9)
ax.barh(features, scores)
fig.show()

---
## Batch Predictions: BigQuery Source to BigQuery Destination, with Explanations

In [None]:
batch = aiplatform.BatchPredictionJob.create(
    job_display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    model_name = model.name,
    instances_format = "bigquery",
    predictions_format = "bigquery",
    bigquery_source = f'bq://{PROJECT_ID}.{DATANAME}.{DATANAME}_prepped',
    bigquery_destination_prefix = f"{PROJECT_ID}",
    generate_explanation = True,
    labels = {'notebook':f'{NOTEBOOK}'}
)

Creating BatchPredictionJob
BatchPredictionJob created. Resource name: projects/24006034033/locations/us-central1/batchPredictionJobs/3184711378036326400
To use this BatchPredictionJob in another session:
bpj = aiplatform.BatchPredictionJob('projects/24006034033/locations/us-central1/batchPredictionJobs/3184711378036326400')
View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/3184711378036326400?project=24006034033
BatchPredictionJob projects/24006034033/locations/us-central1/batchPredictionJobs/3184711378036326400 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/24006034033/locations/us-central1/batchPredictionJobs/3184711378036326400 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/24006034033/locations/us-central1/batchPredictionJobs/3184711378036326400 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/24006034033/locations/us-central1/batchPredictionJobs/3184711

---
## Remove Resources
see notebook "99 - Cleanup"