# 02b - Vertex AI - AutoML with clients (code)
Use the Vertex AI Python Client to recreate the no-code approach of (02a) with code (Python).  This builds a custom model with AutoML and deploys it to an Endpoint for predictions and explanations.  

**Prerequisites:**
-  01 - BigQuery - Table Data Source

**Overview:**
-  Use Python client google.cloud.aiplatform for Vertex AI
   -  Create a dataset
      -  aiplatform.TabularDataset
      -  Link BigQuery table
   -  Train Model with AutoML
      -  aiplatform.AutoMLTabularTrainingJob
   -  Evaluate
      -  Review the model in GCP Console > Vertex AI > Models
   -  Deploy to Endpoint
      -  Endpoint = aiplatform.Endpoint
      -  Endpoint.deploy
   -  Online Predictions
      -  Endpoint.predict
   -  Explanations
      -  Endpoint.explain
   -  Batch Prediction Job
      -  aiplatform.BatchPredictionJob

**Resources:**
-  [Python Client for Vertex AI](https://googleapis.dev/python/aiplatform/latest/aiplatform.html)
-  [AutoML Tabular Training Job With Python Client](https://cloud.google.com/vertex-ai/docs/training/automl-api#aiplatform_create_training_pipeline_tabular_classification_sample-python)
-  [Interpreting Explanations](https://cloud.google.com/vertex-ai/docs/predictions/interpreting-results-automl#tabular)

**Related Training:**
-  todo


---
## Conceptual Architecture

<img src="architectures/statmike-mlops-A2.png">

---
## Setup

inputs:

In [1]:
REGION = 'us-central1'
PROJECT_ID='statmike-mlops'
DATANAME = 'digits'
NOTEBOOK = '02b'

# Resources
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'target'
VAR_OMIT = 'target_OE' # add more variables to the string with space delimiters

packages:

In [54]:
from google.cloud import aiplatform
from datetime import datetime

from google.cloud import bigquery
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np

clients:

In [3]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bigquery = bigquery.Client()

parameters:

In [4]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
DIR = f"temp/{NOTEBOOK}"

environment:

In [57]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Create Dataset (link to BigQuery table)

In [5]:
dataset = aiplatform.TabularDataset.create(
    display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}', 
    bq_source = f'bq://{PROJECT_ID}.{DATANAME}.{DATANAME}_prepped',
    labels = {'notebook':f'{NOTEBOOK}'}
)

INFO:google.cloud.aiplatform.datasets.dataset:Creating TabularDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create TabularDataset backing LRO: projects/691911073727/locations/us-central1/datasets/459112075294146560/operations/1185597615794814976
INFO:google.cloud.aiplatform.datasets.dataset:TabularDataset created. Resource name: projects/691911073727/locations/us-central1/datasets/459112075294146560
INFO:google.cloud.aiplatform.datasets.dataset:To use this TabularDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.TabularDataset('projects/691911073727/locations/us-central1/datasets/459112075294146560')


---
## Train Model with AutoML

In [30]:
column_specs = list(set(dataset.column_names) - set(VAR_OMIT.split()) - set([VAR_TARGET, 'splits']))

In [31]:
column_specs = dict.fromkeys(column_specs, 'auto')

In [32]:
tabular_classification_job = aiplatform.AutoMLTabularTrainingJob(
    display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    optimization_prediction_type = 'classification',
    column_specs = column_specs,
    labels = {'notebook':f'{NOTEBOOK}'}
)

In [33]:
# temporary fix for issue on 9/19/21 that can be removed within 1 week
tabular_classification_job._add_additional_experiments(['training_pipeline_version=legacy'])

In [34]:
model = tabular_classification_job.run(
    dataset = dataset,
    target_column = VAR_TARGET,
    predefined_split_column_name = 'splits',
    budget_milli_node_hours = 1000,
    model_display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    disable_early_stopping = False,
    model_labels = {'notebook':f'{NOTEBOOK}'}
)

INFO:google.cloud.aiplatform.training_jobs:View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/275548609036943360?project=691911073727
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/275548609036943360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/275548609036943360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/275548609036943360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/275548609036943360 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aipl

---
## Endpoint and Deployment

In [35]:
endpoint = aiplatform.Endpoint.create(
    display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    labels = {'notebook':f'{NOTEBOOK}'}
)

INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/691911073727/locations/us-central1/endpoints/4644600998516490240/operations/8028535774607572992
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/691911073727/locations/us-central1/endpoints/4644600998516490240
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/691911073727/locations/us-central1/endpoints/4644600998516490240')


In [36]:
endpoint.deploy(
    model = model,
    deployed_model_display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    traffic_percentage = 100,
    machine_type = DEPLOY_COMPUTE,
    min_replica_count = 1,
    max_replica_count = 1
)

INFO:google.cloud.aiplatform.models:Deploying Model projects/691911073727/locations/us-central1/models/2072139613706649600 to Endpoint : projects/691911073727/locations/us-central1/endpoints/4644600998516490240
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/691911073727/locations/us-central1/endpoints/4644600998516490240/operations/491761798203047936
INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/691911073727/locations/us-central1/endpoints/4644600998516490240


---
## Prediction

### Prepare a record for prediction: instance and parameters lists

In [37]:
pred = bigquery.query(query = f"SELECT * FROM {DATANAME}.{DATANAME} LIMIT 10").to_dataframe()

In [38]:
pred.head(4)

Unnamed: 0,p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,...,p56,p57,p58,p59,p60,p61,p62,p63,target,target_OE
0,0.0,5.0,16.0,15.0,5.0,0.0,0.0,0.0,0.0,2.0,...,0.0,6.0,16.0,16.0,16.0,16.0,7.0,0.0,2,Even
1,0.0,5.0,16.0,12.0,1.0,0.0,0.0,0.0,0.0,5.0,...,0.0,8.0,16.0,16.0,16.0,16.0,4.0,0.0,2,Even
2,0.0,5.0,15.0,16.0,6.0,0.0,0.0,0.0,0.0,11.0,...,0.0,6.0,16.0,16.0,16.0,13.0,3.0,0.0,2,Even
3,0.0,4.0,15.0,15.0,8.0,0.0,0.0,0.0,0.0,8.0,...,0.0,7.0,14.0,11.0,0.0,0.0,0.0,0.0,2,Even


In [39]:
newob = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET])]].to_dict(orient='records')[0]
#newob

In [48]:
instances = [json_format.ParseDict(newob, Value())]
parameters = json_format.ParseDict({}, Value())

### Get Predictions: Python Client

In [49]:
prediction = endpoint.predict(instances=instances, parameters=parameters)

In [50]:
prediction

Prediction(predictions=[{'classes': ['4', '9', '0', '1', '2', '6', '8', '7', '5', '3'], 'scores': [9.097868654592929e-16, 1.578200257653685e-15, 9.30778830051537e-17, 1.305656027739133e-09, 1.0, 6.139234628478596e-13, 1.546341898972514e-08, 4.494506156138556e-11, 2.357233908122178e-14, 4.540160608579313e-11]}], deployed_model_id='8708351994311475200', explanations=None)

In [51]:
import numpy as np
prediction.predictions[0]['classes'][np.argmax(prediction.predictions[0]['scores'])]

'2'

### Get Predictions: REST

In [58]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": [newob]}))

In [59]:
!curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict

{
  "predictions": [
    {
      "scores": [
        9.0978686545929287e-16,
        1.5782002576536849e-15,
        9.30778830051537e-17,
        1.3056560277391329e-09,
        1,
        6.1392346284785959e-13,
        1.5463418989725138e-08,
        4.4945061561385558e-11,
        2.3572339081221781e-14,
        4.5401606085793127e-11
      ],
      "classes": [
        "4",
        "9",
        "0",
        "1",
        "2",
        "6",
        "8",
        "7",
        "5",
        "3"
      ]
    }
  ],
  "deployedModelId": "8708351994311475200"
}


### Get Predictions: gcloud (CLI)

In [60]:
!gcloud beta ai endpoints predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --json-request={DIR}/request.json

Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]
[{'classes': ['4', '9', '0', '1', '2', '6', '8', '7', '5', '3'], 'scores': [9.097868654592929e-16, 1.578200257653685e-15, 9.30778830051537e-17, 1.305656027739133e-09, 1, 6.139234628478596e-13, 1.546341898972514e-08, 4.494506156138556e-11, 2.357233908122178e-14, 4.540160608579313e-11]}]


### Batch Predictions: BigQuery Source to BigQuery Destination, with Explanations

In [59]:
batch = aiplatform.BatchPredictionJob.create(
    job_display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    model_name = model.name,
    instances_format = "bigquery",
    predictions_format = "bigquery",
    bigquery_source = f'bq://{PROJECT_ID}.{DATANAME}.{DATANAME}',
    bigquery_destination_prefix = f"{PROJECT_ID}",
    generate_explanation = True,
    labels = {'notebook':f'{NOTEBOOK}'}
)

INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/691911073727/locations/us-central1/batchPredictionJobs/7908657396220690432
INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:
INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/691911073727/locations/us-central1/batchPredictionJobs/7908657396220690432')
INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/7908657396220690432?project=691911073727
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/691911073727/locations/us-central1/batchPredictionJobs/7908657396220690432 current state:
JobState.JOB_STATE_RUNNING
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/691911073727/locations/us-central1/batchPredictionJobs/7908657396220690432 current state:
JobState.JOB_STAT

---
## Explanations
Interpretation Guide
- https://cloud.google.com/vertex-ai/docs/predictions/interpreting-results-automl#tabular

In [61]:
explanation = endpoint.explain(instances=instances, parameters=parameters)

In [62]:
explanation.predictions

[{'scores': [9.097868654592929e-16,
   1.578200257653685e-15,
   9.30778830051537e-17,
   1.305656027739133e-09,
   1.0,
   6.139234628478596e-13,
   1.546341898972514e-08,
   4.494506156138556e-11,
   2.357233908122178e-14,
   4.540160608579313e-11],
  'classes': ['4', '9', '0', '1', '2', '6', '8', '7', '5', '3']}]

In [63]:
print("attribution:")
print("baseline output",explanation.explanations[0].attributions[0].baseline_output_value)
print("instance output",explanation.explanations[0].attributions[0].instance_output_value)
print("output_index",explanation.explanations[0].attributions[0].output_index)
print("output display value",explanation.explanations[0].attributions[0].output_display_name)
print("approximation error",explanation.explanations[0].attributions[0].approximation_error)

attribution:
baseline output 0.001280912896618247
instance output 1.0
output_index [4]
output display value 2
approximation error 0.026628862793207198


In [64]:
explanation.explanations[0].attributions[0]

baseline_output_value: 0.001280912896618247
instance_output_value: 1.0
feature_attributions {
  struct_value {
    fields {
      key: "p0"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p1"
      value {
        number_value: 0.02005322404771245
      }
    }
    fields {
      key: "p10"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p11"
      value {
        number_value: 0.0006086619437805244
      }
    }
    fields {
      key: "p12"
      value {
        number_value: 0.001618836640513369
      }
    }
    fields {
      key: "p13"
      value {
        number_value: 0.0720875047429997
      }
    }
    fields {
      key: "p14"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p15"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p16"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p17"
      value {
        number_value: 0.0

---
## Remove Resources
see notebook "XX - Cleanup"