<a href="https://colab.research.google.com/github/FrankGangWang/TFX_Learn/blob/main/notebooks/official/automl/automl-tabular-classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Vertex AI SDK for Python: AutoML Tabular training and prediction

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/automl-tabular-classification.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/automl-tabular-classification.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/automl/automl-tabular-classification.ipynb">
        <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>
<br/><br/><br/>

## Overview

This tutorial demonstrates how to use the Vertex AI Python client library to train and deploy a tabular classification model for online prediction.

**Note**: you may incur charges for training, prediction, storage, or usage of other Google Cloud products in connection with testing this SDK.

Learn more about [Classification for tabular data](https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/overview).

### Objective

In this tutorial, you learn how to train and make predictions on an AutoML model based on a tabular dataset. Alternatively, you can train and make predictions on models by using the `gcloud` command-line tool or by using the online Cloud Console.

This tutorial uses the following Google Cloud ML services and resources:

- Vertex AI
- AutoML Tabular

The steps performed include the following:

- Create a Vertex AI model training job.
- Train an AutoML Tabular model.
- Deploy the `Model` resource to a serving `Endpoint` resource.
- Make a prediction by sending data.
- Undeploy the `Model` resource.

### Dataset

The dataset we are using is the PetFinder Dataset, available locally in Colab. To learn more about this dataset, visit https://www.kaggle.com/c/petfinder-adoption-prediction.

### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Installation

Install the packages required for executing this notebook.

In [None]:
!pip install google-cloud-aiplatform

In [None]:
!pip uninstall shapely
# Restart runtime

In [2]:
# Install the packages
! pip3 install --quiet --upgrade google-cloud-aiplatform \
                                 google-cloud-storage

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/114.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━[0m [32m61.4/114.6 kB[0m [31m1.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.6/114.6 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25h

### Colab only: Uncomment the following cell to restart the kernel

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

## Before you begin

### Set your project ID

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [4]:
PROJECT_ID = "tfx-template-396920"  # @param {type:"string"}

# Set the project id
! gcloud config set project {PROJECT_ID}

Updated property [core/project].


#### Region

You can also change the `REGION` variable used by Vertex AI. Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [3]:
REGION = "us-central1"  # @param {type: "string"}

### Authenticate your Google Cloud account

Depending on your Jupyter environment, you may have to manually authenticate. Follow the relevant instructions below.

**1. Vertex AI Workbench**
* Do nothing as you are already authenticated.

**2. Local JupyterLab instance, uncomment and run:**

In [None]:
# ! gcloud auth login

**3. Colab, uncomment and run:**

In [5]:
from google.colab import auth
auth.authenticate_user()

**4. Service account or other**
* See how to grant Cloud Storage permissions to your service account at https://cloud.google.com/storage/docs/gsutil/commands/iam#ch-examples.

### Create a Cloud Storage bucket

Create a storage bucket to store intermediate artifacts such as datasets.

In [6]:
BUCKET_URI = f"gs://bucket-automl-tabular-classify-{PROJECT_ID}-unique"  # @param {type:"string"}

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [7]:
! gsutil mb -l $REGION $BUCKET_URI

Creating gs://bucket-automl-tabular-classify-tfx-template-396920-unique/...
ServiceException: 409 A Cloud Storage bucket named 'bucket-automl-tabular-classify-tfx-template-396920-unique' already exists. Try another name. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.


### Copy dataset into your Cloud Storage bucket

In [8]:
IMPORT_FILE = "petfinder-tabular-classification.csv"
! gsutil cp gs://cloud-samples-data/ai-platform-unified/datasets/tabular/{IMPORT_FILE} {BUCKET_URI}/data/

gcs_source = f"{BUCKET_URI}/data/{IMPORT_FILE}"

Copying gs://cloud-samples-data/ai-platform-unified/datasets/tabular/petfinder-tabular-classification.csv [Content-Type=text/csv]...
/ [1 files][872.8 KiB/872.8 KiB]                                                
Operation completed over 1 objects/872.8 KiB.                                    


In [9]:
gcs_source

'gs://bucket-automl-tabular-classify-tfx-template-396920-unique/data/petfinder-tabular-classification.csv'

### Import Vertex AI SDK for Python

Import the Vertex AI SDK into your Python environment and initialize it.

In [10]:
import os

from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION)

## Tutorial

Now you are ready to create your AutoML Tabular model.

### Create a Managed Tabular Dataset from a CSV

This section will create a dataset from a CSV file stored on your GCS bucket.

In [11]:
#Creates a new tabular dataset.
ds = dataset = aiplatform.TabularDataset.create(
    display_name="petfinder-tabular-dataset",
    gcs_source=gcs_source,
)

ds.resource_name

INFO:google.cloud.aiplatform.datasets.dataset:Creating TabularDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create TabularDataset backing LRO: projects/40548613966/locations/us-central1/datasets/6955282958404026368/operations/9084757361855299584
INFO:google.cloud.aiplatform.datasets.dataset:TabularDataset created. Resource name: projects/40548613966/locations/us-central1/datasets/6955282958404026368
INFO:google.cloud.aiplatform.datasets.dataset:To use this TabularDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.TabularDataset('projects/40548613966/locations/us-central1/datasets/6955282958404026368')


'projects/40548613966/locations/us-central1/datasets/6955282958404026368'

### Launch a training job to create a Model

Once we have defined your training script, we will create a model. The `run` function creates a training pipeline that trains and creates a `Model` object. After the training pipeline completes, the `run` function returns the `Model` object.

In [13]:
#Constructs a AutoML Tabular Training Job.
job = aiplatform.AutoMLTabularTrainingJob(
    display_name="train-petfinder-automl-1",
    optimization_prediction_type="classification",
    column_transformations=[
        {"categorical": {"column_name": "Type"}},
        {"numeric": {"column_name": "Age"}},
        {"categorical": {"column_name": "Breed1"}},
        #{"categorical": {"column_name": "Color1"}},
        #{"categorical": {"column_name": "Color2"}},
        #{"categorical": {"column_name": "MaturitySize"}},
        #{"categorical": {"column_name": "FurLength"}},
        #{"categorical": {"column_name": "Vaccinated"}},
        #{"categorical": {"column_name": "Sterilized"}},
        #{"categorical": {"column_name": "Health"}},
        #{"numeric": {"column_name": "Fee"}},
        #{"numeric": {"column_name": "PhotoAmt"}},
    ],
)

In [14]:
#Runs the training job and returns a model.
# This will take around an hour to run
model = job.run(
    dataset=ds,
    target_column="Adopted",
    training_fraction_split=0.8,
    validation_fraction_split=0.1,
    test_fraction_split=0.1,
    model_display_name="adopted-prediction-model",
    disable_early_stopping=False,
)

INFO:google.cloud.aiplatform.training_jobs:View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/3606707720347975680?project=40548613966
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/40548613966/locations/us-central1/trainingPipelines/3606707720347975680 current state:
PipelineState.PIPELINE_STATE_PENDING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/40548613966/locations/us-central1/trainingPipelines/3606707720347975680 current state:
PipelineState.PIPELINE_STATE_PENDING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/40548613966/locations/us-central1/trainingPipelines/3606707720347975680 current state:
PipelineState.PIPELINE_STATE_PENDING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/40548613966/locations/us-central1/trainingPipelines/3606707720347975680 current state:
PipelineState.PIPELINE_STATE_PENDING
INFO:google.cloud.aipl

### Deploy your model

Before you use your model to make predictions, you need to deploy it to an `Endpoint`. You can do this by calling the `deploy` function on the `Model` resource. This function does two things:

1. Creates an `Endpoint` resource to which the `Model` resource will be deployed.
2. Deploys the `Model` resource to the `Endpoint` resource.

Deploy your model.

### NOTE: Wait until the model **FINISHES** deployment before proceeding to prediction.

You must deploy a model to an endpoint before that model can be used to serve online predictions. Deploying a model associates physical resources with the model so it can serve online predictions with low latency.

You can deploy more than one model to an endpoint, and you can deploy a model to more than one endpoint. For more information about options and use cases for deploying models, see Reasons to deploy more than one model to the same endpoint below.



In [15]:
endpoint = model.deploy(
    machine_type="n1-standard-4",
)

INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/40548613966/locations/us-central1/endpoints/2106420187237449728/operations/6422215188405026816
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/40548613966/locations/us-central1/endpoints/2106420187237449728
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/40548613966/locations/us-central1/endpoints/2106420187237449728')
INFO:google.cloud.aiplatform.models:Deploying model to Endpoint : projects/40548613966/locations/us-central1/endpoints/2106420187237449728
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/40548613966/locations/us-central1/endpoints/2106420187237449728/operations/4411357954784100352
INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/40548613966/loca

In [17]:
model

<google.cloud.aiplatform.models.Model object at 0x7f46f6096bc0> 
resource name: projects/40548613966/locations/us-central1/models/2089184242960433152

In [25]:
endpoint

<google.cloud.aiplatform.models.Endpoint object at 0x7f46f82bda20> 
resource name: projects/40548613966/locations/us-central1/endpoints/2106420187237449728

### Predict on the endpoint


* This sample instance is taken from an observation in which `Adopted` = **Yes**
* Note that the values are all strings. Since the original data was in CSV format, everything is treated as a string. The transformations you defined when creating your `AutoMLTabularTrainingJob` inform Vertex AI to transform the inputs to their defined types.


In [16]:
prediction = endpoint.predict(
    [
        {
            "Type": "Cat",
            "Age": "3",
            "Breed1": "Tabby",
            "Gender": "Male",
            "Color1": "Black",
            "Color2": "White",
            "MaturitySize": "Small",
            "FurLength": "Short",
            "Vaccinated": "No",
            "Sterilized": "No",
            "Health": "Healthy",
            "Fee": "100",
            "PhotoAmt": "2",
        }
    ]
)

print(prediction)

Prediction(predictions=[{'scores': [0.8063801527023315, 0.1936198770999908], 'classes': ['Yes', 'No']}], deployed_model_id='611233907043467264', model_version_id='1', model_resource_name='projects/40548613966/locations/us-central1/models/2089184242960433152', explanations=None)


In [20]:
prediction = endpoint.predict(
    [
        {
            "Type": "Cat",
            "Age": "3",
            "Breed1": "Tabby",
        }
    ]
)

print(prediction)

Prediction(predictions=[{'classes': ['Yes', 'No'], 'scores': [0.8063801527023315, 0.1936198770999908]}], deployed_model_id='611233907043467264', model_version_id='1', model_resource_name='projects/40548613966/locations/us-central1/models/2089184242960433152', explanations=None)


In [31]:
prediction = endpoint.predict(
    [
        {
            "Type": "Cat",
            "Age": "3",
            "Breed1": "Tabby",
        }
    ]
)

print(prediction)

Prediction(predictions=[{'scores': [0.8063801527023315, 0.1936198770999908], 'classes': ['Yes', 'No']}], deployed_model_id='611233907043467264', model_version_id='1', model_resource_name='projects/40548613966/locations/us-central1/models/2089184242960433152', explanations=None)


In [26]:
#https://github.com/googleapis/python-aiplatform/blob/main/samples/snippets/prediction_service/predict_tabular_classification_sample.py
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# [START aiplatform_predict_tabular_classification_sample]
from typing import Dict

from google.cloud import aiplatform
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value


def predict_tabular_classification_sample(
    project: str,
    endpoint_id: str,
    instance_dict: Dict,
    location: str = "us-central1",
    api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):
    # The AI Platform services require regional API endpoints.
    client_options = {"api_endpoint": api_endpoint}
    # Initialize client that will be used to create and send requests.
    # This client only needs to be created once, and can be reused for multiple requests.
    client = aiplatform.gapic.PredictionServiceClient(client_options=client_options)
    # for more info on the instance schema, please use get_model_sample.py
    # and look at the yaml found in instance_schema_uri
    instance = json_format.ParseDict(instance_dict, Value())
    instances = [instance]
    parameters_dict = {}
    parameters = json_format.ParseDict(parameters_dict, Value())
    endpoint = client.endpoint_path(
        project=project, location=location, endpoint=endpoint_id
    )
    response = client.predict(
        endpoint=endpoint, instances=instances, parameters=parameters
    )
    print("response")
    print(" deployed_model_id:", response.deployed_model_id)
    # See gs://google-cloud-aiplatform/schema/predict/prediction/tabular_classification_1.0.0.yaml for the format of the predictions.
    predictions = response.predictions
    for prediction in predictions:
        print(" prediction:", dict(prediction))


# [END aiplatform_predict_tabular_classification_sample]

In [32]:
predict_tabular_classification_sample(
    project="40548613966",
    endpoint_id="2106420187237449728",
    location="us-central1",
    instance_dict=
        {
            "Type": "Cat",
            "Age": "1",
            "Breed1": "Tabby",
        }

)

response
 deployed_model_id: 611233907043467264
 prediction: {'scores': [0.9098551869392395, 0.09014484286308289], 'classes': ['Yes', 'No']}


In [33]:
endpoint.predict(
    [
        {
            "Type": "Cat",
            "Age": "1",
            "Breed1": "Tabby",
        }
    ]
)

Prediction(predictions=[{'scores': [0.9098551869392395, 0.09014484286308289], 'classes': ['Yes', 'No']}], deployed_model_id='611233907043467264', model_version_id='1', model_resource_name='projects/40548613966/locations/us-central1/models/2089184242960433152', explanations=None)

# Method1: Predict on the endpoint
endpoint.predict(
    [
        {
            "Type": "Cat",
            "Age": "3",
            "Breed1": "Tabby",
        }
    ]
)
Prediction(predictions=[{'scores': [0.8063801527023315, 0.1936198770999908], 'classes': ['Yes', 'No']}], deployed_model_id='611233907043467264', model_version_id='1', model_resource_name='projects/40548613966/locations/us-central1/models/2089184242960433152', explanations=None)

#Predict via Python Request with predict_tabular_classification_sample():
response
 deployed_model_id: 611233907043467264
 prediction: {'classes': ['Yes', 'No'], 'scores': [0.8063801527023315, 0.1936198770999908]}




### Undeploy the model

To undeploy your `Model` resource from the serving `Endpoint` resource, use the endpoint's `undeploy` method with the following parameter:

- `deployed_model_id`: The model deployment identifier returned by the prediction service when the `Model` resource is deployed. You can retrieve the `deployed_model_id` using the prediction object's `deployed_model_id` property.

In [None]:
endpoint.undeploy(deployed_model_id=prediction.deployed_model_id)

# Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

- Training Job
- Model
- Endpoint
- Cloud Storage Bucket

**Note**: You must delete any `Model` resources deployed to the `Endpoint` resource before deleting the `Endpoint` resource.

In [None]:
# Warning: Setting this to true will delete everything in your bucket
delete_bucket = False

# Delete the training job
job.delete()

# Delete the model
model.delete()

# Delete the endpoint
endpoint.delete()

if delete_bucket or os.getenv("IS_TESTING"):
    ! gsutil -m rm -r $BUCKET_URI