In [1]:
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# AI Platform (Unified) client library: AutoML image classification model for export to edge

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/showcase_automl_image_classification_export_edge.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/ai-platform-unified/showcase_automl_image_classification_export_edge.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
</table>
<br/><br/><br/>

## Overview


This tutorial demonstrates how to use the AI Platform (Unified) Python client library to create image classification models to export as an Edge model using Google Cloud's AutoML.

### Dataset

The dataset used for this tutorial is the [Flowers dataset](https://www.tensorflow.org/datasets/catalog/tf_flowers) from [TensorFlow Datasets](https://www.tensorflow.org/datasets/catalog/overview). The version of the dataset you will use in this tutorial is stored in a public Cloud Storage bucket. The trained model predicts the type of flower an image is from a class of five flowers: daisy, dandelion, rose, sunflower, or tulip.

### Objective

In this notebook, you create a AutoML image classification model from a Python script using the AI Platform (Unified) client library, and then export the model as an Edge model in TFLite format. You can alternatively create models with AutoML using the `gcloud` command-line tool or online using the Google Cloud Console.

The steps performed include:

- Create an AI Platform (Unified) `Dataset` resource.
- Train the model.
- Export the `Edge` model from the `Model` resource to Cloud Storage.
- Download the model locally.
- Make a local prediction.

### Costs

This tutorial uses billable components of Google Cloud (GCP):

* AI Platform (Unified)
* Cloud Storage

Learn about [Cloud AI Platform
pricing](https://cloud.google.com/ai-platform-unified/pricing) and [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Installation

Install the latest (preview) version of AI Platform (Unified) client library.

In [2]:
! pip3 install -U google-cloud-aiplatform --user

Error processing line 1 of /home/jupyter/.local/lib/python3.7/site-packages/google-cloud-aiplatform-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.7/site.py", line 168, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 580, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.7/site-packages/google-cloud-aiplatform-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.7/site.py", line 168, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 580, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Requirement already up-to-date: google-cloud-aiplatform in ./.local/lib/python3.7/site-packages (0.6.0)
You should consider upgrading vi

Install the latest GA version of *google-cloud-storage* library as well.

In [3]:
! pip3 install -U google-cloud-storage

Error processing line 1 of /home/jupyter/.local/lib/python3.7/site-packages/google-cloud-aiplatform-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.7/site.py", line 168, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 580, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Error processing line 1 of /opt/conda/lib/python3.7/site-packages/google-cloud-aiplatform-nspkg.pth:

  Traceback (most recent call last):
    File "/opt/conda/lib/python3.7/site.py", line 168, in addpackage
      exec(line)
    File "<string>", line 1, in <module>
    File "<frozen importlib._bootstrap>", line 580, in module_from_spec
  AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored
Requirement already up-to-date: google-cloud-storage in /opt/conda/lib/python3.7/site-packages (1.36.2)
You should consider upgrading vi

### Restart the kernel

Once you've installed the AI Platform (Unified) client library and Google *cloud-storage*, you need to restart the notebook kernel so it can find the packages.

In [4]:
import os
if not os.getenv("AUTORUN"):
    # Automatically restart kernel after installs
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Before you begin

### GPU runtime

*Make sure you're running this notebook in a GPU runtime if you have that option. In Colab, select* **Runtime > Change Runtime Type > GPU**

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project.](https://cloud.google.com/billing/docs/how-to/modify-project)

3. [Enable the AI Platform APIs and Compute Engine APIs.](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com,compute_component)

4. [The Google Cloud SDK](https://cloud.google.com/sdk) is already installed in AI Platform Notebooks.

5. Enter your project ID in the cell below. Then run the  cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$` into these commands.

In [5]:
PROJECT_ID = "[your-project-id]" #@param {type:"string"}

In [6]:
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID:", PROJECT_ID)

Project ID: andy-1234-221921


In [7]:
! gcloud config set project $PROJECT_ID

Updated property [core/project].


#### Region

You can also change the `REGION` variable, which is used for operations
throughout the rest of this notebook.  Below are regions supported for AI Platform (Unified). We recommend that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

You may not use a multi-regional bucket for training with AI Platform (Unified). Not all regions provide support for all AI Platform (Unified) services. For the latest support per region, see the [AI Platform (Unified) locations documentation](https://cloud.google.com/ai-platform-unified/docs/general/locations)

In [8]:
REGION = 'us-central1' #@param {type: "string"}

#### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append onto the name of resources which will be created in this tutorial.

In [9]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

### Authenticate your Google Cloud account

**If you are using AI Platform notebooks**, your environment is already
authenticated. Skip this step.

*Note: If you are on an AI Platform notebook and run the cell, the cell knows to skip executing the authentication steps.*

In [10]:
import os
import sys

# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your Google Cloud account. This provides access
# to your Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

# If on AI Platform Notebooks, then don't execute this code
if not os.path.exists('/opt/deeplearning/metadata/env_version'):
    if 'google.colab' in sys.modules:
        from google.colab import auth as google_auth
        google_auth.authenticate_user()

    # If you are running this tutorial in a notebook locally, replace the string
    # below with the path to your service account key and run this cell to
    # authenticate your Google Cloud account.
    else:
        %env GOOGLE_APPLICATION_CREDENTIALS your_path_to_credentials.json

    # Log in to your account on Google Cloud
    ! gcloud auth login

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

This tutorial is designed to use training data that is in a public Cloud Storage bucket and a local Cloud Storage bucket for exporting the trained model. You may alternatively use your own training data that you have stored in a local Cloud Storage bucket.

Set the name of your Cloud Storage bucket below. It must be unique across all Cloud Storage buckets.

In [11]:
BUCKET_NAME = "gs://[your-bucket-name]" #@param {type:"string"}

In [12]:
if BUCKET_NAME == "" or BUCKET_NAME is None or BUCKET_NAME == "gs://[your-bucket-name]":
    BUCKET_NAME = "gs://" + PROJECT_ID + "aip-" + TIMESTAMP

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [13]:
! gsutil mb -l $REGION $BUCKET_NAME

Creating gs://andy-1234-221921aip-20210322213438/...


Finally, validate access to your Cloud Storage bucket by examining its contents:

In [14]:
! gsutil ls -al $BUCKET_NAME

### Set up variables

Next, set up some variables used throughout the tutorial.
### Import libraries and define constants

#### Import AI Platform (Unified) client library

Import the AI Platform (Unified) client library into our Python environment.

In [15]:
import os
import sys
import time

from google.cloud.aiplatform import gapic as aip

from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
from google.protobuf.struct_pb2 import Struct
from google.protobuf.json_format import MessageToJson
from google.protobuf.json_format import ParseDict

#### AI Platform (Unified) constants

Setup up the following constants for AI Platform (Unified):

- `API_ENDPOINT`: The AI Platform (Unified) API service endpoint for dataset, model, job, pipeline and endpoint services.
- `PARENT`: The AI Platform (Unified) location root path for dataset, model and endpoint resources.

In [16]:
# API service endpoint
API_ENDPOINT = "{0}-aiplatform.googleapis.com".format(REGION)

# AI Platform (Unified) location root path for your dataset, model and endpoint resources
PARENT = "projects/" + PROJECT_ID + "/locations/" + REGION

#### AutoML constants

Set constants unique to AutoML datasets and training:

- Dataset Schemas: Tells the `Dataset` resource service which type of dataset it is.
- Data Labeling (Annotations) Schemas: Tells the `Dataset` resource service how the data is labeled (annotated).
- Dataset Training Schemas: Tells the `Pipeline` resource service the task (e.g., classification) to train the model for.

In [17]:
# Image Dataset type
DATA_SCHEMA = 'google-cloud-aiplatform/schema/dataset/metadata/image_1.0.0.yaml'
# Image Labeling type
LABEL_SCHEMA = "gs://google-cloud-aiplatform/schema/dataset/ioformat/image_classification_single_label_io_format_1.0.0.yaml"
# Image Training task
TRAINING_SCHEMA = "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_image_classification_1.0.0.yaml"

# Tutorial

Now you are ready to start creating your own AutoML image classification model.

## Set up clients

The AI Platform (Unified) client library works as a client/server model. On your side (the Python script) you will create a client that sends requests and receives responses from the server (AI Platform).

You will use different clients in this tutorial for different steps in the workflow. So set them all up upfront.

- Dataset Service for `Dataset` resources.
- Model Service for `Model` resources.
- Pipeline Service for training.

In [18]:
# client options same for all services
client_options = {"api_endpoint": API_ENDPOINT}


def create_dataset_client():
    client = aip.DatasetServiceClient(
        client_options=client_options
    )
    return client


def create_model_client():
    client = aip.ModelServiceClient(
        client_options=client_options
    )
    return client


def create_pipeline_client():
    client = aip.PipelineServiceClient(
        client_options=client_options
    )
    return client



clients = {}
clients['dataset'] = create_dataset_client()
clients['model'] = create_model_client()
clients['pipeline'] = create_pipeline_client()

for client in clients.items():
    print(client)

('dataset', <google.cloud.aiplatform_v1.services.dataset_service.client.DatasetServiceClient object at 0x7f6f941ec210>)
('model', <google.cloud.aiplatform_v1.services.model_service.client.ModelServiceClient object at 0x7f6f941ec310>)
('pipeline', <google.cloud.aiplatform_v1.services.pipeline_service.client.PipelineServiceClient object at 0x7f6f941ec110>)


## Dataset

Now that your clients are ready, your first step in training a model is to create a managed dataset instance, and then upload your labeled data to it.

### Create `Dataset` resource instance

Use the helper function `create_dataset` to create the instance of a `Dataset` resource. This function does the following:

1. Uses the dataset client service.
2. Creates an AI Platform (Unified) `Dataset` resource (`aip.Dataset`), with the following parameters:
 - `display_name`: The human-readable name you choose to give it.
 - `metadata_schema_uri`: The schema for the dataset type.
3. Calls the client dataset service method `create_dataset`, with the following parameters:
 - `parent`: The AI Platform (Unified) location root path for your `Database`, `Model` and `Endpoint` resources.
 - `dataset`: The AI Platform (Unified) dataset object instance you created.
4. The method returns an `operation` object.

An `operation` object is how AI Platform (Unified) handles asynchronous calls for long running operations. While this step usually goes fast, when you first use it in your project, there is a longer delay due to provisioning.

You can use the `operation` object to get status on the operation (e.g., create `Dataset` resource) or to cancel the operation, by invoking an operation method:

| Method      | Description |
| ----------- | ----------- |
| result()    | Waits for the operation to complete and returns a result object in JSON format.      |
| running()   | Returns True/False on whether the operation is still running.        |
| done()      | Returns True/False on whether the operation is completed. |
| canceled()  | Returns True/False on whether the operation was canceled. |
| cancel()    | Cancels the operation (this may take up to 30 seconds). |

In [19]:
TIMEOUT = 90

def create_dataset(name, schema, labels=None, timeout=TIMEOUT):
    start_time = time.time()
    try:
        dataset = aip.Dataset(display_name=name, metadata_schema_uri="gs://" + schema, labels=labels)

        operation = clients['dataset'].create_dataset(parent=PARENT, dataset=dataset)
        print("Long running operation:", operation.operation.name)
        result = operation.result(timeout=TIMEOUT)
        print("time:", time.time() - start_time)
        print("response")
        print(" name:", result.name)
        print(" display_name:", result.display_name)
        print(" metadata_schema_uri:", result.metadata_schema_uri)
        print(" metadata:", dict(result.metadata))
        print(" create_time:", result.create_time)
        print(" update_time:", result.update_time)
        print(" etag:", result.etag)
        print(" labels:", dict(result.labels))
        return result
    except Exception as e:
        print("exception:", e)
        return None


result = create_dataset("flowers-" + TIMESTAMP, DATA_SCHEMA)

Long running operation: projects/759209241365/locations/us-central1/datasets/5516962320086990848/operations/4383300960663896064
time: 5.463537216186523
response
 name: projects/759209241365/locations/us-central1/datasets/5516962320086990848
 display_name: flowers-20210322213438
 metadata_schema_uri: gs://google-cloud-aiplatform/schema/dataset/metadata/image_1.0.0.yaml
 metadata: {'dataItemSchemaUri': 'gs://google-cloud-aiplatform/schema/dataset/dataitem/image_1.0.0.yaml'}
 create_time: None
 update_time: None
 etag: 
 labels: {'aiplatform.googleapis.com/dataset_metadata_schema': 'IMAGE'}


Now save the unique dataset identifier for the `Dataset` resource instance you created.

In [20]:
# The full unique ID for the dataset
dataset_id = result.name
# The short numeric ID for the dataset
dataset_short_id = dataset_id.split('/')[-1]

print(dataset_id)

projects/759209241365/locations/us-central1/datasets/5516962320086990848


### Data preparation

The AI Platform (Unified) `Dataset` resource for images has some requirements for your data:

- Images must be stored in a Cloud Storage bucket.
- Each image file must be in an image format (PNG, JPEG, BMP, ...).
- There must be an index file stored in your Cloud Storage bucket that contains the path and label for each image.
- The index file must be either CSV or JSONL.

#### CSV

For image classification, the CSV index file has the requirements:

- No heading.
- First column is the Cloud Storage path to the image.
- Second column is the label.

#### Location of Cloud Storage training data.

Now set the variable `IMPORT_FILE` to the location of the CSV index file in Cloud Storage.

In [21]:
IMPORT_FILE = 'gs://cloud-samples-data/vision/automl_classification/flowers/all_data_v2.csv'

#### Quick peek at your data

You will use a version of the Flowers dataset that is stored in a public Cloud Storage bucket, using a CSV index file.

Let's start by doing a quick peek at the data. You count the number of examples by counting the number of rows in the CSV index file  (`wc -l`) and then peek at the first few rows.

In [22]:
if 'IMPORT_FILES' in globals():
    FILE = IMPORT_FILES[0]
else:
    FILE = IMPORT_FILE

count = ! gsutil cat $FILE | wc -l
print("Number of Examples", int(count[0]))

print("First 10 rows")
! gsutil cat $FILE | head

Number of Examples 3669
First 10 rows
gs://cloud-ml-data/img/flower_photos/daisy/100080576_f52e8ee070_n.jpg,daisy
gs://cloud-ml-data/img/flower_photos/daisy/10140303196_b88d3d6cec.jpg,daisy
gs://cloud-ml-data/img/flower_photos/daisy/10172379554_b296050f82_n.jpg,daisy
gs://cloud-ml-data/img/flower_photos/daisy/10172567486_2748826a8b.jpg,daisy
gs://cloud-ml-data/img/flower_photos/daisy/10172636503_21bededa75_n.jpg,daisy
gs://cloud-ml-data/img/flower_photos/daisy/102841525_bd6628ae3c.jpg,daisy
gs://cloud-ml-data/img/flower_photos/daisy/1031799732_e7f4008c03.jpg,daisy
gs://cloud-ml-data/img/flower_photos/daisy/10391248763_1d16681106_n.jpg,daisy
gs://cloud-ml-data/img/flower_photos/daisy/10437754174_22ec990b77_m.jpg,daisy
gs://cloud-ml-data/img/flower_photos/daisy/10437770546_8bb6f7bdd3_m.jpg,daisy


### Import data

Now, import the data into your AI Platform (Unified) Dataset resource. Use this helper function `import_data` to import the data. The function does the following:

- Uses the `Dataset` client.
- Calls the client method `import_data`, with the following parameters:
 - `name`: The human readable name you give to the `Dataset` resource (e.g., flowers).
 - `import_configs`: The import configuration.

- `import_configs`: A Python list containing a dictionary, with the key/value entries:
 - `gcs_sources`: A list of URIs to the paths of the one or more index files.
 - `import_schema_uri`: The schema identifying the labeling type.

The `import_data()` method returns a long running `operation` object. This will take a few minutes to complete. If you are in a live tutorial, this would be a good time to ask questions, or take a personal break.

In [23]:
def import_data(dataset, gcs_sources, schema):
    config = [{
        'gcs_source': {'uris': gcs_sources},
        'import_schema_uri': schema
    }]
    print("dataset:", dataset_id)
    start_time = time.time()
    try:
        operation = clients['dataset'].import_data(name=dataset_id, import_configs=config)
        print("Long running operation:", operation.operation.name)

        result = operation.result()
        print("result:", result)
        print("time:", int(time.time() - start_time), "secs")
        print("error:", operation.exception())
        print("meta :", operation.metadata)
        print("after: running:", operation.running(), "done:", operation.done(), "cancelled:", operation.cancelled())

        return operation
    except Exception as e:
        print("exception:", e)
        return None


import_data(dataset_id, [IMPORT_FILE], LABEL_SCHEMA)

dataset: projects/759209241365/locations/us-central1/datasets/5516962320086990848
Long running operation: projects/759209241365/locations/us-central1/datasets/5516962320086990848/operations/7015654942861950976
result: 
time: 886 secs
error: None
meta : generic_metadata {
  partial_failures {
    code: 6
  }
  partial_failures {
    code: 6
  }
  create_time {
    seconds: 1616448888
    nanos: 939269000
  }
  update_time {
    seconds: 1616449716
    nanos: 906810000
  }
}

after: running: False done: True cancelled: False


<google.api_core.operation.Operation at 0x7f6f941a4fd0>

## Train the model

Now train an AutoML image classification model using your AI Platform (Unified) `Dataset` resource. To train the model, do the following steps:

1. Create an AI Platform (Unified) training pipeline for the `Dataset` resource.
2. Execute the pipeline to start the training.

### Create a training pipeline

You may ask, what do we use a pipeline for? You typically use pipelines when the job (such as training) has multiple steps, generally in sequential order: do step A, do step B, etc. By putting the steps into a pipeline, we gain the benefits of:

1. Being reusable for subsequent training jobs.
2. Can be containerized and ran as a batch job.
3. Can be distributed.
4. All the steps are associated with the same pipeline job for tracking progress.

Use this helper function `create_pipeline`, which takes the following parameters:

- `pipeline_name`: A human readable name for the pipeline job.
- `model_name`: A human readable name for the model.
- `dataset`: The AI Platform (Unified) fully qualified dataset identifier.
- `schema`: The dataset labeling (annotation) training schema.
- `task`: A dictionary describing the requirements for the training job.

The helper function calls the `Pipeline` client service'smethod `create_pipeline`, which takes the following parameters:

- `parent`: The AI Platform (Unified) location root path for your `Dataset`, `Model` and `Endpoint` resources.
- `training_pipeline`: the full specification for the pipeline training job.

Let's look now deeper into the *minimal* requirements for constructing a `training_pipeline` specification:

- `display_name`: A human readable name for the pipeline job.
- `training_task_definition`: The dataset labeling (annotation) training schema.
- `training_task_inputs`: A dictionary describing the requirements for the training job.
- `model_to_upload`: A human readable name for the model.
- `input_data_config`: The dataset specification.
 - `dataset_id`: The AI Platform (Unified) dataset identifier only (non-fully qualified) -- this is the last part of the fully-qualified identifier.
 - `fraction_split`: If specified, the percentages of the dataset to use for training, test and validation. Otherwise, the percentages are automatically selected by AutoML.

In [24]:
def create_pipeline(pipeline_name, model_name, dataset, schema, task):

    dataset_id = dataset.split('/')[-1]

    input_config = {'dataset_id': dataset_id,
                    'fraction_split': {
                        'training_fraction': 0.8,
                        'validation_fraction': 0.1,
                        'test_fraction': 0.1
                    }}

    training_pipeline = {
        "display_name": pipeline_name,
        "training_task_definition": schema,
        "training_task_inputs": task,
        "input_data_config": input_config,
        "model_to_upload": {"display_name": model_name},
    }

    try:
        pipeline = clients['pipeline'].create_training_pipeline(parent=PARENT, training_pipeline=training_pipeline)
        print(pipeline)
    except Exception as e:
        print("exception:", e)
        return None
    return pipeline

### Construct the task requirements

Next, construct the task requirements. Unlike other parameters which take a Python (JSON-like) dictionary, the `task` field takes a Google protobuf Struct, which is very similar to a Python dictionary. Use the `json_format.ParseDict` method for the conversion.

The minimal fields we need to specify are:

- `multi_label`: Whether True/False this is a multi-label (vs single) classification.
- `budget_milli_node_hours`: The maximum time to budget (billed) for training the model, where 1000 = 1 hour. For image classification, the budget must be a minimum of 8 hours.
- `model_type`: The type of deployed model:
  - `CLOUD`: For deploying to Google Cloud.
  - `MOBILE_TF_LOW_LATENCY_1`: For deploying to the edge and optimizing for latency (response time).
  - `MOBILE_TF_HIGH_ACCURACY_1`: For deploying to the edge and optimizing for accuracy.
  - `MOBILE_TF_VERSATILE_1`: For deploying to the edge and optimizing for a trade off between latency and accuracy.
- `disable_early_stopping`: Whether True/False to let AutoML use its judgement to stop training early or train for the entire budget.

Finally, create the pipeline by calling the helper function `create_pipeline`, which returns an instance of a training pipeline object.

In [25]:
PIPE_NAME = "flowers_pipe-" + TIMESTAMP
MODEL_NAME = "flowers_model-" + TIMESTAMP

task = json_format.ParseDict({'multi_label': False,
                              'budget_milli_node_hours': 8000,
                              'model_type': "MOBILE_TF_LOW_LATENCY_1",
                              'disable_early_stopping': False
                             }, Value())

response = create_pipeline(PIPE_NAME, MODEL_NAME, dataset_id, TRAINING_SCHEMA, task)

name: "projects/759209241365/locations/us-central1/trainingPipelines/4350837879853809664"
display_name: "flowers_pipe-20210322213438"
input_data_config {
  dataset_id: "5516962320086990848"
  fraction_split {
    training_fraction: 0.8
    validation_fraction: 0.1
    test_fraction: 0.1
  }
}
training_task_definition: "gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_image_classification_1.0.0.yaml"
training_task_inputs {
  struct_value {
    fields {
      key: "budgetMilliNodeHours"
      value {
        string_value: "8000"
      }
    }
    fields {
      key: "modelType"
      value {
        string_value: "MOBILE_TF_LOW_LATENCY_1"
      }
    }
  }
}
model_to_upload {
  display_name: "flowers_model-20210322213438"
}
state: PIPELINE_STATE_PENDING
create_time {
  seconds: 1616449775
  nanos: 383725000
}
update_time {
  seconds: 1616449775
  nanos: 383725000
}



Now save the unique identifier of the training pipeline you created.

In [26]:
# The full unique ID for the pipeline
pipeline_id = response.name
# The short numeric ID for the pipeline
pipeline_short_id = pipeline_id.split('/')[-1]

print(pipeline_id)

projects/759209241365/locations/us-central1/trainingPipelines/4350837879853809664


### Get information on a training pipeline

Now get pipeline information for just this training pipeline instance. The helper function  gets the job information for just this job by calling the the job client service's `get_training_pipeline` method, with the following parameter:

- `name`: The AI Platform (Unified) fully qualified pipeline identifier.

When the model is done training, the pipeline state will be `PIPELINE_STATE_SUCCEEDED`.

In [27]:
def get_training_pipeline(name, silent=False):
    response = clients['pipeline'].get_training_pipeline(name=name)
    if silent:
        return response

    print("pipeline")
    print(" name:", response.name)
    print(" display_name:", response.display_name)
    print(" state:", response.state)
    print(" training_task_definition:", response.training_task_definition)
    print(" training_task_inputs:", dict(response.training_task_inputs))
    print(" create_time:", response.create_time)
    print(" start_time:", response.start_time)
    print(" end_time:", response.end_time)
    print(" update_time:", response.update_time)
    print(" labels:", dict(response.labels))
    return response


response = get_training_pipeline(pipeline_id)

pipeline
 name: projects/759209241365/locations/us-central1/trainingPipelines/4350837879853809664
 display_name: flowers_pipe-20210322213438
 state: PipelineState.PIPELINE_STATE_PENDING
 training_task_definition: gs://google-cloud-aiplatform/schema/trainingjob/definition/automl_image_classification_1.0.0.yaml
 training_task_inputs: {'budgetMilliNodeHours': '8000', 'modelType': 'MOBILE_TF_LOW_LATENCY_1'}
 create_time: 2021-03-22 21:49:35.383725+00:00
 start_time: None
 end_time: None
 update_time: 2021-03-22 21:49:35.383725+00:00
 labels: {}


# Deployment

Training the above model may take upwards of 30 minutes time.

Once your model is done training, you can calculate the actual time it took to train the model by subtracting `end_time` from `start_time`. For your model, you will need to know the fully qualified AI Platform (Unified) Model resource identifier, which the pipeline service assigned to it. You can get this from the returned pipeline instance as the field `model_to_deploy.name`.

In [28]:
while True:
    response = get_training_pipeline(pipeline_id, True)
    if response.state != aip.PipelineState.PIPELINE_STATE_SUCCEEDED:
        print("Training job has not completed:", response.state)
        model_to_deploy_id = None
        if response.state == aip.PipelineState.PIPELINE_STATE_FAILED:
            raise Exception("Training Job Failed")
    else:
        model_to_deploy = response.model_to_upload
        model_to_deploy_id = model_to_deploy.name
        print("Training Time:", response.end_time - response.start_time)
        break
    time.sleep(60)

print("model to deploy:", model_to_deploy_id)

Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: PipelineState.PIPELINE_STATE_RUNNING
Training job has not completed: Pi

## Model information

Now that your model is trained, you can get some information on your model.

## Evaluate the Model resource

Now find out how good the model service believes your model is. As part of training, some portion of the dataset was set aside as the test (holdout) data, which is used by the pipeline service to evaluate the model.

### List evaluations for all slices

Use this helper function `list_model_evaluations`, which takes the following parameter:

- `name`: The AI Platform (Unified) fully qualified model identifier for the `Model` resource.

This helper function uses the model client service's `list_model_evaluations` method, which takes the same parameter. The response object from the call is a list, where each element is an evaluation metric.

For each evaluation (you probably only have one) we then print all the key names for each metric in the evaluation, and for a small set (`logLoss` and `auPrc`) you will print the result.

In [29]:
def list_model_evaluations(name):
    response = clients['model'].list_model_evaluations(parent=name)
    for evaluation in response:
        print("model_evaluation")
        print(" name:", evaluation.name)
        print(" metrics_schema_uri:", evaluation.metrics_schema_uri)
        metrics = json_format.MessageToDict(evaluation._pb.metrics)
        for metric in metrics.keys():
            print(metric)
        print('logloss', metrics['logLoss'])
        print('auPrc', metrics['auPrc'])


    return evaluation.name


last_evaluation = list_model_evaluations(model_to_deploy_id)

model_evaluation
 name: projects/759209241365/locations/us-central1/models/5339668269131366400/evaluations/4370802811990966272
 metrics_schema_uri: gs://google-cloud-aiplatform/schema/modelevaluation/classification_metrics_1.0.0.yaml
confusionMatrix
confidenceMetrics
auPrc
logLoss
logloss 0.116246305
auPrc 0.9707106


## Export as Edge model

You can export an AutoML image classification model as an Edge model which you can then custom deploy to an edge device, such as a mobile phone or IoT device, or download locally. Use this helper function `export_model` to export the model to Google Cloud, which takes the following parameters:

- `name`: The AI Platform (Unified) fully qualified identifier for the `Model` resource.
- `gcs_dest`: The Cloud Storage location to store the SavedFormat model artifacts to.

This function calls the `Model` client service's method `export_model`, with the following parameters:

- `name`: The AI Platform (Unified) fully qualified identifier for the `Model` resource.
- `output_config`: The destination information for the exported model.
  - `artifact_destination.output_uri_prefix`: The Cloud Storage location to store the SavedFormat model artifacts to.
  - `export_format_id`: The format to save the model format as. For AutoML image classification:
   - `tf-saved-model`: TensorFlow SavedFormat for deployment to a container.
   - `tflite`: TensorFlow Lite for deployment to an edge or mobile device.
   - `edgetpu-tflite`: TensorFlow Lite for TPU
   - `tf-js`: TensorFlow for web client
   - `coral-ml`: for Coral devices

The method returns a long running operation `response`. We will wait sychronously for the operation to complete by calling the `response.result()`, which will block until the model is exported.

In [30]:
MODEL_DIR = BUCKET_NAME + '/' + "flowers"

def export_model(name, gcs_dest):
    output_config = {
        "artifact_destination": {"output_uri_prefix": gcs_dest},
        "export_format_id": "tflite",
    }
    response = clients['model'].export_model(name=name, output_config=output_config)
    print("Long running operation:", response.operation.name)
    result = response.result(timeout=1800)
    metadata = response.operation.metadata
    artifact_uri = str(metadata.value).split("\\")[-1][4:-1]
    print("Artifact Uri", artifact_uri)
    return artifact_uri


model_package = export_model(model_to_deploy_id, MODEL_DIR)

Long running operation: projects/759209241365/locations/us-central1/models/5339668269131366400/operations/5790112894263754752
Artifact Uri gs://andy-1234-221921aip-20210322213438/flowers/model-5339668269131366400/tflite/2021-03-22T23:03:20.935787Z


#### Download the TFLite model artifacts

Now that you have an exported TFLite version of your model, you can test the exported model locally, but first downloading it from Cloud Storage.

In [31]:
! gsutil ls $model_package
# Download the model artifacts
! gsutil cp -r $model_package tflite

tflite_path = 'tflite/model.tflite'

gs://andy-1234-221921aip-20210322213438/flowers/model-5339668269131366400/tflite/2021-03-22T23:03:20.935787Z/model.tflite
Copying gs://andy-1234-221921aip-20210322213438/flowers/model-5339668269131366400/tflite/2021-03-22T23:03:20.935787Z/model.tflite...
/ [1 files][553.6 KiB/553.6 KiB]                                                
Operation completed over 1 objects/553.6 KiB.                                    


#### Instantiate a TFLite interpreter

The TFLite version of the model is not a TensorFlow SavedModel format. You cannot directly use methods like predict(). Instead, one uses the TFLite interpreter. You must first setup the interpreter for the TFLite model as follows:

- Instantiate an TFLite interpreter for the TFLite model.
- Instruct the interpreter to allocate input and output tensors for the model.
- Get detail information about the models input and output tensors that will need to be known for prediction.

In [32]:
import tensorflow as tf

interpreter = tf.lite.Interpreter(model_path=tflite_path)
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_shape = input_details[0]['shape']

print("input tensor shape", input_shape)

input tensor shape [  1 224 224   3]


### Get test item

You will use an arbitrary example out of the dataset as a test item. Don't be concerned that the example was likely used in training the model -- we just want to demonstrate how to make a prediction.

In [33]:
test_items = ! gsutil cat $IMPORT_FILE | head -n1
test_item = test_items[0].split(',')[0]

with tf.io.gfile.GFile(test_item, "rb") as f:
        content = f.read()
test_image = tf.io.decode_jpeg(content)
print("test image shape", test_image.shape)

test_image = tf.image.resize(test_image, (224, 224))
print("test image shape", test_image.shape, test_image.dtype)

test_image = tf.cast(test_image, dtype=tf.uint8).numpy()

test image shape (263, 320, 3)
test image shape (224, 224, 3) <dtype: 'float32'>


#### Make a prediction with TFLite model

Finally, you do a prediction using your TFLite model, as follows:

- Convert the test image into a batch of a single image (`np.expand_dims`)
- Set the input tensor for the interpreter to your batch of a single image (`data`).
- Invoke the interpreter.
- Retrieve the softmax probabilities for the prediction (`get_tensor`).
- Determine which label had the highest probability (`np.argmax`).

In [34]:
import numpy as np

data = np.expand_dims(test_image, axis=0)

interpreter.set_tensor(input_details[0]['index'], data)

interpreter.invoke()

softmax = interpreter.get_tensor(output_details[0]['index'])

label = np.argmax(softmax)

print(label)

4


# Cleaning up

To clean up all GCP resources used in this project, you can [delete the GCP
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

- Dataset
- Pipeline
- Model
- Endpoint
- Batch Job
- Cloud Storage Bucket

In [35]:
delete_dataset = True
delete_pipeline = True
delete_model = True
delete_endpoint = True
delete_batchjob = True
delete_bucket = True

# Delete the dataset using the AI Platform (Unified) fully qualified identifier for the dataset
try:
    if delete_dataset:
        clients['dataset'].delete_dataset(name=dataset_id)
except Exception as e:
    print(e)

# Delete the training pipeline using the AI Platform (Unified) fully qualified identifier for the pipeline
try:
    if delete_pipeline:
        clients['pipeline'].delete_training_pipeline(name=pipeline_id)
except Exception as e:
    print(e)

# Delete the model using the AI Platform (Unified) fully qualified identifier for the model
try:
    if delete_model:
        clients['model'].delete_model(name=model_to_deploy_id)
except Exception as e:
    print(e)

# Delete the endpoint using the AI Platform (Unified) fully qualified identifier for the endpoint
try:
    if delete_endpoint:
        clients['endpoint'].delete_endpoint(name=endpoint_id)
except Exception as e:
    print(e)

# Delete the batch job using the AI Platform (Unified) fully qualified identifier for the batch job
try:
    if delete_batchjob:
        clients['job'].delete_batch_prediction_job(name=batch_job_id)
except Exception as e:
    print(e)

if delete_bucket and 'BUCKET_NAME' in globals():
    ! gsutil rm -r $BUCKET_NAME

'endpoint'
'job'
Removing gs://andy-1234-221921aip-20210322213438/flowers/model-5339668269131366400/tflite/2021-03-22T23:03:20.935787Z/model.tflite#1616454204503970...
/ [1 objects]                                                                   
Operation completed over 1 objects.                                              
Removing gs://andy-1234-221921aip-20210322213438/...
