In [None]:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

This notebook is an updated version of a notebook contributed by [Mohammad Al-Ansari](https://github.com/Mansari). Special thanks to [Andrew Ferlitsch](https://github.com/andrewferlitsch) for his reviews and edits.

This is an extension of the [Vertex AI SDK for Python: AutoML training text entity extraction model for online prediction notebook](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/sdk_automl_text_entity_extraction_online.ipynb) originally co-authored by [Andrew Ferlitsch](https://github.com/andrewferlitsch) and [
Karl Weinmeister](https://github.com/kweinmeister). This version add the use of `Vision API` and `BigQuery` to preprocess a `Vertex AI AutoML` dataset for text entity extraction model training.

# E2E ML on GCP: MLOps stage 2 : experiments: get started Vision API test preprocessing and AutoML text model generation

<table align="left">
  <td>
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage2/get_started_with_visionapi_and_automl.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Colab logo"> Run in Colab
    </a>
  </td>
  <td>
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage2/get_started_with_visionapi_and_automl.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      View on GitHub
    </a>
  </td>
  <td>
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/community/ml_ops/stage2/get_started_with_visionapi_and_automl.ipynb">
     <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      Open in Vertex AI Workbench
    </a>
  </td>
</table>
<br/><br/><br/>

## Overview

This tutorial demonstrates how to use `BigQuery`, `Vision AI`, and `Vertex AI SDK` for Python to train a text entity extraction model based on existing training data.

### Objective

In this tutorial, you create an `AutoML` text entity extraction model pre-existing extracted data by generating a custom import file. You deploy this mode for online prediction from a Python script using the `BigQuery`, `Vision AI`, Cloud Storage and `Vertex AI SDK` for Python. You can alternatively create and deploy models using the `gcloud` command-line tool or online using the Cloud Console.

Using existing training data that have been previously annotated can be very useful in training a model, as it allows you to use a larger data set with minimal resources.

This tutorial uses the following Google Cloud services:

- `BigQuery`
- `Vision AI`
- `Vertex AI AutoML`

The steps performed include:

- Preprocess training files using `Vision AI` APIs to extract the text from PDF files.
- Create a custom import file that includes annotation data based on the sample `BigQuery` dataset.
- Create a `Vertex AI Dataset` resource.
- Train the model.
- View the model evaluation.
- Deploy the `Vertex AI Model` resource to a serving `Endpoint` resource.
- Make a prediction.
- Undeploy the `Model`.

### Dataset

The dataset used for this tutorial is the [Patent PDF Samples with Extracted Structured Data](https://console.cloud.google.com/marketplace/product/global-patents/labeled-patents) from Google Public Data Sets. 

This dataset includes data extracted from over 300 patent documents issued in the US and EU. The dataset includes links to Cloud Storage blobs for the first page of each patent, in addition to a number of extracted entities. 

The data is published as a [public dataset](https://cloud.google.com/bigquery/public-data) on `BigQuery`.

### Costs

This tutorial uses billable components of Google Cloud:

* BigQuery
* Vision API
* Vertex AI
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing), [Vision API pricing](https://cloud.google.com/vision/pricing), [BigQuery pricing](https://cloud.google.com/bigquery/pricing), [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

### Set up your local development environment

If you are using Colab or Vertex AI Workbench Notebooks, your environment already meets all the requirements to run this notebook.  

Otherwise, make sure your environment meets this notebook's requirements. You need the following:

- The BigQuery SDK
- The Vision API SDK
- The Vertex AI SDK
- The Cloud Storage SDK
- Git
- Python 3
- virtualenv
- Jupyter notebook running in a virtual environment with Python 3

The Cloud Storage guide to [Setting up a Python development environment](https://cloud.google.com/python/setup) and the [Jupyter installation guide](https://jupyter.org/install) provide detailed instructions for meeting these requirements. The following steps provide a condensed set of instructions:

1. [Install and initialize the SDKs](https://cloud.google.com/sdk/docs/).

2. [Install Python 3](https://cloud.google.com/python/setup#installing_python).

3. [Install virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv) and create a virtual environment that uses Python 3.  Activate the virtual environment.

4. To install Jupyter, run `pip3 install jupyter` on the command-line in a terminal shell.

5. To launch Jupyter, run `jupyter notebook` on the command-line in a terminal shell.

6. Open this notebook in the Jupyter Notebook Dashboard.


## Installation

Install the packages required for executing this notebook. You can ignore errors for the `pip` dependecy resolver as they do not impact this notebook.

In [None]:
import os

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME") and not os.getenv("VIRTUAL_ENV")
IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(
    "/opt/deeplearning/metadata/env_version"
)

# Vertex AI Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_WORKBENCH_NOTEBOOK:
    USER_FLAG = "--user"

! pip3 install --upgrade google-cloud-aiplatform google-cloud-bigquery google-cloud-vision google-cloud-storage pandas $USER_FLAG -q

### Restart the kernel

Once you've installed the additional packages, you need to restart the notebook kernel so it can find the packages.

In [None]:
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Before you begin

### GPU runtime

*Make sure you're running this notebook in a GPU runtime if you have that option. In Colab, select* **Runtime > Change Runtime Type > GPU**

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project.](https://cloud.google.com/billing/docs/how-to/modify-project)

3. [Enable the following APIs: BigQuery APIs, Vision API, Vertex AI APIs, Compute Engine APIs, and Cloud Storage.](https://console.cloud.google.com/flows/enableapi?apiid=bigquery.googleapis.com,vision.googleapis.com,aiplatform.googleapis.com,compute_component,storage-component.googleapis.com)

4. If you are running this notebook locally, you need to install the [Cloud SDK]((https://cloud.google.com/sdk)).

5. Enter your project ID in the cell below. Then run the  cell to make sure the
Cloud SDK uses the right project for all the commands in this notebook.

**Note**: Jupyter runs lines prefixed with `!` as shell commands, and it interpolates Python variables prefixed with `$`.

#### Set your project ID

**If you don't know your project ID**, you may be able to get your project ID using `gcloud`.

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

In [None]:
if PROJECT_ID == "" or PROJECT_ID is None or PROJECT_ID == "[your-project-id]":
    # Get your GCP project id from gcloud
    shell_output = ! gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = shell_output[0]
    print("Project ID:", PROJECT_ID)

In [None]:
! gcloud config set project $PROJECT_ID

### Regions

#### Vision AI

You can now specify continent-level data storage and Optical Character Regonition (OCR) processing by setting the `VISION_AI_REGION` variable. You can select one of the following options:

* USA country only: `us`
* The European Union: `eu`

Learn more about [Vision AI regions for OCR](https://cloud.google.com/vision/docs/pdf#regionalization)

#### Vertex AI

You can also change the `VERTEX_AI_REGION` variable, which is used for operations throughout the rest of this notebook.  Below are regions supported for Vertex AI. We recommend that you choose the region closest to you.

- Americas: `us-central1`
- Europe: `europe-west4`
- Asia Pacific: `asia-east1`

You may not use a multi-regional bucket for training with Vertex AI. Not all regions provide support for all Vertex AI services.

Learn more about [Vertex AI regions](https://cloud.google.com/vertex-ai/docs/general/locations).

In [None]:
VISION_AI_REGION = "[your-region]"  # @param {type: "string"}

if VISION_AI_REGION == "[your-region]":
    VISION_AI_REGION = "us"

VERTEX_AI_REGION = "[your-region]"  # @param {type: "string"}

if VERTEX_AI_REGION == "[your-region]":
    VERTEX_AI_REGION = "us-central1"

### Timestamp

If you are in a live tutorial session, you might be using a shared test account or project. To avoid name collisions between users on resources created, you create a timestamp for each instance session, and append the timestamp onto the name of resources you create in this tutorial.

In [None]:
from datetime import datetime

TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

### Authenticate your Google Cloud account

**If you are using Vertex AI Workbench**, your environment is already authenticated. Skip this step. If you receive errors still, you may have to grant the service account that is your Workbench notebook is running under access to the services listed below.

**If you are using Colab**, run the cell below and follow the instructions when prompted to authenticate your account via oAuth.

**Otherwise**, follow these steps:

In the Cloud Console, go to the [Create service account key](https://console.cloud.google.com/apis/credentials/serviceaccountkey) page.

**Click Create service account**.

In the **Service account name** field, enter a name, and click **Create**.

In the **Grant this service account access to project** section, click the Role drop-down list. Type "Vertex" into the filter box, and select **Vertex AI Administrator**. Type "Storage Object Admin" into the filter box, and select **Storage Object Admin**.

Click Create. A JSON file that contains your key downloads to your local environment.

Enter the path to your service account key as the GOOGLE_APPLICATION_CREDENTIALS variable in the cell below and run the cell.

In [None]:
# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account. This provides access to your
# Cloud Storage bucket and lets you submit training jobs and prediction
# requests.

import os
import sys

# If on Vertex AI Workbench, then don't execute this code
IS_COLAB = False
if not os.path.exists("/opt/deeplearning/metadata/env_version") and not os.getenv(
    "DL_ANACONDA_HOME"
):
    if "google.colab" in sys.modules:
        IS_COLAB = True
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP
    # account.
    elif not os.getenv("IS_TESTING"):
        %env GOOGLE_APPLICATION_CREDENTIALS ''

### Create a Cloud Storage bucket

**The following steps are required, regardless of your notebook environment.**

When you initialize the Vertex AI SDK for Python, you specify a Cloud Storage staging bucket. The staging bucket is where all the data associated with your dataset and model resources are retained across sessions. This bucket will be also used to store the output of the Vision API SDK PDF-to-text conversion process.

Set the name of your Cloud Storage bucket below. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.

In [None]:
BUCKET_NAME = "[your-bucket-name]"  # @param {type:"string"}
BUCKET_URI = f"gs://{BUCKET_NAME}"

In [None]:
if BUCKET_NAME == "" or BUCKET_NAME is None or BUCKET_NAME == "[your-bucket-name]":
    BUCKET_NAME = PROJECT_ID + "aip-" + TIMESTAMP
    BUCKET_URI = "gs://" + BUCKET_NAME

**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l $VERTEX_AI_REGION -p $PROJECT_ID $BUCKET_URI

Finally, validate access to your Cloud Storage bucket by examining its contents:

In [None]:
! gsutil ls -al $BUCKET_URI

### Set up variables

Next, set up some variables used throughout the tutorial.

### Import libraries and define constants

In [None]:
import os

from google.cloud import aiplatform, bigquery, storage, vision

### Initalize BigQuery SDK for Python

Create the BigQuery client.

In [None]:
bq_client = bigquery.Client(project=PROJECT_ID)

### Initialize Vision API SDK for Python

In [None]:
vision_client_options = {
    "quota_project_id": PROJECT_ID,
    "api_endpoint": f"{VISION_AI_REGION}-vision.googleapis.com",
}
vision_client = vision.ImageAnnotatorClient(client_options=vision_client_options)

### Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
aiplatform.init(
    project=PROJECT_ID, location=VERTEX_AI_REGION, staging_bucket=BUCKET_URI
)

### Initialize Cloud Storage SDK for Python

In [None]:
storage_client = storage.Client(project=PROJECT_ID)

## Tutorial

Now you are ready to start creating your own AutoML text entity extraction model with a custom import file.

### Query the Patent PDF Samples dataset

First, you run a query to select a subset of the data for your training purposes.

In [None]:
# Select only patents issued in the US and in English
query = """
    SELECT * FROM `bigquery-public-data.labeled_patents.extracted_data`
    WHERE issuer = 'US' and language = 'EN'
    ORDER BY gcs_path ASC
    """

query_job = bq_client.query(query)

# Convert the results into a pandas dataframe
results_df = query_job.result().to_dataframe()

print(f"Retrieved {len(results_df)} rows")

#### Review the data

You can take a quick peek at the content of the data by showing the first 5 rows.


In [None]:
print(results_df.head(5))

The `gcs_path` contains the full path to the PDFs. To make your life easier, you are going to add another column to the data frame that includes the file name only (no extenstion).You do this using a pandas transformation.

Learn more about [pandas text manipultion](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html)

In [None]:
# Apply a transformation and store the result in a new column
results_df["gcs_filename"] = results_df["gcs_path"].apply(
    lambda x: os.path.splitext(os.path.basename(x))[0]
)

# Take a look at the data to see the newly added filename at the end
print(results_df.head(5))

### Create the AutoML dataset import file

Now that we have the data ready, we can move to the next step.

#### Preprocess files using Vision API

Vertex AI needs plain text for training a text entity extraction model. However, our sample dataset doest not contain the full text that was used to extract the data, but instead contains a link to the PDF version. 

[Here is a sample of one of the PDF files](https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/main/notebooks/community/ml_ops/stage2/notebook_resources/get_started_with_visionapi_and_automl/us_001.pdf)

You need to convert the PDF files to text so that you can use it in building your annotation file. You use the `Vision AI` document text detection to accomplish this. 

Learn more about [Detect text in files (PDF/TIFF)](https://cloud.google.com/vision/docs/pdf).

In [None]:
# Specify a destination path inside our bucket
gcs_destination_path = "ocr-output"
gcs_destination_uri = f"{BUCKET_URI}/{gcs_destination_path}"

# Specify the feature for the Vision API processor
feature = vision.Feature(type_=vision.Feature.Type.DOCUMENT_TEXT_DETECTION)

# Create a collection of requests. The SDK requires a separate request per each
# file that we want to extract text from
async_requests = []

# Build the requests array by iterating through our dataset
for i in range(len(results_df)):
    gcs_uri = results_df.loc[i, "gcs_path"]

    # Build input_config
    gcs_source = vision.GcsSource(uri=gcs_uri)
    input_config = vision.InputConfig(
        gcs_source=gcs_source, mime_type="application/pdf"
    )

    # Build output config
    gcs_source_filename = results_df.loc[i, "gcs_filename"]
    gcs_destination = vision.GcsDestination(
        uri=f"{gcs_destination_uri}/{gcs_source_filename}-"
    )
    output_config = vision.OutputConfig(gcs_destination=gcs_destination)

    # Build request object and add to the collection
    async_request = vision.AsyncAnnotateFileRequest(
        features=[feature], input_config=input_config, output_config=output_config
    )

    async_requests.append(async_request)

print(f"Created {len(async_requests)} requests")

# Submit the batch OCR job
operation = vision_client.async_batch_annotate_files(requests=async_requests)
print("Submitting the batch OCR job")

print("Waiting for the operation to finish... this will take a short while")

response = operation.result(timeout=420)

print("Completed!")

### Generate the import file

Since you already have extracted data from the patents, you use this data to generate annotations that you use to train the `AutoML` model. 

You do this by looking up certain extracted entities in the converted text and then specifiying the location of that text in the document. 

#### Generate the annotations

For the purposes of this tutorial you are only going to annotate three entities: publication data, application number and the first line of the inventor(s) names. You can extend the code to annotate additional entites that are available in the extracted dataset.

In [None]:
import json

# Remove the leading gs:// from the bucket name by splitting the name
# and taking the second part
gcs_bucket_name = BUCKET_URI.split("gs://")[1]

output_bucket = storage_client.bucket(gcs_bucket_name)
# print(output_bucket)
print("Creating annotation set based on output text files and dataset")

annotations = []

# Loop through the dataset
for i in range(len(results_df)):
    gcs_uri = results_df.loc[i, "gcs_path"]

    # For each row, try to read the resulting text file from the output directory
    # based on some path name
    gcs_output_filename = f"{results_df.loc[i, 'gcs_filename']}-output-1-to-1.json"
    blob_name = f"{gcs_destination_path}/{gcs_output_filename}"
    blob = output_bucket.blob(blob_name)
    contents = blob.download_as_string()

    json_object = json.loads(contents)

    full_text = json_object["responses"][0]["fullTextAnnotation"]["text"]

    text_segment_annotations = []

    # Prepare labels to be annotated
    labels = [
        {
            "name": "lbl_publication_date",
            "value": results_df.loc[i, "publication_date"],
        },
        {
            "name": "lbl_application_number",
            "value": results_df.loc[i, "application_number"],
        },
        {"name": "lbl_inventor_line_1", "value": results_df.loc[i, "inventor_line_1"]},
    ]

    # Search for the labels in the patent text
    for label in labels:
        startIndex = 0

        while True:
            found_index = full_text.find(label["value"], startIndex)
            if found_index == -1:
                break
            else:
                end_offset = found_index + len(label["value"])
                text_segment_annotations.append(
                    {
                        "startOffset": found_index,
                        "endOffset": end_offset,
                        "displayName": label["name"],
                    }
                )
                # Move the startIndex in the search to the end offset so we can
                # find the next match
                startIndex = end_offset

    # Create the annotation object and add to the collection
    annotation = {
        "textSegmentAnnotations": text_segment_annotations,
        "textContent": full_text,
    }

    annotations.append(annotation)

print("Done !")

#### Generate and upload the import file

Now that you have the annotations, you generate the import file based on the schema mentioned in [this guide](https://cloud.google.com/vertex-ai/docs/datasets/prepare-text#entity-extraction).

You upload your import file to your bucket to be used for `AutoML` training.

In [None]:
# Convert array to JSONL content
jsonl_output = ""
for annotation in annotations:
    jsonl_output += json.dumps(annotation) + "\n"

print(f"Created import file based on {len(annotations)} annotations")

# Upload content to GCS to be used in our next step
gcs_annotation_file_name = "annotation_file/import_file.jsonl"
import_file_blob = output_bucket.blob(gcs_annotation_file_name)
import_file_blob.upload_from_string(jsonl_output)

print(f"Uploaded import file to {output_bucket.name}/{gcs_annotation_file_name}")

### Create the Vertex AI Dataset

Next, create the `Dataset` resource using the `create` method for the `TextDataset` class, which takes the following parameters:

- `display_name`: The human readable name for the `Dataset` resource.
- `gcs_source`: A list of one or more dataset index files to import the data items into the `Dataset` resource.
- `import_schema_uri`: The data labeling schema for the data items.

This operation may take ten to twenty minutes.

In [None]:
dataset = aiplatform.TextDataset.create(
    display_name="Patent PDF Samples" + "_" + TIMESTAMP,
    gcs_source=[f"gs://{output_bucket.name}/{gcs_annotation_file_name}"],
    import_schema_uri=aiplatform.schema.dataset.ioformat.text.extraction,
)

print(dataset.resource_name)

Now that the dataset is imported, you can also review the dataset annotations in the Cloud Console to verify the patent text have been successfully annotated.

### Create and run `AutoML` training pipeline

To train an `AutoML` model, you perform two steps: 1) create an `AutoML` training pipeline, and 2) run the pipeline.

#### Create training pipeline

An `AutoML` training pipeline is created with the `AutoMLTextTrainingJob` class, with the following parameters:

- `display_name`: The human readable name for the `TrainingJob` resource.
- `prediction_type`: The type task to train the model for.
  - `classification`: A text classification model.
  - `sentiment`: A text sentiment analysis model.
  - `extraction`: A text entity extraction model.
- `multi_label`: If a classification task, whether single (False) or multi-labeled (True).
- `sentiment_max`: If a sentiment analysis task, the maximum sentiment value.


In [None]:
job = aiplatform.AutoMLTextTrainingJob(
    display_name="patent_sample_" + TIMESTAMP, prediction_type="extraction"
)

print("Training job created!")

#### Run the training pipeline

Next, you start the training job by invoking the method `run`, with the following parameters:

- `dataset`: The `Dataset` resource to train the model.
- `model_display_name`: The human readable name for the trained model.
- `training_fraction_split`: The percentage of the dataset to use for training.
- `test_fraction_split`: The percentage of the dataset to use for test (holdout data).
- `validation_fraction_split`: The percentage of the dataset to use for validation.

The `run` method when completed returns the `Model` resource.

The execution of the training pipeline will take upto 4 hours.

_If you are using Colab, please make sure your session does not time out while waiting for the model training to be completed. You may run into issues and have to reinitalize the notebook and manually set the model evaluation in the next step_

In [None]:
model = job.run(
    dataset=dataset,
    model_display_name="patent_sample_" + TIMESTAMP,
    training_fraction_split=0.8,
    validation_fraction_split=0.1,
    test_fraction_split=0.1,
)

## Review model evaluation scores

After your model training has finished, you can review the evaluation scores for it using the `list_model_evaluations()` method. This method will return an iterator for each evaluation slice.

In [None]:
model_evaluations = model.list_model_evaluations()

for model_evaluation in model_evaluations:
    print(model_evaluation.to_dict())

## Deploy the `AutoML` model to a `Vertex AI Endpoint` resource

Next, deploy your `AutoML` model to a `Vertex AI Endpoint` resource for online prediction. To deploy the model, you invoke the `deploy` method. 

This will take a few minutes.

In [None]:
endpoint = model.deploy()

## Send a online prediction request

Send a online prediction to your deployed model.

### Make a test item

Next, you run a test using the extracted text from first page of a patent that was not part of our original dataset. [Here is a link to the PDF file that was converted to text below.](https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/main/notebooks/community/ml_ops/stage2/notebook_resources/get_started_with_visionapi_and_automl/sample_patent.pdf)

The file was downloaded from the United States Patent and Trademark Office (USPTO) that follows the same format as your training set.

You have previously extracted text from the image using Vision API document text detection process and included the text below.

In [None]:
test_item = "US010143036B2\n(12) United States Patent\nKhan\n(10) Patent No.: US 10,143,036 B2\n(45) Date of Patent: Nov. 27, 2018\n(54) MILLIMETER WAVE WIRELESS SYSTEM\nUSING LICENSED AND UNLICENSED\nFREQUENCY SPRECTRUM\n(58) Field of Classification Search\nCPC . H04W 74/006; H04W 80/02; H04W 34/005;\nHOW 84/04; HOW 84/045; H04W\n88/06; H04W 92/20; H04W 74/0816;\nHO4B 7/02\nSee application file for complete search history.\n(71) Applicant: Phazr, Inc., Allen, TX (US)\n(72) Inventor: Farooq Khan, Allen, TX (US)\n(56)\n(73) Assignee: Phazr, Inc., Allen, TX (US)\nReferences Cited\nU.S. PATENT DOCUMENTS\n(*) Notice:\nSubject to any disclaimer, the term of this\npatent is extended or adjusted under 35\nU.S.C. 154(b) by 44 days.\n10,057,898 B2 *\n10,075,852 B2 *\n2013/0072125 A1*\n8/2018 Khan\n9/2018 Nekovee\n3/2013 Yoon\nH04W 72/044\nH04W 16/28\nHO1P 1/10\n455/67.11\nH04B 1/40\n455/78\n(21) Appl. No.: 15/644,553\n2014/0106686 A1*\n4/2014 Higgins\n(22) Filed:\nJul. 7, 2017\n(Continued)\n(65)\nPrior Publication Data\nUS 2018/0035487 A1\nFeb. 1, 2018\nPrimary Examiner — Devan Sandiford\n(74) Attorney, Agent, or Firm — Michael A. Rahman\nRelated U.S. Application Data\n(60) Provisional application No. 62/369,038, filed on Jul.\n30, 2016.\n(51) Int. CI.\nH0W 84/04\n(2009.01)\nH04W 74/00\n(2009.01)\nH04W 84/00\n(2009.01)\nH04W 92/20\n(2009.01)\nH04W 80/02\n(2009.01)\nH0W 88/06\n(2009.01)\nH04B 7/02\n(2018.01)\nH04W 74/08\n(2009.01)\n(52) U.S. CI.\nCPC\nH04W 84/045 (2013.01); H04B 7/02\n(2013.01); H04W 74/006 (2013.01); H04W\n80/02 (2013.01); H04W 84/005 (2013.01);\nH04W 84/04 (2013.01); H04W 88/06\n(2013.01); H04W 92/20 (2013.01); H04W\n74/0816 (2013.01)\n(57)\nABSTRACT\nA method of wireless communication includes transmitting\nin a downlink direction on a licensed millimeter wave band,\nby a radio base station, a first millimeter wave band signal\nat high transmit equivalent isotropically radiated power\n(EIRP) using a multiple input multiple output transmit\nantenna array. The method includes receiving by a commu-\nnications device the first millimeter wave band signal. The\nmethod includes transmitting in an uplink direction on an\nunlicensed millimeter wave band, by the communications\ndevice, a second millimeter wave band signal at low transmit\nequivalent isotropically radiated power (EIRP) using a mul-\ntiple input multiple output transmit antenna array. The\nmethod includes receiving on the unlicensed millimeter\nwave band, by the radio base station, the second millimeter\nwave band signal at a high receive gain using a multiple\ninput multiple output receive antenna array.\n26 Claims, 11 Drawing Sheets\n100\nHigh-Gain\nantenna array\nHigh power\nPower amplifiers\nComplex\nTX\nTX\nLow Transmit\nEIRP\nLicensed\nSpectrum.\nRX\nLow Receive\nGain\nUnlicensed\nSpectrum\nHighly\nsensitive\nRX\nCommunication Device\nDevice\nC1\nLow-Noise\nFigure\nLNA\nAccess Point\nDevice\n00\nDevice\nC2\nAccess\nPoint\nAO\n"

### Make the prediction

Now that your `Model` resource is deployed to an `Endpoint` resource, you can do online predictions by sending prediction requests to the `Endpoint` resource.

#### Request

The format of each instance is:

     { 'content': text_string }

Since the predict() method can take multiple items (instances), send your single test item as a list of one test item.

#### Response

The response from the predict() call is a Python dictionary with the following entries:

- `ids`: The internal assigned unique identifiers for each prediction request.
- `displayNames`: The class names for each entity.
- `confidences`: The predicted confidence, between 0 and 1, per entity.
- `textSegmentStartOffsets`: The character offset in the text to the start of the entity.
- `textSegmentEndOffsets`: The character offset in the text to the end of the entity.
- `deployed_model_id`: The Vertex AI identifier for the deployed `Model` resource which did the predictions.

In [None]:
import json

instances_list = [{"content": test_item}]

prediction = endpoint.predict(instances_list)
print(json.dumps(prediction, indent=4))

## Undeploy the model

When you are done doing predictions, you undeploy the model from the `Endpoint` resouce. This deprovisions all compute resources and ends billing for the deployed model.

In [None]:
endpoint.undeploy_all()

# Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial.

In [None]:
delete_bucket = False

# Delete the dataset using the Vertex dataset object
dataset.delete()

# Delete the model using the Vertex model object
model.delete()

# Delete the endpoint using the Vertex endpoint object
endpoint.delete()

# Delete the AutoML or Pipeline training job
job.delete()

if delete_bucket or os.getenv("IS_TESTING"):
    ! gsutil -m rm -r $BUCKET_URI

print("Clean up completed!")