# Exporting a BigQuery ML Model for Online Prediction

**Learning Objectives**

1. Train and deploy a logistic regression model - also applies to DNN classifier, DNN regressor, k-means, linear regression, and matrix factorization models.
2. Train and deploy a Boosted Tree classifier model - also applies to Boosted Tree regressor model.
3. Train and deploy an AutoML classifier model - also applies to AutoML regressor model.

## Introduction 
In this notebook, you will learn how to [export a BigQuery ML model](https://cloud.google.com/bigquery-ml/docs/exporting-models) and then deploy the model on AI Platform. You will use the iris table from the BigQuery public datasets and work through the three end-to-end scenarios.

Each learning objective will correspond to a __#TODO__ in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](../solutions/export_a_bigquery_ml_model.ipynb).

## Set up environment variables and load necessary libraries

In [None]:
!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst

Check that the Google BigQuery library is installed and if not, install it.

In [1]:
!pip install --user google-cloud-bigquery==1.25.0

Collecting google-cloud-bigquery==1.25.0
Downloading google_cloud_bigquery-1.25.0-py2.py3-none-any.whl (169 kB)
|████████████████████████████████| 169 kB 4.8 MB/s eta 0:00:01
Collecting google-resumable-media<0.6dev,>=0.5.0
Downloading google_resumable_media-0.5.1-py2.py3-none-any.whl (38 kB)
Installing collected packages: google-resumable-media, google-cloud-bigquery
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.
We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.
google-cloud-storage 1.30.0 requires google-resumable-media<2.0dev,>=0.6.0, but you'll have google-resumable-media 0.5.1 which is incompatible.
Successfully installed google-cloud-bigquery-1.25.0 google-resumable-media-0.5.1


**Note**: Restart your kernel to use updated packages.

Kindly ignore the deprecation warnings and incompatibility errors related to google-cloud-storage.

Import necessary libraries.

In [None]:
import os
from google.cloud import bigquery

## Set environment variables.

Set environment variables so that we can use them throughout the entire lab. We will be using our project name for our bucket, so you only need to change your project and region.

In [None]:
%%bash
export PROJECT=$(gcloud config list project --format "value(core.project)")
echo "Your current GCP Project Name is: "$PROJECT

In [None]:
# TODO: Change environment variables
PROJECT = "cloud-training-demos"  # REPLACE WITH YOUR PROJECT NAME
BUCKET = "BUCKET"  # REPLACE WITH YOUR BUCKET NAME, DEFAULT BUCKET WILL BE PROJECT ID
REGION = "us-central1"  # REPLACE WITH YOUR BUCKET REGION e.g. us-central1

# Do not change these
os.environ["BUCKET"] = PROJECT if BUCKET == "BUCKET" else BUCKET # DEFAULT BUCKET WILL BE PROJECT ID
os.environ["REGION"] = REGION

if PROJECT == "cloud-training-demos":
    print("Don't forget to update your PROJECT name! Currently:", PROJECT)

## Create a BigQuery Dataset and Google Cloud Storage Bucket

A BigQuery dataset is a container for tables, views, and models built with BigQuery ML. Let's create one called **bqml_tutorial**. We'll do the same for a GCS bucket for our project too.

In [None]:
%%bash

## Create a BigQuery dataset bqml_tutorial
    
    bq --location=US mk --dataset \
        --description "bqml_tutorial" \
        $PROJECT:bqml_tutorial
    echo "Here are your current datasets:"
    bq ls

## Train and deploy a logistic regression model

**Train the model**

Train a logistic regression model that predicts iris type using the BigQuery ML [CREATE MODEL](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create#create_model_statement) statement. This training job should take approximately 1 minute to complete.

In [None]:
%%bash

bq query --use_legacy_sql=false \
  'CREATE MODEL `bqml_tutorial.iris_model`
  OPTIONS (model_type="logistic_reg",
      max_iterations=10, input_label_cols=["species"])
  AS SELECT
    *
  FROM
    `bigquery-public-data.ml_datasets.iris`;'


**Export the model**

Export the model to a Cloud Storage bucket using the [bq command-line tool](https://cloud.google.com/bigquery/docs/bq-command-line-tool). For additional ways to export models, see [Exporting BigQuery ML models](https://cloud.google.com/bigquery-ml/docs/exporting-models#exporting_models). This extract job should take less than 1 minute to complete.

In [None]:
%%bash
bq extract -m bqml_tutorial.iris_model gs://$BUCKET/iris_model

**Local deployment and serving**

You can deploy exported TensorFlow models using the TensorFlow Serving Docker container. The following steps require you to install [Docker](https://hub.docker.com/search/?type=edition&offering=community).
Download the exported model files to a temporary directory

In [None]:
%%bash
mkdir tmp_dir
gcloud storage cp --recursive gs://$BUCKET/iris_model tmp_dir

**Create a version subdirectory**

This step sets a version number (1 in this case) for the model.

In [None]:
%%bash
mkdir -p serving_dir/iris_model/1
cp -r tmp_dir/iris_model/* serving_dir/iris_model/1
rm -r tmp_dir

**Pull the docker image**

In [None]:
%%bash
docker pull tensorflow/serving

**Run the Docker container**


In [None]:
%%bash
docker run -p 8500:8500 --network="host" --mount type=bind,source=`pwd`/serving_dir/iris_model,target=/models/iris_model -e MODEL_NAME=iris_model -t tensorflow/serving &

**Run the prediction**

In [None]:
%%bash
curl -d '{"instances": [{"sepal_length":5.0, "sepal_width":2.0, "petal_length":3.5, "petal_width":1.0}]}' -X POST http://localhost:8501/v1/models/iris_model:predict

**Online deployment and serving**

This section uses the [gcloud command-line tool](https://cloud.google.com/sdk/gcloud) to deploy and run predictions against the exported model. For more details about deploying a model to AI Platform for online/batch predictions, see [Deploying models](https://cloud.google.com/ai-platform/prediction/docs/deploying-models).


**Note: Execute the following commands in the Cloud Shell of Cloud Platform Console till the Run predict command. Click Activate Cloud Shell icon to open the cloud shell and click Continue.**

**Create a model resource**

In [None]:
MODEL_NAME="IRIS_MODEL"
gcloud ai-platform models create $MODEL_NAME

**Create a model version**

Set the environment variables

In [None]:
# Replace the BUCKET_NAME with your bucket name.
MODEL_DIR="gs://<BUCKET_NAME>/iris_model"
VERSION_NAME="v1"
FRAMEWORK="TENSORFLOW"

Create the version

In [None]:
gcloud ai-platform versions create $VERSION_NAME --model=$MODEL_NAME --origin=$MODEL_DIR --runtime-version=2.1 --framework=$FRAMEWORK

This step might take a few minutes to complete. You should see the message Creating version (this might take a few minutes).......

Get information about your new version.

In [None]:
gcloud ai-platform versions describe $VERSION_NAME --model $MODEL_NAME

**Online prediction**

The details about running online predictions against a deployed model are available at [Getting online predictions](https://cloud.google.com/ai-platform/prediction/docs/online-predict#requesting_predictions)
Create a newline-delimited JSON file for inputs, for example **instances.json** file with the following content.

In [None]:
{"sepal_length":5.0, "sepal_width":2.0, "petal_length":3.5, "petal_width":1.0}
{"sepal_length":5.3, "sepal_width":3.7, "petal_length":1.5, "petal_width":0.2}

Setup env variables for predict

In [None]:
INPUT_DATA_FILE="instances.json"

Run predict

In [None]:
gcloud ai-platform predict --model $MODEL_NAME --version $VERSION_NAME --json-instances $INPUT_DATA_FILE

## Train and deploy a Boosted Tree classifier model

**Train the model**

Train a Boosted Tree classifier model that predicts iris type using the BigQuery ML [CREATE MODEL](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create#create_model_statement) statement. This training job should take approximately 7 minutes to complete.

In [None]:
%%bash

bq query --use_legacy_sql=false \
 'CREATE MODEL `bqml_tutorial.boosted_tree_iris_model`
 OPTIONS (model_type="boosted_tree_classifier",
 max_iterations=10, input_label_cols=["species"])
 AS SELECT
 *
 FROM
 `bigquery-public-data.ml_datasets.iris`;'


**Export the model**

Export the model to a Cloud Storage bucket using the [bq command-line tool](https://cloud.google.com/bigquery/docs/bq-command-line-tool). For additional ways to export models, see [Exporting BigQuery ML models](https://cloud.google.com/bigquery-ml/docs/exporting-models#exporting_models).

In [None]:
%%bash
bq extract --destination_format ML_XGBOOST_BOOSTER -m bqml_tutorial.boosted_tree_iris_model gs://$BUCKET/boosted_tree_iris_model

**Local deployment and serving**

In the exported files, there is a main.py file for local run.

**Download the exported model files to a local directory**

In [None]:
%%bash
mkdir serving_dir
gcloud storage cp --recursive gs://$BUCKET/boosted_tree_iris_model serving_dir


**Extract predictor.py**

In [None]:
%%bash
tar -xvf serving_dir/boosted_tree_iris_model/xgboost_predictor-0.1.tar.gz -C serving_dir/boosted_tree_iris_model/

**Install XGBoost library**
Install the [XGBoost library](https://xgboost.readthedocs.io/en/latest/build.html) - version 0.82 or later.
Run the prediction

In [None]:
%%bash
pip3 install xgboost

In [None]:
%%bash
cd serving_dir/boosted_tree_iris_model/
python main.py '[{"sepal_length":5.0, "sepal_width":2.0, "petal_length":3.5, "petal_width":1.0}]'

**Online deployment and serving**

This section uses the [gcloud command-line tool](https://cloud.google.com/sdk/gcloud) to deploy and run predictions against the exported model.

For more details about deploying a model to AI Platform for online/batch predictions using custom routines, see [Deploying models](https://cloud.google.com/ai-platform/prediction/docs/deploying-models).


**Note: Execute the following commands in the Cloud Shell of Cloud Platform Console till the Run predict command.**

**Create a model resource**

In [None]:
MODEL_NAME="BOOSTED_TREE_IRIS_MODEL"
gcloud ai-platform models create $MODEL_NAME

**Create a model version**

Set the environment variables

In [None]:
# Replace the BUCKET_NAME with your bucket name.
MODEL_DIR="gs://<BUCKET_NAME>/boosted_tree_iris_model"
VERSION_NAME="v1"

Create the version

In [None]:
gcloud beta ai-platform versions create $VERSION_NAME --model=$MODEL_NAME --origin=$MODEL_DIR --package-uris=${MODEL_DIR}/xgboost_predictor-0.1.tar.gz --prediction-class=predictor.Predictor --runtime-version=2.1

This step might take a few minutes to complete. You should see the message Creating version (this might take a few minutes).......

Get information about your new version.

In [None]:
gcloud ai-platform versions describe $VERSION_NAME --model $MODEL_NAME

**Online prediction**

For more details about running online predictions against a deployed model, see [Requesting predictions](https://cloud.google.com/ai-platform/prediction/docs/online-predict#requesting_predictions).

Create a newline-delimited JSON file for inputs. For example, **instances.json** file with the following content. Ignore if already created.

In [None]:
{"sepal_length":5.0, "sepal_width":2.0, "petal_length":3.5, "petal_width":1.0}
{"sepal_length":5.3, "sepal_width":3.7, "petal_length":1.5, "petal_width":0.2}

Setup env variables for predict

In [None]:
INPUT_DATA_FILE="instances.json"

Run predict

In [None]:
gcloud ai-platform predict --model $MODEL_NAME --version $VERSION_NAME --json-instances $INPUT_DATA_FILE

## Train and deploy an AutoML classifier model

**Train the model**

Train an AutoML classifier model that predicts iris type using the BigQuery ML [CREATE MODEL](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create#create_model_statement) statement. AutoML models need at least 1000 rows of input data. Because ml_datasets.iris only has 150 rows, we duplicate the data 10 times. **This training job should take around 2 hours to complete**.

In [None]:
%%bash

bq query --use_legacy_sql=false \
  'CREATE MODEL `bqml_tutorial.automl_iris_model`
  OPTIONS (model_type="automl_classifier",
      budget_hours=1, input_label_cols=["species"])
  AS SELECT
    * EXCEPT(multiplier)
  FROM
    `bigquery-public-data.ml_datasets.iris`, unnest(GENERATE_ARRAY(1, 10)) as multiplier;'


**Export the model**

Export the model to a Cloud Storage bucket using the [bq command-line tool](https://cloud.google.com/bigquery/docs/bq-command-line-tool). For additional ways to export models, see [Exporting BigQuery ML models](https://cloud.google.com/bigquery-ml/docs/exporting-models#exporting_models).

In [None]:
%%bash
bq extract -m bqml_tutorial.automl_iris_model gs://$BUCKET/automl_iris_model

**Local deployment and serving**

For details about building AutoML containers, see [Exporting models](https://cloud.google.com/automl-tables/docs/model-export). The following steps require you to install [Docker](https://hub.docker.com/search/?type=edition&offering=community).

**Copy exported model files to a local directory**

In [None]:
%%bash
mkdir automl_serving_dir
gcloud storage cp --recursive gs://$BUCKET/automl_iris_model/* automl_serving_dir/

**Pull AutoML Docker image**


In [None]:
%%bash
docker pull gcr.io/cloud-automl-tables-public/model_server

**Start Docker container**

In [None]:
%%bash
docker run -v `pwd`/automl_serving_dir:/models/default/0000001 -p 8080:8080 -it gcr.io/cloud-automl-tables-public/model_server

**Run the prediction**

Create a newline-delimited JSON file for inputs. For example, **input.json** file with the following contents:

In [None]:
{"instances": [{"sepal_length":5.0, "sepal_width":2.0, "petal_length":3.5, "petal_width":1.0},
{"sepal_length":5.3, "sepal_width":3.7, "petal_length":1.5, "petal_width":0.2}]}

Make the predict call


In [None]:
%%bash
curl -X POST --data @input.json http://localhost:8080/predict

**Online deployment and serving**

Online prediction for AutoML regressor and AutoML classifier models is not supported in AI Platform.

Copyright 2020 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License