# Part 5: Deploy the solution to AI Platform Prediction

This notebook is the fifth of five notebooks that guide you through running the [Real-time Item-to-item Recommendation with BigQuery ML Matrix Factorization and ScaNN](https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/tree/master/retail/recommendation-system/bqml-scann) solution.

Use this notebook to complete the following tasks:

1. Deploy the embedding lookup model to AI Platform Prediction. 
2. Deploy the ScaNN matching service to AI Platform Prediction by using a custom container. The ScaNN matching service is an application that wraps the ANN index model and provides additional functionality, like mapping item IDs to item embeddings.
3. Optionally, export and deploy the matrix factorization model to AI Platform for exact matching.

Before starting this notebook, you must run the [04_build_embeddings_scann](04_build_embeddings_scann.ipynb) notebook to build an approximate nearest neighbor (ANN) index for the item embeddings.


## Setup

Import the required libraries, configure the environment variables, and authenticate your GCP account.

### Import libraries

In [None]:
import numpy as np
import tensorflow as tf

### Configure GCP environment settings

Update the following variables to reflect the values for your GCP environment:

+ `PROJECT_ID`: The ID of the Google Cloud project you are using to implement this solution.
+ `PROJECT_NUMBER`: The number of the Google Cloud project you are using to implement this solution. You can find this in the **Project info** card on the [project dashboard page](https://pantheon.corp.google.com/home/dashboard).
+ `BUCKET`: The name of the Cloud Storage bucket you created to use with this solution. The `BUCKET` value should be just the bucket name, so `myBucket` rather than `gs://myBucket`.
+ `REGION`: The region to use for the AI Platform Prediction job.

In [None]:
PROJECT_ID = 'yourProject' # Change to your project.
PROJECT_NUMBER = 'yourProjectNumber' # Change to your project number
BUCKET = 'yourBucketName' # Change to the bucket you created.
REGION = 'yourPredictionRegion' # Change to your AI Platform Prediction region.
ARTIFACTS_REPOSITORY_NAME = 'ml-serving'

EMBEDDNIG_LOOKUP_MODEL_OUTPUT_DIR = f'gs://{BUCKET}/bqml/embedding_lookup_model'
EMBEDDNIG_LOOKUP_MODEL_NAME = 'item_embedding_lookup'
EMBEDDNIG_LOOKUP_MODEL_VERSION = 'v1'

INDEX_DIR = f'gs://{BUCKET}/bqml/scann_index'
SCANN_MODEL_NAME = 'index_server'
SCANN_MODEL_VERSION = 'v1'

KIND = 'song'

!gcloud config set project $PROJECT_ID

### Authenticate your GCP account
This is required if you run the notebook in Colab. If you use an AI Platform notebook, you should already be authenticated.

In [None]:
try:
  from google.colab import auth
  auth.authenticate_user()
  print("Colab user is authenticated.")
except: pass

## Deploy the embedding lookup model to AI Platform Prediction

Create the embedding lookup model resource in AI Platform:

In [None]:
!gcloud ai-platform models create {EMBEDDNIG_LOOKUP_MODEL_NAME} --region={REGION}

Next, deploy the model:

In [None]:
!gcloud ai-platform versions create {EMBEDDNIG_LOOKUP_MODEL_VERSION} \
  --region={REGION} \
  --model={EMBEDDNIG_LOOKUP_MODEL_NAME} \
  --origin={EMBEDDNIG_LOOKUP_MODEL_OUTPUT_DIR} \
  --runtime-version=2.2 \
  --framework=TensorFlow \
  --python-version=3.7 \
  --machine-type=n1-standard-2

print("The model version is deployed to AI Platform Prediction.")

Once the model is deployed, you can verify it in the [AI Platform console](https://pantheon.corp.google.com/ai-platform/models).

### Test the deployed embedding lookup AI Platform Prediction model

Set the AI Platform Prediction API information:

In [None]:
import googleapiclient.discovery
from google.api_core.client_options import ClientOptions

api_endpoint = f'https://{REGION}-ml.googleapis.com'
client_options = ClientOptions(api_endpoint=api_endpoint)
service = googleapiclient.discovery.build(
    serviceName='ml', version='v1', client_options=client_options)

Run the `caip_embedding_lookup` method to retrieve item embeddings. This method accepts item IDs, calls the embedding lookup model in AI Platform Prediction, and returns the appropriate embedding vectors.


In [None]:
def caip_embedding_lookup(input_items):
  request_body = {'instances': input_items}
  service_name = f'projects/{PROJECT_ID}/models/{EMBEDDNIG_LOOKUP_MODEL_NAME}/versions/{EMBEDDNIG_LOOKUP_MODEL_VERSION}'
  print(f'Calling : {service_name}')
  response = service.projects().predict(
    name=service_name, body=request_body).execute()

  if 'error' in response:
    raise RuntimeError(response['error'])

  return response['predictions']

Test the `caip_embedding_lookup` method with three item IDs:

In [None]:
input_items = ['2114406', '2114402 2120788', 'abc123']

embeddings = caip_embedding_lookup(input_items)
print(f'Embeddings retrieved: {len(embeddings)}')
for idx, embedding in enumerate(embeddings):
  print(f'{input_items[idx]}: {embedding[:5]}')

## ScaNN matching service

The ScaNN matching service performs the following steps:

1. Receives one or more item IDs from the client.
1. Calls the embedding lookup model to fetch the embedding vectors of those item IDs.
1. Uses these embedding vectors to query the ANN index to find approximate nearest neighbor embedding vectors.
1. Maps the approximate nearest neighbors embedding vectors to their corresponding item IDs.
1. Sends the item IDs back to the client.

When the client receives the item IDs of the matches, the song title and artist information is fetched from Datastore in real-time to be displayed and served to the client application.

Note: In practice, recommendation systems combine matches (from one or more indices) with user-provided filtering clauses (like where price <= *value* and colour =red), as well as other item metadata (like item categories, popularity, and recency) to ensure recommendation freshness and diversity. In addition, ranking is commonly applied after generating the matches to decide the order in which they are served to the user. 

### ScaNN matching service implementation

The ScaNN matching service is implemented as a [Flask](https://flask.palletsprojects.com/en/1.1.x/quickstart/) application that runs on a [gunicorn](https://gunicorn.org/) web server. This application is implemented in the [main.py](index_server/main.py) module.

The ScaNN matching service application works as follows:

1. Uses environmental variables to set configuration information, such as the Google Cloud location of the ScaNN index to load.
1. Loads the ScaNN index as the `ScaNNMatcher` object is initiated.
1. As [required by AI Platform Prediction](https://cloud.google.com/ai-platform/prediction/docs/custom-container-requirements), exposes two HTTP endpoints:
    
    + `health`: a `GET` method to which AI Platform Prediction sends health checks.
    + `predict`: a `POST` method to which AI Platform Prediction forwards prediction requests.

    The `predict` method expects JSON requests in the form `{"instances":[{"query": "item123", "show": 10}]}`, where `query` represents the item ID to retrieve matches for, and `show` represents the number of matches to retrieve.
    
    The `predict` method works as follows:

        1. Validates the received request object.
        1. Extracts the `query` and `show` values from the request object.
        1. Calls `embedding_lookup.lookup` with the given query item ID to get its embedding vector from the embedding lookup model.
        1. Calls `scann_matcher.match` with the query item embedding vector to retrieve its approximate nearest neighbor item IDs from the ANN Index.
The list of matching item IDs are put into JSON format and returned as the response of the `predict` method.

## Deploy the ScaNN matching service to AI Platform Prediction

Package the ScaNN matching service application in a custom container and deploy it to AI Platform Prediction.

### Create an Artifact Registry for the Docker container image

In [None]:
!gcloud beta artifacts repositories create {ARTIFACTS_REPOSITORY_NAME} \
  --location={REGION} \
  --repository-format=docker

In [None]:
!gcloud beta auth configure-docker {REGION}-docker.pkg.dev --quiet

### Use Cloud Build to build the Docker container image

The container runs the gunicorn HTTP web server and executes the Flask [app](https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/blob/315040032df26d7cef3a26e5def35ca50dd559d6/retail/recommendation-system/bqml-scann/index_server/main.py#L35) variable defined in the `main.py` module.

The container image to deploy to AI Platform Prediction is defined in a [Dockerfile](index_server/Dockerfile), as shown in the following code snippet:

```
FROM python:3.8-slim

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . ./

ARG PORT
ENV PORT=$PORT

CMD exec gunicorn --bind :$PORT main:app  --workers=1 --threads 8 --timeout 1800
```

Build the container image by using Cloud Build and specifying the [cloudbuild.yaml](index_server/cloudbuild.yaml) file:


In [None]:
IMAGE_URL = f'{REGION}-docker.pkg.dev/{PROJECT_ID}/{ARTIFACTS_REPOSITORY_NAME}/{SCANN_MODEL_NAME}:{SCANN_MODEL_VERSION}'
PORT=5001

SUBSTITUTIONS = ''
SUBSTITUTIONS += f'_IMAGE_URL={IMAGE_URL},'
SUBSTITUTIONS += f'_PORT={PORT}'

!gcloud builds submit --config=index_server/cloudbuild.yaml \
  --substitutions={SUBSTITUTIONS} \
  --timeout=1h

Run the following command to verify the container image has been built:


In [None]:
repository_id = f'{REGION}-docker.pkg.dev/{PROJECT_ID}/{ARTIFACTS_REPOSITORY_NAME}'

!gcloud beta artifacts docker images list {repository_id}

### Create a service account for AI Platform Prediction

Create a service account to run the custom container. This [is required](https://cloud.google.com/ai-platform/prediction/docs/custom-service-account#container-default) in cases where you want to grant specific permissions to the service account.

In [None]:
SERVICE_ACCOUNT_NAME = 'caip-serving'
SERVICE_ACCOUNT_EMAIL = f'{SERVICE_ACCOUNT_NAME}@{PROJECT_ID}.iam.gserviceaccount.com'
!gcloud iam service-accounts create {SERVICE_ACCOUNT_NAME} \
  --description="Service account for AI Platform Prediction to access cloud resources." 

Grant the `Cloud ML Engine (AI Platform)` service account the `iam.serviceAccountAdmin` privilege, and grant the `caip-serving` service account the privileges required by the ScaNN matching service, which are `storage.objectViewer` and `ml.developer`.

In [None]:
!gcloud projects describe {PROJECT_ID} --format="value(projectNumber)"

In [None]:
!gcloud projects add-iam-policy-binding {PROJECT_ID} \
  --role=roles/iam.serviceAccountAdmin \
  --member=serviceAccount:service-{PROJECT_NUMBER}@cloud-ml.google.com.iam.gserviceaccount.com

!gcloud projects add-iam-policy-binding {PROJECT_ID} \
  --role=roles/storage.objectViewer \
  --member=serviceAccount:{SERVICE_ACCOUNT_EMAIL}
    
!gcloud projects add-iam-policy-binding {PROJECT_ID} \
  --role=roles/ml.developer \
  --member=serviceAccount:{SERVICE_ACCOUNT_EMAIL}

### Deploy the custom container to AI Platform Prediction

Create the ANN index model resource in AI Platform:

In [None]:
!gcloud ai-platform models create {SCANN_MODEL_NAME} --region={REGION}

Deploy the custom container to AI Platform prediction. Note that you use the `env-vars` parameter to pass environmental variables to the Flask application in the container. 

In [None]:
HEALTH_ROUTE=f'/v1/models/{SCANN_MODEL_NAME}/versions/{SCANN_MODEL_VERSION}'
PREDICT_ROUTE=f'/v1/models/{SCANN_MODEL_NAME}/versions/{SCANN_MODEL_VERSION}:predict'

ENV_VARIABLES = f'PROJECT_ID={PROJECT_ID},'
ENV_VARIABLES += f'REGION={REGION},'
ENV_VARIABLES += f'INDEX_DIR={INDEX_DIR},'
ENV_VARIABLES += f'EMBEDDNIG_LOOKUP_MODEL_NAME={EMBEDDNIG_LOOKUP_MODEL_NAME},'
ENV_VARIABLES += f'EMBEDDNIG_LOOKUP_MODEL_VERSION={EMBEDDNIG_LOOKUP_MODEL_VERSION}'

!gcloud beta ai-platform versions create {SCANN_MODEL_VERSION} \
  --region={REGION} \
  --model={SCANN_MODEL_NAME} \
  --image={IMAGE_URL} \
  --ports={PORT} \
  --predict-route={PREDICT_ROUTE} \
  --health-route={HEALTH_ROUTE} \
  --machine-type=n1-standard-4 \
  --env-vars={ENV_VARIABLES} \
  --service-account={SERVICE_ACCOUNT_EMAIL}

print("The model version is deployed to AI Platform Prediction.")

### Test the Deployed ScaNN Index Service

After deploying the custom container, test it by running the `caip_scann_match` method. This method accepts the parameter `query_items`, whose value is converted into a space-separated string of item IDs and treated as a single query. That is, a single embedding vector is retrieved from the embedding lookup model, and similar item IDs are retrieved from the ScaNN index given this embedding vector.

In [None]:
from google.cloud import datastore
import requests
client = datastore.Client(PROJECT_ID)

In [None]:
def caip_scann_match(query_items, show=10):
  request_body = {
      'instances': [{
          'query':' '.join(query_items), 
          'show':show
      }]
   }
  
  service_name = f'projects/{PROJECT_ID}/models/{SCANN_MODEL_NAME}/versions/{SCANN_MODEL_VERSION}'
  print(f'Calling: {service_name}')  
  response = service.projects().predict(
    name=service_name, body=request_body).execute()

  if 'error' in response:
    raise RuntimeError(response['error'])

  match_tokens = response['predictions']
  keys = [client.key(KIND, int(key)) for key in match_tokens]
  items = client.get_multi(keys)
  return items


Call the `caip_scann_match` method with five item IDs and request five match items for each:

In [None]:
songs = {
    '2120788': 'Limp Bizkit: My Way',
    '1086322': 'Jacques Brel: Ne Me Quitte Pas',
    '833391': 'Ricky Martin: Livin\' la Vida Loca',
    '1579481': 'Dr. Dre: The Next Episode',
    '2954929': 'Black Sabbath: Iron Man'
}

In [None]:
for item_Id, desc in songs.items():
  print(desc)
  print("==================")
  similar_items = caip_scann_match([item_Id], 5)
  for similar_item in similar_items:
    print(f'- {similar_item["artist"]}: {similar_item["track_title"]}')
  print()

## (Optional) Deploy the matrix factorization model to AI Platform Prediction

Optionally, you can deploy the matrix factorization model in order to perform exact item matching. The model takes `Item1_Id` as an input and outputs the top 50 recommended `item2_Ids`.

Exact matching returns better results, but takes significantly longer than approximate nearest neighbor matching. You might want to use exact item matching in cases where you are working with a very small data set and where latency isn't a primary concern.

### Export the model from BigQuery ML to Cloud Storage as a SavedModel

In [None]:
BQ_DATASET_NAME = 'recommendations'
BQML_MODEL_NAME = 'item_matching_model'
BQML_MODEL_VERSION = 'v1' 
BQML_MODEL_OUTPUT_DIR = f'gs://{BUCKET}/bqml/item_matching_model'

!bq --quiet extract -m {BQ_DATASET_NAME}.{BQML_MODEL_NAME} {BQML_MODEL_OUTPUT_DIR}

In [None]:
!saved_model_cli show --dir {BQML_MODEL_OUTPUT_DIR} --tag_set serve --signature_def serving_default

### Deploy the exact matching model to AI Platform Prediction

In [None]:
!gcloud ai-platform models create {BQML_MODEL_NAME} --region={REGION}

In [None]:
!gcloud ai-platform versions create {BQML_MODEL_VERSION} \
  --region={REGION} \
  --model={BQML_MODEL_NAME} \
  --origin={BQML_MODEL_OUTPUT_DIR} \
  --runtime-version=2.2 \
  --framework=TensorFlow \
  --python-version=3.7 \
  --machine-type=n1-standard-2

print("The model version is deployed to AI Platform Predicton.")

In [None]:
def caip_bqml_matching(input_items, show):
  request_body = {'instances': input_items}
  service_name = f'projects/{PROJECT_ID}/models/{BQML_MODEL_NAME}/versions/{BQML_MODEL_VERSION}'
  print(f'Calling : {service_name}')
  response = service.projects().predict(
    name=service_name, body=request_body).execute()

  if 'error' in response:
    raise RuntimeError(response['error'])
    
  
  match_tokens = response['predictions'][0]["predicted_item2_Id"][:show]
  keys = [client.key(KIND, int(key)) for key in match_tokens]
  items = client.get_multi(keys)
  return items

  return response['predictions']

In [None]:
for item_Id, desc in songs.items():
  print(desc)
  print("==================")
  similar_items = caip_bqml_matching([int(item_Id)], 5)
  for similar_item in similar_items:
    print(f'- {similar_item["artist"]}: {similar_item["track_title"]}')
  print()

## License

Copyright 2020 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License. You may obtain a copy of the License at: http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 

See the License for the specific language governing permissions and limitations under the License.

**This is not an official Google product but sample code provided for an educational purpose**