# Fraudfinder - Model Inference

## Overview

This series of labs are updated upon [FraudFinder](https://github.com/googlecloudplatform/fraudfinder) repository which builds a end-to-end real-time fraud detection system on Google Cloud. Throughout the FraudFinder labs, you will learn how to read historical bank transaction data stored in data warehouse, read from a live stream of new transactions, perform exploratory data analysis (EDA), do feature engineering, ingest features into a feature store, train a model using feature store, register your model in a model registry, evaluate your model, deploy your model to an endpoint, do real-time inference on your model with feature store, and monitor your model.

### Objective

In this notebook, we'll focus on deploying our trained fraud detection model as a real-time inference service. We'll be using a serverless architecture, which allows us to build and run applications without having to manage the underlying infrastructure. This means we can focus on our code and let Google Cloud handle the scaling, availability, and maintenance.

Our architecture will be event-driven, leveraging Pub/Sub to trigger our inference service in response to new transaction events. This is a common pattern for real-time systems, as it allows for loose coupling between components and can handle high volumes of data.

To accomplish this, we will create a Cloud Run app to perform model inference on the endpoint deployed in the previous notebooks. This Cloud Run app will be triggered by a Pub/Sub subscriber for live transactions, perform a look-up on feature values from the Vertex AI Feature Store, and send the prediction request to the Vertex AI endpoint. You can then view the resulting prediction-response logs in BigQuery.

This lab uses the following Google Cloud services and resources:

- **[Vertex AI](https://cloud.google.com/vertex-ai/)**: For hosting our trained model on an endpoint and for using the Feature Store to retrieve feature values for inference.
- **[BigQuery](https://cloud.google.com/bigquery/)**: For storing and analyzing the prediction logs from our model.
- **[Cloud Run](https://cloud.google.com/run)**: A serverless compute platform that will host our inference application. It automatically scales up and down, and you only pay for the resources you use.
- **[Pub/Sub](https://cloud.google.com/pubsub/)**: A real-time messaging service that will be used to trigger our Cloud Run application whenever a new transaction occurs.

The steps we will take in this notebook are:

1.  **Build and deploy a Cloud Run app for model inference**: We'll containerize our inference code using Docker and deploy it to Cloud Run.
2.  **Create and use a Pub/Sub push subscription to invoke the Cloud Run model inference app**: We'll set up a trigger so that our Cloud Run service is called for each new transaction.

### Load config settings

In [None]:
GCP_PROJECTS = !gcloud config get-value project
PROJECT_ID = GCP_PROJECTS[0]
BUCKET_NAME = f"{PROJECT_ID}-fraudfinder"
config = !gsutil cat gs://{BUCKET_NAME}/config/notebook_env.py
print(config.n)
exec(config.n)

###Â Define constants

### Import libraries

In [None]:
from google.cloud import aiplatform as vertex_ai

### Initialize Vertex AI for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
vertex_ai.init(project=PROJECT_ID, location=REGION)

## Build and deploy a Cloud Run app for model inference

To formalize the process of prediction, you will use a Cloud Run app that takes in live transactions as a trigger, then fetches feature values from Vertex AI Feature Store, then sends the prediction payload to an endpoint. To clarify, to invoke the Cloud Run app, you will create a Pub/Sub push subscription that reads live transactions from the public Pub/Sub topic to invoke the Cloud Run app.

[Cloud Run](https://cloud.google.com/run) is a serverless compute platform that enables you to deploy containers that can be executed every time it is triggered. 

### The Cloud Run Application

Before we build and deploy our application, let's take a closer look at the code that will be running on Cloud Run. The application is a simple Flask web server that receives Pub/Sub messages, processes them, and sends a prediction request to our Vertex AI model.

The application code is located in the `notebooks/fraudfinder/scripts/cloud_run_model_inference/` directory and consists of three main files:

*   [`main.py`](./scripts/cloud_run_model_inference/main.py): This is the main application file. It contains a Flask web server with a single endpoint that listens for POST requests. When a request is received, the application:
    1.  Parses the incoming Pub/Sub message to extract the transaction data.
    2.  Retrieves additional features from the Vertex AI Feature Store.
    3.  Constructs a prediction request with the combined features.
    4.  Sends the request to the Vertex AI Endpoint.
    5.  Logs the prediction result.
*   `Dockerfile`: This file defines the container image for our application. It specifies the base Python image, copies the application code, installs the required dependencies from `requirements.txt`, and defines the command to start the Flask web server using `gunicorn`.
*   `requirements.txt`: This file lists the Python dependencies that our application needs to run, including `Flask`, `gunicorn`, and the `google-cloud-aiplatform` library.

By containerizing our application, we can ensure that it runs consistently across different environments and can be easily deployed to Cloud Run.

#### Steps to build and deploy the Cloud Run app

To deploy a Cloud Run app, you must:
1. Build a Docker container with your code
2. Deploy your container to Cloud Run

### 1. Build a Docker container with your code

The container code has been prepared for you in the `cloud_run_model_inference/` folder. The first step is to build our Docker container and push it to Google Container Registry (GCR). GCR is a private Docker registry where we can store our container images.

We'll use the `gcloud builds submit` command to do this. This command will:

1.  Compress the application code in the `scripts/cloud_run_model_inference` directory.
2.  Upload the code to a Cloud Storage bucket.
3.  Initiate a build using Cloud Build, which will execute the instructions in our `Dockerfile`.
4.  Tag the resulting image with the name `gcr.io/$PROJECT_ID/cloud_run_model_inference`.
5.  Push the tagged image to Google Container Registry.

This entire process is automated, making it easy to build and publish our container images.

In [1]:
!cat ./scripts/cloud_run_model_inference/Dockerfile

# Copyright 2022 Google, LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# [START cloudrun_pubsub_dockerfile]
# [START run_pubsub_dockerfile]

# Use the official Python image.
# https://hub.docker.com/_/python
FROM python:3.10

# Allow statements and log messages to immediately appear in the Cloud Run logs
ENV PYTHONUNBUFFERED True

# Copy application dependency manifests to the container image.
# Copying this separately prevents re-running pip install on every code change.
COPY requirements.

In [None]:
!gcloud builds submit ./scripts/cloud_run_model_inference --tag gcr.io/$PROJECT_ID/cloud_run_model_inference --quiet

### 2. Deploy your container to Cloud Run

With your container built on Container Registry, you can now deploy it to Cloud Run. To do so, you will need some environment variables to make sure your Cloud Run app knows which Vertex AI endpoint to use.

First, let's retrieve the endpoint ID of our deployed model:

In [None]:
# Retrieve your Vertex AI endpoint name
endpoints = vertex_ai.Endpoint.list(
    filter=f"display_name={ENDPOINT_NAME}",  # optional: filter by specific endpoint name
    order_by="update_time",
)

ENDPOINT_ID = endpoints[-1].name
print(ENDPOINT_ID)

Now we can deploy our container to Cloud Run using the `gcloud run deploy` command. This command will:

1.  Create a new Cloud Run service named `cloud-run-model-inference-app`.
2.  Use the container image we just built and pushed to GCR.
3.  Set the `--no-allow-unauthenticated` flag to ensure that our service can only be invoked by authenticated requests.
4.  Set the `--region` flag to the same region as our other resources.
5.  Use the `--update-env-vars` flag to set the necessary environment variables that our application needs to connect to the Vertex AI Feature Store and Endpoint.

Once deployed, you can check the status of your service on the [Cloud Run console](https://console.cloud.google.com/run).

Note that if you try to visit the Service URL (which may look like http://cloud-run-model-inference-app-XXXXXX-a.run.app), you should expect to see `Error: Forbidden
Your client does not have permission to get URL / from this server`, which is normal, as you don't want the public internet to invoke your app.

In [None]:
!gcloud run deploy cloud-run-model-inference-app \
--image gcr.io/{PROJECT_ID}/cloud_run_model_inference \
--no-allow-unauthenticated \
--region $REGION \
--update-env-vars FEATURESTORE_ID=$FEATURESTORE_ID,ENDPOINT_ID=$ENDPOINT_ID,PROJECT_ID=$PROJECT_ID,REGION=$REGION \
--quiet --verbosity=none

You have now deployed a Cloud Run app to do model inference. However, it is not currently triggered by anything. In the next section, you will connect your Cloud Run app to the live transactions so you can continuously trigger your model inference app.

## Create and use a Pub/Sub push subscription to invoke the Cloud Run model inference app

Now that our Cloud Run service is deployed, we need a way to trigger it whenever a new transaction occurs. For this, we'll use a Pub/Sub push subscription. This will create a subscription to the `ff-tx` topic, and for each message that is published to the topic, Pub/Sub will send a POST request to our Cloud Run service's endpoint.

However, since we've configured our Cloud Run service to not allow unauthenticated requests, we need to provide a way for Pub/Sub to securely invoke our service. We'll do this by creating a dedicated service account and granting it the necessary permissions to invoke our Cloud Run service. This is a much more secure approach than allowing unauthenticated access, as it ensures that only Pub/Sub can trigger our service.

#### There are a few steps needed:
1. Create a service account that can invoke your Cloud Run app with appropriate IAM policies
2. Create the Pub/Sub subscription from the live transactions to invoke the Cloud Run app

### 1. Create a service account that can invoke your Cloud Run app with appropriate IAM policies

First, we'll create a new service account named `cloud-run-invoker`. This service account will be used by our Pub/Sub subscription to authenticate with our Cloud Run service. We'll then grant this service account the `run.invoker` role, which gives it permission to invoke the Cloud Run service.

Finally, we'll grant the Google-managed Pub/Sub service account the `iam.serviceAccountTokenCreator` role. This is a crucial step that allows the Pub/Sub service to create authentication tokens on behalf of our `cloud-run-invoker` service account. When Pub/Sub sends a message to our Cloud Run service, it will include one of these tokens in the request header, which Cloud Run will then use to verify the identity of the caller.

In [None]:
# Create a service account
!gcloud iam service-accounts create cloud-run-invoker --display-name "Cloud Run Pub/Sub Invoker"

# Retrieve your project number
PROJECT_NUMBER = !gcloud projects list --filter="$PROJECT_ID" --format="value(PROJECT_NUMBER)"
PROJECT_NUMBER = PROJECT_NUMBER[0]

# Bind the service account with an IAM policy to invoke the Cloud Run app
!gcloud run services add-iam-policy-binding cloud-run-model-inference-app \
   --member=serviceAccount:cloud-run-invoker@{PROJECT_ID}.iam.gserviceaccount.com \
   --role=roles/run.invoker \
   --region=us-central1

# Add another IAM policy to the service account to provide authentication needed to invoke Cloud Run
!gcloud projects add-iam-policy-binding $PROJECT_ID \
     --member=serviceAccount:service-{PROJECT_NUMBER}@gcp-sa-pubsub.iam.gserviceaccount.com \
     --role=roles/iam.serviceAccountTokenCreator

### 2. Create the Pub/Sub subscription from the live transactions to invoke the Cloud Run app

With the service account created and configured, we can now create the Pub/Sub push subscription. This subscription will connect the live transaction topic (`ff-tx`) to our Cloud Run service. 

When we create the subscription, we'll specify:

*   The name of the topic to subscribe to.
*   The push endpoint, which is the URL of our Cloud Run service.
*   The service account to use for authentication.

This tells Pub/Sub to send an HTTP POST request to our service's URL for every message that arrives on the topic, using the specified service account to authenticate the request. This setup ensures that our service is securely and reliably triggered for each new transaction.

To create the Pub/Sub push subscription, you will first need to retrieve your Cloud Run service URL.

In [None]:
# to get the service URL programmatically
SERVICE_URL = !gcloud run services describe cloud-run-model-inference-app \
  --platform managed \
  --region $REGION \
  --format "value(status.url)"
SERVICE_URL = SERVICE_URL[0]

print(SERVICE_URL)

Now you can create your Pub/Sub push subscription:

In [None]:
!gcloud pubsub subscriptions create push-live-tx-to-cloudrun --topic projects/cymbal-fraudfinder/topics/ff-tx \
   --ack-deadline=600 \
   --push-endpoint=$SERVICE_URL \
   --push-auth-service-account=cloud-run-invoker@{PROJECT_ID}.iam.gserviceaccount.com

Once created, you can do some checks to make sure everything worked successfully:
- On the [Pub/Sub page](https://console.cloud.google.com/cloudpubsub/subscription/list), inspect your new Pub/Sub subscription `push-live-tx-to-cloudrun`
- On the [Cloud Run logs page](https://console.cloud.google.com/run/detail/us-central1/cloud-run-model-inference-app/logs), check the logs of your Cloud Run app to confirm that you see model prediction requests and responses