### Vertex AI Matching Engine (ANN) for Movie Overview Similarity

In this lab you will use Vertex AI Matching Engine (Approximate Nearest Neighbors) to do performant, low-latency similary matching based on vectors. The vectors used in this lab are embeddings of movie descriptions generated with Google's Universal Sentence Encoder. 

Learning Objectives:
* Learn how to create an index to Vertex AI Matching Engine
* Learned how to create an index endpoint and deploy an index to it 

#### Before you begin
In Google Cloud console, open a **Cloud Shell** and run the following command to enable the Service Networking API      


`gcloud services enable servicenetworking.googleapis.com`


**Cloud IAM**: Navigate to IAM in the Google Cloud Console and add the role of Compute Network Admin to your Default Compute Engine Service Account

#### Setup

In [1]:
PROJECT = !(gcloud config get-value core/project)
PROJECT = PROJECT[0]
BUCKET = f"{PROJECT}-matching-engine"
REGION = "us-central1"
NETWORK_NAME = "vertex-matching-engine-vpc-network"
PEERING_RANGE_NAME = "ip-range-1"

%env PROJECT={PROJECT}
%env BUCKET={BUCKET}
%env REGION={REGION}
%env NETWORK_NAME={NETWORK_NAME}
%env PEERING_RANGE_NAME={PEERING_RANGE_NAME}

env: PROJECT=kylesteckler-demo
env: BUCKET=kylesteckler-demo-matching-engine
env: REGION=us-central1
env: NETWORK_NAME=vertex-matching-engine-vpc-network
env: PEERING_RANGE_NAME=ip-range-1


### Setup VPC Network
To reduce network latency for vector matching online queries, call the Vertex AI service endpoints from your Virtual Private Cloud (VPC) by using Private Service Access. For each Google Cloud project, only one VPC network can be peered with Matching Engine. If you already have a VPC with private services access configured, you can use that VPC to peer with Vertex AI Matching Engine.

Configuring a VPC Network Peering connection is an initial task required only one time per Google Cloud project. After this setup is done, you can make calls to the Matching Engine index from any client running inside your VPC.

The VPC Network Peering connection is required only for vector matching online queries. API calls to create, deploy, and delete indexes do not require a VPC Network Peering connection.

The following cell does the following:
* Creates the VPC network 
* Allows VPC network ICMP, RDP on TCP port 3389, SSH on TCP port 22, and internal traffic to Google service on range 10.128.0.0/9
* Reserve an IP range

**NOTE** It can take 2-3 minutes for this to complete.

In [2]:
%%bash
gcloud compute networks create ${NETWORK_NAME} \
    --bgp-routing-mode=regional

gcloud compute firewall-rules create ${NETWORK_NAME}-allow-icmp \
    --network ${NETWORK_NAME} \
    --priority 65534 \
    --project ${PROJECT} \
    --allow icmp

gcloud compute firewall-rules create ${NETWORK_NAME}-allow-internal \
    --network ${NETWORK_NAME} \
    --priority 65534 \
    --project ${PROJECT} \
    --allow all \
    --source-ranges 10.128.0.0/9

gcloud compute firewall-rules create ${NETWORK_NAME}-allow-rdp \
    --network ${NETWORK_NAME} \
    --priority 65534 \
    --project ${PROJECT} \
    --allow tcp:3389

gcloud compute firewall-rules create ${NETWORK_NAME}-allow-ssh \
    --network ${NETWORK_NAME} \
    --priority 65534 \
    --project ${PROJECT} \
    --allow tcp:22

gcloud compute addresses create ${PEERING_RANGE_NAME} \
  --global --prefix-length=16 \
  --network=${NETWORK_NAME} \
  --purpose=VPC_PEERING \
  --project=${PROJECT}

gcloud services vpc-peerings connect \
  --service=servicenetworking.googleapis.com \
  --network=${NETWORK_NAME} \
  --ranges=${PEERING_RANGE_NAME} 

NAME                                SUBNET_MODE  BGP_ROUTING_MODE  IPV4_RANGE  GATEWAY_IPV4
vertex-matching-engine-vpc-network  AUTO         REGIONAL
NAME                                           NETWORK                             DIRECTION  PRIORITY  ALLOW  DENY  DISABLED
vertex-matching-engine-vpc-network-allow-icmp  vertex-matching-engine-vpc-network  INGRESS    65534     icmp         False
NAME                                               NETWORK                             DIRECTION  PRIORITY  ALLOW  DENY  DISABLED
vertex-matching-engine-vpc-network-allow-internal  vertex-matching-engine-vpc-network  INGRESS    65534     all          False
NAME                                          NETWORK                             DIRECTION  PRIORITY  ALLOW     DENY  DISABLED
vertex-matching-engine-vpc-network-allow-rdp  vertex-matching-engine-vpc-network  INGRESS    65534     tcp:3389        False
NAME                                          NETWORK                             DIRECTION

Created [https://www.googleapis.com/compute/v1/projects/kylesteckler-demo/global/networks/vertex-matching-engine-vpc-network].

Instances on this network will not be reachable until firewall rules
are created. As an example, you can allow all internal traffic between
instances as well as SSH, RDP, and ICMP by running:

$ gcloud compute firewall-rules create <FIREWALL_NAME> --network vertex-matching-engine-vpc-network --allow tcp,udp,icmp --source-ranges <IP_RANGE>
$ gcloud compute firewall-rules create <FIREWALL_NAME> --network vertex-matching-engine-vpc-network --allow tcp:22,tcp:3389,icmp

Creating firewall...
..Created [https://www.googleapis.com/compute/v1/projects/kylesteckler-demo/global/firewalls/vertex-matching-engine-vpc-network-allow-icmp].
done.
Creating firewall...
..Created [https://www.googleapis.com/compute/v1/projects/kylesteckler-demo/global/firewalls/vertex-matching-engine-vpc-network-allow-internal].
done.
Creating firewall...
..Created [https://www.googleapis.com/co

### Vertex AI Matching Engine
Vertex AI Matching Engine provides the industry's leading high scale, low latency, vector-similarity matching (also known as approximate nearest neighbor) service. Matching Engine provides tooling to build use cases that entail matching semantically similar items. More specifically, given a query item, Matching Engine finds the most semantically similar items to it from a large corpus of candidate items. At a high level, the process of using Matching Engine is as follows:
* Create vectors
* Create an Index
* Create an Endpoint
* Deploy index to endpoint

#### Creating Vectors
Vertex AI Matching Engine provides tooling to generate vector representations of items. More information on this specific tooling can be found [here](https://cloud.google.com/vertex-ai/docs/matching-engine/train-embeddings-two-tower).

You can also bring your own embeddings/vectors and simply leverage the ANN service. In this lab we will use the embeddings created by sending movie overviews through Google's Universal Sentence Encoder. These vectors are of length 512. 

In [3]:
# Create regional GCS bucket
!gsutil mb -l {REGION} gs://{BUCKET}

Creating gs://kylesteckler-demo-matching-engine/...


In [4]:
# Copy the data to your bucket
!gsutil -m cp gs://asl-public/data/movie-descriptions/index-data/* gs://{BUCKET}/data

Copying gs://asl-public/data/movie-descriptions/index-data/embeddings-00000-of-00009.json [Content-Type=text/plain]...
Copying gs://asl-public/data/movie-descriptions/index-data/embeddings-00001-of-00009.json [Content-Type=text/plain]...
Copying gs://asl-public/data/movie-descriptions/index-data/embeddings-00002-of-00009.json [Content-Type=text/plain]...
Copying gs://asl-public/data/movie-descriptions/index-data/embeddings-00003-of-00009.json [Content-Type=text/plain]...
Copying gs://asl-public/data/movie-descriptions/index-data/embeddings-00005-of-00009.json [Content-Type=text/plain]...
Copying gs://asl-public/data/movie-descriptions/index-data/embeddings-00004-of-00009.json [Content-Type=text/plain]...
Copying gs://asl-public/data/movie-descriptions/index-data/embeddings-00006-of-00009.json [Content-Type=text/plain]...
Copying gs://asl-public/data/movie-descriptions/index-data/embeddings-00008-of-00009.json [Content-Type=text/plain]...
Copying gs://asl-public/data/movie-descriptions/

In [5]:
!gsutil ls -l gs://{BUCKET}/data/

  30774898  2022-06-27T21:56:59Z  gs://kylesteckler-demo-matching-engine/data/embeddings-00000-of-00009.json
  11705375  2022-06-27T21:56:59Z  gs://kylesteckler-demo-matching-engine/data/embeddings-00001-of-00009.json
  55784715  2022-06-27T21:56:59Z  gs://kylesteckler-demo-matching-engine/data/embeddings-00002-of-00009.json
  63437085  2022-06-27T21:56:59Z  gs://kylesteckler-demo-matching-engine/data/embeddings-00003-of-00009.json
  43122420  2022-06-27T21:56:59Z  gs://kylesteckler-demo-matching-engine/data/embeddings-00004-of-00009.json
  52669878  2022-06-27T21:56:59Z  gs://kylesteckler-demo-matching-engine/data/embeddings-00005-of-00009.json
  28904060  2022-06-27T21:56:59Z  gs://kylesteckler-demo-matching-engine/data/embeddings-00006-of-00009.json
  74965967  2022-06-27T21:56:59Z  gs://kylesteckler-demo-matching-engine/data/embeddings-00007-of-00009.json
  28517009  2022-06-27T21:56:59Z  gs://kylesteckler-demo-matching-engine/data/embeddings-00008-of-00009.json
TOTAL: 9 objects, 3

As you can see, there are multiple JSON files. Each json file contains rows of JSON objects with:
* `id`: Movie Title
* `embedding`: 512 length vector (embedded movie description)

Take a look at an example

In [6]:
!gsutil cat gs://{BUCKET}/data/embeddings-00001-of-00009.json | head -1

{"id": "Hector", "embedding": [0.03524283692240715, -0.05786382779479027, -0.01879088394343853, -0.054597076028585434, -0.023502442985773087, 0.03810478374361992, -0.03932681679725647, -0.04778684675693512, 0.04427504166960716, 0.012033884413540363, 0.06726176291704178, 0.04901351034641266, 0.044581614434719086, -0.05477724224328995, 0.009321354329586029, -0.07125352323055267, -0.07518237084150314, 0.0561932735145092, -0.05556556209921837, -0.07360559701919556, -0.002205073134973645, 0.04406796395778656, 0.0732835903763771, -0.0203841719776392, 0.07131519168615341, 0.03472614660859108, 0.015632878988981247, 0.039959583431482315, 0.03650550916790962, 0.021306287497282028, -0.010188562795519829, 0.06862200796604156, -0.019070487469434738, 0.05627898871898651, 0.030582301318645477, -0.007555294781923294, 0.04445774108171463, 0.017033400014042854, 0.050306130200624466, -0.03371251001954079, 0.06591857969760895, -0.06860781461000443, 0.002915141172707081, -0.04184809699654579, -0.0451970212

#### Create an Index
Now that the data in the GCS bucket in the proper format you can create an index. To create an index you first need to configure the parameters for the index. You do this with an index metadata file. We will keep things simple in this lab but you can find the definiton for each possible index metadata field [here](https://cloud.google.com/vertex-ai/docs/matching-engine/configuring-indexes).

The fields we will use:
* `contentsDeltaUri`: Allows inserting, updating or deleting the contents of the Matching Engine Index. Must be a valid GCS directory path.
* `config`
    * `dimensions`: The number of dimensions of the input vectors
    * `approximateNeighborsCount`: The default number of neighbors to find via approximate search.
    * `distanceMeasureType`: The distance measure used in nearest neighbor search. The options are L2 Distance (Euclidean), L1 Distance (Manhattan), Cosine distance (defined as 1 - cosine similarity) or dot product distance. (defined as negative of the dot product). 
* `algorithm_config`
    * `treeAhConfig`: Specifies the selection of the tree-AH algorithm (shallow tree + asymmetric hashing)

In [7]:
import json

metadata = {
    "contentsDeltaUri": f"gs://{BUCKET}/data/",
    "config": {
        "dimensions": 512,
        "approximateNeighborsCount": 20,
        "distanceMeasureType": "DOT_PRODUCT_DISTANCE",
        "algorithm_config": {"treeAhConfig": {}},
    },
}

with open("metadata.json", "w") as f:
    json.dump(metadata, f)

Create the index

In [8]:
!gcloud ai indexes create \
    --metadata-file='./metadata.json' \
    --display-name='movie-descriptions' \
    --project={PROJECT} \
    --region={REGION}

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
The create index operation [projects/693210680039/locations/us-central1/indexes/5874048512416546816/operations/7300198175489392640] was submitted successfully.

You may view the status of your operation with the command

  $ gcloud ai operations describe 7300198175489392640 --index=5874048512416546816 [--project=kylesteckler-demo]


**NOTE** Creating the index will likely take around 40 minutes. Run the following command to list your indexes. If the reponse does not have the index ID, then the index is still being created. You cannot move forward in the lab until your index creation has completed. 

In [18]:
# Get your index ID. 
result = !gcloud ai indexes list \
    --project={PROJECT} \
    --region={REGION}

INDEX_ID = result[-2].split('/')[-1]

if INDEX_ID == "]":
    print("""Wait for the Index to be created and run this cell again. 
    This can take ~45 minutes.""")
else:
    print(f"Index Created. Index ID: {INDEX_ID}")

Index Created. Index ID: 5874048512416546816


In [19]:
%env INDEX_ID={INDEX_ID}

env: INDEX_ID=5874048512416546816


#### Create Index Endpoint and Deploy
Now that the index has been created, you can create an endpoint and deploy the index to it.

In [20]:
# Need to provide full network resource path
PROJECT_NUMBER =  !(gcloud projects describe {PROJECT} \
                    --format="value(projectNumber)")
    
PROJECT_NUMBER = PROJECT_NUMBER[0]
NETWORK_RESOURCE_NAME = (
    f"projects/{PROJECT_NUMBER}/global/networks/{NETWORK_NAME}"
)
print(f"Network: {NETWORK_RESOURCE_NAME} ")

Network: projects/693210680039/global/networks/vertex-matching-engine-vpc-network 


In [21]:
!gcloud ai index-endpoints create \
  --display-name="movie_index_endpoint" \
  --network={NETWORK_RESOURCE_NAME} \
  --project={PROJECT} \
  --region={REGION}

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
Waiting for operation [540295134806278144]...done.                             
Created Vertex AI index endpoint: projects/693210680039/locations/us-central1/indexEndpoints/1495423774705582080.


In [22]:
result = !gcloud ai index-endpoints list \
    --region={REGION} \

ENDPOINT_ID = result[-3].split('/')[-1]
print(f"Index Endpoint Created. Endpoint ID: {ENDPOINT_ID}")

Index Endpoint Created. Endpoint ID: 1495423774705582080


In [23]:
%env ENDPOINT_ID={ENDPOINT_ID}

env: ENDPOINT_ID=1495423774705582080


In [24]:
%%bash
gcloud ai index-endpoints deploy-index ${ENDPOINT_ID} \
  --deployed-index-id="deployed-movie-index" \
  --display-name="deployed-movie-index-endpoint" \
  --index=${INDEX_ID} \
  --project=${PROJECT} \
  --region=${REGION} \
  --min-replica-count=1 \
  --max-replica-count=2

metadata:
  '@type': type.googleapis.com/google.cloud.aiplatform.v1.DeployIndexOperationMetadata
  deployedIndexId: deployed-movie-index
  genericMetadata:
    createTime: '2022-06-27T22:43:52.030913Z'
    updateTime: '2022-06-27T22:43:52.030913Z'
name: projects/693210680039/locations/us-central1/indexEndpoints/1495423774705582080/operations/8034496020983316480


Using endpoint [https://us-central1-aiplatform.googleapis.com/]
The deploy index operation [projects/693210680039/locations/us-central1/indexEndpoints/1495423774705582080/operations/8034496020983316480] was submitted successfully.

You may view the status of your operation with the command

  $ gcloud ai operations describe 8034496020983316480 --index-endpoint=1495423774705582080 [--project=kylesteckler-demo]


#### Querying the Endpoint
To query the endpoint, spin up a Vertex AI notebook in the same VPC network as your deployed index. Then you can walk through `matching_engine_query.ipynb`.