In [1]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

## Setup

Before get started with the Vertex AI services, we need to setup the following.

* Install Python SDK
* Environment variables
* Authentication (Colab only)
* Enable APIs
* Set IAM permissions

### Install Python SDK

Vertex AI, Cloud Storage and BigQuery APIs can be accessed with multiple ways including REST API and Python SDK. In this tutorial we will use the SDK.

In [5]:
pip install --upgrade --user google-cloud-aiplatform google-cloud-storage google-cloud-bigquery[pandas] 


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.0 -> 24.1.2
[notice] To update, run: C:\Users\DELL\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [3]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

: 

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>



### Environment variables (Start Running from here)

Sets environment variables. If asked, please replace the following `[your-project-id]` with your project ID and run it.

In [2]:
# get project ID
#PROJECT_ID = ! gcloud config get project
#PROJECT_ID = PROJECT_ID[0]
LOCATION = "us-central1"
#if PROJECT_ID == "(unset)":
#    print(f"Please set the project ID manually below")


    

In [3]:

# define project information
#if PROJECT_ID == "(unset)":
#    PROJECT_ID = "[your-project-id]"  # @param {type:"string"}

# generate an unique id for this session
from datetime import datetime

UID = datetime.now().strftime("%m%d%H%M")

In [4]:

PROJECT_ID = 'tokyo-country-189103'  
#bq_client = bigquery.Client(project=PROJECT_ID)


### Authentication (Colab only)

If you are running this notebook on Colab, you will need to run the following cell authentication. This step is not required if you are using Vertex AI Workbench as it is pre-authenticated.

In [5]:
import sys

# if it's Colab runtime, authenticate the user with Google Cloud
if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

In [6]:
import os
os.environ['PATH'] += os.pathsep + r'C:\Users\DELL\AppData\Local\Google\Cloud SDK\google-cloud-sdk\bin'


### Enable APIs

Run the following to enable APIs for Compute Engine, Vertex AI, Cloud Storage and BigQuery with this Google Cloud project.

In [7]:
! gcloud services enable compute.googleapis.com aiplatform.googleapis.com storage.googleapis.com bigquery.googleapis.com --project {PROJECT_ID}

Operation "operations/acat.p2-303276163211-6d598cbd-fcbb-4cb0-a090-13c29ede3293" finished successfully.


## Getting Started with Vertex AI Embeddings for Text

Now it's ready to get started with embeddings!

### Data Preparation

We will be using [the Stack Overflow public dataset](https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow) hosted on BigQuery table `bigquery-public-data.stackoverflow.posts_questions`. This is a very big dataset with 23 million rows that doesn't fit into the memory. We are going to limit it to 1000 rows for this tutorial.

In [8]:
# new
from google.cloud import bigquery
import os

# Set environment variable for your service account key
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r"C:\Users\DELL\Python projects\DocSync\V2\Key\tokyo-country-189103-4ce23189dd39.json"  # Replace with your actual path

# Initialize BigQuery client
bq_client = bigquery.Client(project=PROJECT_ID) 

In [9]:
# load the BQ Table into a Pandas DataFrame
import pandas as pd
from google.cloud import bigquery

QUESTIONS_SIZE = 1000

bq_client = bigquery.Client(project=PROJECT_ID)
QUERY_TEMPLATE = """
        SELECT distinct q.id, q.title
        FROM (SELECT * FROM `bigquery-public-data.stackoverflow.posts_questions`
        where Score > 0 ORDER BY View_Count desc) AS q
        LIMIT {limit} ;
        """
query = QUERY_TEMPLATE.format(limit=QUESTIONS_SIZE)
query_job = bq_client.query(query)
rows = query_job.result()
df = rows.to_dataframe()

# examine the data
df.head()



Unnamed: 0,id,title
0,73422998,merge rows based on a specific date interval
1,73462590,why real-time DEVS are always in sync in DEVS/SOA
2,73463551,Modernizing and simplifying an old node.js+mon...
3,73485120,Print arabic characters using logstash exec pl...
4,73341249,Auditing packages brings back more errors


###  Importing the Two excl FIle that will be compared together

In [18]:
new_df = pd.read_excel(r'C:\Users\DELL\Python projects\DocSync\V2\DocSync\SamplesForTesting\New_Doc.xlsx')
old_df = pd.read_excel(r'C:\Users\DELL\Python projects\DocSync\V2\DocSync\SamplesForTesting\Old_Doc.xlsx')

new_df['Requirement'] = new_df['Requirement'].astype(str)
old_df['Requirement'] = old_df['Requirement'].astype(str)

new_df.head()

Unnamed: 0,Requirement ID,Requirement
0,ID_NEW_1,"The sun set behind the towering mountains, cas..."
1,ID_NEW_2,She smiled warmly as she received the unexpect...
2,ID_NEW_3,"The cat purred contentedly on the windowsill, ..."
3,ID_NEW_4,"Rain tapped gently against the roof, creating ..."
4,ID_NEW_5,He finished the marathon with a triumphant che...


### Call the API to generate embeddings

With the Stack Overflow dataset, we will use the `title` column (the question title) and generate embedding for it with Embeddings for Text API. The API is available under the [vertexai](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai) package of the SDK.

You may see some warning messages from the TensorFlow library but you can ignore them.

In [13]:
# init the vertexai package
import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

From the package, import [TextEmbeddingModel](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.language_models.TextEmbeddingModel) and get a model.

In [14]:
# Load the text embeddings model
from vertexai.preview.language_models import TextEmbeddingModel

model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")

In this tutorial we will use `textembedding-gecko@001` model for getting text embeddings. Please take a look at [Supported models](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings#supported_models) on the doc to see the list of supported models.

Once you get the model, you can call its [get_embeddings](https://cloud.google.com/python/docs/reference/aiplatform/latest/vertexai.language_models.TextEmbeddingModel#vertexai_language_models_TextEmbeddingModel_get_embeddings) function to get embeddings. You can pass up to 5 texts at once in a call. But there is a caveat. By default, the text embeddings API has a "request per minute" quota set to 60 for new Cloud projects and 600 for projects with usage history (see [Quotas and limits](https://cloud.google.com/vertex-ai/docs/quotas#request_quotas) to check the latest quota value for `base_model:textembedding-gecko`). So, rather than using the function directly, you may want to define a wrapper like below to limit under 10 calls per second, and pass 5 texts each time.

In [16]:
import time
import tqdm  # to show a progress bar

# get embeddings for a list of texts
BATCH_SIZE = 5


def get_embeddings_wrapper(texts):
    embs = []
    for i in tqdm.tqdm(range(0, len(texts), BATCH_SIZE)):
        time.sleep(1)  # to avoid the quota error
        result = model.get_embeddings(texts[i : i + BATCH_SIZE])
        embs = embs + [e.values for e in result]
    return embs

The following code will get embedding for the question titles and add them as a new column `embedding` to the DataFrame. This will take a few minutes.

In [24]:
# get embeddings for the question titles and add them as "embedding" column
new_df = new_df.assign(embedding=get_embeddings_wrapper(list(new_df.Requirement)))
new_df.head()

100%|██████████| 20/20 [00:34<00:00,  1.71s/it]


Unnamed: 0,Requirement ID,Requirement,embedding
0,ID_NEW_1,"The sun set behind the towering mountains, cas...","[-0.06087258830666542, 0.02693977579474449, -0..."
1,ID_NEW_2,She smiled warmly as she received the unexpect...,"[0.021243004128336906, 0.052161142230033875, -..."
2,ID_NEW_3,"The cat purred contentedly on the windowsill, ...","[0.007877746596932411, 0.0007245641900226474, ..."
3,ID_NEW_4,"Rain tapped gently against the roof, creating ...","[-0.02790035679936409, 0.008891325443983078, 9..."
4,ID_NEW_5,He finished the marathon with a triumphant che...,"[-0.0031797306146472692, 0.003588359570130706,..."


In [35]:
# get embeddings for the question titles and add them as "embedding" column
old_df = old_df.assign(embedding=get_embeddings_wrapper(list(old_df.Requirement)))
old_df.head()

100%|██████████| 20/20 [00:32<00:00,  1.63s/it]


Unnamed: 0,Requirement ID,Requirement,embedding
0,ID_OLD_1,She found solace watching the sunset from her ...,"[-0.056000467389822006, -0.023922106251120567,..."
1,ID_OLD_2,The background was filled with the steady tick...,"[-0.008554883301258087, 0.012233082205057144, ..."
2,ID_OLD_3,The kitchen was enveloped in the enticing arom...,"[-0.03560804948210716, 0.019117780029773712, 0..."
3,ID_OLD_4,"The room was quiet, save for the gentle hum of...","[0.02677360363304615, -0.014026092365384102, 0..."
4,ID_OLD_5,"By the river's edge, the old man quietly fishe...","[0.023639937862753868, 0.0027894822414964437, ..."


## Look at the embedding similarities

Let's see how these embeddings are organized in the embedding space with their meanings by quickly calculating the similarities between them and sorting them.

As embeddings are vectors, you can calculate similarity between two embeddings by using one of the popular metrics like the followings:

![](https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/8.png)

Which metric should we use? Usually it depends on how each model is trained. In case of the model `textembedding-gecko@001`, we need to use inner product (dot product).

In the following code, it picks up one question randomly and uses the numpy `np.dot` function to calculate the similarities between the question and other questions.

In [37]:
# Convert the embeddings to numpy arrays for similarity calculation
new_embeddings = np.array(new_df['embedding'].tolist())
old_embeddings = np.array(old_df['embedding'].tolist())

# Calculate the similarity matrix using dot product
similarity_matrix = np.dot(new_embeddings, old_embeddings.T)

# Find the most similar requirement in the old excel for each requirement in the new excel
most_similar_indices = np.argmax(similarity_matrix, axis=1)
most_similar_scores = np.max(similarity_matrix, axis=1)

# Create the comparison dataframe
comparison_df = pd.DataFrame({
    'new_requirement_id': new_df['Requirement ID'],
    'new_requirement_text': new_df['Requirement'],
    'most_similar_requirement_text': old_df['Requirement'].iloc[most_similar_indices].values,
    'most_similar_requirement_id': old_df['Requirement ID'].iloc[most_similar_indices].values,
    'matching_percentage': most_similar_scores
})


In [39]:
comparison_df


Unnamed: 0,new_requirement_id,new_requirement_text,most_similar_requirement_text,most_similar_requirement_id,matching_percentage
0,ID_NEW_1,"The sun set behind the towering mountains, cas...","The sun cast a golden glow over the valley, pa...",ID_OLD_37,0.859175
1,ID_NEW_2,She smiled warmly as she received the unexpect...,"Unexpectedly, she received a gift from her dea...",ID_OLD_90,0.963568
2,ID_NEW_3,"The cat purred contentedly on the windowsill, ...",The contented cat purred softly on the windows...,ID_OLD_36,0.962184
3,ID_NEW_4,"Rain tapped gently against the roof, creating ...",Raindrops provided a soothing rhythm on the roof.,ID_OLD_46,0.884464
4,ID_NEW_5,He finished the marathon with a triumphant che...,Crisp fall mornings invigorated him.,ID_OLD_95,0.630703
...,...,...,...,...,...
95,ID_NEW_96,The aroma of fresh herbs filled the kitchen.,The kitchen was infused with the aromatic scen...,ID_OLD_38,0.965957
96,ID_NEW_97,She enjoyed the cool breeze on a summer evening.,She found solace watching the sunset from her ...,ID_OLD_1,0.711254
97,ID_NEW_98,The sound of waves crashing was a constant rem...,Waves crashing reminded him of the sea's power.,ID_OLD_49,0.889294
98,ID_NEW_99,He loved the crispness of a fall afternoon.,Crisp fall mornings invigorated him.,ID_OLD_95,0.835081


In [26]:
import random
import numpy as np

# pick one of them as a key question
key = random.randint(0, len(new_df))

# calc dot product between the key and other questions
embs = np.array(new_df.embedding.to_list())
similarities = np.dot(embs[key], embs.T)

# print similarities for the first 5 questions
similarities[:5]

array([0.526026  , 0.55304776, 0.45136549, 0.56357067, 0.4500969 ])

Finally, sort the questions with the similarities and print the list.

In [28]:
# print the question
print(f"Key question: {new_df.Requirement[key]}\n")

# sort and print the questions by similarities
sorted_questions = sorted(
    zip(new_df.Requirement, similarities), key=lambda x: x[1], reverse=True
)[:20]
for i, (question, similarity) in enumerate(sorted_questions):
    print(f"{similarity:.4f} {question}")

Key question: She felt the soft sand beneath her feet at the shore.

1.0000 She felt the soft sand beneath her feet at the shore.
0.7720 The warm sand felt good under his feet.
0.6990 She enjoyed the peacefulness of the empty beach.
0.6975 She enjoyed a peaceful walk along the beach at dusk.
0.6560 He felt a sense of peace standing by the ocean.
0.6384 She enjoyed the sweet taste of the ripe fruit.
0.6374 She loved the smell of freshly cut grass in the summer.
0.6316 She found joy in the simplicity of a morning walk.
0.6293 She found comfort in the familiar scent of home.
0.6291 Waves crashed rhythmically on the shore, echoing the heartbeat of the ocean, and creating a mesmerizing, eternal dance.
0.6282 The sound of waves crashing was a constant reminder of the sea.
0.6268 The gentle lapping of the lake was peaceful.
0.6246 She enjoyed the cool breeze on a summer evening.
0.6232 She appreciated the beauty of a starry night.
0.6216 The sound of waves lapping against the boat was soothin

# Find embeddings fast with Vertex AI Vector Search

As we have explained above, you can find similar embeddings by calculating the distance or similarity between the embeddings.

But this isn't easy when you have millions or billions of embeddings. For example, if you have 1 million embeddings with 768 dimensions, you need to repeat the distance calculations for 1 million x 768 times. This would take some seconds - too slow.

So the researchers have been studying a technique called [Approximate Nearest Neighbor (ANN)](https://en.wikipedia.org/wiki/Nearest_neighbor_search) for faster search. ANN uses "vector quantization" for separating the space into multiple spaces with a tree structure. This is similar to the index in relational databases for improving the query performance, enabling very fast and scalable search with billions of embeddings.

With the rise of LLMs, the ANN is getting popular quite rapidly, known as the Vector Search technology.

![](https://storage.googleapis.com/gweb-cloudblog-publish/images/7._ANN.1143068821171228.max-2200x2200.png)

In 2020, Google Research published a new ANN algorithm called [ScaNN](https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html). It is considered one of the best ANN algorithms in the industry, also the most important foundation for search and recommendation in major Google services such as Google Search, YouTube and many others.


## What is Vertex AI Vector Search?

Google Cloud developers can take the full advantage of Google's vector search technology with [Vertex AI Vector Search](https://cloud.google.com/vertex-ai/docs/vector-search/overview) (previously called Matching Engine). With this fully managed service, developers can just add the embeddings to its index and issue a search query with a key embedding for the blazingly fast vector search. In the case of the Stack Overflow demo, Vector Search can find relevant questions from 8 million embeddings in tens of milliseconds.

![](https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/9.png)

With Vector Search, you don't need to spend much time and money building your own vector search service from scratch or using open source tools if your goal is high scalability, availability and maintainability for production systems.

## Get Started with Vector Search

When you already have the embeddings, then getting started with Vector Search is pretty easy. In this section, we will follow the steps below.

### Setting up Vector Search
- Save the embeddings in JSON files on Cloud Storage
- Build an Index
- Create an Index Endpoint
- Deploy the Index to the endpoint

### Use Vector Search

- Query with the endpoint

### **Tip for Colab users**

If you use Colab for this tutorial, you may lose your runtime while you are waiting for the Index building and deployment in the later sections as it takes tens of minutes. In that case, run the following sections again with the new instance to recover the runtime: [Install Python SDK, Environment variables and Authentication](https://colab.research.google.com/drive/1xJhLFEyPqW0qvKiERD6aYgeTHa6_U50N?resourcekey=0-2qUkxckCjt6W03AsqvZHhw#scrollTo=AtXnXhF8U-8R&line=9&uniqifier=1).

Then, use the [Utilities](https://colab.research.google.com/drive/1xJhLFEyPqW0qvKiERD6aYgeTHa6_U50N?resourcekey=0-2qUkxckCjt6W03AsqvZHhw#scrollTo=BE1tELsH-u8N&line=1&uniqifier=1) to recover the Index and Index Endpoint and continute with the rest.

### Save the embeddings in a JSON file
To load the embeddings to Vector Search, we need to save them in JSON files with JSONL format. See more information in the docs at [Input data format and structure](https://cloud.google.com/vertex-ai/docs/matching-engine/match-eng-setup/format-structure#data-file-formats).

First, export the `id` and `embedding` columns from the DataFrame in JSONL format, and save it.

In [30]:
# save id and embedding as a json file
jsonl_string = new_df[["Requirement", "Requirement ID"]].to_json(orient="records", lines=True)
with open("questions.json", "w") as f:
    f.write(jsonl_string)

# show the first few lines of the json file
#! head -n 3 questions.json

'head' is not recognized as an internal or external command,
operable program or batch file.


Then, create a new Cloud Storage bucket and copy the file to it.

In [31]:
'''
BUCKET_URI = f"gs://{PROJECT_ID}-embvs-tutorial-{UID}"
! gsutil mb -l $LOCATION -p {PROJECT_ID} {BUCKET_URI}
! gsutil cp questions.json {BUCKET_URI}
'''

'\nBUCKET_URI = f"gs://{PROJECT_ID}-embvs-tutorial-{UID}"\n! gsutil mb -l $LOCATION -p {PROJECT_ID} {BUCKET_URI}\n! gsutil cp questions.json {BUCKET_URI}\n'

### Create an Index

Now it's ready to load the embeddings to Vector Search. Its APIs are available under the [aiplatform](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform) package of the SDK.

In [32]:
'''
# init the aiplatform package
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)
'''

'\n# init the aiplatform package\nfrom google.cloud import aiplatform\n\naiplatform.init(project=PROJECT_ID, location=LOCATION)\n'

Create an [MatchingEngineIndex](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.MatchingEngineIndex) with its `create_tree_ah_index` function (Matching Engine is the previous name of Vector Search).

In [34]:

'''
# create index
my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name=f"embvs-tutorial-index-{UID}",
    contents_delta_uri=BUCKET_URI,
    dimensions=768,
    approximate_neighbors_count=20,
    distance_measure_type="DOT_PRODUCT_DISTANCE",
)
'''

NameError: name 'aiplatform' is not defined

By calling the `create_tree_ah_index` function, it starts building an Index. This will take under a few minutes if the dataset is small, otherwise about 50 minutes or more depending on the size of the dataset. You can check status of the index creation on [the Vector Search Console > INDEXES tab](https://console.cloud.google.com/vertex-ai/matching-engine/indexes).

![](https://storage.googleapis.com/github-repo/img/embeddings/vs-quickstart/creating-index.png)

#### The parameters for creating index

- `contents_delta_uri`: The URI of Cloud Storage directory where you stored the embedding JSON files
- `dimensions`: Dimension size of each embedding. In this case, it is 768 as we are using the embeddings from the Text Embeddings API.
- `approximate_neighbors_count`: how many similar items we want to retrieve in typical cases
- `distance_measure_type`: what metrics to measure distance/similarity between embeddings. In this case it's `DOT_PRODUCT_DISTANCE`

See [the document](https://cloud.google.com/vertex-ai/docs/vector-search/create-manage-index) for more details on creating Index and the parameters.

#### Batch Update or Streaming Update?
There are two types of index: Index for *Batch Update* (used in this tutorial) and Index for *Streaming Updates*. The Batch Update index can be updated with a batch process whereas the Streaming Update index can be updated in real-time. The latter one is more suited for use cases where you want to add or update each embeddings in the index more often, and crucial to serve with the latest embeddings, such as e-commerce product search.



### Create Index Endpoint and deploy the Index

To use the Index, you need to create an [Index Endpoint](https://cloud.google.com/vertex-ai/docs/vector-search/deploy-index-public). It works as a server instance accepting query requests for your Index.

In [None]:
# create IndexEndpoint
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
    display_name=f"embvs-tutorial-index-endpoint-{UID}",
    public_endpoint_enabled=True,
)

This tutorial utilizes a [Public Endpoint](https://cloud.google.com/vertex-ai/docs/vector-search/setup/setup#choose-endpoint) and does not support [Virtual Private Cloud (VPC)](https://cloud.google.com/vpc/docs/private-services-access). Unless you have a specific requirement for VPC, we recommend using a Public Endpoint. Despite the term "public" in its name, it does not imply open access to the public internet. Rather, it functions like other endpoints in Vertex AI services, which are secured by default through IAM. Without explicit IAM permissions, as we have previously established, no one can access the endpoint.

With the Index Endpoint, deploy the Index by specifying an unique deployed index ID.

In [None]:
DEPLOYED_INDEX_ID = f"embvs_tutorial_deployed_{UID}"

In [None]:
# deploy the Index to the Index Endpoint
my_index_endpoint.deploy_index(index=my_index, deployed_index_id=DEPLOYED_INDEX_ID)

If it is the first time to deploy an Index to an Index Endpoint, it will take around 25 minutes to automatically build and initiate the backend for it. After the first deployment, it will finish in seconds. To see the status of the index deployment, open [the Vector Search Console > INDEX ENDPOINTS tab](https://console.cloud.google.com/vertex-ai/matching-engine/index-endpoints) and click the Index Endpoint.

<img src="https://storage.googleapis.com/github-repo/img/embeddings/vs-quickstart/deploying-index.png" width="70%">

### Run Query

Finally it's ready to use Vector Search. In the following code, it creates an embedding for a test question, and find similar question with the Vector Search.

In [None]:
test_embeddings = get_embeddings_wrapper(["How to read JSON with Python?"])

In [None]:
# Test query
response = my_index_endpoint.find_neighbors(
    deployed_index_id=DEPLOYED_INDEX_ID,
    queries=test_embeddings,
    num_neighbors=20,
)

# show the result
import numpy as np

for idx, neighbor in enumerate(response[0]):
    id = np.int64(neighbor.id)
    similar = df.query("id == @id", engine="python")
    print(f"{neighbor.distance:.4f} {similar.title.values[0]}")

The `find_neighbors` function only takes milliseconds to fetch the similar items even when you have billions of items on the Index, thanks to the ScaNN algorithm. Vector Search also supports [autoscaling](https://cloud.google.com/vertex-ai/docs/vector-search/deploy-index-public#autoscaling) which can automatically resize the number of nodes based on the demands of your workloads.

# IMPORTANT: Cleaning Up

In case you are using your own Cloud project, not a temporary project on Qwiklab, please make sure to delete all the Indexes, Index Endpoints and Cloud Storage buckets after finishing this tutorial. Otherwise the remaining objects would **incur unexpected costs**.

If you used Workbench, you may also need to delete the Notebooks from [the console](https://console.cloud.google.com/vertex-ai/workbench).

In [None]:
# wait for a confirmation
input("Press Enter to delete Index Endpoint, Index and Cloud Storage bucket:")

# delete Index Endpoint
my_index_endpoint.undeploy_all()
my_index_endpoint.delete(force=True)

# delete Index
my_index.delete()

# delete Cloud Storage bucket
! gsutil rm -r {BUCKET_URI}

# Summary

## Grounding LLM outputs with Vertex AI Vector Search

As we have seen, by combining the Embeddings API and Vector Search, you can use the embeddings to "ground" LLM outputs to real business data with low latency.

For example, if an user asks a question, Embeddings API can convert it to an embedding, and issue a query on Vector Search to find similar embeddings in its index. Those embeddings represent the actual business data in the databases. As we are just retrieving the business data and not generating any artificial texts, there is no risk of having hallucinations in the result.

![](https://storage.googleapis.com/gweb-cloudblog-publish/original_images/10._grounding.png)

### The difference between the questions and answers

In this tutorial, we have used the Stack Overflow dataset. There is a reason why we had to use it; As the dataset has many pairs of **questions and answers**, so you can just find questions similar to your question to find answers to it.

In many business use cases, the semantics (meaning) of questions and answers are different. Also, there could be cases where you would want to add variety of recommended or personalized items to the results, like product search on e-commerce sites.

In these cases, the simple semantics search don't work well. It's more like a recommendation system problem where you may want to train a model (e.g. Two-Tower model) to learn the relationship between the question embedding space and answer embedding space. Also, many production systems adds reranking phase after the semantic search to achieve higher search quality. Please see [Scaling deep retrieval with TensorFlow Recommenders and Vertex AI Matching Engine](https://cloud.google.com/blog/products/ai-machine-learning/scaling-deep-retrieval-tensorflow-two-towers-architecture) to learn more.

### Hybrid of semantic + keyword search

Another typical challenge you will face in production system is to support keyword search combined with the semantic search. For example, for e-commerce product search, you may want to let users find product by entering its product name or model number. As LLM doesn't memorize those product names or model numbers, semantic search can't handle those "usual" search functionalities.

[Vertex AI Search](https://cloud.google.com/blog/products/ai-machine-learning/vertex-ai-search-and-conversation-is-now-generally-available) is another product you may consider for those requirements. While Vector Search provides a simple semantic search capability only, Search provides a integrated search solution that combines semantic search, keyword search, reranking and filtering, available as an out-of-the-box tool.

### What about Retrieval Augmented Generation (RAG)?

In this tutorial, we have looked at the simple combination of LLM embeddings and vector search. From this starting point, you may also extend the design to [Retrieval Augmented Generation (RAG)](https://www.google.com/search?q=Retrieval+Augmented+Generation+(RAG)&oq=Retrieval+Augmented+Generation+(RAG)).

RAG is a popular architecture pattern of implementing grounding with LLM with text chat UI. The idea is to have the LLM text chat UI as a frontend for the document retrieval with vector search and summarization of the result.

![](https://storage.googleapis.com/gweb-cloudblog-publish/images/Figure-7-Ask_Your_Documents_Flow.max-529x434.png)

There are some pros and cons between the two solutions.

| | Emb + vector search | RAG |
|---|---|---|
| Design | simple | complex |
| UI | Text search UI | Text chat UI |
| Summarization of result | No | Yes |
| Multi-turn (Context aware) | No | Yes |
| Latency | millisecs | seconds |
| Cost | lower | higher |
| Hallucinations | No risk | Some risk |

The Embedding + vector search pattern we have looked at with this tutorial provides simple, fast and low cost semantic search functionality with the LLM intelligence. RAG adds context-aware text chat experience and result summarization to it. While RAG provides the more "Gen AI-ish" experience, it also adds a risk of hallucination and higher cost and time for the text generation.

To learn more about how to build a RAG solution, you may look at [Building Generative AI applications made easy with Vertex AI PaLM API and LangChain](https://cloud.google.com/blog/products/ai-machine-learning/generative-ai-applications-with-vertex-ai-palm-2-models-and-langchain).

## Resources

To learn more, please check out the following resources:

### Documentations

[Vertex AI Embeddings for Text API documentation
](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings)

[Vector Search documentation](https://cloud.google.com/vertex-ai/docs/matching-engine/overview)

### Vector Search blog posts

[Vertex Matching Engine: Blazing fast and massively scalable nearest neighbor search](https://cloud.google.com/blog/products/ai-machine-learning/vertex-matching-engine-blazing-fast-and-massively-scalable-nearest-neighbor-search)

[Find anything blazingly fast with Google's vector search technology](https://cloud.google.com/blog/topics/developers-practitioners/find-anything-blazingly-fast-googles-vector-search-technology)

[Enabling real-time AI with Streaming Ingestion in Vertex AI](https://cloud.google.com/blog/products/ai-machine-learning/real-time-ai-with-google-cloud-vertex-ai)

[Mercari leverages Google's vector search technology to create a new marketplace](https://cloud.google.com/blog/topics/developers-practitioners/mercari-leverages-googles-vector-search-technology-create-new-marketplace)

[Recommending news articles using Vertex AI Matching Engine](https://cloud.google.com/blog/products/ai-machine-learning/recommending-articles-using-vertex-ai-matching-engine)

[What is Multimodal Search: "LLMs with vision" change businesses](https://cloud.google.com/blog/products/ai-machine-learning/multimodal-generative-ai-search)

# Utilities

Sometimes it takes tens of minutes to create or deploy Indexes and you would lose connection with the Colab runtime. In that case, instead of creating or deploying new Index again, you can check [the Vector Search Console](https://console.cloud.google.com/vertex-ai/matching-engine/index-endpoints) and get the existing ones to continue.

## Get an existing Index

To get an Index object that already exists, replace the following `[your-index-id]` with the index ID and run the cell. You can check the ID on [the Vector Search Console > INDEXES tab](https://console.cloud.google.com/vertex-ai/matching-engine/indexes).

In [None]:
my_index_id = "[your-index-id]"  # @param {type:"string"}
my_index = aiplatform.MatchingEngineIndex(my_index_id)

## Get an existing Index Endpoint

To get an Index Endpoint object that already exists, replace the following `[your-index-endpoint-id]` with the Index Endpoint ID and run the cell. You can check the ID on [the Vector Search Console > INDEX ENDPOINTS tab](https://console.cloud.google.com/vertex-ai/matching-engine/index-endpoints).

In [None]:
my_index_endpoint_id = "[your-index-endpoint-id]"  # @param {type:"string"}
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint(my_index_endpoint_id)