In the previous demo, we chunked the raw PDF document pages into small sections, computed the embeddings, and saved it as a Delta Lake table. Our dataset is now ready. 

Next, we'll configure Databricks Vector Search to ingest data from this table.

Vector search index uses a Vector search endpoint to serve the embeddings (you can think about it as your Vector Search API endpoint). <br/>
Multiple Indexes can use the same endpoint. Let's start by creating one.


**Learning Objectives:**

*By the end of this demo, you will be able to;*

* Set up an endpoint for Vector Search.

* Store the embeddings and their metadata using the Vector Search.

* Inspect the Vector Search endpoint and index using the UI. 

* Retrieve documents from the vector store using similarity search.


## REQUIRED - SELECT CLASSIC COMPUTE
Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:
1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

2. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

   - Click **More** in the drop-down.

   - In the **Attach to an existing compute resource** window, use the first drop-down to select your unique cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

2. Find the triangle icon to the right of your compute cluster name and click it.

3. Wait a few minutes for the cluster to start.

4. Once the cluster is running, complete the steps above to select your cluster.

## Requirements

Please review the following requirements before starting the lesson:

* To run this notebook, you need to use one of the following Databricks runtime(s): **17.3.x-cpu-ml-scala2.13**


**ðŸš¨ Important:** This demonstration relies on the resources established in the previous one. Please ensure you have completed the prior demonstration before starting this one.


## Classroom Setup

Install required libraries.

In [0]:
%pip install -U -qqqq databricks-vectorsearch 'mlflow-skinny[databricks]==3.4.0' PyPDF2==3.0.0 databricks-sdk flashrank 
%restart_python

Before starting the demo, run the provided classroom setup script. This script will define configuration variables necessary for the demo. Execute the following cell:

In [0]:
%run ../Includes/Classroom-Setup-03

**Other Conventions:**

Throughout this demo, we'll refer to the object `DA`. This object, provided by Databricks Academy, contains variables such as your username, catalog name, schema name, working directory, and dataset locations. Run the code block below to view these details:

In [0]:
print(f"Username:          {DA.username}")
print(f"Catalog Name:      {DA.catalog_name}")
print(f"Schema Name:       {DA.schema_name}")
print(f"Working Directory: {DA.paths.working_dir}")
print(f"Dataset Location:  {DA.paths.datasets}")

## Demo Overview

As seen in the diagram below, in this demo we will focus on the Vector Search indexing section (highlighted in orange).  


<!-- <img src="https://files.training.databricks.com/images/genai/genai-as-01-rag-pdf-self-managed-3.png" width="100%"> -->


<!--  -->

![genai-as-01-rag-pdf-self-managed-3](../Includes/images/genai-as-01-rag-pdf-self-managed-3.png)

## Create a "Self-Managed" Vector Search Index

Setting up a Databricks Vector Search index involves a few key steps. First, you need to decide on the method of providing vector embeddings. Databricks supports three options: 

* providing a source Delta table containing text data
* **providing a source Delta table that contains pre-calculated embeddings**
* using the Direct Vector API to create an index on embeddings stored in a Delta table

In this demo, we will go with the second method. 

Next, we will **create a vector search endpoint**. And in the final step, we will **create a vector search index** from a Delta table. 




### Setup a Vector Search Endpoint

The first step for creating a Vector Search index is to create a compute endpoint. This endpoint serves the vector search index. You can query and update the endpoint using the REST API or the SDK. 

**ðŸš¨IMPORTANT: Vector Search endpoints must be created before running the rest of the demo.

In [0]:
# assign vs search endpoint by username
# vs_endpoint_prefix = "vs_endpoint_"
# vs_endpoint_name = vs_endpoint_prefix+str(get_fixed_integer(DA.unique_name("_")))

vs_endpoint_name = "vs_endpoint_cetpa"

print(f"Assigned Vector Search endpoint name: {vs_endpoint_name}.")

In [0]:
from databricks.vector_search.client import VectorSearchClient
from databricks.sdk import WorkspaceClient
import databricks.sdk.service.catalog as c

vsc = VectorSearchClient(disable_notice=True)

# check the status of the endpoint
wait_for_vs_endpoint_to_be_ready(vsc, vs_endpoint_name)
print(f"Endpoint named {vs_endpoint_name} is ready.")

### View the Endpoint

After the endpoint is created, you can view your endpoint on the [Vector Search Endpoints UI](#/setting/clusters/vector-search). Click on the endpoint name to see all indexes that are served by the endpoint.

### Connect Delta Table with Vector Search Endpoint

After creating the endpoint, we can create the **vector search index**. The vector search index is created from a Delta table and is optimized to provide real-time approximate nearest neighbor searches. The goal of the search is to identify documents that are similar to the query. 

**Vector search indexes appear in and are governed by Unity Catalog.**

In [0]:
# the table we'd like to index
source_table_fullname = f"{DA.catalog_name}.{DA.schema_name}.pdf_text_embeddings"

# where we want to store our index
vs_index_fullname = f"{DA.catalog_name}.{DA.schema_name}.pdf_text_self_managed_vs_index"

# create or sync the index
if not index_exists(vsc, vs_endpoint_name, vs_index_fullname):
  print(f"Creating index {vs_index_fullname} on endpoint {vs_endpoint_name}...")
  vsc.create_delta_sync_index(
    endpoint_name=vs_endpoint_name,
    index_name=vs_index_fullname,
    source_table_name=source_table_fullname,
    pipeline_type="TRIGGERED", #Sync needs to be manually triggered
    primary_key="id",
    embedding_dimension=1024, #Match your model embedding size (gte)
    embedding_vector_column="embedding"
  )
else:
  # trigger a sync to update our vs content with the new data saved in the table
  vsc.get_index(vs_endpoint_name, vs_index_fullname).sync()

# let's wait for the index to be ready and all our embeddings to be created and indexed
wait_for_index_to_be_ready(vsc, vs_endpoint_name, vs_index_fullname)

## Search for Similar Content

That's all we have to do. Databricks will automatically capture and synchronize new entries in your Delta Lake Table.

Note that depending on your dataset size and model size, index creation can take a few seconds to start and index your embeddings.

Let's give it a try and search for similar content.

**ðŸ“Œ Note:** `similarity_search` also supports a filter parameter. This is useful to add a security layer to your RAG system: you can filter out some sensitive content based on who is doing the call (for example filter on a specific department based on the user preference).


In [0]:
import mlflow.deployments

deploy_client = mlflow.deployments.get_deploy_client("databricks")
question = "How Generative AI impacts humans?"
response = deploy_client.predict(endpoint="databricks-gte-large-en", inputs={"input": [question]})
embeddings = [e["embedding"] for e in response.data]
print(embeddings)

In [0]:
# get similar 5 documents.
results = vsc.get_index(vs_endpoint_name, vs_index_fullname).similarity_search(
  query_vector=embeddings[0],
  columns=["pdf_name", "content"],
  num_results=5)

# format result to align with reranker lib format. 
passages = []
for doc in results.get("result", {}).get("data_array", []):
    new_doc = {"file": doc[0], "text": doc[1]}
    passages.append(new_doc)

pprint(passages)

## Re-ranking Search Results

For re-ranking the results, we will use a very light library. [**`flashrank`**](https://github.com/PrithivirajDamodaran/FlashRank) is an open-source reranking library based on SoTA cross-encoders. The library supports multiple models, and in this example we will use `rank-T5` model. 

After re-ranking you can review the results to check if the order of the results has changed. 

**ðŸ’¡Note:** Re-ranking order varies based on the model used!

In [0]:
from flashrank import Ranker, RerankRequest

# Ensure the model file exists at this path or update the path accordingly
cache_dir = f"{DA.paths.working_dir}/opt"

ranker = Ranker(model_name="rank-T5-flan", cache_dir=cache_dir)

rerankrequest = RerankRequest(query=question, passages=passages)
results = ranker.rerank(rerankrequest)
print(*results[:3], sep="\n\n")


## Clean up Classroom

**ðŸš¨ Warning:** Please refrain from deleting tables created in this demo, as they are required for upcoming demonstrations. To clean up the classroom assets, execute the classroom clean-up script provided in the final demo.


## Conclusion

In this demo, the objective was to generate embeddings from documents and store them in Vector Search. The initial step involved creating a Vector Search index, which required the establishment of a compute endpoint and the creation of an index that is synchronized with a source Delta table. Following this, we conducted a search for the stored indexes using a sample input query.