# Vector Search Python SDK example usage

This notebook demonstrates usage of the Vector Search Python SDK, which provides a `VectorSearchClient` as a primary API for working with Vector Search.
Alternatively, you may call the REST API directly.
For additional documentation please review: 
https://learn.microsoft.com/en-us/azure/databricks/generative-ai/create-query-vector-search and 
https://www.databricks.com/blog/introducing-databricks-vector-search-public-preview

**Requirements**: This notebook assumes you have the following:
- Unity Catalog enabled workspace.
- Serverless compute is enabled.
- Source table Change Data Feed enabled. (If you are reading preexisting tables)
- CREATE TABLE privileges on catalog schema(s) to create indexes.
- Personal access tokens enabled.

In [None]:
%pip install --upgrade --force-reinstall databricks-vectorsearch
dbutils.library.restartPython()

In [None]:
from databricks.vector_search.client import VectorSearchClient
# Automatically generates a PAT Token for authentication
vsc = VectorSearchClient()

# Uses the service principal token for authentication
# client = VectorSearch(service_principal_client_id=<CLIENT_ID>,service_principal_client_secret=<CLIENT_SECRET>)

## Load sample dataset Products.json into source Delta table

The following creates the source Delta table.

In [None]:
source_catalog = "vector_database"
source_schema = "vector_search"
source_table = "product"
source_table_fullname = f"{source_catalog}.{source_schema}.{source_table}"

In [None]:
# Uncomment if you want to start from scratch by dropping the existing table.
#spark.sql(f"DROP TABLE {source_table_fullname}")

In [None]:
# Mount the ADLS storage location and specify the filepath.
source_df =spark.read.option("multiline","true").json("dbfs:/mnt/<yourspecificfilepath>/product_docs.json")
display(source_df)

In [None]:
# Create a delta table in unity catalog with change data feed enabled
source_df.write.format("delta").option("delta.enableChangeDataFeed", "true").saveAsTable(source_table_fullname)

In [None]:
# display(spark.sql(f"SELECT * FROM {source_table_fullname}"))


## Create Vector Search Endpoint

In [None]:
vector_search_endpoint_name = "vector-search-demo-endpoint"

In [None]:
vsc.create_endpoint(
    name=vector_search_endpoint_name,
    endpoint_type="STANDARD"
)

In [None]:
index = vsc.get_index(endpoint_name=vector_search_endpoint_name, index_name=vs_index_fullname)
index.describe()


*** Please wait for the endpoint to be created before moving to the next step

## Create Vector Index

In [None]:
# Vector index
vs_index = "product_vsindex"
vs_index_fullname = f"{source_catalog}.{source_schema}.{vs_index}"
embedding_model_endpoint = "databricks-bge-large-en"


In [None]:
index = vsc.create_delta_sync_index(
  endpoint_name=vector_search_endpoint_name,
  source_table_name=source_table_fullname,
  index_name=vs_index_fullname,
  pipeline_type='TRIGGERED',
  primary_key="id",
  embedding_source_column="content",
  embedding_model_endpoint_name=embedding_model_endpoint
)
index.describe()


*** Please wait for the Vector Index to be created before moving to the next step

## Get a vector index  

Use the get_index() method to retrieve the vector index object using the vector index name. You can also use the describe() method on the index object to see a summary of the index's configuration information.

In [None]:
index = vsc.get_index(endpoint_name=vector_search_endpoint_name, index_name=vs_index_fullname)
index.describe()

In [None]:
# Wait for index to become online. Expect this command to take several minutes.
import time
while not index.describe().get('index_status').get('status').startswith('ONLINE'):
  print("Waiting for index to be ONLINE...")
  time.sleep(5)
print("Index is ONLINE")
index.describe()

## Delete Vector Index

In [None]:
#vsc.delete_index(index_name=vs_index_fullname)