# Vector Search Python SDK example usage

This notebook demonstrates usage of the Vector Search Python SDK, which provides a `VectorSearchClient` as a primary API for working with Vector Search.

Alternatively, you may call the REST API directly.

**Pre-req**: This notebook assumes you have already created a Model Serving endpoint for the embedding model.  See the companion notebook for creating endpoints.and the vector Index

## Similarity search

Query the Vector Index to find similar documents!

In [None]:
%pip install --upgrade --force-reinstall databricks-vectorsearch
dbutils.library.restartPython()

In [None]:
from databricks.vector_search.client import VectorSearchClient
# Automatically generates a PAT Token for authentication
vsc = VectorSearchClient()

# Uses the service principal token for authentication
# client = VectorSearch(service_principal_client_id=<CLIENT_ID>,service_principal_client_secret=<CLIENT_SECRET>)

In [None]:
source_catalog = "vector_database"
source_schema = "vector_search"
source_table = "product"
source_table_fullname = f"{source_catalog}.{source_schema}.{source_table}"
vs_index = "product_vsindex"
vector_search_endpoint_name = "vector-search-demo-endpoint"
vs_index_fullname = f"{source_catalog}.{source_schema}.{vs_index}"

In [None]:
index = vsc.get_index(endpoint_name=vector_search_endpoint_name, index_name=vs_index_fullname)
index.describe()

### Performing Similarity Search and converting the results to a dataframe

In [None]:
from pyspark.sql.functions import *
from pyspark.sql.types import DoubleType
all_columns = spark.table(source_table_fullname).columns

results = index.similarity_search(
  query_text="Databases",
  columns=all_columns)

ls_results= results.get('result').get('data_array')
df = spark.createDataFrame(data = ls_results, schema = "category STRING , comment STRING ,id STRING ,title STRING ,distance STRING")
df=df.withColumn('distance',lit(df.distance).cast(DoubleType()))
#display(df)

### Returning best five search results 

In [None]:
df_result= df.select(df.title, df.category).sort(asc('distance')).limit(5)
display(df_result)

## Delete vector index

In [None]:
#vsc.delete_index(endpoint_name=vector_search_endpoint_name,index_name=vs_index_fullname)