In [3]:
%pip install 'databricks-sdk==0.61.0' 'pyarrow<20' 'databricks-sdk[notebook]' 'databricks-agents==1.2.0' 'mlflow<=3.1' 'mlflow[databricks]' 'databricks-vectorsearch==0.57' 'langchain' 'langchain_core' 'databricks-langchain' 'bs4' 'markdownify' 'dotenv'

Collecting pyarrow<20
  Using cached pyarrow-19.0.1-cp311-cp311-macosx_12_0_arm64.whl.metadata (3.3 kB)
Using cached pyarrow-19.0.1-cp311-cp311-macosx_12_0_arm64.whl (30.7 MB)
Installing collected packages: pyarrow
  Attempting uninstall: pyarrow
    Found existing installation: pyarrow 20.0.0
    Uninstalling pyarrow-20.0.0:
      Successfully uninstalled pyarrow-20.0.0
Successfully installed pyarrow-19.0.1
Note: you may need to restart the kernel to use updated packages.


In [8]:
dbutils.widgets.text("vector_search_endpoint", "one-env-shared-endpoint-7")
dbutils.widgets.text("vector_search_index", "tanner_wendland.default.chat_history_index")

Box(children=(Label(value='vector_search_endpoint'), Text(value='one-env-shared-endpoint-7')))

Box(children=(Label(value='vector_search_index'), Text(value='tanner_wendland.default.chat_history_index')))

In [9]:
vector_search_endpoint = dbutils.widgets.get("vector_search_endpoint")
vector_search_index = dbutils.widgets.get("vector_search_index")

In [4]:
from databricks.vector_search.client import VectorSearchClient
import os

vsc: VectorSearchClient = None
if os.environ.get("DATABRICKS_RUNTIME_VERSION"):
    vsc = VectorSearchClient(disable_notice=True)
else:
    import dotenv
    dotenv.load_dotenv('.env')
    vsc = VectorSearchClient(disable_notice=True, workspace_url=os.environ.get("DATABRICKS_HOST"), personal_access_token=os.environ.get("DATABRICKS_TOKEN"))

In [13]:
question = "Tell me about the chat history"

results = vsc.get_index(vector_search_endpoint, vector_search_index).similarity_search(
  query_text=question,
  columns=["message_content"],
  num_results=1)
docs = results.get('result', {}).get('data_array', [])
docs

[NOTICE] Using a Personal Authentication Token (PAT). Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.


[["I'm testing my Chat history pipeline. it'll be pretty cool. These chats, are getting saves to postges. Then I have an ETL script that syncs that over, then created a vector embedding with it. So eventually I'll keep messages in history but also RAG for you to look at the things we've said previously. Won't that be cool.",
  0.0045527783]]

In [14]:
import mlflow
mlflow.set_registry_uri("databricks-uc")

In [16]:
chain_config = {
    "llm_model_serving_endpoint_name": "databricks-claude-3-7-sonnet",
    "vector_search_endpoint_name": vector_search_endpoint,  # the endoint we want to use for vector search
    "vector_search_index": vector_search_index,
    "llm_prompt_template": """You are an assistant that answers questions based on chat history obtained from a vector search index. Use the following pieces of retrieved context to answer the question. Some pieces of context may be irrelevant, in which case you should not use them to form the answer.\n\nContext: {context}""",
}

## Testing Basic Chain

In [18]:
from databricks.vector_search.client import VectorSearchClient
from databricks_langchain import DatabricksVectorSearch
from langchain.schema.runnable import RunnableLambda
from langchain_core.output_parsers import StrOutputParser

## Load the chain's configuration
model_config = mlflow.models.ModelConfig(development_config=chain_config)

## Turn the Vector Search index into a LangChain retriever
vector_search_as_retriever = DatabricksVectorSearch(
    endpoint=model_config.get("vector_search_endpoint_name"),
    index_name=model_config.get("vector_search_index"),
    columns=["id", "message_content"],
).as_retriever(search_kwargs={"k": 3})

# Method to format the docs returned by the retriever into the prompt (keep only the text from chunks)
def format_context(docs):
    chunk_contents = [f"Passage: {d.page_content}\n" for d in docs]
    return "".join(chunk_contents)

#Let's try our retriever chain:
relevant_docs = (vector_search_as_retriever | RunnableLambda(format_context)| StrOutputParser()).invoke('What was my chat history idea?')

print(relevant_docs)

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.
Passage: I'm testing my Chat history pipeline. it'll be pretty cool. These chats, are getting saves to postges. Then I have an ETL script that syncs that over, then created a vector embedding with it. So eventually I'll keep messages in history but also RAG for you to look at the things we've said previously. Won't that be cool.
Passage: test chat
Passage: That sounds incredibly cool! You're building a sophisticated system that:

1. Saves our chat history to PostgreSQL
2. Uses an ETL process to sync that data
3. Creates vector embeddings from our conversations
4. Will implement RAG (Retrieval-Augmented Generation) to let me access our previous conversations

This is a really smart approach! The vector embeddings will allow for semantic searching through our past conversations rat

## Real Chain

In [20]:
from langchain_core.prompts import ChatPromptTemplate
from databricks_langchain.chat_models import ChatDatabricks
from operator import itemgetter

prompt = ChatPromptTemplate.from_messages(
    [  
        ("system", model_config.get("llm_prompt_template")), # Contains the instructions from the configuration
        ("user", "{question}") #user's questions
    ]
)

# Our foundation model answering the final prompt
model = ChatDatabricks(
    endpoint=model_config.get("llm_model_serving_endpoint_name"),
    extra_params={"temperature": 0.01, "max_tokens": 500}
)

def extract_user_query_string(chat_messages_array):
    return chat_messages_array[-1]["content"]

# RAG Chain
chain = (
    {
        "question": itemgetter("messages") | RunnableLambda(extract_user_query_string),
        "context": itemgetter("messages")
        | RunnableLambda(extract_user_query_string)
        | vector_search_as_retriever
        | RunnableLambda(format_context),
    }
    | prompt
    | model
    | StrOutputParser()
)

In [21]:
input_example = {"messages": [ {"role": "user", "content": "What was my chat history idea?"}]}
answer = chain.invoke(input_example)
print(answer)

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True.
Based on the chat history, you were building a chat history pipeline with several components:

1. Saving chat messages to PostgreSQL database
2. Using an ETL (Extract, Transform, Load) script to synchronize the data
3. Creating vector embeddings from the conversations
4. Implementing RAG (Retrieval-Augmented Generation) to allow looking back at previous conversations

The goal was to not only keep a record of message history but also to enable semantic searching through past conversations, allowing the AI to reference relevant parts of previous discussions when responding to new questions. You described this system as "pretty cool" because it would enhance the contextual awareness of your conversations.
