
# Ticket-Resolving RAG Application

In this notebook, we create a complete Ticket-Resolving Retrieval-Augmented Generation (RAG) application by assembling and evaluating the components developed so far.

We begin by constructing a Vector Search Index and connecting all parts of the RAG pipeline, which enables efficient retrieval of relevant information for ticket resolution. Following this, we evaluate the RAG pipeline's performance, register the model, and deploy a Model Serving Endpoint to make the RAG system accessible.


Install required libraries and helper funcations

In [0]:
%pip install -U --quiet mlflow==2.14.3 databricks-vectorsearch==0.40 transformers==4.44.0 langchain==0.2.11 langchain-community==0.2.10 pydantic==2.8.2 flashrank==0.2.8 accelerate PyPDF2
dbutils.library.restartPython()

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
%run ../Includes/_helper_functions


### Setup the Retriever

We have set up the Vector Search endpoint created in previous notebook as the retriever. This retriever will return 2 relevant documents based on the query.

In [0]:
# components we created before
# assign vs search endpoint by username
vs_endpoint_prefix = "vs_endpoint_"

vs_endpoint_name = vs_endpoint_prefix + "1"
print(f"Assigned Vector Search endpoint name: {vs_endpoint_name}.")

vs_index_fullname = "workspace.default.pdf_text_self_managed_vs_index"

Assigned Vector Search endpoint name: vs_endpoint_1.


In [0]:
host = "https://"+spark.conf.get("spark.databricks.workspaceUrl")
print(host)


INFO:py4j.clientserver:Received command c on object id p0


https://dbc-540a8fab-a6c9.cloud.databricks.com


In [0]:
from databricks.vector_search.client import VectorSearchClient
from langchain.embeddings import DatabricksEmbeddings
from langchain_core.runnables import RunnableLambda
from langchain.docstore.document import Document
from flashrank import Ranker, RerankRequest
from langchain_community.chat_models import ChatDatabricks

def get_retriever(cache_dir="/tmp"):

    def retrieve(query, k: int=10):
        if isinstance(query, dict):
            query = next(iter(query.values()))

        # get the vector search index
        vsc = VectorSearchClient(disable_notice=True)
        vs_index = vsc.get_index(endpoint_name=vs_endpoint_name, index_name=vs_index_fullname)
        
        # get the query vector
        embeddings = DatabricksEmbeddings(endpoint="RAGdatabricks-gte-base-en")
        query_vector = embeddings.embed_query(query)
        
        # get similar k documents
        return query, vs_index.similarity_search(
            query_vector=query_vector,
            columns=["pdf_name", "content"],
            num_results=k)

    def rerank(query, retrieved, cache_dir, k: int=2):
        # format result to align with reranker lib format 
        passages = []
        for doc in retrieved.get("result", {}).get("data_array", []):
            new_doc = {"file": doc[0], "text": doc[1]}
            passages.append(new_doc)       
        # Load the flashrank ranker
        ranker = Ranker(model_name="rank-T5-flan", cache_dir=cache_dir)

        # rerank the retrieved documents
        rerankrequest = RerankRequest(query=query, passages=passages)
        results = ranker.rerank(rerankrequest)[:k]

        # format the results of rerank to be ready for prompt
        return [Document(page_content=r.get("text"), metadata={"source": r.get("file")}) for r in results]

    # the retriever is a runnable sequence of retrieving and reranking.
    return RunnableLambda(retrieve) | RunnableLambda(lambda x: rerank(x[0], x[1], cache_dir))

# test our retriever
question = {"input": "What is Spam??"}
vectorstore = get_retriever(cache_dir ="/Workspace/Shared/Ticket Resolution using Gen AI/opt")
similar_documents = vectorstore.invoke(question)
print(f"Relevant documents: {similar_documents}")

Relevant documents: [Document(metadata={'source': 'dbfs:/Volumes/workspace/default/raw_data/KB0000029.pdf'}, page_content="It is critical that you pause and think before replying to \nany spam. Consider the following guidelines: \nSetting up your email account to generate automatic responses while you are away can have the unfortunate side-effect of verifying your email address to \nevery spammer that sends you spam.\xa0If the message appears to come from a legitimate company, the company may have obtained your email address from \nsome transaction between you and the company. In fact, you may have inadvertently provided your email address (e.g., if you didn't check a box \nmarked\xa0Don't send me product updates). In these cases, it is usually safe to reply and ask to be removed from the mailing list.\xa0If it is not a company you \nrecognize, use your judgment. To be safe, copy and paste the link to the company's site into the browser rather than clicking it in the email message.\xa0


### Setup the Foundation Model

Building the Databricks Chat Model to query the "llama-3-1-8b-instruct" foundation model.

In [0]:
from langchain_community.chat_models import ChatDatabricks

# test Databricks Foundation LLM model
chat_model = ChatDatabricks(endpoint="RAGdatabricks-meta-llama-3-1-8b-instruct", max_tokens = 300)
print(f"Test chat model: {chat_model.invoke('How to reduce the amount of spam you receive?')}")

Test chat model: content="The eternal struggle against spam! Here are some effective ways to reduce the amount of spam you receive:\n\n1. **Unsubscribe from unwanted emails**: Look for the unsubscribe link at the bottom of spam emails and remove yourself from the sender's list. Be cautious, as some spammers may use fake unsubscribe links to harvest more email addresses.\n2. **Use a spam filter**: Most email providers, such as Gmail, Yahoo, and Outlook, have built-in spam filters. You can also use third-party spam filters like SpamAssassin or SpamSieve.\n3. **Report spam**: Mark emails as spam or report them to your email provider. This helps them improve their spam detection algorithms.\n4. **Use a strong password**: Use a unique, complex password for your email account. This will make it harder for spammers to guess or crack your password.\n5. **Don't respond to spam**: Avoid responding to spam emails, as this can confirm to the spammer that your email address is active and may lead t

In [0]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import PromptTemplate



TEMPLATE = """You are an assistant for incident resolution within a ServiceNow environment, specializing in providing solutions based on relevant knowledge base (KB) articles. You answer questions related to resolving ServiceNow incidents by referencing similar incidents and solutions documented in KB articles. If the question is outside the scope of ServiceNow incident resolution, kindly decline to answer. If the relevant information is not available, simply state that the answer is not available. Keep the response concise and focused.

Use the following pieces of context from the KB articles to answer the question below and provide the KB article number:

<context>
{context}
</context>

Question: {input}

Answer:
"""
prompt = PromptTemplate(template=TEMPLATE, input_variables=["context", "input"])

# unwrap the longchain document from the context to be a dict so we can register the signature in mlflow
def unwrap_document(answer):
  return answer | {"context": [{"metadata": r.metadata, "page_content": r.page_content} for r in answer["context"]]}

question_answer_chain = create_stuff_documents_chain(chat_model, prompt)
chain = create_retrieval_chain(get_retriever(), question_answer_chain)|RunnableLambda(unwrap_document)



INFO:py4j.clientserver:Received command c on object id p0


In [0]:
question = {"input": "How to reduce the amount of spam you receive?"}
answer = chain.invoke(question)
print(answer)

INFO:flashrank.Ranker:Downloading rank-T5-flan...
rank-T5-flan.zip:   0%|          | 0.00/73.7M [00:00<?, ?iB/s]rank-T5-flan.zip:   6%|▌         | 4.45M/73.7M [00:00<00:01, 46.2MiB/s]rank-T5-flan.zip:  12%|█▏        | 8.86M/73.7M [00:00<00:01, 44.4MiB/s]rank-T5-flan.zip:  18%|█▊        | 13.1M/73.7M [00:00<00:01, 43.5MiB/s]rank-T5-flan.zip:  23%|██▎       | 17.3M/73.7M [00:00<00:01, 43.1MiB/s]rank-T5-flan.zip:  29%|██▉       | 21.4M/73.7M [00:00<00:01, 43.2MiB/s]rank-T5-flan.zip:  35%|███▍      | 25.5M/73.7M [00:00<00:01, 42.9MiB/s]rank-T5-flan.zip:  40%|████      | 29.7M/73.7M [00:00<00:01, 42.8MiB/s]rank-T5-flan.zip:  46%|████▌     | 34.0M/73.7M [00:00<00:00, 42.8MiB/s]rank-T5-flan.zip:  52%|█████▏    | 38.2M/73.7M [00:00<00:00, 42.8MiB/s]rank-T5-flan.zip:  58%|█████▊    | 42.5M/73.7M [00:01<00:00, 43.0MiB/s]rank-T5-flan.zip:  63%|██████▎   | 46.7M/73.7M [00:01<00:00, 43.2MiB/s]rank-T5-flan.zip:  69%|██████▉   | 51.0M/73.7M [00:01<00:00, 43.1MiB/s]rank-T5-flan.zip:  75%

{'input': 'How to reduce the amount of spam you receive?', 'context': [{'metadata': {'source': 'dbfs:/Volumes/workspace/default/raw_data/KB0000028.pdf'}, 'page_content': 'Email Tips and Tricks\n2019-02-21 21:49:44 KB0000028 What are phishing \nscams and how can I \navoid them? Email Tips and Tricks\nKnowledge Details Page 4\nRun By : System Administrator 2024-10-21 23:04:57 Pacific Daylight TimeCreated Article Short description Topic Category Comments\n2019-02-21 20:09:49 KB0000028 What are phishing \nscams and how can I \navoid them? Email Tips and Tricks <p>&nbsp;The best \ndefense against spear \nphishing is to carefully, \nsecurely discard \ninformation (i.e., using a \ncross-cut shredder) that \ncould be used in such an \nattack. Further, be \naware of data that may \nbe relatively easily \nobtainable (e.g., your \ntitle at work, your \nfavorite places, or where \nyou bank), and think \nbefore acting on \nseemingly random \nrequests via email or \nphone.</p>\n2019-02-21 20:09:16 K


#### Save the Model to Model Registry in UC


In [0]:
from mlflow.models import infer_signature
import mlflow
import langchain


# set model registry to UC
mlflow.set_registry_uri("databricks-uc")
model_name = "workspace.default.rag_app_2v"

with mlflow.start_run(run_name="rag_app_2v") as run:
    signature = infer_signature(question, answer)
    model_info = mlflow.langchain.log_model(
        chain,
        loader_fn=get_retriever, 
        artifact_path="chain",
        registered_model_name=model_name,
        pip_requirements=[
            "mlflow==" + mlflow.__version__,
            "langchain==" + langchain.__version__,
            "databricks-vectorsearch",
            "langchain-community==0.2.10",
            "flashrank==0.2.8"
        ],
        input_example=question,
        signature=signature
    )

INFO:py4j.clientserver:Received command c on object id p0
  return load_llm_from_config(config, **kwargs)


Uploading artifacts:   0%|          | 0/35 [00:00<?, ?it/s]

Successfully registered model 'workspace.default.rag_app_2v'.


Uploading artifacts:   0%|          | 0/35 [00:00<?, ?it/s]

Created version '1' of model 'workspace.default.rag_app_2v'.
