In [0]:
%pip install -U --quiet databricks-sdk==0.40.0 databricks-agents==0.16.0 databricks-langchain==0.3.0 mlflow[databricks]==2.20.2 databricks-vectorsearch==0.49 langchain==0.3.19 langchain_core==0.3.37 bs4==0.0.2 markdownify==0.14.1 grpcio-status==1.59.3 # Temporary pin: grpcio version to avoid protobuf conflict.
dbutils.library.restartPython()

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m


Percebi um viés com o llama-3-1-70d.
Caso da Marta Suplicy,
Onde a associação com o tema de sexualidade funciona muito bem, mas em relação à máfia do transporte, que está no verbete, ele ignora.

Vou montar outro chat com o mixtral para comparar.

Vou reutilizar o mapa de vetores conforme criado em em "RAG-DHBB-llama-3-1-70b-instruct".

# Deploy our chatbot model with RAG using DBRX

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/rag-basic-chain-1.png?raw=true" style="float: right" width="500px">

We've seen how Databricks makes it easy to ingest and prepare your documents, and deploy a Vector Search index on top of it with just clicks.

Now that our Vector Searc index is ready, let's deploy a langchain application.

## Configuring our Chain parameters

As any appliaction, a RAG chain needs some configuration for each environement (ex: different catalog for test/prod environement). 

Databricks makes this easy with Chain Configurations. You can use this object to configure any value within your app, including the different system prompts and make it easy to test and deploy newer version with better prompt.

In [0]:
VECTOR_SEARCH_ENDPOINT_NAME="dhbb_vs_endpoint"
# For this first basic demo, we'll keep the configuration as a minimum. In real app, you can make all your RAG as a param (such as your prompt template to easily test different prompts!)
chain_config = {
    "llm_model_serving_endpoint_name": "databricks-mixtral-8x7b-instruct",  # the foundation model we want to use
    "vector_search_endpoint_name": VECTOR_SEARCH_ENDPOINT_NAME,  # the endoint we want to use for vector search
    "vector_search_index": f"dhbb.bronze.dhbb_bronze_vs_index",
    "llm_prompt_template": """Você é um assistente que responde perguntas sobre o acervo histório de pessoas políticas no Brasil. Use os seguintes verbetes como  contexto para responder à pergunta. Alguns pedaços de contexto podem ser irrelevantes, nesse caso você não deve usá-los para formar a resposta. Responda em português Brasil por padrão.\n\nContext: {context}""",
}

### Building our Langchain retriever

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/rag-basic-chain-2.png?raw=true" style="float: right" width="500px">

Let's start by building our Langchain retriever. 

It will be in charge of:

* Creating the input question (our Managed Vector Search Index will compute the embeddings for us)
* Calling the vector search index to find similar documents to augment the prompt with 

Databricks Langchain wrapper makes it easy to do in one step, handling all the underlying logic and API call for you.


In [0]:
from pyspark.sql.functions import pandas_udf
import pandas as pd
import pyspark.sql.functions as F
from pyspark.sql.functions import col, udf, length, pandas_udf
import os
import mlflow
import yaml
from typing import Iterator
from mlflow import MlflowClient
mlflow.set_registry_uri('databricks-uc')

# Workaround for a bug fix that is in progress
mlflow.spark.autolog(disable=True)

import warnings
warnings.filterwarnings("ignore")
# Disable MLflow warnings
import logging
logging.getLogger('mlflow').setLevel(logging.ERROR)

In [0]:
def display_txt_as_html(txt):
    txt = txt.replace('\n', '<br/>')
    displayHTML(f'<div style="max-height: 150px">{txt}</div>')

In [0]:
from databricks.vector_search.client import VectorSearchClient
from databricks_langchain.vectorstores import DatabricksVectorSearch
from langchain.schema.runnable import RunnableLambda
from langchain_core.output_parsers import StrOutputParser

## Enable MLflow Tracing
mlflow.langchain.autolog()

## Load the chain's configuration
model_config = mlflow.models.ModelConfig(development_config=chain_config)

## Turn the Vector Search index into a LangChain retriever
vector_search_as_retriever = DatabricksVectorSearch(
    endpoint=model_config.get("vector_search_endpoint_name"),
    index_name=model_config.get("vector_search_index"),
    columns=["file_name", "content"],
).as_retriever(search_kwargs={"k": 3})

# Method to format the docs returned by the retriever into the prompt (keep only the text from chunks)
def format_context(docs):
    chunk_contents = [f"Passage: {d.page_content}\n" for d in docs]
    return "".join(chunk_contents)

#Let's try our retriever chain:
relevant_docs = (vector_search_as_retriever | RunnableLambda(format_context)| StrOutputParser()).invoke('Quem foi José Siqueira?')

display_txt_as_html(relevant_docs)

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().


Trace(request_id=tr-baed7f83ca494fc995c2baaad4d5e234)

You can see in the results that Databricks automatically trace your chain details and you can debug each steps and review the documents retrieved.

## Building Databricks Chat Model to query our demo's Foundational LLM

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/rag-basic-chain-3.png?raw=true" style="float: right" width="500px">

Our chatbot will be using Meta's Llama open source model. However, it could be utilized with DBRX (_pictured_), or any other LLMs served on Databricks.  

Other types of models that could be utilized include:

- Databricks Foundation models (_what we will use by default in this demo_)
- Your organization's custom, fine-tuned model
- An external model provider (_such as Azure OpenAI_)


In [0]:
from langchain_core.prompts import ChatPromptTemplate
from databricks_langchain.chat_models import ChatDatabricks
from operator import itemgetter

prompt = ChatPromptTemplate.from_messages(
    [  
        ("system", model_config.get("llm_prompt_template")), # Contains the instructions from the configuration
        ("user", "{question}") #user's questions
    ]
)

# Our foundation model answering the final prompt
model = ChatDatabricks(
    endpoint=model_config.get("llm_model_serving_endpoint_name"),
    extra_params={"temperature": 0.01, "max_tokens": 1500}
)

#Let's try our prompt:
answer = (prompt | model | StrOutputParser()).invoke({'question':'Quem foi José Siqueira?', 'context': ''})
display_txt_as_html(answer)

Trace(request_id=tr-1a158b12385448db92e4b5d8a27e8a99)


## Putting it together in a final chain, supporting the standard Chat Completion format

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/rag-basic-chain-4.png?raw=true" style="float: right" width="500px">


Let's now merge the retriever and the model in a single Langchain chain.

We will use a custom langchain template for our assistant to give proper answer.

We will make sure our chain support the standard Chat Completion API input schema : `{"messages": [{"role": "user", "content": "What is Retrieval-augmented Generation?"}]}`

Make sure you take some time to try different templates and adjust your assistant tone and personality for your requirement.

*Note that we won't support history in this first version, and will only take the last message as the question. See the advanced demo for a more complete example.*

In [0]:
# Return the string contents of the most recent messages: [{...}] from the user to be used as input question
def extract_user_query_string(chat_messages_array):
    return chat_messages_array[-1]["content"]

# RAG Chain
chain = (
    {
        "question": itemgetter("messages") | RunnableLambda(extract_user_query_string),
        "context": itemgetter("messages")
        | RunnableLambda(extract_user_query_string)
        | vector_search_as_retriever
        | RunnableLambda(format_context),
    }
    | prompt
    | model
    | StrOutputParser()
)


#### Databricks will track all the chain for you

<img src="https://ai-cookbook.io/_images/mlflow_trace2.gif" width="600px" style="float: right; margin-left: 10px">

As you can see in the cell result below, Databricks automatically trace the chain call. 

This makes it super easy to debug and improve your chain!

In [0]:
# Let's give it a try:
input_example = {"messages": [ {"role": "user", "content": "Qual é o nome do Tiririca?"}]}
answer = chain.invoke(input_example)
print(answer)

 Francisco Everardo Oliveira Silva


Trace(request_id=tr-1fb9506d918e44318793aad15b848fc6)

## Deploy a RAG Chain to a web-based UI for stakeholder feedback

Our chain is now ready! 

Let's first register the Rag Chain model to MLFlow and Unity Catalog, and then use Agent Framework to deploy to the Agent Evaluation stakeholder review application which is backed by a scalable, production-ready Model Serving endpoint.

In [0]:
from mlflow.models.resources import DatabricksVectorSearchIndex, DatabricksServingEndpoint
# Log the model to MLflow
with mlflow.start_run(run_name="dhbb_rag_bot_mixtral"):
  logged_chain_info = mlflow.langchain.log_model(
          #Note: In classical ML, MLflow works by serializing the model object.  In generative AI, chains often include Python packages that do not serialize.  Here, we use MLflow's new code-based logging, where we saved our chain under the chain notebook and will use this code instead of trying to serialize the object.
          lc_model=os.path.join(os.getcwd(), 'chain'),  # Chain code file e.g., /path/to/the/chain.py 
          model_config=chain_config, # Chain configuration 
          artifact_path="chain", # Required by MLflow, the chain's code/config are saved in this directory
          input_example=input_example,
          example_no_conversion=True,  # Required by MLflow to use the input_example as the chain's schema,
          # Specify resources for automatic authentication passthrough
          resources=[
            DatabricksVectorSearchIndex(index_name=model_config.get("vector_search_index")),
            DatabricksServingEndpoint(endpoint_name=model_config.get("llm_model_serving_endpoint_name"))
          ]
      )

MODEL_NAME = "dhbb_rag_mixtral"
MODEL_NAME_FQN = f"dhbb.bronze.{MODEL_NAME}"
# Register to UC
uc_registered_model_info = mlflow.register_model(model_uri=logged_chain_info.model_uri, name=MODEL_NAME_FQN)

[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().
[NOTICE] Using a notebook authentication token. Recommended for development only. For improved performance, please use Service Principal based authentication. To disable this message, pass disable_notice=True to VectorSearchClient().


Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Successfully registered model 'dhbb.bronze.dhbb_rag_mixtral'.


Downloading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Uploading artifacts:   0%|          | 0/12 [00:00<?, ?it/s]

Created version '1' of model 'dhbb.bronze.dhbb_rag_mixtral'.


Let's now deploy the Mosaic AI **Agent Evaluation review application** using the model we just created!

In [0]:
def wait_for_model_serving_endpoint_to_be_ready(ep_name):
    from databricks.sdk import WorkspaceClient
    from databricks.sdk.service.serving import EndpointStateReady, EndpointStateConfigUpdate
    import time

    # TODO make the endpoint name as a param
    # Wait for it to be ready
    w = WorkspaceClient()
    state = ""
    for i in range(200):
        state = w.serving_endpoints.get(ep_name).state
        if state.config_update == EndpointStateConfigUpdate.IN_PROGRESS:
            if i % 40 == 0:
                print(f"Waiting for endpoint to deploy {ep_name}. Current state: {state}")
            time.sleep(10)
        elif state.ready == EndpointStateReady.READY:
          print('endpoint ready.')
          return
        else:
          break
    raise Exception(f"Couldn't start the endpoint, timeout, please check your endpoint for more details: {state}")

In [0]:
from databricks import agents
# Deploy to enable the Review APP and create an API endpoint
# Note: scaling down to zero will provide unexpected behavior for the chat app. Set it to false for a prod-ready application.
deployment_info = agents.deploy(MODEL_NAME_FQN, model_version=uc_registered_model_info.version, scale_to_zero=True)

instructions_to_reviewer = f"""## Instruções para testar o chat bot para fazer perguntas aos verbetes disponíveis no DHBB via Mixtral 8x7b

Suas contribuições são inestimáveis ​​para a equipe de desenvolvimento do hackathon ChatFGV. Ao fornecer feedback e correções detalhadas, você nos ajuda a corrigir problemas e melhorar a qualidade geral do aplicativo. Contamos com sua experiência para identificar quaisquer lacunas ou áreas que precisem de aprimoramento."""

# Add the user-facing instructions to the Review App
agents.set_review_instructions(MODEL_NAME_FQN, instructions_to_reviewer)

wait_for_model_serving_endpoint_to_be_ready(deployment_info.endpoint_name)



Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]



Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]


    Deployment of dhbb.bronze.dhbb_rag_mixtral version 1 initiated.  This can take up to 15 minutes and the Review App & Query Endpoint will not work until this deployment finishes.

    View status: https://fgv-pocs-genie.cloud.databricks.com/ml/endpoints/agents_dhbb-bronze-dhbb_rag_mixtral
    Review App: https://fgv-pocs-genie.cloud.databricks.com/ml/review/dhbb.bronze.dhbb_rag_mixtral/1?o=4008450564360146
    Monitoring guide: https://docs.databricks.com/en/generative-ai/agent-evaluation/evaluating-production-traffic.html?utm_source=agents.deploy&utm_medium=python-sdk
    
Waiting for endpoint to deploy agents_dhbb-bronze-dhbb_rag_mixtral. Current state: EndpointState(config_update=<EndpointStateConfigUpdate.IN_PROGRESS: 'IN_PROGRESS'>, ready=<EndpointStateReady.NOT_READY: 'NOT_READY'>)
Waiting for endpoint to deploy agents_dhbb-bronze-dhbb_rag_mixtral. Current state: EndpointState(config_update=<EndpointStateConfigUpdate.IN_PROGRESS: 'IN_PROGRESS'>, ready=<EndpointStateReady.NOT_

# Use the Mosaic AI Agent Evaluation to evaluate your RAG applications

## Chat with your bot and build your validation dataset!

Our Chat Bot is now live. Databricks provides a built-in chatbot application that you can use to test the chatbot and give feedbacks on its answer.

You can easily give access to external domain experts and have them test and review the bot.  **Your domain experts do NOT need to have Databricks Workspace access** - you can assign permissions to any user in your SSO if you have enabled [SCIM](https://docs.databricks.com/en/admin/users-groups/scim/index.html)

This is a critical step to build or improve your evaluation dataset: have users ask questions to your bot, and provide the bot with output answer when they don't answer properly.

Your Chatbot is automatically capturing all stakeholder questions and bot responses, including an MLflow trace for each, into Delta Tables in your Lakehouse. On top of that, Databricks makes it easy to track feedback from your end user: if the chatbot doesn't give a good answer and the user gives a thumbdown, their feedback is included in the Delta Tables.

Once your eval dataset is ready, you'll then be able to leverage it for offline evaluation to measure your new chatbot performance, and also potentially to Fine Tune your model.
<br/>

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/eval-framework.gif?raw=true" width="1000px">


In [0]:
print(f"\n\nReview App URL to share with your stakeholders: {deployment_info.review_app_url}")



Review App URL to share with your stakeholders: https://fgv-pocs-genie.cloud.databricks.com/ml/review/dhbb.bronze.dhbb_rag_mixtral/1?o=4008450564360146
