# Lab 2: Build an HR Q&A Agent with Pro-Code

## Our agent is composed of:

- [**agent.py**]($./agent.py): in this file, we used Langchain to prepare an agent ready to be used.
- [**agent_config.yaml**]($./agent_config.yaml): this file contains our agent configuration, including the system prompt and the LLM endpoint that we'll use

In [0]:
%pip install -U -qqqq mlflow>=3.1.1 langchain langgraph databricks-langchain pydantic databricks-agents unitycatalog-langchain[databricks] uv databricks-feature-engineering==0.12.1
dbutils.library.restartPython()

In [0]:
%run ./resources/utils


## 1. Extracting the PDF information
Databricks provides a builtin `ai_parse_document` function, leveraging AI to analyze and extract PDF information as text. This makes it super easy to ingest unstructured information!

In [0]:
%sql
-- The path of the files depends on the volume that you use in the lab 1.
SELECT path FROM READ_FILES('/Volumes/wsdb_demos/agent_lab/hr_documents_volume', format => 'binaryFile') 

In [0]:
%sql
SELECT ai_parse_document(content) AS parsed_document
  FROM READ_FILES('/Volumes/wsdb_demos/agent_lab/hr_documents_volume', format => 'binaryFile') limit 2

## 1.1/ Create our knowledge base table

Let's first create our table. We'll enable Change Data Feed so that we can create our vector search on top of it.

In [0]:
%sql
-- Change the catalog and schema to match your environment
USE CATALOG wsdb_demos;
USE SCHEMA agent_lab;
CREATE TABLE IF NOT EXISTS knowledge_base (
  id BIGINT GENERATED ALWAYS AS IDENTITY,
  product_name STRING,
  title STRING,
  content STRING,
  doc_uri STRING)
  TBLPROPERTIES (delta.enableChangeDataFeed = true);

## 1.2/ PDF to text with ai_parse_document

Let's now use Databricks built in `ai_parse_document` function to automatically parse the PDF document for us, making it super easy to extract the information!

*Note: in this case, we have relatively small pdf documents, so we'll merge all the pages of the document in one single text field for our RAG system to work properly. Bigger docs might need some pre-processing steps to potentially reduce context size and be able to search/retreive more documents, adding potential pre-processing steps, for example ensuring the WIFI Router model is present in all the chunk to keep the vector search more relevant.*

In [0]:
%sql
-- Change the catalog and schema to match your environment
USE CATALOG wsdb_demos;
USE SCHEMA agent_lab;
INSERT OVERWRITE TABLE knowledge_base (product_name, title, content, doc_uri)
SELECT ai_extract.product_name, ai_extract.title, content, doc_uri
FROM (
  SELECT
    ai_extract(content, array('product_name', 'title')) AS ai_extract,
    content,
    doc_uri
  FROM (
    SELECT array_join(
            transform(parsed_document:document.elements::ARRAY<STRUCT<content:STRING>>, x -> x.content), '\n') AS content,
           path as doc_uri
    FROM (
      SELECT ai_parse_document(content) AS parsed_document, path
      FROM READ_FILES('/Volumes/wsdb_demos/agent_lab/hr_documents_volume', format => 'binaryFile') 
    )
  )
);

In [0]:
%sql
USE CATALOG wsdb_demos;
USE SCHEMA agent_lab;
SELECT * FROM knowledge_base;

## 2/ Create our vector search table

### 2.1/ Vector search Endpoints

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/rag-basic-prep-2.png?raw=true" style="float: right; margin-left: 10px" width="400px">

Vector search endpoints are entities where your indexes will live. Think about them as entry point to handle your search request. 

Let's start by creating our first Vector Search endpoint. Once created, you can view it in the [Vector Search Endpoints UI](#/setting/clusters/vector-search). Click on the endpoint name to see all indexes that are served by the endpoint.

In [0]:
from databricks.vector_search.client import VectorSearchClient
vsc = VectorSearchClient(disable_notice=True)

if not endpoint_exists(vsc, VECTOR_SEARCH_ENDPOINT_NAME):
    vsc.create_endpoint(name=VECTOR_SEARCH_ENDPOINT_NAME, endpoint_type="STANDARD")

wait_for_vs_endpoint_to_be_ready(vsc, VECTOR_SEARCH_ENDPOINT_NAME)
print(f"Endpoint named {VECTOR_SEARCH_ENDPOINT_NAME} is ready.")

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/rag-basic-prep-3.png?raw=true" style="float: right; margin-left: 10px" width="400px">


### 2.2/ Creating the Vector Search Index

Once the endpoint is created, all we now have to do is to as Databricks to create the index on top of the existing table. 

You just need to specify the text column and our embedding foundation model (`GTE`).  Databricks will build and synchronize the index automatically for us.

Note that Databricks provides 3 type of vector search:

* **Managed embeddings**: Databricks creates the embeddings for you from a text field and Databricks synchronize the Delta table to your index (what we'll use)
* **Self managed embeddings**: You compute the embeddings yourself and save them to your Delta table  and Databricks synchronize the Delta table to your index
* **Direct access**: you manage the VS indexation yourself (no Delta table)

This can be done using the API, or in a few clicks within the Unity Catalog Explorer menu:

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/index_creation.gif?raw=true" width="600px">

In [0]:
from databricks.sdk import WorkspaceClient

#The table we'd like to index
source_table_fullname = f"{catalog}.{dbName}.knowledge_base"
# Where we want to store our index
vs_index_fullname = f"{catalog}.{dbName}.knowledge_base_vs_index"

if not index_exists(vsc, VECTOR_SEARCH_ENDPOINT_NAME, vs_index_fullname):
  print(f"Creating index {vs_index_fullname} on endpoint {VECTOR_SEARCH_ENDPOINT_NAME}...")
  vsc.create_delta_sync_index(
    endpoint_name=VECTOR_SEARCH_ENDPOINT_NAME,
    index_name=vs_index_fullname,
    source_table_name=source_table_fullname,
    pipeline_type="TRIGGERED",
    primary_key="id",
    embedding_source_column='content', #The column containing our text
    embedding_model_endpoint_name='databricks-gte-large-en' #The embedding endpoint used to create the embeddings
  )
  #Let's wait for the index to be ready and all our embeddings to be created and indexed
  wait_for_index_to_be_ready(vsc, VECTOR_SEARCH_ENDPOINT_NAME, vs_index_fullname)
else:
  #Trigger a sync to update our vs content with the new data saved in the table
  wait_for_index_to_be_ready(vsc, VECTOR_SEARCH_ENDPOINT_NAME, vs_index_fullname)
  vsc.get_index(VECTOR_SEARCH_ENDPOINT_NAME, vs_index_fullname).sync()

print(f"index {vs_index_fullname} on table {source_table_fullname} is ready")

## 2.3/ Try our VS index: searching for relevant content

That's all we have to do. Databricks will automatically capture and synchronize new entries in your table with the index.

Note that depending on your dataset size and model size, index creation can take a few seconds to start and index your embeddings.

Let's give it a try and search for similar content.

*Note: `similarity_search` also support a filters parameter. This is useful to add a security layer to your RAG system: you can filter out some sensitive content based on who is doing the call (for example filter on a specific department based on the user preference).*

In [0]:
question = "What does the Northwind Health Plus benefit plan cover?"

results = vsc.get_index(VECTOR_SEARCH_ENDPOINT_NAME, vs_index_fullname).similarity_search(
  query_text=question,
  columns=["id", "content"],
  num_results=1)
docs = results.get('result', {}).get('data_array', [])
docs

## 3/ Config our Agent

Now that our index is ready, let's configure our agent to use it as a retriever tool!

We'll reuse the `agent.py` and we reuse `agent_config.yaml` file: simply add the retriever configuration and our agent will add it as one of the tools available!

In [0]:
import mlflow
import yaml, sys, os
import mlflow.models

agent_eval_path = os.path.abspath(os.path.join(os.getcwd(), "./ai-agent"))
sys.path.append(agent_eval_path)

mlflow.set_experiment("/Shared/agent_evaluation")
conf_path = os.path.join(agent_eval_path, 'agent_config.yaml')

rag_chain_config = {
    "config_version_name": "model_with_retriever",
    "input_example": [
        {
            "content": "What does the Northwind Health Plus benefit plan cover?",
            "role": "user"
        }
    ],
    "llm_endpoint_name": "databricks-claude-3-7-sonnet",
    "max_history_messages": 20,
    "retriever_config": {
        "description": "Retrieves Contoso resource human documentation, including internal documentation about HR policies, employee resources, benefits, and related topics. Use this tool for any questions about Contoso human resources documentation or HR issues.",
        "index_name": f"{catalog}.{dbName}.knowledge_base_vs_index",
        "num_results": 1,
        "tool_name": "contoso_resource_human_docs_retriever"
    },
    "system_prompt": "You are a Resource Human assistant at Contoso. Answer user questions.",
    "uc_tool_names": [f"{catalog}.{dbName}.*"] 
}

# Load existing config if present, then merge/overwrite with new values
if os.path.exists(conf_path):
    with open(conf_path, "r") as f:
        existing_config = yaml.safe_load(f) or {}
    existing_config.update(rag_chain_config)
    config_to_write = existing_config
else:
    config_to_write = rag_chain_config

# Write (create or modify) the YAML configuration file
with open(conf_path, "w") as f:
    yaml.dump(config_to_write, f, sort_keys=False)

model_config = mlflow.models.ModelConfig(development_config=conf_path)

In [0]:
from agent import AGENT

#Let's try our retriever to make sure we know have access to the rh documents
request_example = "What does the Northwind Health Plus benefit plan cover?"
answer = AGENT.predict({"input":[{"role": "user", "content": request_example}]})
print(answer)


Now log the new agent in the MLflow model registry using `mlflow.pyfunc.log_model()`.

In [0]:
# Agent captures required resources for agent execution, note that it now has the VS index referenced
for r in AGENT.get_resources():
  print(f"Resource: {type(r).__name__}:{r.name}")

In [0]:
with mlflow.start_run(run_name=model_config.get('config_version_name')):
  logged_agent_info = mlflow.pyfunc.log_model(
    name="agent",
    python_model=agent_eval_path+"/agent.py",
    model_config=conf_path,
    input_example={"input": [{"role": "user", "content": request_example}]},
     # Determine resources (endpoints, fonctions, vs...) to specify for automatic auth passthrough for deployment
    resources=AGENT.get_resources(),
    extra_pip_requirements=["databricks-connect"]
    )

## 4/ Evaluate our agent against our documents base

Our new model is available! As usual, the next step is to evaluate our dataset to make sure we're improving our answers.


### 4.1/ Generate synthetic eval data

Note that our eval dataset doesn't have any entry on our PDF.

Using Databricks, it's easy to bootstrap our evaluation dataset with synthetic eval data, and then improve this dataset over time.

In [0]:
from databricks.agents.evals import generate_evals_df

docs = spark.table('knowledge_base')
# Describe what our agent is doing
agent_description = """
The Agent is a RAG chatbot that answers questions about Contoso's human resources documentation, including internal documentation about HR policies, employee resources, benefits, and related topics. The Agent has access to a corpus of HR Documents, and its task is to answer the user's questions by retrieving the relevant docs from the corpus and synthesizing a helpful, accurate response.
"""

question_guidelines = """
# User personas
- A Contoso employee asking about HR policies, benefits, or internal resources
- An HR representative seeking information on procedures or role responsibilities

# Example questions and answers
- Q: What does the Northwind Health Plus benefits plan cover?
  A: The Northwind Health Plus plan covers medical, dental, and vision care, prescription drugs, and mental health services.
- Q: What is the procedure for taking maternity leave?
  A: You must notify Human Resources at least 30 days before the expected date and provide the corresponding medical documentation.
- Q: What does the PerksPlus program include?
  A: PerksPlus includes discounts on gyms, wellness memberships, food vouchers, and financial support.
- Q: What are the responsibilities of a data analyst according to the role library?
  A: According to the Role Library, a data analyst is responsible for collecting, cleaning, and analyzing data to generate strategic business reports.
- Q: What are Northwind's standard benefits?
  A: Standard benefits include basic medical insurance, paid vacation, and a 401(k) retirement plan.

# Additional Guidelines
- Questions should be succinct, and human-like
"""

# Generate synthetic eval dataset
evals = generate_evals_df(
    docs,
    num_evals=10,
    agent_description=agent_description,
    question_guidelines=question_guidelines
)
evals["inputs"] = evals["inputs"].apply(lambda x: {"question": x["messages"][0]["content"]})
display(evals)


In [0]:
import mlflow
import mlflow.genai.datasets

eval_dataset_table_name = f"{catalog}.{dbName}.ai_agent_mlflow_eval"

try:
  eval_dataset = mlflow.genai.datasets.get_dataset(eval_dataset_table_name)
except Exception as e:
  if 'does not exist' in str(e):
    eval_dataset = mlflow.genai.datasets.create_dataset(eval_dataset_table_name)
    # Add your examples to the evaluation dataset
    eval_dataset.merge_records(evals)
    print("Added records to the evaluation dataset.")

# Preview the dataset
display(eval_dataset.to_df())

In [0]:
display(spark.sql(f"SELECT * FROM {catalog}.{dbName}.ai_agent_mlflow_eval"))

### 4.2/ Running our evaluation
As previously, let's run our evaluations using the MLFlow dataset. We'll make sure our model still behave properly on the customer-related question, and now perform well on our knowledge-base questions!

In [0]:
from mlflow.genai.scorers import RetrievalGroundedness, RelevanceToQuery, Safety, Guidelines
import pandas as pd

eval_dataset = mlflow.genai.datasets.get_dataset(f"{catalog}.{dbName}.ai_agent_mlflow_eval")

#Get the same scorers as previously (function is defined in _resources/01-setup, similar to the previous step)
scorers = get_scorers()

# Load the model and create a prediction function
loaded_model = mlflow.pyfunc.load_model(f"runs:/{logged_agent_info.run_id}/agent")
def predict_wrapper(question):
    # Format for chat-style models
    model_input = pd.DataFrame({
        "input": [[{"role": "user", "content": question}]]
    })
    response = loaded_model.predict(model_input)
    return response['output'][-1]['content'][-1]['text']
    
print("Running evaluation...")
with mlflow.start_run(run_name='eval_with_retriever'):
    results = mlflow.genai.evaluate(data=eval_dataset, predict_fn=predict_wrapper, scorers=scorers)

### 4.3/ Deploy the final model! 

We're good to go. Let's deploy our model to UC and update our endpoint with the latest version!

In [0]:
from mlflow import MlflowClient
MODEL_NAME = "ai_hr_agent_demo"
UC_MODEL_NAME = f"{catalog}.{dbName}.{MODEL_NAME}"

# register the model to UC
client = MlflowClient()
uc_registered_model_info = mlflow.register_model(model_uri=logged_agent_info.model_uri, name=UC_MODEL_NAME, tags={"model": "rh_support_agent", "model_version": "with_retriever"})

client.set_registered_model_alias(name=UC_MODEL_NAME, alias="model-to-deploy", version=uc_registered_model_info.version)
displayHTML(f'<a href="/explore/data/models/{catalog}/{dbName}/{MODEL_NAME}" target="_blank">Open Unity Catalog to see Registered Agent</a>')

In [0]:
from databricks import agents
# Deploy the model to the review app and a model serving endpoint
endpoint_name = f'{MODEL_NAME}_{catalog}_{db}'[:60]

if len(agents.get_deployments(model_name=UC_MODEL_NAME, model_version=uc_registered_model_info.version)) == 0:
  agents.deploy(UC_MODEL_NAME, uc_registered_model_info.version, endpoint_name=endpoint_name, tags = {"project": "rh_support_agent"})

## 5/ Deploying our frontend App with Lakehouse Applications

Now that our agent is ready, let's deploy a GradIO application to serve its content to our end users. 

Mosaic AI Agent Evaluation review app is used for collecting stakeholder feedback during your development process.

You still need to deploy your own front end application!

Let's leverage Databricks Lakehouse Applications to build and deploy our first, simple chatbot frontend app. 

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/rag-frontend-app.png?raw=true" width="1200px">


<div style="background-color: #d4e7ff; padding: 10px; border-radius: 15px;">
<strong>Note:</strong> In this example, we'll deploy the app using the endpoint. However, if the only use-case is the app itself, you can also directly package your MLFlow Chat Agent within your application, and remove the endpoint entirely!
</div>

<!-- Collect usage data (view). Remove it to disable collection or disable tracker during installation. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=data-science&org_id=4028599767852497&notebook=%2F04-deploy-app%2F04-Deploy-Frontend-Lakehouse-App&demo_name=ai-agent&event=VIEW&path=%2F_dbdemos%2Fdata-science%2Fai-agent%2F04-deploy-app%2F04-Deploy-Frontend-Lakehouse-App&version=1">

## Add your application configuration

Lakehouse apps allow you to work with any Python framework. For our demo, we'll create a simple configuration file containing the model serving endpoint name and save it as `chatbot_app/app.yaml`.

In [0]:
print(f"The Databricks APP will be using the following model serving endpoint: {ENDPOINT_NAME}")

In [0]:
import yaml

# Our frontend application will hit the model endpoint we deployed.
# Because dbdemos let you change your catalog and database, let's make sure we deploy the app with the proper endpoint name
yaml_app_config = {"command": ["uvicorn", "main:app", "--workers", "1"],
                    "env": [{"name": "MODEL_SERVING_ENDPOINT", "value": ENDPOINT_NAME}]
                  }
try:
    with open('chatbot_app/app.yaml', 'w') as f:
        yaml.dump(yaml_app_config, f)
except Exception as e:
    print(f'pass to work on build job - {e}')

## Capturing feedback through MLFlow Tracing and Feedback API

With MLFLow 3, it's now easy to directly capture feedback (thumb up/down) from your application!

```
client = mlflow.deployments.get_deploy_client("databricks")

input_message = [{"content": "test", "role": "user", "type": "message"}]

response = client.predict(
  endpoint=ENDPOINT_NAME,
  inputs={'input': input_message, "databricks_options": {
      # Return the trace so we can get the trace_id for logging feedback. (return only the id for faster results)
      "return_trace": True
    }}
)
```


Then, simply use the `mlflow-tracing` in your chatbot backend to send emit the trace with the user feedback:


```
mlflow.log_feedback(
                trace_id=trace_id, #the trace id present in the response, typically tr-xxxxx
                name='user_feedback',
                value=True if like_data.liked else False,
                rationale=None,
                source=mlflow.entities.AssessmentSource(source_type='HUMAN', source_id='user')
            )
```

*Note: you can also manage your own IDs - see the [feedback documentation](https://docs.databricks.com/aws/en/mlflow3/genai/tracing/collect-user-feedback/) for more details*

## Let's now create our chatbot application using Gradio using Databricks Applications

## Deploying our application

Our application is made of 2 files under the `chatbot_app` folder:
- `main.py` containing our python code
- `app.yaml` containing our configuration

All we now have to do is call the API to create a new app and deploy using the `chatbot_app` path:

In [0]:
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.apps import App, AppResource, AppResourceServingEndpoint, AppResourceServingEndpointServingEndpointPermission, AppDeployment

w = WorkspaceClient()
app_name = "rh-ai-agent-app"

Lakehouse apps come with an auto-provisioned Service Principal. Let's grant this Service Principal access to our model endpoint before deploying...

In [0]:
serving_endpoint = AppResourceServingEndpoint(name=ENDPOINT_NAME,
                                              permission=AppResourceServingEndpointServingEndpointPermission.CAN_QUERY
                                              )

rag_endpoint = AppResource(name="rag-endpoint", serving_endpoint=serving_endpoint) 

rag_app = App(name=app_name, 
              description="Your Databricks assistant", 
              default_source_code_path=os.path.join(os.getcwd(), 'chatbot_app'),
              resources=[rag_endpoint])
try:
  app_details = w.apps.create_and_wait(app=rag_app)
  print(app_details)
except Exception as e:
  if "already exists" in str(e):
    print("App already exists, you can deploy it")
  else:
    raise e


Once the app is created, we can (re)deploy the code as following:

In [0]:
import mlflow

mlflow.set_experiment("Shared/pdf-rag-tool")

In [0]:
deployment = AppDeployment(source_code_path=os.path.join(os.getcwd(), 'chatbot_app'))

app_details = w.apps.deploy_and_wait(app_name=app_name, app_deployment=deployment)

In [0]:
#Let's access the application
w.apps.get(name=app_name).url

## Your Lakehouse app is ready and deployed!

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/product/chatbot-rag/rag-gradio-app.png?raw=true" width="750px" style="float: right; margin-left:10px">

Open the UI to start requesting your chatbot.

As improvement, we could improve our chatbot UI to provide feedback and send it to Mosaic AI Quality Labs, so that bad answers can be reviewed and improved.

## Conclusion

We saw how Databricks provides an end to end platform: 
- Building and deploying an endpoint
- Buit-in solution to review, analyze and improve our chatbot
- Deploy front-end genAI application with lakehouse apps!