## 👉 START HERE: How to use this notebook

# Step 3: Build, evaluate, & deploy your Agent

Use this notebook to iterate on the code and configuration of your Agent.

By the end of this notebook, you will have 1+ registered versions of your Agent, each coupled with a detailed quality evaluation.

Optionally, you can deploy a version of your Agent that you can interact with in the [Mosiac AI Playground](https://docs.databricks.com/en/large-language-models/ai-playground.html) and let your business stakeholders who don't have Databricks accounts interact with it & provide feedback in the [Review App](https://docs.databricks.com/en/generative-ai/agent-evaluation/human-evaluation.html#review-app-ui).


For each version of your agent, you will have an MLflow run inside your MLflow experiment that contains:
- Your Agent's code & config
- Evaluation metrics for cost, quality, and latency


**Important note:** Throughout this notebook, we indicate which cell's code you:
- ✅✏️ should customize - these cells contain code & config with business logic that you should edit to meet your requirements & tune quality.
- 🚫✏️ should not customize - these cells contain boilerplate code required to load/save/execute your Agent

*Cells that don't require customization still need to be run!  You CAN change these cells, but if this is the first time using this notebook, we suggest not doing so.*

### 🚫✏️ Install Python libraries

You do not need to modify this cell unless you need additional Python packages in your Agent.

In [0]:
%pip install -qqqq -U -r requirements.txt
# Restart to load the packages into the Python environment
dbutils.library.restartPython()

### 🚫✏️ Connect to Databricks

If running locally in an IDE using Databricks Connect, connect the Spark client & configure MLflow to use Databricks Managed MLflow.  If this running in a Databricks Notebook, these values are already set.

In [0]:
from mlflow.utils import databricks_utils as du

if not du.is_in_databricks_notebook():
    from databricks.connect import DatabricksSession
    import os

    spark = DatabricksSession.builder.getOrCreate()
    os.environ["MLFLOW_TRACKING_URI"] = "databricks"

### 🚫✏️ Load the Agent's UC storage locations; set up MLflow experiment

This notebook uses the UC model, MLflow Experiment, and Evaluation Set that you specified in the [Agent setup](02_agent_setup.ipynb) notebook.

In [0]:
import os
import yaml
from pathlib import Path
import mlflow 
from box import Box
from cookbook.databricks_utils import get_table_url
from cookbook.databricks_utils import get_mlflow_experiment_url

# Load the Agent's storage configuration
agent_storage_config = Box(yaml.safe_load(Path("./configs/agent_storage_config.yaml").read_text()))
print(agent_storage_config)

# set the MLflow experiment
experiment_info = mlflow.set_experiment(agent_storage_config.mlflow_experiment_name)
# If running in a local IDE, set the MLflow experiment name as an environment variable
os.environ["MLFLOW_EXPERIMENT_NAME"] = agent_storage_config.mlflow_experiment_name

print(f"View the MLflow Experiment `{agent_storage_config.mlflow_experiment_name}` at {get_mlflow_experiment_url(experiment_info.experiment_id)}")

### 🚫✏️ Helper method to log the Agent's code & config to MLflow

Before we start, let's define a helper method to log the Agent's code & config to MLflow.  We will use this to log the agent's code & config to MLflow & the Unity Catalog.  It is used in evaluation & for deploying to Agent Evaluation's [Review App](https://docs.databricks.com/en/generative-ai/agent-evaluation/human-evaluation.html#review-app-ui) (a chat UI for your stakeholders to test this agent) and later, deplying the Agent to production.

In [0]:
import mlflow
from mlflow.types.llm import CHAT_MODEL_INPUT_SCHEMA
from mlflow.models.rag_signatures import StringResponse, ChatCompletionResponse
from cookbook.agents.utils.signatures import STRING_RESPONSE_WITH_MESSAGES
from mlflow.models.signature import ModelSignature


# This helper will log the Agent's code & config to an MLflow run and return the logged model's URI
# If run from inside a mlfow.start_run() block, it will log to that run, otherwise it will log to a new run.
# This logged Agent is ready for deployment, so if you are happy with your evaluation, it is ready to deploy!
def log_function_calling_agent_to_mlflow(agent_config):
    from cookbook.agents.function_calling_agent import get_resource_dependencies

    # Get the agent's code path from the imported Agent class
    agent_code_path = f"{os.getcwd()}/cookbook/agents/function_calling_agent.py"

    # Get the pip requirements from the requirements.txt file
    with open("requirements.txt", "r") as file:
        pip_requirements = [line.strip() for line in file.readlines()] + [
            "pyspark"
        ]  # manually add pyspark

    logged_agent_info = mlflow.langchain.log_model(
        agent_code_path,
        artifact_path="agent",
        input_example=agent_config.input_example,
        model_config=agent_config.to_dict(),
        resources=get_resource_dependencies(
            agent_config
        ),  # This allows the agents.deploy() command to securely provision credentials for the Agent's databricks resources e.g., vector index, model serving endpoints, etc
        signature=ModelSignature(
            inputs=CHAT_MODEL_INPUT_SCHEMA,
            # outputs=STRING_RESPONSE_WITH_MESSAGES #TODO: replace with MLflow signature
            outputs=ChatCompletionResponse(),
        ),
        code_paths=[os.path.join(os.getcwd(), "cookbook")],
        pip_requirements=pip_requirements,
    )

    return logged_agent_info


## 1️⃣ Iterate on the Agent's code & config to improve quality

The below cells are used to execute your inner dev loop to improve the Agent's quality.

We suggest the following process:
1. Vibe check the Agent for 5 - 10 queries to verify it works
2. Make any necessary changes to the code/config
3. Use Agent Evaluation to evaluate the Agent using your evaluation set, which will provide a quality assessment & identify the root causes of any quality issues
4. Based on that evaluation, make & test changes to the code/config to improve quality
5. 🔁 Repeat steps 3 and 4 until you are satisified with the Agent's quality
6. Deploy the Agent to Agent Evaluation's [Review App](https://docs.databricks.com/en/generative-ai/agent-evaluation/human-evaluation.html#review-app-ui) for pre-production testing
7. Use the following notebooks to review that feedback (optionally adding new records to your evaluation set) & identify any further quality issues
8. 🔁 Repeat steps 3 and 4 to fix any issues identified in step 7
9. Deploy the Agent to a production-ready REST API endpoint (using the same cells in this notebook as step 6)


#### ✅✏️ Optionally, adjust the Agent's code

Here, we import the Agent's code so we can run the Agent locally within the notebook.  To modify the code, open the Agent's code file in a separate window, enable reload, make your changes, and re-run this cell.

**Typically, when building the first version of your agent, we suggest first trying to tune the configuration (prompts, etc) to improve quality.  If you need more control to fix quality issues, you can then modify the Agent's code.**

In [0]:
from cookbook.agents.function_calling_agent import create_function_calling_agent
import inspect

# Print the Agent code for inspection
print(inspect.getsource(create_function_calling_agent))

In [0]:
%load_ext autoreload
%autoreload 3

#### ✅✏️ 🅰 Vibe check the Agent for a single query

Running this cell will produce an MLflow Trace that you can use to see the Agent's outputs and understand the steps it took to produce that output.

If you are running in a local IDE, browse to the MLflow Experiment page to view the Trace (link to the Experiment UI is at the top of this notebook).  If running in a Databricks Notebook, your trace will appear inline below.

In [0]:
import os
import yaml
from pathlib import Path
from box import Box
from cookbook.databricks_utils import get_mlflow_experiment_traces_url
from cookbook.agents.function_calling_agent import create_function_calling_agent


import os 
import os.path as path
from box import Box

agent_conf = Box(yaml.safe_load(Path("configs/function_calling_agent_config.yaml").read_text()))

print(agent_conf)

# Load the Agent's code with the above configuration
agent = create_function_calling_agent(agent_config=agent_conf)

# Vibe check the Agent for a single query
output = agent.invoke(input={"messages": [{"role": "user", "content": "How does the blender work?"}]})
# output = agent.predict(model_input={"messages": [{"role": "user", "content": "Translate the sku `OLD-abs-1234` to the new format"}]})

print(f"View the MLflow Traces at {get_mlflow_experiment_traces_url(experiment_info.experiment_id)}")
print(f"Agent's final response:\n----\n{output['choices'][-1]['message']['content']}\n----")
print()
# print(f"Agent's full message history (useful for debugging):\n----\n{json.dumps(output['messages'], indent=2)}\n----")


Now, let's test a multi-turn conversation with the Agent.

In [0]:
second_turn = {'messages': [output['choices'][-1]['message']['content']] + [{"role": "user", "content": "How do I turn it on?"}]}

# Run the Agent again with the same input to continue the conversation
second_turn_output = agent.invoke(input=second_turn)

print(f"View the MLflow Traces at {get_mlflow_experiment_traces_url(experiment_info.experiment_id)}")
print(f"Agent's final response:\n----\n{second_turn_output['choices'][-1]['message']['content']}\n----")
print()
#print(f"Agent's full message history (useful for debugging):\n----\n{json.dumps(second_turn_output['messages'], indent=2)}\n----")

#### ✅✏️ 🅱 Evaluate the Agent using your evaluation set

Note: If you do not have an evaluation set, you can create a synthetic evaluation set by using the 03_synthetic_evaluation notebook.

In [0]:
evaluation_set = spark.table(agent_storage_config.evaluation_set_uc_table)

with mlflow.start_run():
    logged_agent_info = log_function_calling_agent_to_mlflow(agent_conf)

    # Run the agent for these queries, using Agent evaluation to parallelize the calls
    eval_results = mlflow.evaluate(
        model=logged_agent_info.model_uri,  # use the MLflow logged Agent
        data=evaluation_set,  # Evaluate the Agent for every row of the evaluation set
        model_type="databricks-agent",  # use Agent Evaluation
    )

    # Show all outputs.  Click on a row in this table to display the MLflow Trace.
    display(eval_results.tables["eval_results"])

    # Click 'View Evaluation Results' to see the Agent's inputs/outputs + quality evaluation displayed in a UI

## 2️⃣ Deploy a version of your Agent - either to the Review App or Production

Once you have a version of your Agent that has sufficient quality, you will register the Agent's model from the MLflow Experiment into the Unity Catalog & use Agent Framework's `agents.deploy(...)` command to deploy it.  Note these steps are the same for deploying to pre-production (e.g., the [Review App](https://docs.databricks.com/en/generative-ai/agent-evaluation/human-evaluation.html#review-app-ui) or production.

By the end of this step, you will have deployed a version of your Agent that you can interact with and share with your business stakeholders for feedback, even if they don't have access to your Databricks workspace:

1. A production-ready scalable REST API deployed as a Model Serving endpoint that logged every request/request/MLflow Trace to a Delta Table.
    - REST API for querying the Agent
    - REST API for sending user feedback from your UI to the Agent
2. Agent Evaluation's [Review App](https://docs.databricks.com/en/generative-ai/agent-evaluation/human-evaluation.html#review-app-ui) connected to these endpoints.
3. [Mosiac AI Playground](https://docs.databricks.com/en/large-language-models/ai-playground.html) connected to these endpoints.

Option 1: Deploy the last agent you logged above

In [0]:
from databricks import agents

# Use Unity Catalog as the model registry
mlflow.set_registry_uri("databricks-uc")

# Register the Agent's model to the Unity Catalog
uc_registered_model_info = mlflow.register_model(
    model_uri=logged_agent_info.model_uri, name=agent_storage_config.uc_model_name
)

# Deploy the model to the review app and a model serving endpoint
agents.deploy(agent_storage_config.uc_model_name, uc_registered_model_info.version)