## üëâ START HERE: How to use this notebook

# Step 3: Build, evaluate, & deploy your Agent

Use this notebook to iterate on the code and configuration of your Agent.

By the end of this notebook, you will have 1+ registered versions of your Agent, each coupled with a detailed quality evaluation.

Optionally, you can deploy a version of your Agent that you can interact with in the [Mosiac AI Playground](https://docs.databricks.com/en/large-language-models/ai-playground.html) and let your business stakeholders who don't have Databricks accounts interact with it & provide feedback in the [Review App](https://docs.databricks.com/en/generative-ai/agent-evaluation/human-evaluation.html#review-app-ui).


For each version of your agent, you will have an MLflow run inside your MLflow experiment that contains:
- Your Agent's code & config
- Evaluation metrics for cost, quality, and latency


**Important note:** Throughout this notebook, we indicate which cell's code you:
- ‚úÖ‚úèÔ∏è should customize - these cells contain code & config with business logic that you should edit to meet your requirements & tune quality.
- üö´‚úèÔ∏è should not customize - these cells contain boilerplate code required to load/save/execute your Agent

*Cells that don't require customization still need to be run!  You CAN change these cells, but if this is the first time using this notebook, we suggest not doing so.*

### üö´‚úèÔ∏è Install Python libraries

You do not need to modify this cell unless you need additional Python packages in your Agent.

In [1]:
# %pip install -qqqq -U -r requirements.txt
# # Restart to load the packages into the Python environment
# dbutils.library.restartPython()

### üö´‚úèÔ∏è Connect to Databricks

If running locally in an IDE using Databricks Connect, connect the Spark client & configure MLflow to use Databricks Managed MLflow.  If this running in a Databricks Notebook, these values are already set.

In [1]:
from mlflow.utils import databricks_utils as du

if not du.is_in_databricks_notebook():
    from databricks.connect import DatabricksSession
    import os

    spark = DatabricksSession.builder.getOrCreate()
    os.environ["MLFLOW_TRACKING_URI"] = "databricks"

### üö´‚úèÔ∏è Load the Agent's UC storage locations; set up MLflow experiment

This notebook uses the UC model, MLflow Experiment, and Evaluation Set that you specified in the [Agent setup](02_agent_setup.ipynb) notebook.

In [2]:
from cookbook.config.shared.agent_storage_location import AgentStorageConfig
from cookbook.databricks_utils import get_mlflow_experiment_url
from cookbook.config import load_serializable_config_from_yaml_file
import mlflow 

# Load the Agent's storage locations
agent_storage_config: AgentStorageConfig= load_serializable_config_from_yaml_file("./configs/agent_storage_config.yaml")

# Show the Agent's storage locations
agent_storage_config.pretty_print()

# set the MLflow experiment
experiment_info = mlflow.set_experiment(agent_storage_config.mlflow_experiment_name)
# If running in a local IDE, set the MLflow experiment name as an environment variable
os.environ["MLFLOW_EXPERIMENT_NAME"] = agent_storage_config.mlflow_experiment_name

print(f"View the MLflow Experiment `{agent_storage_config.mlflow_experiment_name}` at {get_mlflow_experiment_url(experiment_info.experiment_id)}")

{
  "uc_model_name": "ep.cookbook_local_test.my_agent",
  "evaluation_set_uc_table": "ep.cookbook_local_test.my_agent_eval_set",
  "mlflow_experiment_name": "/Users/eric.peter@databricks.com/my_agent_mlflow_experiment",
  "class_path": "cookbook.config.shared.agent_storage_location.AgentStorageConfig"
}
View the MLflow Experiment `/Users/eric.peter@databricks.com/my_agent_mlflow_experiment` at https://e2-dogfood.staging.cloud.databricks.com/ml/experiments/3916415516852775


### üö´‚úèÔ∏è Helper method to log the Agent's code & config to MLflow

Before we start, let's define a helper method to log the Agent's code & config to MLflow.  We will use this to log the agent's code & config to MLflow & the Unity Catalog.  It is used in evaluation & for deploying to Agent Evaluation's [Review App](https://docs.databricks.com/en/generative-ai/agent-evaluation/human-evaluation.html#review-app-ui) (a chat UI for your stakeholders to test this agent) and later, deplying the Agent to production.

In [3]:

import mlflow
from mlflow.types.llm import CHAT_MODEL_INPUT_SCHEMA
from mlflow.models.rag_signatures import StringResponse
from cookbook.agents.utils.signatures import STRING_RESPONSE_WITH_MESSAGES
from mlflow.models.signature import ModelSignature
from cookbook.agents.function_calling_agent import FunctionCallingAgent
from cookbook.agents.function_calling_agent import FunctionCallingAgentConfig

# This helper will log the Agent's code & config to an MLflow run and return the logged model's URI
# If run from inside a mlfow.start_run() block, it will log to that run, otherwise it will log to a new run.
# This logged Agent is ready for deployment, so if you are happy with your evaluation, it is ready to deploy!
def log_agent_to_mlflow(agent_config: FunctionCallingAgentConfig):
    # Get the agent's code path from the imported Agent class
    agent_code_path = f"{os.getcwd()}/{FunctionCallingAgent.__module__.replace('.', '/')}.py"

    # Get the pip requirements from the requirements.txt file
    with open("requirements.txt", "r") as file:
        pip_requirements = [line.strip() for line in file.readlines()] + ["pyspark"] # manually add pyspark

    logged_agent_info = mlflow.pyfunc.log_model(
            artifact_path="agent",
            python_model=agent_code_path,
            input_example=agent_config.input_example,
            model_config=agent_config.model_dump(),
            resources=agent_config.get_resource_dependencies(), # This allows the agents.deploy() command to securely provision credentials for the Agent's databricks resources e.g., vector index, model serving endpoints, etc
            signature=ModelSignature(
            inputs=CHAT_MODEL_INPUT_SCHEMA,
            # outputs=STRING_RESPONSE_WITH_MESSAGES #TODO: replace with MLflow signature
            outputs=StringResponse()
        ),
        code_paths=[os.path.join(os.getcwd(), "cookbook")],
        pip_requirements=pip_requirements,
    )

    return logged_agent_info

# create tools

- we will store all tools in the `user_tools` folder
- first, create a local function & test it with pytest
- then, deploy it as a UC tool & test it with pytest
- then, add the tool to the Agent 

always reload the tool's code

In [6]:
%load_ext autoreload
%autoreload 3

talk about the need for google doc string

In [12]:
%%writefile tools/sample_tool.py

def sku_sample_translator(old_sku: str) -> str:
    """
    Translates a pre-2024 SKU formatted as "OLD-XXX-YYYY" to the new SKU format "NEW-YYYY-XXX".

    Args:
        old_sku (str): The old SKU in the format "OLD-XXX-YYYY".

    Returns:
        str: The new SKU in the format "NEW-YYYY-XXX".

    Raises:
        ValueError: If the SKU format is invalid, providing specific error details.
    """
    import re

    if not isinstance(old_sku, str):
        raise ValueError("SKU must be a string")

    # Normalize input by removing extra whitespace and converting to uppercase
    old_sku = old_sku.strip().upper()

    # Define the regex pattern for the old SKU format
    pattern = r"^OLD-([A-Z]{3})-(\d{4})$"

    # Match the old SKU against the pattern
    match = re.match(pattern, old_sku)
    if not match:
        if not old_sku.startswith("OLD-"):
            raise ValueError("SKU must start with 'OLD-'")
        if not re.match(r"^OLD-[A-Z]{3}-\d{4}$", old_sku):
            raise ValueError(
                "SKU format must be 'OLD-XXX-YYYY' where X is a letter and Y is a digit"
            )
        raise ValueError("Invalid SKU format")

    # Extract the letter code and numeric part
    letter_code, numeric_part = match.groups()

    # Additional validation for numeric part
    if not (1 <= int(numeric_part) <= 9999):
        raise ValueError("Numeric part must be between 0001 and 9999")

    # Construct the new SKU
    new_sku = f"NEW-{numeric_part}-{letter_code}"
    return new_sku


Overwriting tools/sample_tool.py


Now, let's import the tool and test it locally

In [13]:
from tools.sample_tool import sku_sample_translator

sku_sample_translator("OLD-XXX-1234")

'NEW-1234-XXX'

now, lets write some pyTest unit tests for the tool - these are just samples, you will need to write your own

In [14]:
%%writefile tools/test_sample_tool.py
import pytest
from tools.sample_tool import sku_sample_translator



def test_valid_sku_translation():
    """Test successful SKU translation with valid input."""
    assert sku_sample_translator("OLD-ABC-1234") == "NEW-1234-ABC"
    assert sku_sample_translator("OLD-XYZ-0001") == "NEW-0001-XYZ"
    assert sku_sample_translator("old-def-5678") == "NEW-5678-DEF"  # Test case insensitivity


def test_whitespace_handling():
    """Test that the function handles extra whitespace correctly."""
    assert sku_sample_translator("  OLD-ABC-1234  ") == "NEW-1234-ABC"
    assert sku_sample_translator("\tOLD-ABC-1234\n") == "NEW-1234-ABC"


def test_invalid_input_type():
    """Test that non-string inputs raise ValueError."""
    with pytest.raises(ValueError, match="SKU must be a string"):
        sku_sample_translator(123)
    with pytest.raises(ValueError, match="SKU must be a string"):
        sku_sample_translator(None)


def test_invalid_prefix():
    """Test that SKUs not starting with 'OLD-' raise ValueError."""
    with pytest.raises(ValueError, match="SKU must start with 'OLD-'"):
        sku_sample_translator("NEW-ABC-1234")
    with pytest.raises(ValueError, match="SKU must start with 'OLD-'"):
        sku_sample_translator("XXX-ABC-1234")


def test_invalid_format():
    """Test various invalid SKU formats."""
    invalid_skus = [
        "OLD-AB-1234",  # Too few letters
        "OLD-ABCD-1234",  # Too many letters
        "OLD-123-1234",  # Numbers instead of letters
        "OLD-ABC-123",  # Too few digits
        "OLD-ABC-12345",  # Too many digits
        "OLD-ABC-XXXX",  # Letters instead of numbers
        "OLD-A1C-1234",  # Mixed letters and numbers in middle
    ]

    for sku in invalid_skus:
        with pytest.raises(
            ValueError,
            match="SKU format must be 'OLD-XXX-YYYY' where X is a letter and Y is a digit",
        ):
            sku_sample_translator(sku)


Overwriting tools/test_sample_tool.py


now, lets run the tests

In [15]:
import pytest

# Run tests from test_sku_translator.py
pytest.main(["-v", "tools/test_sample_tool.py"])


platform darwin -- Python 3.11.10, pytest-8.3.3, pluggy-1.5.0 -- /Users/eric.peter/Library/Caches/pypoetry/virtualenvs/genai-cookbook-T2SdtsNM-py3.11/bin/python
cachedir: .pytest_cache
rootdir: /Users/eric.peter/Github/genai-cookbook/agent_app_sample_code
configfile: pyproject.toml
plugins: anyio-4.6.2.post1, typeguard-4.3.0
[1mcollecting ... [0mcollected 5 items

tools/test_sample_tool.py::test_valid_sku_translation [32mPASSED[0m[32m             [ 20%][0m
tools/test_sample_tool.py::test_whitespace_handling [32mPASSED[0m[32m               [ 40%][0m
tools/test_sample_tool.py::test_invalid_input_type [32mPASSED[0m[32m                [ 60%][0m
tools/test_sample_tool.py::test_invalid_prefix [32mPASSED[0m[32m                    [ 80%][0m
tools/test_sample_tool.py::test_invalid_format [32mPASSED[0m[32m                    [100%][0m



<ExitCode.OK: 0>

Now, lets deploy the tool to Unity catalog & wrap it into a UCTool that will be used by our Agent.  UC tool is just a Pydnatic base model that is serializable to YAML that will load the tool's metadata from UC and wrap it in a callable object.

In [20]:
from unitycatalog.ai.core.databricks import DatabricksFunctionClient
from tools.sample_tool import sku_sample_translator
from cookbook.tools.uc_tool import UCTool

client = DatabricksFunctionClient()
CATALOG = "ep"  # Change me!
SCHEMA = "cookbook_local_test"  # Change me if you want

# this will deploy the tool to UC, automatically setting the metadata in UC based on the tool's docstring & typing hints
tool_uc_info = client.create_python_function(func=sku_sample_translator, catalog=CATALOG, schema=SCHEMA, replace=True)

# the tool will deploy to a function in UC called `{catalog}.{schema}.{func}` where {func} is the name of the function
# Print the deployed Unity Catalog function name
print(f"Deployed Unity Catalog function name: {tool_uc_info.full_name}")

# wrap the tool into a UCTool which can be passed to our Agent
translate_sku_tool = UCTool(uc_function_name=tool_uc_info.full_name)

Deployed Unity Catalog function name: ep.cookbook_local_test.sku_sample_translator


Now, let's test the UC tool - the UCTool is a directly callable wrapper around the UC function, so it can be used just like a local function, but the output will be put into a dictionary with either the output in a 'value' key or an 'error' key if an error is raised.

when an error happens, the UC tool will also return an instruction prompt to show the agent how to think about handling the error.  this can be changed via the `error_prompt` parameter in the UCTool..


In [22]:
# successful call
translate_sku_tool(old_sku="OLD-XXX-1234")

{'error': None, 'format': 'SCALAR', 'value': 'NEW-1234-XXX', 'truncated': None}

In [23]:
# unsuccessful call
translate_sku_tool(old_sku="OxxLD-XXX-1234")

ERROR:root:Error parsing: 'stack', trying alternative approaches to parsing.


{'error': {'stack_trace': 'line 17, in main\n    raise ValueError("SKU must start with \'OLD-\'")',
  'error_message': "ValueError: SKU must start with 'OLD-'"},
 'error_instructions': 'The tool call generated an Exception, detailed in `error`. Think step-by-step following these instructions to determine your next step.\n[1] Is the error due to a problem with the input parameters?\n[2] Could it succeed if retried with exactly the same inputs?\n[3] Could it succeed if retried with modified parameters using the input we already have from the user?\n[4] Could it succeed if retried with modified parameters informed by collecting additional input from the user?  What specific input would we need from the user?\nBased on your thinking, if the error is due to a problem with the input parameters, either call this tool again in a way that avoids this exception or collect additional information from the user to modify the inputs to avoid this exception.'}

now, let's convert our pytests to work with the UC tool.  this requires a bit of transformation to the test code, so we will use a helper method to do this.  see example.

In [25]:
%%writefile tools/test_sample_tool_uc.py
import pytest
from cookbook.tools.uc_tool import UCTool


# Load the function from the UCTool versus locally
@pytest.fixture
def uc_tool():
    """Fixture to translate a UC tool into a local function."""
    UC_FUNCTION_NAME = "ep.cookbook_local_test.sku_sample_translator"
    loaded_tool = UCTool(uc_function_name=UC_FUNCTION_NAME)
    return loaded_tool


# Note: The value will be post processed into the `value` key, so we must check the returned value there.
def test_valid_sku_translation(uc_tool):
    """Test successful SKU translation with valid input."""
    assert uc_tool(old_sku="OLD-ABC-1234")["value"] == "NEW-1234-ABC"
    assert uc_tool(old_sku="OLD-XYZ-0001")["value"] == "NEW-0001-XYZ"
    assert (
        uc_tool(old_sku="old-def-5678")["value"] == "NEW-5678-DEF"
    )  # Test case insensitivity


# Note: The value will be post processed into the `value` key, so we must check the returned value there.
def test_whitespace_handling(uc_tool):
    """Test that the function handles extra whitespace correctly."""
    assert uc_tool(old_sku="  OLD-ABC-1234  ")["value"] == "NEW-1234-ABC"
    assert uc_tool(old_sku="\tOLD-ABC-1234\n")["value"] == "NEW-1234-ABC"


# Note: the input validation happens BEFORE the function is called by Spark, so we will never get these exceptions from the function.
# Instead, we will get invalid parameters errors from Spark.
def test_invalid_input_type(uc_tool):
    """Test that non-string inputs raise ValueError."""
    assert (
        uc_tool(old_sku=123)["error"]["error_message"]
        == """Invalid parameters provided: {'old_sku': "Parameter old_sku should be of type STRING (corresponding python type <class 'str'>), but got <class 'int'>"}."""
    )
    assert (
        uc_tool(old_sku=None)["error"]["error_message"]
        == """Invalid parameters provided: {'old_sku': "Parameter old_sku should be of type STRING (corresponding python type <class 'str'>), but got <class 'NoneType'>"}."""
    )


# Note: The errors will be post processed into the `error_message` key inside the `error` top level key, so we must check for exceptions there.
def test_invalid_prefix(uc_tool):
    """Test that SKUs not starting with 'OLD-' raise ValueError."""
    assert (
        uc_tool(old_sku="NEW-ABC-1234")["error"]["error_message"]
        == "ValueError: SKU must start with 'OLD-'"
    )
    assert (
        uc_tool(old_sku="XXX-ABC-1234")["error"]["error_message"]
        == "ValueError: SKU must start with 'OLD-'"
    )


# Note: The errors will be post processed into the `error_message` key inside the `error` top level key, so we must check for exceptions there.
def test_invalid_format(uc_tool):
    """Test various invalid SKU formats."""
    invalid_skus = [
        "OLD-AB-1234",  # Too few letters
        "OLD-ABCD-1234",  # Too many letters
        "OLD-123-1234",  # Numbers instead of letters
        "OLD-ABC-123",  # Too few digits
        "OLD-ABC-12345",  # Too many digits
        "OLD-ABC-XXXX",  # Letters instead of numbers
        "OLD-A1C-1234",  # Mixed letters and numbers in middle
    ]

    expected_error = "ValueError: SKU format must be 'OLD-XXX-YYYY' where X is a letter and Y is a digit"
    for sku in invalid_skus:
        assert uc_tool(old_sku=sku)["error"]["error_message"] == expected_error


Writing tools/test_sample_tool_uc.py


In [26]:
import pytest

# Run tests from test_sku_translator.py
pytest.main(["-v", "tools/test_sample_tool_uc.py"])


platform darwin -- Python 3.11.10, pytest-8.3.3, pluggy-1.5.0 -- /Users/eric.peter/Library/Caches/pypoetry/virtualenvs/genai-cookbook-T2SdtsNM-py3.11/bin/python
cachedir: .pytest_cache
rootdir: /Users/eric.peter/Github/genai-cookbook/agent_app_sample_code
configfile: pyproject.toml
plugins: anyio-4.6.2.post1, typeguard-4.3.0
[1mcollecting ... [0mcollected 5 items

tools/test_sample_tool_uc.py::test_valid_sku_translation [32mPASSED[0m[32m          [ 20%][0m
tools/test_sample_tool_uc.py::test_whitespace_handling [32mPASSED[0m[32m            [ 40%][0m
tools/test_sample_tool_uc.py::test_invalid_input_type [32mPASSED[0m[32m             [ 60%][0m
tools/test_sample_tool_uc.py::test_invalid_prefix 

ERROR:root:Error parsing: 'stack', trying alternative approaches to parsing.
ERROR:root:Error parsing: 'stack', trying alternative approaches to parsing.


[32mPASSED[0m[32m                 [ 80%][0m
tools/test_sample_tool_uc.py::test_invalid_format 

ERROR:root:Error parsing: 'stack', trying alternative approaches to parsing.
ERROR:root:Error parsing: 'stack', trying alternative approaches to parsing.
ERROR:root:Error parsing: 'stack', trying alternative approaches to parsing.
ERROR:root:Error parsing: 'stack', trying alternative approaches to parsing.
ERROR:root:Error parsing: 'stack', trying alternative approaches to parsing.
ERROR:root:Error parsing: 'stack', trying alternative approaches to parsing.
ERROR:root:Error parsing: 'stack', trying alternative approaches to parsing.


[32mPASSED[0m[32m                 [100%][0m



<ExitCode.OK: 0>


## 1Ô∏è‚É£ Iterate on the Agent's code & config to improve quality

The below cells are used to execute your inner dev loop to improve the Agent's quality.

We suggest the following process:
1. Vibe check the Agent for 5 - 10 queries to verify it works
2. Make any necessary changes to the code/config
3. Use Agent Evaluation to evaluate the Agent using your evaluation set, which will provide a quality assessment & identify the root causes of any quality issues
4. Based on that evaluation, make & test changes to the code/config to improve quality
5. üîÅ Repeat steps 3 and 4 until you are satisified with the Agent's quality
6. Deploy the Agent to Agent Evaluation's [Review App](https://docs.databricks.com/en/generative-ai/agent-evaluation/human-evaluation.html#review-app-ui) for pre-production testing
7. Use the following notebooks to review that feedback (optionally adding new records to your evaluation set) & identify any further quality issues
8. üîÅ Repeat steps 3 and 4 to fix any issues identified in step 7
9. Deploy the Agent to a production-ready REST API endpoint (using the same cells in this notebook as step 6)


In [4]:
from tools.sku_translator import translate_sku
from unitycatalog.ai.core.databricks import DatabricksFunctionClient

client = DatabricksFunctionClient()
CATALOG = "ep"  # Change me!
SCHEMA = "cookbook_local_test"  # Change me if you want
client.create_python_function(func=translate_sku, catalog=CATALOG, schema=SCHEMA, replace=True)

FunctionInfo(browse_only=None, catalog_name='ep', comment='Translates a pre-2024 SKU formatted as "OLD-XXX-YYYY" to the new SKU format "NEW-YYYY-XXX".', created_at=1731220624006, created_by='eric.peter@databricks.com', data_type=<ColumnTypeName.STRING: 'STRING'>, external_language='Python', external_name=None, full_data_type='STRING', full_name='ep.cookbook_local_test.translate_sku', function_id='1b81035a-035c-4cd9-b92f-78e97b953b40', input_params=FunctionParameterInfos(parameters=[FunctionParameterInfo(name='old_sku', type_text='string', type_name=<ColumnTypeName.STRING: 'STRING'>, position=0, comment='The old SKU in the format "OLD-XXX-YYYY".', parameter_default=None, parameter_mode=None, parameter_type=<FunctionParameterType.PARAM: 'PARAM'>, type_interval_type=None, type_json='{"name":"old_sku","type":"string","nullable":true,"metadata":{"comment":"The old SKU in the format \\"OLD-XXX-YYYY\\"."}}', type_precision=0, type_scale=0)]), is_deterministic=False, is_null_call=None, metasto

In [9]:
from cookbook.tools.uc_tool import UCTool

translate_sku_tool = UCTool(uc_function_name="ep.cookbook_local_test.translate_sku")
# translate_sku_tool(old_sku="OLD-XXX-1234")

translate_sku_tool(old_sku="NEW-ABC-1234")['error']



'Job aborted due to stage failure: Task 0 in stage 434.0 failed 4 times, most recent failure: Lost task 0.3 in stage 434.0 (TID 3133) (ip-10-68-129-107.us-west-2.compute.internal executor driver): org.apache.spark.SparkRuntimeException: [UDF_USER_CODE_ERROR.GENERIC] Execution of function ep.cookbook_local_test.translate_sku(NEW-ABC-1234) failed. \n== Error ==\nValueError: SKU must start with \'OLD-\'\n== Stacktrace ==\n  File "<udfbody>", line 17, in main\n    raise ValueError("SKU must start with \'OLD-\'") SQLSTATE: 39000\n== SQL (line 1, position 8) ==\nSELECT `ep`.`cookbook_local_test`.`translate_sku`(\'NEW-ABC-1234\')\n       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n\tat com.databricks.sql.execution.safespark.SafesparkErrorMessages$.createSparkRuntimeException(SafesparkErrorMessages.scala:131)\n\tat com.databricks.sql.execution.safespark.SafesparkErrorMessages$.convertToSparkRuntimeException(SafesparkErrorMessages.scala:84)\n\tat com.databricks.sql.execution.s

In [2]:
from tools.sku_translator import translate_sku
from cookbook.config import serializable_config_to_yaml_file

# translate_sku("OLD-XXX-1234")

from cookbook.tools.local_function import LocalFunctionTool
from tools.sku_translator import translate_sku

translate_sku_tool = LocalFunctionTool(func=translate_sku, name="xxx", description="Translates a pre-2024 SKU formatted as 'OLD-XXX-YYYY' to the new SKU format 'NEW-YYYY-XXX'.")
translate_sku_tool.model_dump()

# translate_sku_tool._input_schema

# translate_sku_tool(old_sku="OLD-XXX-1234")

serializable_config_to_yaml_file(translate_sku_tool, "./configs/local_fun.yaml")

translate_sku_tool._get_parameters_schema()


{'properties': {'old_sku': {'description': 'The old SKU in the format "OLD-XXX-YYYY".',
   'title': 'Old Sku',
   'type': 'string'}},
 'required': ['old_sku'],
 'title': 'Translate_SkuInputs',
 'type': 'object'}

In [2]:
from cookbook.config import load_serializable_config_from_yaml_file, serializable_config_to_yaml_file
test = load_serializable_config_from_yaml_file("./configs/local_fun.yaml")
test.model_dump()
# test._get_parameters_schema()
# serializable_config_to_yaml_file(test, "./configs/"+MULTI_AGENT_DEFAULT_YAML_CONFIG_FILE_NAME+"_loaded.yaml")


{'name': 'xxx',
 'description': "Translates a pre-2024 SKU formatted as 'OLD-XXX-YYYY' to the new SKU format 'NEW-YYYY-XXX'.",
 'func_path': 'tools.sku_translator.translate_sku',
 'class_path': 'cookbook.tools.local_function.LocalFunctionTool'}

In [7]:
import importlib

test = importlib.import_module("tools.sku_translator")
getattr(test, "translate_sku")

<function tools.sku_translator.translate_sku(old_sku: str) -> str>

In [5]:
from tools.sku_translator import translate_sku

translate_sku("OLD-XXX-1234")

from cookbook.tools.local_function import LocalFunctionTool
from tools.sku_translator import translate_sku

translate_sku_tool = LocalFunctionTool(func=translate_sku, name="translate_sku", description="Translates a pre-2024 SKU formatted as 'OLD-XXX-YYYY' to the new SKU format 'NEW-YYYY-XXX'.")
translate_sku_tool.model_dump()

f"{translate_sku.__module__}.{translate_sku.__name__}"

import importlib.util
import os

# Get the current working directory
cwd = os.getcwd()

# Construct the full path to the module file
module_path = os.path.join(cwd, "tools", "sku_translator.py")

# Create the spec
spec = importlib.util.spec_from_file_location("sku_translator", module_path)

# Create the module
sku_translator = importlib.util.module_from_spec(spec)

# Execute the module
spec.loader.exec_module(sku_translator)

# Get the translate_sku function
translate_sku = sku_translator.translate_sku


'tools.sku_translator.translate_sku'

In [9]:
module_name, func_name = "tools.sku_translator.translate_sku".rsplit(".", 1)

func_name

'translate_sku'

In [4]:
# Import Cookbook Agent configurations, which are Pydantic models
from cookbook.config import serializable_config_to_yaml_file
from cookbook.config.agents.function_calling_agent import (
    FunctionCallingAgentConfig,
)
from cookbook.config.data_pipeline import (
    DataPipelineConfig,
)
from cookbook.config.shared.llm import LLMConfig, LLMParametersConfig
from cookbook.config import load_serializable_config_from_yaml_file
from cookbook.tools.vector_search import (
    VectorSearchRetrieverTool,
    VectorSearchSchema,
)
import json
from cookbook.tools.uc_tool import UCTool


########################
# #### üö´‚úèÔ∏è Load the Vector Index Unity Cataloglocation from the data pipeline configuration
# Usage:
# - If you used `01_data_pipeline` to create your Vector Index, run this cell.
# - If your Vector Index was created elsewhere, comment out this logic and set the UC location in the Retriever config.
########################

data_pipeline_config: DataPipelineConfig = load_serializable_config_from_yaml_file(
    "./configs/data_pipeline_config.yaml"
)

########################
# #### ‚úÖ‚úèÔ∏è Retriever tool that connects to the Vector Search index
########################

retriever_tool = VectorSearchRetrieverTool(
    name="search_product_docs",
    description="Use this tool to search for product documentation.",
    vector_search_index=data_pipeline_config.output.vector_index,
    vector_search_schema=VectorSearchSchema(
        # These columns are the default values used in the `01_data_pipeline` notebook
        # If you used a different column names in that notebook OR you are using a pre-built vector index, update the column names here.
        chunk_text="content_chunked",  # Contains the text of each document chunk
        document_uri="doc_uri",  # The document URI of the chunk e.g., "/Volumes/catalog/schema/volume/file.pdf" - displayed as the document ID in the Review App
        additional_metadata_columns=[],  # Additional columns to return from the vector database and present to the LLM
    ),
    # Optional parameters, see VectorSearchRetrieverTool.__doc__ for details.  The default values are shown below.
    # doc_similarity_threshold=0.0,
    # vector_search_parameters=VectorSearchParameters(
    #     num_results=5,
    #     query_type="ann"
    # ),
    # Adding columns here will allow the Agent's LLM to dynamically apply filters based on the user's query.
    # filterable_columns=[]
)

########################
# #### ‚úÖ‚úèÔ∏è Add Unity Catalog tools to the Agent
########################

translate_sku_tool = UCTool(uc_function_name="ep.cookbook_local_test.translate_sku")

from cookbook.tools.local_function import LocalFunctionTool
from tools.sku_translator import translate_sku

########################
#### ‚úÖ‚úèÔ∏è Agent's LLM configuration
########################

system_prompt = """
## Role
You are a helpful assistant that answers questions using a set of tools. If needed, you ask the user follow-up questions to clarify their request.

## Objective
Your goal is to provide accurate, relevant, and helpful response based solely on the outputs from these tools. You are concise and direct in your responses.

## Instructions
1. **Understand the Query**: Think step by step to analyze the user's question and determine the core need or problem. 

2. **Assess available tools**: Think step by step to consider each available tool and understand their capabilities in the context of the user's query.

3. **Select the appropriate tool(s) OR ask follow up questions**: Based on your understanding of the query and the tool descriptions, decide which tool(s) should be used to generate a response. If you do not have enough information to use the available tools to answer the question, ask the user follow up questions to refine their request.  If you do not have a relevant tool for a question or the outputs of the tools are not helpful, respond with: "I'm sorry, I can't help you with that."
""".strip()

fc_agent_config = FunctionCallingAgentConfig(
    llm_config=LLMConfig(
        llm_endpoint_name="ep-gpt4o-new",  # Model serving endpoint w/ a Chat Completions API
        llm_system_prompt_template=system_prompt,  # System prompt template
        llm_parameters=LLMParametersConfig(
            temperature=0.01, max_tokens=1500
        ),  # LLM parameters
    ),
    # Add one or more tools that comply with the CookbookTool interface
    tools=[retriever_tool, translate_sku_tool],
)

# Print the configuration as a JSON string to see it all together
print(json.dumps(fc_agent_config.model_dump(), indent=4))

########################
##### Dump the configuration to a YAML
# Optional step, this allows the Agent's code file to be run by itself (e.g., outside of this notebook) using the above configuration.
########################
# Import the default YAML config file name from the Agent's code file
from cookbook.agents.function_calling_agent import FC_AGENT_DEFAULT_YAML_CONFIG_FILE_NAME

# Dump the configuration to a YAML file
serializable_config_to_yaml_file(fc_agent_config, "./configs/"+FC_AGENT_DEFAULT_YAML_CONFIG_FILE_NAME)

{
    "tools": [
        {
            "class_path": "cookbook.tools.vector_search.VectorSearchRetrieverTool",
            "description": "Use this tool to search for product documentation.",
            "doc_similarity_threshold": 0.0,
            "filterable_columns": [],
            "name": "search_product_docs",
            "retriever_filter_parameter_prompt": "optional filters to apply to the search. An array of objects, each specifying a field name and the filters to apply to that field.",
            "retriever_query_parameter_prompt": "query to look up in retriever",
            "vector_search_index": "ep.cookbook_local_test.product_docs_docs_chunked_index__v1",
            "vector_search_parameters": {
                "num_results": 5,
                "query_type": "ann"
            },
            "vector_search_schema": {
                "additional_metadata_columns": [],
                "chunk_text": "content_chunked",
                "document_uri": "doc_uri"
            }


#### ‚úÖ‚úèÔ∏è Optionally, adjust the Agent's code

Here, we import the Agent's code so we can run the Agent locally within the notebook.  To modify the code, open the Agent's code file in a separate window, enable reload, make your changes, and re-run this cell.

**Typically, when building the first version of your agent, we suggest first trying to tune the configuration (prompts, etc) to improve quality.  If you need more control to fix quality issues, you can then modify the Agent's code.**

In [6]:
from cookbook.agents.function_calling_agent import FunctionCallingAgent
import inspect

# Print the Agent code for inspection
print(inspect.getsource(FunctionCallingAgent))

class FunctionCallingAgent(mlflow.pyfunc.PythonModel):
    """
    Class representing an Agent that does function-calling with tools using OpenAI SDK
    """

    def load_context(self, context: PythonModelContext):
        # If context is not None, we are in the serving environment
        if context is not None:
            logging.info(
                f"load_context received context.model_config: {context.model_config}"
            )
            # we intentioanlly don't catch any errors here so the full logs show in model serving logs
            model_config_as_yaml = yaml.dump(context.model_config)
            self.agent_config = load_serializable_config_from_yaml(model_config_as_yaml)
            logging.info(
                f"Loaded config from context.model_config: {self.agent_config}"
            )

            if self.agent_config is None:
                # we failed, so let's try with mlflow.ModelConfig._read_config()
                model_config_as_yaml = yaml.dump(
     

In [7]:
%load_ext autoreload
%autoreload 2

#### ‚úÖ‚úèÔ∏è üÖ∞ Vibe check the Agent for a single query

Running this cell will produce an MLflow Trace that you can use to see the Agent's outputs and understand the steps it took to produce that output.

If you are running in a local IDE, browse to the MLflow Experiment page to view the Trace (link to the Experiment UI is at the top of this notebook).  If running in a Databricks Notebook, your trace will appear inline below.

In [8]:
from cookbook.databricks_utils import get_mlflow_experiment_traces_url
from cookbook.agents.function_calling_agent import FunctionCallingAgent

# Load the Agent's code with the above configuration
agent = FunctionCallingAgent(fc_agent_config)

# Vibe check the Agent for a single query
output = agent.predict(model_input={"messages": [{"role": "user", "content": "How does the blender work?"}]})
output = agent.predict(model_input={"messages": [{"role": "user", "content": "Translate the sku `OLD-abs-1234` to the new format"}]})

print(f"View the MLflow Traces at {get_mlflow_experiment_traces_url(experiment_info.experiment_id)}")
print(f"Agent's final response:\n----\n{output['content']}\n----")
print()
print(f"Agent's full message history (useful for debugging):\n----\n{json.dumps(output['messages'], indent=2)}\n----")


tools=[VectorSearchRetrieverTool(name='search_product_docs', description='Use this tool to search for product documentation.', vector_search_index='ep.cookbook_local_test.product_docs_docs_chunked_index__v1', filterable_columns=[], vector_search_schema=VectorSearchSchema(chunk_text='content_chunked', document_uri='doc_uri', additional_metadata_columns=[]), doc_similarity_threshold=0.0, vector_search_parameters=VectorSearchParameters(num_results=5, query_type='ann'), retriever_query_parameter_prompt='query to look up in retriever', retriever_filter_parameter_prompt='optional filters to apply to the search. An array of objects, each specifying a field name and the filters to apply to that field.'), UCTool(name='ep__cookbook_local_test__translate_sku', description='Translates a pre-2024 SKU formatted as "OLD-XXX-YYYY" to the new SKU format "NEW-YYYY-XXX".', uc_function_name='ep.cookbook_local_test.translate_sku', error_prompt='Error in generated code.  Please think step-by-step about how 

Now, let's test a multi-turn conversation with the Agent.

In [None]:
second_turn = {'messages': output['messages'] + [{"role": "user", "content": "How do I turn it on?"}]}

# Run the Agent again with the same input to continue the conversation
second_turn_output = agent.predict(model_input=second_turn)

print(f"View the MLflow Traces at {get_mlflow_experiment_traces_url(experiment_info.experiment_id)}")
print(f"Agent's final response:\n----\n{second_turn_output['content']}\n----")
print()
print(f"Agent's full message history (useful for debugging):\n----\n{json.dumps(second_turn_output['messages'], indent=2)}\n----")

#### ‚úÖ‚úèÔ∏è üÖ± Evaluate the Agent using your evaluation set

Note: If you do not have an evaluation set, you can create a synthetic evaluation set by using the 03_synthetic_evaluation notebook.

In [None]:
evaluation_set = spark.table(agent_storage_config.evaluation_set_uc_table)

with mlflow.start_run():
    logged_agent_info = log_agent_to_mlflow(fc_agent_config)

    # Run the agent for these queries, using Agent evaluation to parallelize the calls
    eval_results = mlflow.evaluate(
        model=logged_agent_info.model_uri,  # use the MLflow logged Agent
        data=evaluation_set,  # Evaluate the Agent for every row of the evaluation set
        model_type="databricks-agent",  # use Agent Evaluation
    )

    # Show all outputs.  Click on a row in this table to display the MLflow Trace.
    display(eval_results.tables["eval_results"])

    # Click 'View Evaluation Results' to see the Agent's inputs/outputs + quality evaluation displayed in a UI

## 2Ô∏è‚É£ Deploy a version of your Agent - either to the Review App or Production

Once you have a version of your Agent that has sufficient quality, you will register the Agent's model from the MLflow Experiment into the Unity Catalog & use Agent Framework's `agents.deploy(...)` command to deploy it.  Note these steps are the same for deploying to pre-production (e.g., the [Review App](https://docs.databricks.com/en/generative-ai/agent-evaluation/human-evaluation.html#review-app-ui) or production.

By the end of this step, you will have deployed a version of your Agent that you can interact with and share with your business stakeholders for feedback, even if they don't have access to your Databricks workspace:

1. A production-ready scalable REST API deployed as a Model Serving endpoint that logged every request/request/MLflow Trace to a Delta Table.
    - REST API for querying the Agent
    - REST API for sending user feedback from your UI to the Agent
2. Agent Evaluation's [Review App](https://docs.databricks.com/en/generative-ai/agent-evaluation/human-evaluation.html#review-app-ui) connected to these endpoints.
3. [Mosiac AI Playground](https://docs.databricks.com/en/large-language-models/ai-playground.html) connected to these endpoints.

Option 1: Deploy the last agent you logged above

In [None]:
from databricks import agents

# Use Unity Catalog as the model registry
mlflow.set_registry_uri("databricks-uc")

# Register the Agent's model to the Unity Catalog
uc_registered_model_info = mlflow.register_model(
    model_uri=logged_agent_info.model_uri, name=agent_storage_config.uc_model_name
)

# Deploy the model to the review app and a model serving endpoint
agents.deploy(agent_storage_config.uc_model_name, uc_registered_model_info.version)

Option 2: Log the latest copy of the Agent's code/config and deploy it

In [40]:
from databricks import agents

# Use Unity Catalog as the model registry
mlflow.set_registry_uri("databricks-uc")

with mlflow.start_run():
    logged_agent_info = log_agent_to_mlflow(fc_agent_config)

    # Register the Agent's model to the Unity Catalog
    uc_registered_model_info = mlflow.register_model(
        model_uri=logged_agent_info.model_uri, name=agent_storage_config.uc_model_name+"_3"
    )

# Deploy the model to the review app and a model serving endpoint
# agents.deploy(agent_storage_config.uc_model_name, uc_registered_model_info.version)

Uploading artifacts:   0%|          | 0/55 [00:00<?, ?it/s]



Downloading artifacts:   0%|          | 0/55 [00:00<?, ?it/s]

Successfully registered model 'ep.cookbook_local_test.my_agent_3'.


Downloading artifacts:   0%|          | 0/55 [00:00<?, ?it/s]

Uploading artifacts:   0%|          | 0/55 [00:00<?, ?it/s]

Created version '1' of model 'ep.cookbook_local_test.my_agent_3'.
2024/11/06 20:30:34 INFO mlflow.tracking._tracking_service.client: üèÉ View run chill-wasp-396 at: https://e2-dogfood.staging.cloud.databricks.com/ml/experiments/3916415516852775/runs/ab806dd976064400af8cb4931d5f41b5.
2024/11/06 20:30:34 INFO mlflow.tracking._tracking_service.client: üß™ View experiment at: https://e2-dogfood.staging.cloud.databricks.com/ml/experiments/3916415516852775.


In [41]:
logged_agent_info.model_uri

'runs:/ab806dd976064400af8cb4931d5f41b5/agent'

In [42]:
import mlflow

test = mlflow.pyfunc.load_model(logged_agent_info.model_uri)

test


Downloading artifacts:   0%|          | 0/55 [00:00<?, ?it/s]

mlflow.pyfunc.loaded_model:
  artifact_path: agent
  flavor: mlflow.pyfunc.loaders.code_model
  run_id: ab806dd976064400af8cb4931d5f41b5

In [43]:
rest = test.predict({"messages": [{"role": "user", "content": "How does the blender work?"}]})


In [51]:
# Get all attributes and methods of the test object
dir(test)
test.model_config


{'class_path': 'cookbook.config.agents.function_calling_agent.FunctionCallingAgentConfig',
 'input_example': {'messages': [{'content': 'What can you help me with?',
    'role': 'user'}]},
 'llm_config': {'llm_endpoint_name': 'ep-gpt4o-new',
  'llm_parameters': {'max_tokens': 1500, 'temperature': 0.01},
  'llm_system_prompt_template': '## Role\nYou are a helpful assistant that answers questions using a set of tools. If needed, you ask the user follow-up questions to clarify their request.\n\n## Objective\nYour goal is to provide accurate, relevant, and helpful response based solely on the outputs from these tools. You are concise and direct in your responses.\n\n## Instructions\n1. **Understand the Query**: Think step by step to analyze the user\'s question and determine the core need or problem. \n\n2. **Assess available tools**: Think step by step to consider each available tool and understand their capabilities in the context of the user\'s query.\n\n3. **Select the appropriate tool(