# Evaluate Supply Chain Agent
This notebook demonstrates how to:
- Evaluate the agent with Mosaic AI Agent Evaluation.

## Cluster Configuration
This notebook was tested on the following Databricks cluster configuration:
- **Databricks Runtime Version:** 16.4 LTS ML (includes Apache Spark 3.5.2, Scala 2.12)
- **Single Node** 
    - Azure: Standard_DS4_v2 (28 GB Memory, 8 Cores)
    - AWS: m5d.2xlarge (32 GB Memory, 8 Cores)

In [0]:
%pip install -U -qqqq mlflow-skinny[databricks] langgraph==0.3.4 databricks-langchain databricks-agents uv
%pip install -r ../requirements.txt --quiet
dbutils.library.restartPython()

## Load the agent

In [0]:
import mlflow
from databricks import agents

# Connect to the Unity catalog model registry
mlflow.set_registry_uri("databricks-uc")

catalog = "supply_chain_stress_test"    # Change here
schema = "agents"                       # Change here
agent_name = "supply_chain_agent"       # Change here

# Load the latest version of the model using pyfunc flavor
agent = mlflow.pyfunc.load_model(f"models:/{catalog}.{schema}.{agent_name}@latest")

## Test the agent

Interact with the agent to test its output.

In [0]:
import os
import mlflow
from dbruntime.databricks_repl_context import get_context

# TODO: set WORKSPACE_URL manually if it cannot be inferred from the current notebook
WORKSPACE_URL = None
if WORKSPACE_URL is None:
  workspace_url_hostname = get_context().browserHostName
  assert workspace_url_hostname is not None, "Unable to look up current workspace URL. This can happen if running against serverless compute. Manually set WORKSPACE_URL yourself above, or run this notebook against classic compute"
  WORKSPACE_URL = f"https://{workspace_url_hostname}"

# TODO: set secret_scope_name and secret_key_name to access your PAT
secret_scope = "ryuta"
secret_key = "token"

os.environ["HOST"] = WORKSPACE_URL
os.environ["TOKEN"] = dbutils.secrets.get(scope=secret_scope, key=secret_key)

In [0]:
from supply_chain_agent import AGENT

agent.predict({"messages": [{"role": "user", "content": "What happens if T2_4 goes down and takes 6 weeks to recover? What should I do?"}]})

## Evaluate the agent with [Agent Evaluation](https://learn.microsoft.com/azure/databricks/mlflow3/genai/eval-monitor)

You can edit the requests or expected responses in your evaluation dataset and run evaluation as you iterate your agent, leveraging mlflow to track the computed quality metrics. Evaluate your agent with one of our [predefined LLM scorers](https://learn.microsoft.com/azure/databricks/mlflow3/genai/eval-monitor/predefined-judge-scorers), or try adding [custom metrics](https://learn.microsoft.com/azure/databricks/mlflow3/genai/eval-monitor/custom-scorers).

In [0]:
# Evaluation dataset
eval_data = [
    {
        "inputs": {
            "messages": [
                {
                    "role": "user",
                    "content": "List all downstream sites for the raw material supplied by T3_10, and include any related information about these sites."
                }
            ],
        },
        "expected_response": None
    },
]

In [0]:
import mlflow.genai
from mlflow.genai.scorers import RelevanceToQuery, Guidelines

# Define evaluation scorers
scorers = [
    RelevanceToQuery(),
    Guidelines(
        guidelines="The right tool was used to answer the question.",
        name="tool_usage",
    ),
    Guidelines(
        guidelines="Response must not be longer than 500 words.",
        name="response_length",
    ),
]

In [0]:
# Run evaluation
def evaluate_model(messages) -> dict:
    return agent.predict({"messages": messages})
  
results = mlflow.genai.evaluate(
    data=eval_data,
    predict_fn=evaluate_model,
    scorers=scorers
)

## Assign `Production` alias to this version of the agent

In [0]:
from mlflow import MlflowClient

client = MlflowClient()
model_info = client.get_model_version_by_alias(f"{catalog}.{schema}.{agent_name}", "latest")
client.set_registered_model_alias(f"{catalog}.{schema}.{agent_name}", "Production", model_info.version)

## Next steps
