

# Production Monitoring: Automated Quality at Scale

MLflow's production monitoring automatically runs quality assessments on a sample of your production traffic, ensuring your GenAI app maintains high quality standards without manual intervention. MLflow lets you use the same metrics you defined for offline evaluation in production, enabling you to have consistent quality evaluation across your entire application lifecycle - dev to prod.

**Key benefits:** 

- Automated evaluation - Run LLM judges on production traces with configurable sampling rates
- Continuous quality assessment - Monitor quality metrics in real-time without disrupting user experience
- Cost-effective monitoring - Smart sampling strategies to balance coverage with computational cost

Production monitoring enables you to deploy confidently, knowing that you will proactively detect issues so you can address them before they cause a major impact to your users.

<img src="https://i.imgur.com/wv4p562.gif">

<!-- Collect usage data (view). Remove it to disable collection or disable tracker during installation. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=data-science&org_id=1444828305810485&notebook=05.production-monitoring&demo_name=ai-agent&event=VIEW">

In [0]:
%pip install -U -qqqq mlflow[databricks]>=3.1.1 databricks-agents
dbutils.library.restartPython()

In [0]:
%run ../_resources/01-setup


## Let's create our production grade monitor

You can easily create your monitor using the UI, or directly the SDK:


In [0]:
from databricks.agents.monitoring import (
  AssessmentsSuiteConfig,
  GuidelinesJudge,
  create_external_monitor,
  get_external_monitor,
  BuiltinJudge
)
import mlflow

# Let's re-use our main agent experiment
xp_name = os.getcwd().rsplit("/", 1)[0]+"/02-agent-eval/02.1_agent_evaluation"
mlflow.set_experiment(xp_name)


def get_or_create_monitor():
  try:
    external_monitor = get_external_monitor(experiment_name=xp_name)
  except Exception as e:
    if "does not exist" in str(e):
      # Create external monitor for automated production monitoring
      external_monitor = create_external_monitor(
        # Change to a Unity Catalog schema where you have CREATE TABLE permissions.
        catalog_name=catalog,
        schema_name=dbName,
        assessments_config=AssessmentsSuiteConfig(
          sample=1.0,  # sampling rate
          assessments=[
            # Builtin judges
            BuiltinJudge(name="safety"),
            BuiltinJudge(name="groundedness", sample_rate=0.4),
            BuiltinJudge(name="relevance_to_query"),
            # Guidelines can refer to the request and response.
            GuidelinesJudge(guidelines={
              'accuracy': [
                """The response correctly references all factual information from the provided_info based on these rules:
      - All factual information must be directly sourced from the provided data with NO fabrication
      - Names, dates, numbers, and company details must be 100% accurate with no errors
      - Meeting discussions must be summarized with the exact same sentiment and priority as presented in the data
      - Support ticket information must include correct ticket IDs, status, and resolution details when available
      - All product usage statistics must be presented with the same metrics provided in the data
      - No references to CloudFlow features, services, or offerings unless specifically mentioned in the customer data
      - AUTOMATIC FAIL if any information is mentioned that is not explicitly provided in the data"""
              ],
              'steps_and_reasoning': [
                """Reponse must be done without showing reasoning.
      - don't mention that you need to look up things
      - do not mention tools or function used
      - do not tell your intermediate steps or reasoning"""
              ]
            })
          ]
        )
      )
  print(f"Monitor created: {external_monitor}")


# monitor will create a run that will be refreshed periodically (small cost incures). 
# uncomment to create the monitor in your experiment!
# get_or_create_monitor()
