## Introduction: Delivering Business Value with GenAI

Delivering business value with GenAI requires answering these questions:
1. **Is my agent producing accurate answers?**
2. **How do I improve my agent's accuracy?**
3. **Is my agent fast and cost effective?**

We know how to deliver reliable software, but GenAI's non-deterministic nature makes it difficult to directly apply software development best practices:
1. **User inputs evolve without warning**
2. **Human expertise required to assess output quality**
3. **Must trade-off between quality & cost/latency**

**MLflow enables you to apply software development best practices to evaluate and monitor GenAI application quality.**

![MLflow SLDC](https://i.imgur.com/T0uM1No.gif)

### What you will see

- **Implement MLflow Tracing** to observe and debug your GenAI app
- **Customize MLflow's LLM judges** to create quality evaluation criteria that align with business requirements and your domain expert's judgement
- **Create MLflow Evaluation Datasets** to curate production traces into test suites
- **Use MLflow's Evaluation SDK** to iteratively test changes and check for regressions
- **Use MLflow Prompt Registry and App Versioning** to link versions to quality evaluations
- **Use Databricks AI/BI** to link GenAI observability and evaluation metrics to business KPIs

![MLflow GenAI Demo](https://i.imgur.com/MXhaayF.gif)

## Install packages (only required if running in a Databricks Notebook)

In [None]:
%pip install -U -r ../../requirements.txt
dbutils.library.restartPython()

## Environment Setup

Let's start by setting up the environment and checking that everything is working correctly.

In [3]:
import sys
sys.path.append('../')
sys.path.append('../../')

import os
from dotenv import load_dotenv
import mlflow
from mlflow_demo.utils import *

import ipywidgets as widgets
from IPython.display import display, HTML, Markdown
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px

if mlflow.utils.databricks_utils.is_in_databricks_notebook():
  print("Running in Databricks Notebook")
  setup_databricks_notebook_env()
else:
  print("Running in Local IDE")
  setup_local_ide_env()

# Verify key variables are loaded
print('=== Environment Setup ===')
print(f'DATABRICKS_HOST: {os.getenv("DATABRICKS_HOST")}')
print(f'MLFLOW_EXPERIMENT_ID: {os.getenv("MLFLOW_EXPERIMENT_ID")}')
print(f'LLM_MODEL: {os.getenv("LLM_MODEL")}')
print(f'UC_CATALOG: {os.getenv("UC_CATALOG")}')
print(f'UC_SCHEMA: {os.getenv("UC_SCHEMA")}')
print('✅ Environment variables loaded successfully!')

import logging
logging.getLogger("urllib3").setLevel(logging.ERROR)
logging.getLogger("mlflow").setLevel(logging.ERROR)

Running in Local IDE
=== Environment Setup ===
DATABRICKS_HOST: https://e2-dogfood.staging.cloud.databricks.com
MLFLOW_EXPERIMENT_ID: 1021971371054097
LLM_MODEL: databricks-claude-3-7-sonnet
UC_CATALOG: ep
UC_SCHEMA: mlflow_demo
✅ Environment variables loaded successfully!


## Use Case Overview: Sales Email Generation

We'll create a **sales email generation system** that helps account managers create personalized customer emails. The goal is to drive business KPIs such as sales efficiency, response rate, and revenue generated.

This use case demonstrates real-world challenges:
- **Personalization at scale** - Each email must be tailored to specific customer needs
- **Quality consistency** - Maintain professional tone and accuracy across all emails
- **Business impact measurement** - Link email quality to actual business outcomes

## Interactive Demo Navigation

Follow these steps to learn MLflow GenAI evaluation best practices through hands-on examples.

### Step Navigation

Click the buttons below to navigate to each step of the demo:

In [None]:
# Create interactive progress tracking
progress_checkboxes = []

steps_info = [
    {
        "title": "🔍 Step 1: Observe with Tracing",
        "description": "Add observability to your GenAI app so you can see what's happening and collect user feedback",
        "notebook": "1_observe_with_traces.ipynb",
        "url_env_var": "NOTEBOOK_URL_1_observe_with_traces"
    },
    {
        "title": "🎯 Step 2: Create Quality Metrics", 
        "description": "Scale your expert's judgment using MLflow's LLM judges to create automated quality metrics",
        "notebook": "2_create_quality_metrics.ipynb",
        "url_env_var": "NOTEBOOK_URL_2_create_quality_metrics"
    },
    {
        "title": "📈 Step 3: Find & Fix Quality Issues",
        "description": "Use production traces and actual user data with evaluation metrics to iteratively test quality fixes",
        "notebook": "3_find_fix_quality_issues.ipynb",
        "url_env_var": "NOTEBOOK_URL_3_find_fix_quality_issues"
    },
    {
        "title": "👁️ Step 4: Human Review",
        "description": "Collect expert feedback through an intuitive UI to systematically improve GenAI quality",
        "notebook": "4_human_review.ipynb",
        "url_env_var": "NOTEBOOK_URL_4_human_review"
    },
    {
        "title": "🔬 Step 5: Production Monitoring",
        "description": "Monitor GenAI quality in production with intelligent sampling",
        "notebook": "5_production_monitoring.ipynb",
        "url_env_var": "NOTEBOOK_URL_5_production_monitoring"
    }
]

def create_step_card(step_num, title, description, notebook_name, url_env_var=None):
    """Create an interactive step card with navigation using actual notebook URLs."""
    
    # Try to get the specific notebook URL from environment variables first
    notebook_url = None
    url_text = ""
    button_color = "#007acc"
    
    if url_env_var:
        notebook_url = os.getenv(url_env_var)
        if notebook_url:
            url_text = f"Open {notebook_name} →"
            button_color = "#007acc"  # Databricks blue
        
    # If no environment variable URL, try to get it directly from workspace
    if not notebook_url:
        raise ValueError(f"Notebook URL for {notebook_name} not found")

    # If still no URL, show error state
    if not notebook_url:
        notebook_url = "#"
        url_text = f"⚠️ {notebook_name} - URL not available"
        button_color = "#dc3545"  # Red for error
    
    # Create card HTML
    card_html = f"""
    <div style="
        border: 1px solid #e0e0e0;
        border-radius: 8px;
        padding: 20px;
        margin: 10px 0;
        background: #f9f9f9;
        box-shadow: 0 2px 4px rgba(0,0,0,0.1);
    ">
        <div style="display: flex; align-items: center; margin-bottom: 10px;">
            <div style="
                width: 40px;
                height: 40px;
                border-radius: 50%;
                background: {button_color};
                color: white;
                display: flex;
                align-items: center;
                justify-content: center;
                font-weight: bold;
                margin-right: 15px;
            ">{step_num}</div>
            <h3 style="margin: 0; color: #333;">{title}</h3>
        </div>
        <p style="color: #666; margin: 10px 0; line-height: 1.5;">{description}</p>
        <a href="{notebook_url}" target="_blank" style="
            display: inline-block;
            background: {button_color};
            color: white;
            padding: 8px 16px;
            border-radius: 4px;
            text-decoration: none;
            font-weight: bold;
            margin-right: 10px;
            {'cursor: not-allowed; opacity: 0.6;' if notebook_url == '#' else ''}
        ">{url_text}</a>
    </div>
    """
    
    return card_html

# Create step cards - check if running in Databricks
if mlflow.utils.databricks_utils.is_in_databricks_notebook():
    # Create step cards using the actual notebook URLs and displayHTML
    all_cards_html = "<h3>🚀 Interactive Demo Steps</h3><p>Click on any step below to open the corresponding notebook:</p>"
    
    for i, step in enumerate(steps_info, 1):
        card_html = create_step_card(
            step_num=i,
            title=step["title"],
            description=step["description"],
            notebook_name=step["notebook"],
            url_env_var=step.get("url_env_var")
        )
        all_cards_html += card_html

    displayHTML(all_cards_html)
else:
    # Use widgets for local IDE
    try:
        import ipywidgets as widgets
        from IPython.display import display
        
        # Create step cards using the actual notebook URLs
        step_cards = []
        for i, step in enumerate(steps_info, 1):
            card_html = create_step_card(
                step_num=i,
                title=step["title"],
                description=step["description"],
                notebook_name=step["notebook"],
                url_env_var=step.get("url_env_var")
            )
            step_cards.append(widgets.HTML(card_html))

        # Display all step cards
        navigation_container = widgets.VBox([
            widgets.HTML("<h3>🚀 Interactive Demo Steps</h3>"),
            widgets.HTML("<p>Click on any step below to open the corresponding notebook:</p>"),
            *step_cards
        ])

        display(navigation_container)
    except ImportError:
        # Text fallback when widgets not available
        print("⚠️ Interactive widgets not available.")
        print("📋 Demo Steps (Text Navigation):")
        print("=" * 50)
        
        for i, step in enumerate(steps_info, 1):
            print(f"\n{i}. {step['title']}")
            print(f"   {step['description']}")
            
            # Try to get notebook URL from environment variable first
            notebook_url = None
            if step.get('url_env_var'):
                notebook_url = os.getenv(step['url_env_var'])
            
            # If no env var URL, try to get from workspace
            if not notebook_url:
                raise ValueError(f"Notebook URL for {step['notebook']} not found")

            if notebook_url:
                print(f"   🔗 {notebook_url}")
            else:
                print(f"   ⚠️ URL not available for {step['notebook']}")
        
        print("=" * 50)

---

## Additional Resources

### 📚 Documentation
- [MLflow GenAI Documentation](https://docs.databricks.com/aws/en/mlflow3/genai/)
- [MLflow Tracing Guide](https://docs.databricks.com/aws/en/mlflow3/genai/tracing/)
- [MLflow Evaluation SDK](https://docs.databricks.com/aws/en/mlflow3/genai/evaluation/)

### 🎯 What You'll Learn
By completing this demo, you'll understand how to:
- ✅ Implement comprehensive GenAI observability with MLflow Tracing
- ✅ Create custom quality metrics using LLM judges
- ✅ Build automated evaluation pipelines for continuous quality monitoring
- ✅ Collect and utilize expert feedback for quality improvement
- ✅ Link technical metrics to business KPIs and ROI

### 🔄 Need Help?
If you encounter any issues:
1. Check that your environment variables are properly set
2. Ensure you have the necessary permissions in your Databricks workspace
3. Restart the Python kernel if needed
4. Refer to the individual notebook instructions for step-specific guidance

---

**Happy learning! 🎉**