# Model Output Analysis: Conversation Topic Classifier with Operational Enrichment

This notebook performs comprehensive analysis on the AI classification outputs to evaluate:
- **Topic Classification Quality**: Label distribution, confidence calibration, accuracy
- **Operational Insights**: Escalation patterns, risk assessment, root causes
- **Routing Recommendations**: Action mapping, workflow assignment
- **Handler Actionability**: Summary quality, recommended actions

## Notebook Structure

1. **Setup & Data Loading** - Import libraries, load and validate data
2. **Topic Distribution Analysis** - Main drivers of contact
3. **Confidence Analysis** - Model certainty patterns
4. **Escalation Analysis** - Risk and escalation patterns
5. **Routing Analysis** - Operational actions and workflow mapping
6. **Root Cause Analysis** - Primary issue drivers
7. **Handler Actionability** - Summary and action quality
8. **Model Health Dashboard** - Summary metrics
9. **Answers to Taxonomy Goals** - How outputs support business objectives

---
## 1. Setup & Data Loading

In [None]:
"""
Import required libraries for data analysis and visualization.
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ast
import warnings

# Configuration
warnings.filterwarnings('ignore')
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['axes.grid'] = True
plt.rcParams['grid.alpha'] = 0.3

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 100)
pd.set_option('display.width', None)

# Chart export settings
EXPORT_CHARTS = False  # Set to True to save charts as PNG files
CHART_DPI = 150

print("Libraries loaded successfully.")

In [None]:
"""
Load the model output dataset with new operational enrichment fields.

Expected columns:
- conversation_id, conversation: Original data
- topic, confidence, rationale: Core classification
- handler_summary: Call-handler-friendly summary
- emotion, difficulty: Customer state assessment
- operational_actions: List of recommended actions
- risk_level, escalation_required, escalation_flags: Risk assessment
- root_cause_code, root_cause_detail: Root cause analysis
"""
# Configuration - adjust paths as needed
AI_LABELS_PATH = "data/conversations_ai_classified_.csv"
MANUAL_LABELS_PATH = "data/conversations_manually_classified.csv"

# Load AI classifications
df = pd.read_csv(AI_LABELS_PATH)

print(f"Dataset loaded: {len(df):,} conversations")
print(f"\nColumns ({len(df.columns)}):")
for col in df.columns:
    print(f"  - {col}")

In [None]:
"""
Validate expected columns exist and identify optional columns.
"""
# Define expected columns by category
REQUIRED_COLUMNS = ['conversation_id', 'conversation', 'topic', 'confidence', 'rationale']
OPERATIONAL_COLUMNS = [
    'handler_summary', 'emotion', 'difficulty', 'operational_actions',
    'risk_level', 'escalation_required', 'escalation_flags',
    'root_cause_code', 'root_cause_detail'
]

# Check required columns
missing_required = [col for col in REQUIRED_COLUMNS if col not in df.columns]
if missing_required:
    raise ValueError(f"Missing required columns: {missing_required}")
print("‚úì All required columns present")

# Check operational columns
present_operational = [col for col in OPERATIONAL_COLUMNS if col in df.columns]
missing_operational = [col for col in OPERATIONAL_COLUMNS if col not in df.columns]

print(f"\nOperational columns present ({len(present_operational)}/{len(OPERATIONAL_COLUMNS)}):")
for col in present_operational:
    print(f"  ‚úì {col}")

if missing_operational:
    print(f"\n‚ö†Ô∏è  Missing operational columns (will skip related analyses):")
    for col in missing_operational:
        print(f"  - {col}")

# Feature flags for conditional analysis
HAS_ESCALATION = 'escalation_required' in df.columns
HAS_ACTIONS = 'operational_actions' in df.columns
HAS_ROOT_CAUSE = 'root_cause_code' in df.columns
HAS_EMOTION = 'emotion' in df.columns
HAS_HANDLER_SUMMARY = 'handler_summary' in df.columns

In [None]:
"""
Parse list columns (operational_actions, escalation_flags) from string representation.
Pandas serializes lists as strings in CSV - we need to convert them back.
"""
def safe_parse_list(val):
    """Safely parse a string representation of a list."""
    if pd.isna(val):
        return []
    if isinstance(val, list):
        return val
    if isinstance(val, str):
        try:
            parsed = ast.literal_eval(val)
            return parsed if isinstance(parsed, list) else []
        except (ValueError, SyntaxError):
            return []
    return []

# Parse list columns if present
if HAS_ACTIONS:
    df['operational_actions_list'] = df['operational_actions'].apply(safe_parse_list)
    df['num_actions'] = df['operational_actions_list'].apply(len)
    print(f"Parsed operational_actions: {df['num_actions'].sum():,} total actions across {(df['num_actions'] > 0).sum():,} conversations")

if HAS_ESCALATION and 'escalation_flags' in df.columns:
    df['escalation_flags_list'] = df['escalation_flags'].apply(safe_parse_list)
    df['num_flags'] = df['escalation_flags_list'].apply(len)
    print(f"Parsed escalation_flags: {df['num_flags'].sum():,} total flags across {(df['num_flags'] > 0).sum():,} conversations")

In [None]:
"""
Check for missing values and data quality issues.
"""
# Missing values
missing = df.isnull().sum()
missing_pct = (missing / len(df) * 100).round(1)
missing_report = pd.DataFrame({'Missing': missing, 'Pct': missing_pct})
missing_report = missing_report[missing_report['Missing'] > 0]

print("Missing values per column:")
if len(missing_report) > 0:
    print(missing_report)
else:
    print("  No missing values found.")

# Check for ERROR classifications (failed API calls)
error_count = (df['topic'] == 'ERROR').sum()
if error_count > 0:
    print(f"\n‚ö†Ô∏è  Warning: {error_count} conversations failed classification (topic='ERROR')")
else:
    print("\n‚úì No classification errors found.")

---
## 2. Topic Distribution Analysis

Understanding how conversations are distributed across topic categories reveals the **main drivers of customer contact**.

In [None]:
"""
Calculate topic distribution statistics.
"""
# Topic counts and percentages
topic_counts = df['topic'].value_counts()
topic_pcts = df['topic'].value_counts(normalize=True) * 100

topic_dist = pd.DataFrame({
    'Count': topic_counts,
    'Percentage': topic_pcts.round(1)
})

print(f"Total unique topics: {len(topic_counts)}")
print(f"\nTopic Distribution:")
topic_dist

In [None]:
"""
Visualize topic distribution as a horizontal bar chart - Main Drivers of Contact.
"""
fig, ax = plt.subplots(figsize=(12, 7))

# Sort by count for better visualization
topic_counts_sorted = topic_counts.sort_values(ascending=True)

# Create horizontal bar chart with color gradient
colors = plt.cm.Blues(np.linspace(0.3, 0.9, len(topic_counts_sorted)))
bars = ax.barh(range(len(topic_counts_sorted)), topic_counts_sorted.values, color=colors)

# Set y-tick labels
ax.set_yticks(range(len(topic_counts_sorted)))
ax.set_yticklabels(topic_counts_sorted.index)

# Add count labels on bars
for i, (bar, count) in enumerate(zip(bars, topic_counts_sorted.values)):
    pct = count / len(df) * 100
    ax.text(bar.get_width() + 5, bar.get_y() + bar.get_height()/2, 
            f'{count} ({pct:.1f}%)', va='center', fontsize=9)

ax.set_xlabel('Number of Conversations')
ax.set_title('Main Drivers of Contact: Topic Distribution', fontsize=14, fontweight='bold')
ax.set_xlim(0, max(topic_counts_sorted.values) * 1.25)

plt.tight_layout()

if EXPORT_CHARTS:
    plt.savefig('charts/topic_distribution.png', dpi=CHART_DPI, bbox_inches='tight')
    
plt.show()

print(f"\nüìä Top 3 Contact Drivers:")
for i, (topic, count) in enumerate(topic_counts.head(3).items(), 1):
    print(f"   {i}. {topic}: {count} ({count/len(df)*100:.1f}%)")

---
## 3. Confidence Analysis

Analyzing confidence levels helps understand:
- Overall model certainty
- Which topics are harder to classify
- Potential ambiguity in the taxonomy

In [None]:
"""
Overall confidence level distribution.
"""
# Confidence counts - handle any confidence level present
confidence_order = ['high', 'medium', 'low']
conf_counts = df['confidence'].value_counts()

# Reindex to ensure consistent order
for level in confidence_order:
    if level not in conf_counts.index:
        conf_counts[level] = 0
conf_counts = conf_counts.reindex(confidence_order)

conf_pcts = (conf_counts / len(df) * 100).round(1)

conf_summary = pd.DataFrame({
    'Count': conf_counts,
    'Percentage': conf_pcts
})

print("Overall Confidence Distribution:")
conf_summary

In [None]:
"""
Visualize overall confidence distribution.
"""
fig, ax = plt.subplots(figsize=(8, 5))

colors = {'high': '#27ae60', 'medium': '#f39c12', 'low': '#e74c3c'}
bar_colors = [colors.get(c, 'gray') for c in conf_counts.index]
bars = ax.bar(conf_counts.index, conf_counts.values, color=bar_colors, edgecolor='white', linewidth=2)

# Add count labels on bars
for bar, count, pct in zip(bars, conf_counts.values, conf_pcts.values):
    if count > 0:
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(conf_counts)*0.02, 
                f'{count}\n({pct}%)', ha='center', fontsize=11, fontweight='bold')

ax.set_xlabel('Confidence Level')
ax.set_ylabel('Number of Conversations')
ax.set_title('Overall Confidence Distribution', fontsize=14, fontweight='bold')
ax.set_ylim(0, max(conf_counts.values) * 1.2)

plt.tight_layout()

if EXPORT_CHARTS:
    plt.savefig('charts/confidence_distribution.png', dpi=CHART_DPI, bbox_inches='tight')
    
plt.show()

In [None]:
"""
Confidence distribution broken down by topic (stacked bar chart).
"""
# Cross-tabulation of topic x confidence
conf_by_topic = pd.crosstab(df['topic'], df['confidence'])

# Ensure all confidence levels are present
for col in ['high', 'medium', 'low']:
    if col not in conf_by_topic.columns:
        conf_by_topic[col] = 0
conf_by_topic = conf_by_topic[['high', 'medium', 'low']]

# Calculate percentages
conf_by_topic_pct = conf_by_topic.div(conf_by_topic.sum(axis=1), axis=0) * 100

print("Confidence Distribution by Topic (counts):")
conf_by_topic

In [None]:
"""
Stacked bar chart showing confidence levels per topic.
"""
fig, ax = plt.subplots(figsize=(12, 7))

# Sort topics by high-confidence rate (descending)
topic_order = conf_by_topic_pct['high'].sort_values(ascending=True).index
conf_by_topic_sorted = conf_by_topic_pct.reindex(topic_order)

# Create stacked horizontal bar chart
colors = {'high': '#27ae60', 'medium': '#f39c12', 'low': '#e74c3c'}

y_pos = range(len(conf_by_topic_sorted))
left = np.zeros(len(conf_by_topic_sorted))

for conf_level in ['high', 'medium', 'low']:
    values = conf_by_topic_sorted[conf_level].values
    ax.barh(y_pos, values, left=left, label=conf_level.capitalize(), color=colors[conf_level])
    left += values

ax.set_yticks(y_pos)
ax.set_yticklabels(conf_by_topic_sorted.index)
ax.set_xlabel('Percentage')
ax.set_title('Confidence Distribution by Topic', fontsize=14, fontweight='bold')
ax.legend(loc='lower right')
ax.set_xlim(0, 100)

# Add percentage labels for high confidence
for i, topic in enumerate(conf_by_topic_sorted.index):
    high_pct = conf_by_topic_sorted.loc[topic, 'high']
    if high_pct > 15:
        ax.text(high_pct/2, i, f'{high_pct:.0f}%', ha='center', va='center', 
                fontsize=8, color='white', fontweight='bold')

plt.tight_layout()

if EXPORT_CHARTS:
    plt.savefig('charts/confidence_by_topic.png', dpi=CHART_DPI, bbox_inches='tight')
    
plt.show()

In [None]:
"""
Topics with lowest confidence - candidates for taxonomy refinement.
"""
# Calculate low+medium confidence rate
low_med_rate = (conf_by_topic_pct['low'] + conf_by_topic_pct['medium']).sort_values(ascending=False)

low_conf_df = pd.DataFrame({
    'Topic': low_med_rate.index,
    'Low+Med Rate (%)': low_med_rate.values.round(1),
    'High (%)': conf_by_topic_pct['high'].reindex(low_med_rate.index).values.round(1),
    'Total': conf_by_topic.sum(axis=1).reindex(low_med_rate.index).values
}).reset_index(drop=True)

print("‚ö†Ô∏è  Topics by Uncertainty (candidates for taxonomy refinement):")
low_conf_df

---
## 4. Escalation Analysis

Analyze escalation patterns to understand:
- Which conversations need human intervention
- Risk level distribution across topics
- Common escalation triggers

In [None]:
"""
Overall escalation statistics.
"""
if HAS_ESCALATION:
    # Escalation rate
    esc_count = df['escalation_required'].sum()
    esc_rate = esc_count / len(df) * 100
    
    print("Escalation Overview")
    print("=" * 50)
    print(f"Conversations requiring escalation: {esc_count:,} ({esc_rate:.1f}%)")
    print(f"Conversations not requiring escalation: {len(df) - esc_count:,} ({100-esc_rate:.1f}%)")
    
    # Risk level distribution
    if 'risk_level' in df.columns:
        print(f"\nRisk Level Distribution:")
        risk_counts = df['risk_level'].value_counts()
        for level, count in risk_counts.items():
            print(f"  {level}: {count} ({count/len(df)*100:.1f}%)")
else:
    print("‚ö†Ô∏è  Escalation data not available - skipping escalation analysis.")

In [None]:
"""
Breakdown of escalation flags - what triggers escalation.
"""
if HAS_ESCALATION and 'escalation_flags_list' in df.columns:
    # Flatten all flags and count occurrences
    all_flags = []
    for flags in df['escalation_flags_list']:
        all_flags.extend(flags)
    
    if all_flags:
        flag_counts = pd.Series(all_flags).value_counts()
        
        fig, ax = plt.subplots(figsize=(10, 6))
        
        colors = plt.cm.Reds(np.linspace(0.4, 0.9, len(flag_counts)))
        bars = ax.barh(range(len(flag_counts)), flag_counts.values, color=colors[::-1])
        
        ax.set_yticks(range(len(flag_counts)))
        ax.set_yticklabels(flag_counts.index)
        ax.set_xlabel('Count')
        ax.set_title('Escalation Flags: What Triggers Escalation', fontsize=14, fontweight='bold')
        
        # Add count labels
        for bar, count in zip(bars, flag_counts.values):
            ax.text(bar.get_width() + 0.5, bar.get_y() + bar.get_height()/2, 
                    f'{count}', va='center', fontsize=9)
        
        ax.set_xlim(0, max(flag_counts.values) * 1.15)
        plt.tight_layout()
        
        if EXPORT_CHARTS:
            plt.savefig('charts/escalation_flags.png', dpi=CHART_DPI, bbox_inches='tight')
        
        plt.show()
        
        print(f"\nTop Escalation Triggers:")
        for flag, count in flag_counts.head(5).items():
            print(f"  ‚Ä¢ {flag}: {count} occurrences")
    else:
        print("No escalation flags found in the dataset.")
else:
    print("‚ö†Ô∏è  Escalation flags not available.")

In [None]:
"""
Escalation rate by topic - which topics require most human intervention.
"""
if HAS_ESCALATION:
    # Calculate escalation rate per topic
    esc_by_topic = df.groupby('topic')['escalation_required'].agg(['sum', 'count'])
    esc_by_topic['rate'] = (esc_by_topic['sum'] / esc_by_topic['count'] * 100).round(1)
    esc_by_topic = esc_by_topic.sort_values('rate', ascending=True)
    esc_by_topic.columns = ['Escalations', 'Total', 'Rate (%)']
    
    fig, ax = plt.subplots(figsize=(12, 7))
    
    # Color bars by escalation rate
    colors = plt.cm.RdYlGn_r(esc_by_topic['Rate (%)'].values / 100)
    bars = ax.barh(range(len(esc_by_topic)), esc_by_topic['Rate (%)'].values, color=colors)
    
    ax.set_yticks(range(len(esc_by_topic)))
    ax.set_yticklabels(esc_by_topic.index)
    ax.set_xlabel('Escalation Rate (%)')
    ax.set_title('Escalation Rate by Topic', fontsize=14, fontweight='bold')
    
    # Add rate labels
    for i, (bar, rate) in enumerate(zip(bars, esc_by_topic['Rate (%)'].values)):
        count = esc_by_topic.iloc[i]['Escalations']
        ax.text(bar.get_width() + 0.5, bar.get_y() + bar.get_height()/2, 
                f'{rate:.0f}% (n={int(count)})', va='center', fontsize=9)
    
    ax.set_xlim(0, max(esc_by_topic['Rate (%)'].values) * 1.3 if max(esc_by_topic['Rate (%)'].values) > 0 else 10)
    plt.tight_layout()
    
    if EXPORT_CHARTS:
        plt.savefig('charts/escalation_by_topic.png', dpi=CHART_DPI, bbox_inches='tight')
    
    plt.show()
    
    print("\nEscalation Rate by Topic:")
    print(esc_by_topic.sort_values('Rate (%)', ascending=False))

In [None]:
"""
Risk level distribution by topic.
"""
if 'risk_level' in df.columns:
    # Cross-tabulation of topic x risk_level
    risk_order = ['none', 'low', 'medium', 'high']
    risk_by_topic = pd.crosstab(df['topic'], df['risk_level'])
    
    # Ensure all risk levels present
    for level in risk_order:
        if level not in risk_by_topic.columns:
            risk_by_topic[level] = 0
    risk_by_topic = risk_by_topic[[col for col in risk_order if col in risk_by_topic.columns]]
    
    # Calculate percentages
    risk_by_topic_pct = risk_by_topic.div(risk_by_topic.sum(axis=1), axis=0) * 100
    
    # Sort by high+medium risk rate
    if 'high' in risk_by_topic_pct.columns and 'medium' in risk_by_topic_pct.columns:
        sort_key = risk_by_topic_pct['high'] + risk_by_topic_pct['medium']
    elif 'high' in risk_by_topic_pct.columns:
        sort_key = risk_by_topic_pct['high']
    else:
        sort_key = risk_by_topic_pct.iloc[:, -1]
    
    risk_by_topic_sorted = risk_by_topic_pct.loc[sort_key.sort_values(ascending=True).index]
    
    fig, ax = plt.subplots(figsize=(12, 7))
    
    risk_colors = {'none': '#95a5a6', 'low': '#27ae60', 'medium': '#f39c12', 'high': '#e74c3c'}
    
    y_pos = range(len(risk_by_topic_sorted))
    left = np.zeros(len(risk_by_topic_sorted))
    
    for level in risk_order:
        if level in risk_by_topic_sorted.columns:
            values = risk_by_topic_sorted[level].values
            ax.barh(y_pos, values, left=left, label=level.capitalize(), color=risk_colors[level])
            left += values
    
    ax.set_yticks(y_pos)
    ax.set_yticklabels(risk_by_topic_sorted.index)
    ax.set_xlabel('Percentage')
    ax.set_title('Risk Level Distribution by Topic', fontsize=14, fontweight='bold')
    ax.legend(loc='lower right')
    ax.set_xlim(0, 100)
    
    plt.tight_layout()
    
    if EXPORT_CHARTS:
        plt.savefig('charts/risk_by_topic.png', dpi=CHART_DPI, bbox_inches='tight')
    
    plt.show()
else:
    print("‚ö†Ô∏è  Risk level data not available.")

---
## 5. Routing Analysis

Analyze operational actions to understand:
- What actions are most commonly recommended
- How actions map to topics
- Proposed routing rules for automation

In [None]:
"""
Overall operational actions frequency.
"""
if HAS_ACTIONS and 'operational_actions_list' in df.columns:
    # Flatten all actions
    all_actions = []
    for actions in df['operational_actions_list']:
        all_actions.extend(actions)
    
    if all_actions:
        action_counts = pd.Series(all_actions).value_counts()
        
        fig, ax = plt.subplots(figsize=(12, 8))
        
        colors = plt.cm.Greens(np.linspace(0.3, 0.9, len(action_counts)))
        bars = ax.barh(range(len(action_counts)), action_counts.values, color=colors[::-1])
        
        ax.set_yticks(range(len(action_counts)))
        ax.set_yticklabels(action_counts.index)
        ax.set_xlabel('Count')
        ax.set_title('Operational Actions: Overall Frequency', fontsize=14, fontweight='bold')
        
        # Add count labels
        for bar, count in zip(bars, action_counts.values):
            pct = count / len(df) * 100
            ax.text(bar.get_width() + 1, bar.get_y() + bar.get_height()/2, 
                    f'{count} ({pct:.1f}%)', va='center', fontsize=8)
        
        ax.set_xlim(0, max(action_counts.values) * 1.2)
        plt.tight_layout()
        
        if EXPORT_CHARTS:
            plt.savefig('charts/actions_overall.png', dpi=CHART_DPI, bbox_inches='tight')
        
        plt.show()
        
        print(f"\nTotal actions recommended: {len(all_actions):,}")
        print(f"Unique action types: {len(action_counts)}")
        print(f"Average actions per conversation: {len(all_actions)/len(df):.2f}")
    else:
        print("No operational actions found in the dataset.")
else:
    print("‚ö†Ô∏è  Operational actions not available.")

In [None]:
"""
Top operational actions by topic.
"""
if HAS_ACTIONS and 'operational_actions_list' in df.columns:
    # Build topic -> action mapping
    topic_actions = {}
    for _, row in df.iterrows():
        topic = row['topic']
        actions = row['operational_actions_list']
        if topic not in topic_actions:
            topic_actions[topic] = []
        topic_actions[topic].extend(actions)
    
    # Get top 3 actions per topic
    print("Top 3 Operational Actions by Topic:")
    print("=" * 70)
    
    summary_data = []
    for topic in topic_counts.index:
        if topic in topic_actions and topic_actions[topic]:
            actions = pd.Series(topic_actions[topic]).value_counts().head(3)
            top_actions = ', '.join([f"{a} ({c})" for a, c in actions.items()])
            summary_data.append({'Topic': topic, 'Top Actions': top_actions})
            print(f"\n{topic}:")
            for action, count in actions.items():
                print(f"  ‚Ä¢ {action}: {count}")
    
    actions_summary_df = pd.DataFrame(summary_data)

In [None]:
"""
Proposed routing table: Topic ‚Üí Workflow/Agent mapping.
"""
# Define routing rules based on topic patterns
# This is an example - adjust based on actual business workflows

routing_rules = {
    'Account Access & Customer Profile': 'Auth Support Bot',
    'Orders, Shipping & Delivery': 'Order Management Bot',
    'Returns, Refunds & Exchanges': 'Returns Specialist',
    'Product Defects & Fulfillment Errors': 'Quality Team',
    'Billing, Charges & Price Discrepancies': 'Billing Specialist',
    'Technical & Platform Issues': 'Tech Support Bot',
    'Product Information & Availability': 'Product Info Bot',
    'Promotions, Discounts & Loyalty': 'Loyalty Team',
    'Complaints, Escalations & Negative Feedback': 'Senior Agent',
    'General Enquiries & Multi-Intent': 'General Support Bot'
}

# Build routing coverage table
routing_data = []
for topic in topic_counts.index:
    count = topic_counts[topic]
    pct = count / len(df) * 100
    workflow = routing_rules.get(topic, 'Unassigned')
    routing_data.append({
        'Topic': topic,
        'Count': count,
        'Pct': f"{pct:.1f}%",
        'Suggested Workflow': workflow
    })

routing_df = pd.DataFrame(routing_data)

print("Proposed Routing Table: Topic ‚Üí Workflow/Agent")
print("=" * 80)
print(routing_df.to_string(index=False))

# Coverage analysis
unassigned = routing_df[routing_df['Suggested Workflow'] == 'Unassigned']['Count'].sum()
coverage = (len(df) - unassigned) / len(df) * 100

print(f"\nüìä Routing Coverage: {coverage:.1f}% of conversations have assigned workflows")

---
## 6. Root Cause Analysis

Understand the underlying reasons for customer contacts:
- Most common root causes overall
- Root cause distribution by topic
- Patterns for proactive issue prevention

In [None]:
"""
Overall root cause code distribution.
"""
if HAS_ROOT_CAUSE:
    root_cause_counts = df['root_cause_code'].value_counts()
    
    fig, ax = plt.subplots(figsize=(12, 8))
    
    colors = plt.cm.Oranges(np.linspace(0.3, 0.9, len(root_cause_counts)))
    bars = ax.barh(range(len(root_cause_counts)), root_cause_counts.values, color=colors[::-1])
    
    ax.set_yticks(range(len(root_cause_counts)))
    ax.set_yticklabels(root_cause_counts.index)
    ax.set_xlabel('Count')
    ax.set_title('Root Cause Codes: Overall Distribution', fontsize=14, fontweight='bold')
    
    # Add count labels
    for bar, count in zip(bars, root_cause_counts.values):
        pct = count / len(df) * 100
        ax.text(bar.get_width() + 1, bar.get_y() + bar.get_height()/2, 
                f'{count} ({pct:.1f}%)', va='center', fontsize=9)
    
    ax.set_xlim(0, max(root_cause_counts.values) * 1.2)
    plt.tight_layout()
    
    if EXPORT_CHARTS:
        plt.savefig('charts/root_cause_overall.png', dpi=CHART_DPI, bbox_inches='tight')
    
    plt.show()
    
    print(f"\nTop 5 Root Causes:")
    for cause, count in root_cause_counts.head(5).items():
        print(f"  {cause}: {count} ({count/len(df)*100:.1f}%)")
else:
    print("‚ö†Ô∏è  Root cause data not available.")

In [None]:
"""
Root cause distribution by topic (cross-tab heatmap-style table).
"""
if HAS_ROOT_CAUSE:
    # Cross-tabulation of topic x root_cause_code
    root_cause_by_topic = pd.crosstab(df['topic'], df['root_cause_code'])
    
    # Show as styled table (numeric values)
    print("Topic vs Root Cause Cross-tabulation:")
    print("=" * 80)
    
    # Get top 5 root causes for display
    top_causes = root_cause_counts.head(8).index.tolist()
    display_crosstab = root_cause_by_topic[top_causes] if all(c in root_cause_by_topic.columns for c in top_causes) else root_cause_by_topic.iloc[:, :8]
    
    print(display_crosstab)
    
    # Heatmap visualization
    fig, ax = plt.subplots(figsize=(14, 8))
    
    # Normalize by row (topic) to show percentages
    root_cause_pct = display_crosstab.div(display_crosstab.sum(axis=1), axis=0) * 100
    
    # Create heatmap using imshow
    im = ax.imshow(root_cause_pct.values, cmap='YlOrRd', aspect='auto')
    
    # Set tick labels
    ax.set_xticks(range(len(root_cause_pct.columns)))
    ax.set_xticklabels(root_cause_pct.columns, rotation=45, ha='right', fontsize=8)
    ax.set_yticks(range(len(root_cause_pct.index)))
    ax.set_yticklabels(root_cause_pct.index, fontsize=9)
    
    # Add percentage text
    for i in range(len(root_cause_pct.index)):
        for j in range(len(root_cause_pct.columns)):
            val = root_cause_pct.iloc[i, j]
            if val > 5:  # Only show if > 5%
                text_color = 'white' if val > 30 else 'black'
                ax.text(j, i, f'{val:.0f}%', ha='center', va='center', color=text_color, fontsize=7)
    
    ax.set_title('Root Cause Distribution by Topic (% within topic)', fontsize=14, fontweight='bold')
    plt.colorbar(im, ax=ax, label='Percentage')
    plt.tight_layout()
    
    if EXPORT_CHARTS:
        plt.savefig('charts/root_cause_heatmap.png', dpi=CHART_DPI, bbox_inches='tight')
    
    plt.show()

---
## 7. Handler Actionability

Evaluate how actionable the enriched outputs are for call handlers:
- Summary quality and length
- Action recommendations coverage
- Example records for review

In [None]:
"""
Handler summary quality metrics.
"""
if HAS_HANDLER_SUMMARY:
    df['summary_words'] = df['handler_summary'].fillna('').str.split().str.len()
    df['summary_chars'] = df['handler_summary'].fillna('').str.len()
    
    print("Handler Summary Statistics:")
    print("=" * 50)
    print(f"Average length: {df['summary_words'].mean():.1f} words ({df['summary_chars'].mean():.0f} chars)")
    print(f"Min/Max words: {df['summary_words'].min()} / {df['summary_words'].max()}")
    print(f"Target: ‚â§35 words")
    
    over_limit = (df['summary_words'] > 35).sum()
    print(f"\nSummaries over 35 words: {over_limit} ({over_limit/len(df)*100:.1f}%)")
    
    # Distribution histogram
    fig, ax = plt.subplots(figsize=(10, 5))
    ax.hist(df['summary_words'], bins=20, color='steelblue', edgecolor='white', alpha=0.7)
    ax.axvline(35, color='red', linestyle='--', linewidth=2, label='Target max (35 words)')
    ax.axvline(df['summary_words'].mean(), color='orange', linestyle='--', linewidth=2, label=f'Mean ({df["summary_words"].mean():.0f})')
    ax.set_xlabel('Word Count')
    ax.set_ylabel('Frequency')
    ax.set_title('Handler Summary Length Distribution', fontsize=14, fontweight='bold')
    ax.legend()
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è  Handler summary not available.")

In [None]:
"""
Customer emotion distribution.
"""
if HAS_EMOTION:
    emotion_counts = df['emotion'].value_counts()
    
    # Color map for emotions
    emotion_colors = {
        'calm': '#27ae60',
        'confused': '#3498db', 
        'frustrated': '#f39c12',
        'angry': '#e74c3c',
        'anxious': '#9b59b6',
        'urgent': '#c0392b'
    }
    
    fig, ax = plt.subplots(figsize=(10, 6))
    
    colors = [emotion_colors.get(e, '#95a5a6') for e in emotion_counts.index]
    bars = ax.bar(emotion_counts.index, emotion_counts.values, color=colors, edgecolor='white', linewidth=2)
    
    # Add count labels
    for bar, count in zip(bars, emotion_counts.values):
        pct = count / len(df) * 100
        ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(emotion_counts)*0.02,
                f'{count}\n({pct:.1f}%)', ha='center', fontsize=9)
    
    ax.set_xlabel('Customer Emotion')
    ax.set_ylabel('Count')
    ax.set_title('Customer Emotion Distribution', fontsize=14, fontweight='bold')
    ax.set_ylim(0, max(emotion_counts.values) * 1.2)
    plt.xticks(rotation=0)
    plt.tight_layout()
    
    if EXPORT_CHARTS:
        plt.savefig('charts/emotion_distribution.png', dpi=CHART_DPI, bbox_inches='tight')
    
    plt.show()
else:
    print("‚ö†Ô∏è  Emotion data not available.")

In [None]:
"""
Example records showing actionable enrichment for handlers.
"""
# Select columns to display
display_cols = ['conversation_id', 'topic']

if HAS_HANDLER_SUMMARY:
    display_cols.append('handler_summary')
if HAS_ACTIONS:
    display_cols.append('operational_actions')
if 'risk_level' in df.columns:
    display_cols.append('risk_level')
if HAS_ESCALATION:
    display_cols.append('escalation_required')
if 'escalation_flags' in df.columns:
    display_cols.append('escalation_flags')

# Sample diverse examples
print("Sample Enriched Records for Handler Review:")
print("=" * 100)

# Get one example per topic (up to 5)
sample_df = df.groupby('topic').head(1).head(5)[display_cols]

for _, row in sample_df.iterrows():
    print(f"\n{'='*80}")
    print(f"Conversation ID: {row['conversation_id']}")
    print(f"Topic: {row['topic']}")
    if HAS_HANDLER_SUMMARY:
        print(f"Summary: {row['handler_summary']}")
    if HAS_ACTIONS:
        print(f"Actions: {row['operational_actions']}")
    if 'risk_level' in df.columns:
        print(f"Risk: {row['risk_level']}")
    if HAS_ESCALATION:
        esc = "Yes" if row['escalation_required'] else "No"
        print(f"Escalation: {esc}")
    if 'escalation_flags' in df.columns and row.get('escalation_flags'):
        print(f"Flags: {row['escalation_flags']}")

In [None]:
"""
Examples of conversations requiring escalation.
"""
if HAS_ESCALATION:
    escalated = df[df['escalation_required'] == True]
    
    if len(escalated) > 0:
        print(f"Sample Escalated Conversations ({len(escalated)} total):")
        print("=" * 100)
        
        sample_escalated = escalated.head(3)
        for _, row in sample_escalated.iterrows():
            print(f"\n{'‚îÄ'*80}")
            print(f"ID: {row['conversation_id']} | Topic: {row['topic']}")
            if 'risk_level' in df.columns:
                print(f"Risk Level: {row['risk_level']}")
            if 'escalation_flags' in df.columns:
                print(f"Flags: {row['escalation_flags']}")
            if HAS_HANDLER_SUMMARY:
                print(f"Summary: {row['handler_summary']}")
    else:
        print("No escalated conversations found.")

---
## 8. Model Health Dashboard

In [None]:
"""
Summary dashboard of key model health and operational metrics.
"""
def get_status(value, good, warning, higher_is_better=True):
    """Return status emoji based on thresholds."""
    if higher_is_better:
        return '‚úÖ' if value >= good else ('‚ö†Ô∏è' if value >= warning else '‚ùå')
    else:
        return '‚úÖ' if value <= good else ('‚ö†Ô∏è' if value <= warning else '‚ùå')

# Calculate metrics
high_conf_pct = (df['confidence'] == 'high').mean() * 100
low_conf_pct = (df['confidence'] == 'low').mean() * 100
error_pct = (df['topic'] == 'ERROR').mean() * 100
general_pct = df['topic'].str.contains('General|Multi', case=False, na=False).mean() * 100

dashboard_data = [
    {'Metric': 'Total Conversations', 'Value': f"{len(df):,}", 'Status': '‚úÖ', 'Notes': 'Dataset size'},
    {'Metric': 'Unique Topics', 'Value': f"{df['topic'].nunique()}", 'Status': '‚úÖ', 'Notes': 'Classification labels'},
    {'Metric': 'High Confidence Rate', 'Value': f"{high_conf_pct:.1f}%", 'Status': get_status(high_conf_pct, 70, 50), 'Notes': 'Target: >70%'},
    {'Metric': 'Low Confidence Rate', 'Value': f"{low_conf_pct:.1f}%", 'Status': get_status(low_conf_pct, 10, 20, False), 'Notes': 'Target: <10%'},
    {'Metric': 'Error Rate', 'Value': f"{error_pct:.1f}%", 'Status': get_status(error_pct, 1, 5, False), 'Notes': 'API failures'},
    {'Metric': 'Catch-All Rate', 'Value': f"{general_pct:.1f}%", 'Status': get_status(general_pct, 15, 25, False), 'Notes': 'Target: <20%'},
]

# Add operational metrics if available
if HAS_ESCALATION:
    esc_rate = df['escalation_required'].mean() * 100
    dashboard_data.append({'Metric': 'Escalation Rate', 'Value': f"{esc_rate:.1f}%", 'Status': 'üìä', 'Notes': 'Requires human review'})

if 'risk_level' in df.columns:
    high_risk_pct = (df['risk_level'] == 'high').mean() * 100
    dashboard_data.append({'Metric': 'High Risk Rate', 'Value': f"{high_risk_pct:.1f}%", 'Status': 'üìä', 'Notes': 'High-risk conversations'})

if HAS_ACTIONS:
    action_coverage = (df['num_actions'] > 0).mean() * 100
    dashboard_data.append({'Metric': 'Action Coverage', 'Value': f"{action_coverage:.1f}%", 'Status': get_status(action_coverage, 80, 60), 'Notes': 'Has recommended actions'})

dashboard = pd.DataFrame(dashboard_data)

print("\n" + "=" * 70)
print("üìä MODEL HEALTH DASHBOARD")
print("=" * 70 + "\n")
print(dashboard.to_string(index=False))

---
## 9. Answers to Taxonomy Goals

This section demonstrates how the classifier outputs support key business objectives:

1. **Summarize main drivers of contact** - Understand why customers reach out
2. **Escalate topics to the ops team** - Flag high-risk conversations for human review
3. **Route to specialized AI workflows** - Direct conversations to appropriate handlers

In [None]:
"""
GOAL 1: Summarize the main drivers of contact

The topic distribution provides a clear picture of why customers contact support.
Combined with root cause analysis, we can identify systemic issues.
"""
print("="*80)
print("TAXONOMY GOAL 1: Summarize Main Drivers of Contact")
print("="*80)

print("\nüìä TOP CONTACT DRIVERS (by Topic):")
print("-" * 60)
for i, (topic, count) in enumerate(topic_counts.head(5).items(), 1):
    pct = count / len(df) * 100
    bar = '‚ñà' * int(pct/2)
    print(f"{i}. {topic}")
    print(f"   {bar} {count} ({pct:.1f}%)")

if HAS_ROOT_CAUSE:
    print("\nüîç TOP ROOT CAUSES:")
    print("-" * 60)
    for cause, count in root_cause_counts.head(5).items():
        pct = count / len(df) * 100
        print(f"   ‚Ä¢ {cause}: {count} ({pct:.1f}%)")

print("\nüí° INSIGHT: These topic and root cause distributions enable:")
print("   - Weekly trend reporting on contact drivers")
print("   - Identification of systemic issues for proactive fixes")
print("   - Resource allocation based on topic volume")

In [None]:
"""
GOAL 2: Escalate some topics to the ops team (e.g., Fraud)

The escalation_required flag and escalation_flags enable automatic routing
of high-risk conversations to human agents.
"""
print("="*80)
print("TAXONOMY GOAL 2: Escalate High-Risk Topics to Ops Team")
print("="*80)

if HAS_ESCALATION:
    esc_total = df['escalation_required'].sum()
    esc_rate = esc_total / len(df) * 100
    
    print(f"\nüö® ESCALATION SUMMARY:")
    print(f"   Total escalations: {esc_total} out of {len(df)} ({esc_rate:.1f}%)")
    
    if 'escalation_flags_list' in df.columns:
        all_flags = [flag for flags in df['escalation_flags_list'] for flag in flags]
        if all_flags:
            flag_counts = pd.Series(all_flags).value_counts()
            print("\n‚ö†Ô∏è  ESCALATION TRIGGERS:")
            for flag, count in flag_counts.items():
                print(f"   ‚Ä¢ {flag}: {count}")
    
    # Topics with highest escalation rates
    print("\nüìà TOPICS REQUIRING MOST ESCALATION:")
    esc_by_topic = df.groupby('topic')['escalation_required'].agg(['sum', 'mean'])
    esc_by_topic['rate'] = (esc_by_topic['mean'] * 100).round(1)
    esc_by_topic = esc_by_topic.sort_values('rate', ascending=False)
    
    for topic in esc_by_topic.head(3).index:
        rate = esc_by_topic.loc[topic, 'rate']
        count = int(esc_by_topic.loc[topic, 'sum'])
        print(f"   ‚Ä¢ {topic}: {rate}% escalation rate ({count} cases)")
    
    print("\nüí° INSIGHT: Escalation flags enable:")
    print("   - Automatic routing of fraud/abuse cases to specialized teams")
    print("   - Priority queuing for high-risk conversations")
    print("   - Real-time alerting for critical issues")
else:
    print("\n‚ö†Ô∏è  Escalation data not available in this dataset.")

In [None]:
"""
GOAL 3: Route conversations to different specialized AI workflows/agents

Topic classification combined with operational_actions enables intelligent
routing to specialized bots or human agents.
"""
print("="*80)
print("TAXONOMY GOAL 3: Route to Specialized AI Workflows/Agents")
print("="*80)

print("\nü§ñ PROPOSED ROUTING RULES:")
print("-" * 70)
print(f"{'Topic':<45} {'Workflow':<25}")
print("-" * 70)

for topic in topic_counts.index:
    workflow = routing_rules.get(topic, 'Unassigned')
    count = topic_counts[topic]
    pct = count / len(df) * 100
    print(f"{topic:<45} {workflow:<25} ({pct:.1f}%)")

if HAS_ACTIONS:
    print("\nüîß ACTION-BASED ROUTING INSIGHTS:")
    # Find which actions are most common for each potential workflow
    
    # Auth-related actions
    auth_actions = ['reset_password_or_otp', 'resend_otp_or_verification', 'reactivate_account']
    auth_count = sum(1 for actions in df['operational_actions_list'] for a in actions if a in auth_actions)
    
    # Order-related actions  
    order_actions = ['check_order_status', 'provide_tracking_link_or_update', 'cancel_order']
    order_count = sum(1 for actions in df['operational_actions_list'] for a in actions if a in order_actions)
    
    # Returns-related actions
    returns_actions = ['initiate_return', 'initiate_refund', 'initiate_exchange_replacement']
    returns_count = sum(1 for actions in df['operational_actions_list'] for a in actions if a in returns_actions)
    
    print(f"   ‚Ä¢ Auth Support Bot: {auth_count} conversations with auth actions")
    print(f"   ‚Ä¢ Order Management Bot: {order_count} conversations with order actions")
    print(f"   ‚Ä¢ Returns Specialist: {returns_count} conversations with returns actions")

print("\nüí° INSIGHT: Topic + action-based routing enables:")
print("   - First-contact resolution by specialized AI bots")
print("   - Reduced handling time through pre-filled action recommendations")
print("   - Seamless human handoff with full context when needed")

In [None]:
"""
Final summary: How the enriched classifier outputs support business goals.
"""
print("\n" + "=" * 80)
print("üìã SUMMARY: Classifier Outputs ‚Üí Business Value")
print("=" * 80)

summary_table = """
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ Output Field           ‚îÇ Business Application                               ‚îÇ
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îº‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ topic                  ‚îÇ Primary routing, analytics, trend reporting        ‚îÇ
‚îÇ confidence             ‚îÇ Quality monitoring, human review triggers          ‚îÇ
‚îÇ handler_summary        ‚îÇ Agent briefing, quick context for handlers         ‚îÇ
‚îÇ emotion                ‚îÇ Prioritization, tone adaptation for bots           ‚îÇ
‚îÇ difficulty             ‚îÇ Workload balancing, SLA management                 ‚îÇ
‚îÇ operational_actions    ‚îÇ Action suggestions, bot automation scripts         ‚îÇ
‚îÇ risk_level             ‚îÇ Priority queuing, resource allocation              ‚îÇ
‚îÇ escalation_required    ‚îÇ Automatic escalation to human agents               ‚îÇ
‚îÇ escalation_flags       ‚îÇ Specialized team routing (fraud, legal, VIP)       ‚îÇ
‚îÇ root_cause_code        ‚îÇ Systemic issue detection, product feedback         ‚îÇ
‚îÇ root_cause_detail      ‚îÇ Specific issue context for resolution              ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¥‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
"""
print(summary_table)

print("\n‚úÖ The enriched classifier enables:")
print("   1. Data-driven understanding of contact drivers")
print("   2. Automatic escalation of high-risk conversations")
print("   3. Intelligent routing to specialized AI workflows")
print("   4. Actionable insights for handlers and operations teams")

In [None]:
print("\n" + "=" * 70)
print("‚úÖ Analysis complete!")
print("=" * 70)
print("\nNext steps:")
print("1. Review low-confidence samples for taxonomy refinement")
print("2. Validate escalation flags against actual outcomes")
print("3. Implement routing rules in production system")
print("4. Set up drift monitoring for topic/escalation rate shifts")
print("5. Build handler feedback loop for summary quality")