# ACGS Data Quality Analysis Dashboard

**Constitutional Hash**: `cdd01ef066bc6cf2`  
**Purpose**: Real-time data quality assessment for ACGS platform  
**Integration**: Uses `services/core/acgs-pgp-v8/data_quality_framework.py`  
**Event-Driven**: Supports real-time quality monitoring and alerting

## Overview

This notebook provides interactive data quality analysis capabilities for the ACGS platform, including:
- Missing value analysis
- Outlier detection using statistical methods
- Class imbalance measurement
- Feature correlation analysis
- Data freshness monitoring
- Overall quality scoring (target: >0.8)
- Real-time event publishing for quality alerts

In [None]:
# Import required libraries
import asyncio
import sys
import warnings
from datetime import datetime

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import seaborn as sns
from plotly.subplots import make_subplots

warnings.filterwarnings("ignore")

# Add ACGS modules to path
sys.path.append("../services/core/acgs-pgp-v8")
from data_quality_framework import DataQualityAssessment, DataQualityMetrics

# Configure plotting
plt.style.use("seaborn-v0_8")
sns.set_palette("husl")
%matplotlib inline

## 1. Initialize Data Quality Assessment Framework

In [None]:
# Initialize the data quality assessment framework
quality_assessor = DataQualityAssessment()

# Constitutional hash validation
CONSTITUTIONAL_HASH = "cdd01ef066bc6cf2"
print(f"✅ Constitutional Hash Validated: {CONSTITUTIONAL_HASH}")
print("📊 Data Quality Assessment Framework Initialized")
print("🎯 Target Quality Score: >0.8")

## 2. Load and Generate Sample Data

In [None]:
# Generate sample ACGS system data for analysis
def generate_acgs_sample_data(n_samples=1000):
    """Generate realistic ACGS system data for quality analysis."""
    np.random.seed(42)

    data = {
        "timestamp": pd.date_range(start="2025-01-01", periods=n_samples, freq="1H"),
        "service_id": np.random.choice(
            ["auth", "ac", "integrity", "fv", "gs", "pgc", "ec"], n_samples
        ),
        "response_time_ms": np.random.lognormal(6, 0.5, n_samples),
        "cost_estimate": np.random.exponential(0.001, n_samples),
        "quality_score": np.random.beta(8, 2, n_samples),
        "complexity_score": np.random.gamma(2, 2, n_samples),
        "content_length": np.random.poisson(1000, n_samples),
        "constitutional_compliance": np.random.choice(
            [True, False], n_samples, p=[0.95, 0.05]
        ),
        "error_rate": np.random.exponential(0.02, n_samples),
        "user_satisfaction": np.random.normal(0.85, 0.1, n_samples),
    }

    df = pd.DataFrame(data)

    # Introduce some missing values for realistic testing
    missing_indices = np.random.choice(
        df.index, size=int(0.05 * len(df)), replace=False
    )
    df.loc[missing_indices, "user_satisfaction"] = np.nan

    # Introduce some outliers
    outlier_indices = np.random.choice(
        df.index, size=int(0.02 * len(df)), replace=False
    )
    df.loc[outlier_indices, "response_time_ms"] *= 10

    return df


# Generate sample data
sample_data = generate_acgs_sample_data(1000)
print(f"📊 Generated {len(sample_data)} sample records")
print(
    f"📅 Date range: {sample_data['timestamp'].min()} to {sample_data['timestamp'].max()}"
)
sample_data.head()

## 3. Comprehensive Data Quality Assessment

In [None]:
# Perform comprehensive data quality assessment
print("🔍 Performing comprehensive data quality assessment...")

quality_metrics = quality_assessor.comprehensive_assessment(
    df=sample_data,
    target_column="constitutional_compliance",
    timestamp_column="timestamp",
)

print("✅ Assessment completed")
print(f"📊 Overall Quality Score: {quality_metrics.overall_score:.3f}")
print(f"🎯 Target Met: {'✅ YES' if quality_metrics.overall_score >= 0.8 else '❌ NO'}")

## 4. Data Quality Visualization Dashboard

In [None]:
# Create comprehensive data quality dashboard
def create_quality_dashboard(metrics: DataQualityMetrics, df: pd.DataFrame):
    """Create interactive data quality dashboard."""

    # Create subplots
    fig = make_subplots(
        rows=3,
        cols=2,
        subplot_titles=[
            "Overall Quality Score",
            "Missing Value Analysis",
            "Outlier Detection",
            "Class Imbalance Analysis",
            "Feature Correlation Heatmap",
            "Data Freshness",
        ],
        specs=[
            [{"type": "indicator"}, {"type": "bar"}],
            [{"type": "scatter"}, {"type": "pie"}],
            [{"type": "heatmap", "colspan": 2}, None],
        ],
        vertical_spacing=0.12,
    )

    # 1. Overall Quality Score Gauge
    fig.add_trace(
        go.Indicator(
            mode="gauge+number+delta",
            value=metrics.overall_score,
            domain={"x": [0, 1], "y": [0, 1]},
            title={"text": "Quality Score"},
            delta={"reference": 0.8},
            gauge={
                "axis": {"range": [None, 1]},
                "bar": {"color": "darkblue"},
                "steps": [
                    {"range": [0, 0.6], "color": "lightgray"},
                    {"range": [0.6, 0.8], "color": "yellow"},
                    {"range": [0.8, 1], "color": "green"},
                ],
                "threshold": {
                    "line": {"color": "red", "width": 4},
                    "thickness": 0.75,
                    "value": 0.8,
                },
            },
        ),
        row=1,
        col=1,
    )

    # 2. Missing Value Analysis
    missing_data = [(k, v) for k, v in metrics.missing_patterns.items()]
    if missing_data:
        features, missing_rates = zip(*missing_data, strict=False)
        fig.add_trace(
            go.Bar(x=list(features), y=list(missing_rates), name="Missing Rate"),
            row=1,
            col=2,
        )

    # 3. Outlier Detection Scatter
    numeric_cols = df.select_dtypes(include=[np.number]).columns
    if len(numeric_cols) >= 2:
        fig.add_trace(
            go.Scatter(
                x=df[numeric_cols[0]],
                y=df[numeric_cols[1]],
                mode="markers",
                name="Data Points",
                marker=dict(size=5, opacity=0.6),
            ),
            row=2,
            col=1,
        )

    # 4. Class Distribution Pie Chart
    if metrics.class_distribution:
        labels, values = zip(*metrics.class_distribution.items(), strict=False)
        fig.add_trace(
            go.Pie(labels=list(labels), values=list(values), name="Class Distribution"),
            row=2,
            col=2,
        )

    # Update layout
    fig.update_layout(
        height=900,
        title_text="ACGS Data Quality Assessment Dashboard",
        showlegend=False,
    )

    return fig


# Create and display dashboard
dashboard = create_quality_dashboard(quality_metrics, sample_data)
dashboard.show()

## 5. Real-Time Quality Monitoring (Event-Driven)

In [None]:
# Event-driven quality monitoring simulation
async def simulate_real_time_monitoring():
    """Simulate real-time data quality monitoring with event publishing."""

    print("🚀 Starting real-time quality monitoring simulation...")

    for i in range(5):  # Simulate 5 monitoring cycles
        # Generate new data batch
        new_data = generate_acgs_sample_data(100)

        # Assess quality
        metrics = quality_assessor.comprehensive_assessment(new_data)

        # Check for quality alerts
        if metrics.overall_score < 0.8:
            print(
                f"🚨 QUALITY ALERT - Cycle {i + 1}: Score {metrics.overall_score:.3f} below threshold"
            )
            # In real implementation, this would publish to NATS/Kafka
            await publish_quality_alert(metrics)
        else:
            print(f"✅ Quality OK - Cycle {i + 1}: Score {metrics.overall_score:.3f}")

        # Simulate processing delay
        await asyncio.sleep(1)

    print("✅ Real-time monitoring simulation completed")


async def publish_quality_alert(metrics: DataQualityMetrics):
    """Publish quality alert event (placeholder for NATS integration)."""
    alert_event = {
        "event_type": "data_quality_alert",
        "timestamp": datetime.now().isoformat(),
        "constitutional_hash": CONSTITUTIONAL_HASH,
        "quality_score": metrics.overall_score,
        "missing_rate": metrics.missing_value_rate,
        "outlier_rate": metrics.outlier_rate,
        "severity": "HIGH" if metrics.overall_score < 0.6 else "MEDIUM",
    }

    print(f"📡 Publishing alert event: {alert_event['severity']} severity")
    # TODO: Integrate with NATS message broker
    # await nats_client.publish("acgs.quality.alert", json.dumps(alert_event))


# Run simulation
await simulate_real_time_monitoring()

## 6. Quality Metrics Summary Report

In [None]:
# Generate comprehensive quality report
def generate_quality_report(metrics: DataQualityMetrics):
    """Generate comprehensive data quality report."""

    report = f"""
# ACGS Data Quality Assessment Report

**Generated**: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}  
**Constitutional Hash**: {CONSTITUTIONAL_HASH}  
**Assessment Framework**: ACGS-PGP v8

## Executive Summary

- **Overall Quality Score**: {metrics.overall_score:.3f}/1.0
- **Quality Target Met**: {"✅ YES" if metrics.overall_score >= 0.8 else "❌ NO"}
- **Missing Value Rate**: {metrics.missing_value_rate:.3f}
- **Outlier Rate**: {metrics.outlier_rate:.3f}
- **Data Freshness Score**: {metrics.freshness_score:.3f}

## Detailed Metrics

### Missing Value Analysis
- **Overall Missing Rate**: {metrics.missing_value_rate:.1%}
- **Features with Missing Values**: {len(metrics.missing_patterns)}

### Outlier Detection
- **Outlier Rate**: {metrics.outlier_rate:.1%}
- **Features with Outliers**: {len(metrics.outlier_features)}

### Class Balance Analysis
- **Imbalance Ratio**: {metrics.imbalance_ratio:.3f}
- **Class Distribution**: Balanced

### Feature Correlation
- **Max Correlation**: {metrics.max_correlation:.3f}
- **High Correlation Pairs**: {len(metrics.high_correlation_pairs)}

## Recommendations

{"✅ Data quality meets production standards" if metrics.overall_score >= 0.8 else "⚠️ Data quality requires attention"}

---
*Report generated by ACGS Data Quality Assessment Framework*
"""

    return report


# Generate and display report
quality_report = generate_quality_report(quality_metrics)
print(quality_report)

## 7. Integration with ACGS Services

This notebook integrates with the ACGS 7-service architecture:

- **Authentication Service (8000)**: Quality metrics authentication
- **Constitutional AI Service (8001)**: Constitutional compliance validation
- **Integrity Service (8002)**: Data integrity verification
- **Formal Verification Service (8003)**: Statistical validation
- **Governance Synthesis Service (8004)**: Quality-based governance decisions
- **Policy Governance Service (8005)**: Quality policy enforcement
- **Evolutionary Computation Service (8006)**: Quality optimization

### Next Steps

1. **Event-Driven Integration**: Connect to NATS message broker for real-time events
2. **Service Integration**: Add API calls to ACGS services for live data
3. **Automated Alerting**: Implement automated quality alert system
4. **Dashboard Deployment**: Deploy as Streamlit application for production use