# üîç DriftWatch - Complete Drift Detection Tutorial

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/VincentCotella/DriftWatch/blob/main/examples/notebooks/drift_detection_tutorial.ipynb)

This notebook demonstrates how to use **DriftWatch** for ML drift monitoring, including:

1. **Basic Drift Detection** - Detect distribution shifts between reference and production data
2. **Drift Explanation** - Understand *why* drift was detected with detailed statistics
3. **Visualization** - Visualize distribution shifts with histogram overlays

> **DriftWatch v0.3.0** - Lightweight ML drift monitoring, built for real-world pipelines.

## üì¶ Installation

In [None]:
# Install DriftWatch with visualization support
!pip install -q driftwatch[viz]

## 1Ô∏è‚É£ Generate Sample Data

We'll create synthetic datasets:
- **Reference (Training)** - Our baseline distribution
- **Production** - New data with potential drift

In [None]:
import numpy as np
import pandas as pd

np.random.seed(42)

# Reference data (Training distribution)
reference_df = pd.DataFrame({
    'age': np.random.normal(30, 5, 1000),
    'income': np.random.normal(50000, 10000, 1000),
    'risk_score': np.random.beta(2, 5, 1000),
    'credit_score': np.random.normal(700, 50, 1000),
})

# Production data WITH DRIFT
# - age: Mean shifted from 30 to 45 (significant drift!)
# - income: Same distribution (no drift)
# - risk_score: Slight shift (minor drift)
# - credit_score: Same distribution (no drift)
production_df = pd.DataFrame({
    'age': np.random.normal(45, 5, 1000),  # üî¥ DRIFT: Mean 30 ‚Üí 45
    'income': np.random.normal(50000, 10000, 1000),  # ‚úÖ No drift
    'risk_score': np.random.beta(2.5, 5, 1000),  # ‚ö†Ô∏è Slight drift
    'credit_score': np.random.normal(700, 50, 1000),  # ‚úÖ No drift
})

print("üìä Reference Data:")
print(reference_df.describe().round(2))
print("\nüìä Production Data:")
print(production_df.describe().round(2))

## 2Ô∏è‚É£ Basic Drift Detection

Use `Monitor` to detect drift between reference and production data.

In [None]:
from driftwatch import Monitor

# Initialize monitor with reference data
monitor = Monitor(
    reference_data=reference_df,
    features=['age', 'income', 'risk_score', 'credit_score'],
)

# Check production data for drift
report = monitor.check(production_df)

# Display results
print("‚ö†Ô∏è Drift Detected:", report.has_drift())
print(f"üìà Drift Ratio: {report.drift_ratio()*100:.1f}%")
print(f"üö¶ Status: {report.status.value}")
print(f"üìã Drifted Features: {report.drifted_features()}")
print("\n" + "-" * 50)

# Per-feature results
for feature in report.feature_results:
    status = "üî¥ DRIFT" if feature.has_drift else "‚úÖ OK"
    print(f"{feature.feature_name.ljust(15)}: {status} (Score {feature.method}: {feature.score:.4f})")

## 3Ô∏è‚É£ Drift Explanation (v0.3.0+)

Use `DriftExplainer` to understand *why* drift was detected.

Get detailed statistics:
- Mean shift (absolute and percentage)
- Standard deviation change
- Quantile differences (Q25, Q50, Q75)
- Min/Max changes

In [None]:
from driftwatch.explain import DriftExplainer

# Create explainer
explainer = DriftExplainer(
    reference_data=reference_df,
    production_data=production_df,
    report=report,
)

# Get full explanation
explanation = explainer.explain()

# Display summary
print(explanation.summary())

### Analyze a Specific Feature

In [None]:
# Get explanation for age (drifted feature)
age_exp = explanation['age']

print("üìä AGE Feature Analysis")
print("=" * 40)
print(f"\nüî¥ Drift Detected: {age_exp.has_drift}")
print(f"üìà Drift Score ({age_exp.drift_method}): {age_exp.drift_score:.4f}")
print("\nüìê Central Tendency:")
print(f"   Reference Mean: {age_exp.ref_mean:.2f}")
print(f"   Production Mean: {age_exp.prod_mean:.2f}")
print(f"   Mean Shift: {age_exp.mean_shift:+.2f} ({age_exp.mean_shift_percent:+.1f}%)")
print("\nüìä Spread:")
print(f"   Reference Std: {age_exp.ref_std:.2f}")
print(f"   Production Std: {age_exp.prod_std:.2f}")
print(f"   Std Change: {age_exp.std_change:+.2f} ({age_exp.std_change_percent:+.1f}%)")
print("\nüìè Range:")
print(f"   Reference: [{age_exp.ref_min:.2f}, {age_exp.ref_max:.2f}]")
print(f"   Production: [{age_exp.prod_min:.2f}, {age_exp.prod_max:.2f}]")

## 4Ô∏è‚É£ Visualization (v0.3.0+)

Use `DriftVisualizer` to create histogram overlays comparing distributions.

In [None]:
import matplotlib.pyplot as plt

from driftwatch.explain import DriftVisualizer

# Create visualizer
viz = DriftVisualizer(
    reference_data=reference_df,
    production_data=production_df,
    report=report,
)

# Plot single feature (age - with drift)
fig = viz.plot_feature('age')
plt.show()

In [None]:
# Compare: Plot a feature WITHOUT drift
fig = viz.plot_feature('income')
plt.show()

In [None]:
# Plot ALL features in a grid
fig = viz.plot_all(cols=2)
plt.show()

### Save Visualization

In [None]:
# Save to file
viz.save('drift_report.png', dpi=150)
print("‚úÖ Saved drift_report.png")

# Save single feature
viz.save('age_drift.png', feature_name='age', dpi=150)
print("‚úÖ Saved age_drift.png")

## 5Ô∏è‚É£ Export to JSON

Export the explanation for further analysis or logging.

In [None]:
import json

# Export explanation to dict
explanation_dict = explanation.to_dict()

# Pretty print
print(json.dumps(explanation_dict, indent=2, default=str))

## 6Ô∏è‚É£ Different Drift Detection Methods

DriftWatch supports multiple statistical tests. You can configure thresholds.

In [None]:
# Custom thresholds
monitor_custom = Monitor(
    reference_data=reference_df,
    thresholds={
        'psi': 0.1,  # More sensitive (default: 0.2)
        'ks_pvalue': 0.01,  # Stricter p-value (default: 0.05)
        'wasserstein': 0.05,  # More sensitive
    }
)

report_custom = monitor_custom.check(production_df)
print("With stricter thresholds:")
print(f"Drifted Features: {report_custom.drifted_features()}")
print(f"Drift Ratio: {report_custom.drift_ratio()*100:.1f}%")

## üéØ Summary

In this tutorial, you learned how to:

1. **Detect drift** using `Monitor.check()`
2. **Explain drift** using `DriftExplainer` with detailed statistics
3. **Visualize drift** using `DriftVisualizer` with histogram overlays
4. **Export results** to JSON for logging/analysis

### Next Steps

- üìñ [Full Documentation](https://vincentcotella.github.io/DriftWatch/)
- üîå [FastAPI Integration](https://vincentcotella.github.io/DriftWatch/integrations/fastapi/)
- üîî [Slack Alerting](https://vincentcotella.github.io/DriftWatch/integrations/slack-alerting/)
- üíª [CLI Usage](https://vincentcotella.github.io/DriftWatch/cli/)

---

‚≠ê If you found DriftWatch useful, please star us on [GitHub](https://github.com/VincentCotella/DriftWatch)!