### Detect Data Drift in ML Models
**Objective**: Monitor and detect changes in data distributions that impact ML model performance.

**Task**: Categorical Feature Drift

**Steps**:
1. Load the baseline distribution for a categorical feature (e.g., gender ) from your training dataset.
2. Load the same feature from your current production data.
3. Use chi-squared tests to compare the distributions of the categorical feature.
4. Step 4: If significant drift is detected, investigate the cause and update the model as needed.

In [2]:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency

# Step 1: Load baseline (training) data with 'gender'
baseline_data = pd.DataFrame({
    'gender': np.random.choice(['Male', 'Female'], size=1000, p=[0.6, 0.4])
})

# Step 2: Load production data with same feature
production_data = pd.DataFrame({
    'gender': np.random.choice(['Male', 'Female'], size=1000, p=[0.45, 0.55])  # Simulated drift
})

# Step 3: Compute frequency counts
baseline_counts = baseline_data['gender'].value_counts().sort_index()
production_counts = production_data['gender'].value_counts().sort_index()

# Align categories across both distributions
all_categories = sorted(set(baseline_counts.index).union(set(production_counts.index)))
baseline_freq = [baseline_counts.get(cat, 0) for cat in all_categories]
production_freq = [production_counts.get(cat, 0) for cat in all_categories]

# Step 4: Build contingency table and apply Chi-squared test
contingency_table = [baseline_freq, production_freq]
chi2, p_value, dof, expected = chi2_contingency(contingency_table)

# Step 5: Results
print(f"Chi-squared Statistic: {chi2:.4f}")
print(f"P-value: {p_value:.4f}")

if p_value < 0.05:
    print("⚠️ Drift detected in categorical feature distribution (statistically significant).")
else:
    print("✅ No significant drift detected in categorical feature distribution.")

Chi-squared Statistic: 37.6924
P-value: 0.0000
⚠️ Drift detected in categorical feature distribution (statistically significant).
