# ðŸ§ª Data Drift Simulation Experiment

**Objective:** Engineer a synthetic data drift scenario to validate the `ModelGuard` detection engine.
**Dataset:** California Housing Prices.
**Methodology:** Introduce *Concept Drift* by segregating data based on property value (simulating inflation over time).

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

# 1. Load Source Data
housing = fetch_california_housing(as_frame=True)
df = housing.frame
print(f"Dataset Loaded: {df.shape[0]} rows, {df.shape[1]} columns")

### 2. Engineering the Drift
We split the dataset into two distinct time periods:
*   **Reference (Training):** Properties with Median Value < 2.0 (Cheaper/Older)
*   **Current (Production):** Properties with Median Value >= 2.0 (Expensive/Inflation)

This guarantees a shift in the target variable and correlated features.

In [2]:
# Split Logic
ref_df = df[df['MedHouseVal'] < 2.0].sample(2000, random_state=42)
curr_df = df[df['MedHouseVal'] >= 2.0].sample(2000, random_state=42)

print(f"Reference Set (Low Value): {ref_df.shape}")
print(f"Current Set (High Value): {curr_df.shape}")

### 3. Visualizing the Distribution Shift
We use KDE (Kernel Density Estimation) plots to prove the distributions have diverged.

In [3]:
plt.figure(figsize=(10, 6))
sns.set_style("darkgrid")

# Plot Target Drift
sns.kdeplot(ref_df['MedHouseVal'], label='Reference (Training)', fill=True, color='blue', alpha=0.3)
sns.kdeplot(curr_df['MedHouseVal'], label='Current (Production)', fill=True, color='red', alpha=0.3)

plt.title("Detected Concept Drift: Median House Value", fontsize=14)
plt.xlabel("Median Value ($100k)")
plt.legend()
plt.show()

In [4]:
# Plot Feature Drift (Correlated Feature)
plt.figure(figsize=(10, 6))

sns.kdeplot(ref_df['MedInc'], label='Reference Income', fill=True, color='green', alpha=0.3)
sns.kdeplot(curr_df['MedInc'], label='Current Income', fill=True, color='orange', alpha=0.3)

plt.title("Covariate Shift: Median Income", fontsize=14)
plt.xlabel("Income")
plt.legend()
plt.show()