### Detect Data Drift in ML Models
**Objective**: Monitor and detect changes in data distributions that impact ML model performance.

**Task**: Feature Correlation Drift

**Steps**:
1. Compute the correlation matrix of features in your training dataset.
2. Compute the correlation matrix of the same features in your production data.
3. Assess changes in the correlation matrix over time to identify any significant deviations.
4. Investigate any significant changes in correlation as they may indicate issues in the data collection process or model assumptions.

### Detect Data Drift in ML Models
**Objective**: Monitor and detect changes in data distributions that impact ML model performance.

**Task**: Feature Correlation Drift

**Steps**:
1. Compute the correlation matrix of features in your training dataset.
2. Compute the correlation matrix of the same features in your production data.
3. Assess changes in the correlation matrix over time to identify any significant deviations.
4. Investigate any significant changes in correlation as they may indicate issues in the data collection process or model assumptions.

In [None]:
# write your code from here
import pandas as pd
import numpy as np

# Step 1: Create example training dataset
train_data = {
    'feature1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'feature2': [10, 9, 8, 7, 6, 5, 4, 3, 2, 1],
    'feature3': [5, 3, 6, 2, 7, 1, 8, 0, 9, -1]
}
train_df = pd.DataFrame(train_data)

# Step 2: Create example production dataset with some changes in correlation
prod_data = {
    'feature1': [1, 3, 5, 7, 9, 11, 13, 15, 17, 19],
    'feature2': [20, 18, 16, 14, 12, 10, 8, 6, 4, 2],
    'feature3': [2, 5, 1, 7, 0, 9, -1, 11, -2, 13]
}
prod_df = pd.DataFrame(prod_data)

# Step 3: Compute correlation matrices
train_corr = train_df.corr()
prod_corr = prod_df.corr()

print("Training Data Correlation Matrix:")
print(train_corr)

print("\nProduction Data Correlation Matrix:")
print(prod_corr)

# Step 4: Calculate difference between correlation matrices
corr_diff = (train_corr - prod_corr).abs()

print("\nAbsolute difference between correlation matrices:")
print(corr_diff)

# Step 5: Set threshold for significant drift (e.g., 0.3)
threshold = 0.3

# Identify pairs of features with correlation drift exceeding threshold
drifted_pairs = []

for i in corr_diff.index:
    for j in corr_diff.columns:
        if i != j and corr_diff.loc[i, j] > threshold:
            drifted_pairs.append((i, j, corr_diff.loc[i, j]))

if drifted_pairs:
    print("\nSignificant correlation drift detected between feature pairs:")
    for f1, f2, diff_val in drifted_pairs:
        print(f" - {f1} and {f2}: drift = {diff_val:.3f}")
else:
    print("\nNo significant correlation drift detected.")


