## Using AI for Anomalies Detection in Data Quality
**Description**: Implement an AI-based approach to detect anomalies in data quality.

**Steps**:
1. Use an Anomaly Detection Algorithm:
    - Use sklearn's Isolation Forest for anomaly detection.

**Example data:**

data = np.array([[25, 50000], [30, 60000], [35, 75000], [40, None], [45, 100000]])

2. Integrate with Great Expectations:
    - Generate alerts if anomalies are detected:

In [1]:
import numpy as np
import pandas as pd
from sklearn.ensemble import IsolationForest
import great_expectations as ge

# Sample data with potential anomaly (missing income represented as np.nan)
data = np.array([
    [25, 50000],
    [30, 60000],
    [35, 75000],
    [40, np.nan],    # Missing income, potential anomaly
    [45, 100000]
])

# Convert to DataFrame
df = pd.DataFrame(data, columns=['age', 'income'])

# Replace None or np.nan with a placeholder for IsolationForest (it doesn't handle NaNs)
# One approach: fill missing income with median or a sentinel value (e.g., -1)
income_median = df['income'].median()
df['income_filled'] = df['income'].fillna(income_median)

# Prepare data for anomaly detection (using age and filled income)
X = df[['age', 'income_filled']]

# Initialize Isolation Forest
iso_forest = IsolationForest(contamination=0.2, random_state=42)

# Fit model and predict anomalies (-1 means anomaly, 1 means normal)
df['anomaly'] = iso_forest.fit_predict(X)

# Show detected anomalies
print("Detected anomalies:")
print(df[df['anomaly'] == -1])

# Integrate with Great Expectations to validate 'income' non-null and alert on anomalies

ge_df = ge.from_pandas(df)

# Expect no nulls in 'income'
validation_result = ge_df.expect_column_values_to_not_be_null('income')

if not validation_result['success']:
    print("ALERT: Null values detected in 'income' column!")

# Alert if anomaly detected by Isolation Forest
if df['anomaly'].eq(-1).any():
    print("ALERT: Anomalies detected in data based on Isolation Forest!")



ModuleNotFoundError: No module named 'great_expectations'