## Using AI for Anomalies Detection in Data Quality
**Description**: Implement an AI-based approach to detect anomalies in data quality.

**Steps**:
1. Use an Anomaly Detection Algorithm:
    - Use sklearn's Isolation Forest for anomaly detection.

**Example data:**

data = np.array([[25, 50000], [30, 60000], [35, 75000], [40, None], [45, 100000]])

2. Integrate with Great Expectations:
    - Generate alerts if anomalies are detected:

In [4]:
import pandas as pd
import numpy as np
from sklearn.ensemble import IsolationForest

import great_expectations as ge
from great_expectations.core.batch import Batch
from great_expectations.data_context import BaseDataContext
from great_expectations.validator.validator import Validator
from great_expectations.execution_engine import PandasExecutionEngine

# 1. Create your dataframe
data = np.array([
    [25, 50000],
    [30, 60000],
    [35, 75000],
    [40, np.nan],
    [45, 100000]
])
df = pd.DataFrame(data, columns=['age', 'income'])

# 2. Fill missing values before anomaly detection
df_filled = df.fillna(df.median())

# 3. Fit Isolation Forest to detect anomalies
iso_forest = IsolationForest(contamination=0.2, random_state=42)
iso_forest.fit(df_filled)
df['anomaly'] = iso_forest.predict(df_filled)  # -1: anomaly, 1: normal

print("Anomalies detected:")
print(df[['age', 'income', 'anomaly']])

# 4. Create a Batch object with your DataFrame for validation
batch = Batch(data=df)

# 5. Setup GE validator with PandasExecutionEngine and batch list
execution_engine = PandasExecutionEngine()
validator = Validator(
    execution_engine=execution_engine,
    batches=[batch]
)

# 6. Define expectation: anomaly column values should NOT be in set [-1] mostly 0.6 (allow some anomalies)
expectation_result = validator.expect_column_values_to_not_be_in_set(
    column='anomaly',
    value_set=[-1],
    mostly=0.6
)

print("\nGreat Expectations Validation Result:")
print(expectation_result)

if expectation_result.success:
    print("Data quality check passed: anomalies within acceptable limits.")
else:
    print("Alert: Too many anomalies detected!")


ImportError: cannot import name 'BaseDataContext' from 'great_expectations.data_context' (/home/vscode/.local/lib/python3.10/site-packages/great_expectations/data_context/__init__.py)