## Using AI for Anomalies Detection in Data Quality
**Description**: Implement an AI-based approach to detect anomalies in data quality.

**Steps**:
1. Use an Anomaly Detection Algorithm:
    - Use sklearn's Isolation Forest for anomaly detection.

**Example data:**

data = np.array([[25, 50000], [30, 60000], [35, 75000], [40, None], [45, 100000]])

2. Integrate with Great Expectations:
    - Generate alerts if anomalies are detected:

In [1]:
# Write your code from here
import numpy as np
import pandas as pd
import great_expectations as ge
from sklearn.ensemble import IsolationForest

# Sample data with a missing value (None will be converted to np.nan)
data = np.array([
    [25, 50000],
    [30, 60000],
    [35, 75000],
    [40, np.nan],    # Missing income
    [45, 100000],
    [120, 500],      # Possible anomaly (high age but very low income)
])

# Convert to DataFrame for easier handling
df = pd.DataFrame(data, columns=["age", "income"])

# Handle missing values before anomaly detection (simple fill or drop)
df_filled = df.fillna(df.median())  # Fill NaNs with median values

# Isolation Forest model for anomaly detection
iso_forest = IsolationForest(contamination=0.2, random_state=42)
iso_forest.fit(df_filled)

# Predict anomalies: -1 means anomaly, 1 means normal
df['anomaly'] = iso_forest.predict(df_filled)

# Prepare Great Expectations dataframe
ge_df = ge.from_pandas(df)

# Expectations: no nulls (we already handled them), age and income within reasonable ranges
ge_df.expect_column_values_to_be_between("age", min_value=0, max_value=120)
ge_df.expect_column_values_to_be_between("income", min_value=0)

# Validate
validation_result = ge_df.validate()

# Check anomalies from Isolation Forest
anomalies = df[df['anomaly'] == -1]

print("\n=== Anomaly Detection Results ===")
if not anomalies.empty:
    print(f"Anomalies detected:\n{anomalies}")
else:
    print("No anomalies detected.")

print("\n=== Great Expectations Validation Results ===")
if validation_result["success"]:
    print("Data validation passed!")
else:
    print("Data validation failed!")
    for res in validation_result["results"]:
        if not res["success"]:
            print(f"Failed expectation: {res['expectation_config']['expectation_type']}")
            print(f"Details: {res['result']}")

# Alert if anomalies or validation failures exist
if not anomalies.empty or not validation_result["success"]:
    print("\n*** ALERT: Data quality issues detected! ***")
else:
    print("\nData quality checks passed successfully.")


AttributeError: module 'great_expectations' has no attribute 'from_pandas'