### Task 1: Automated Data Profiling

**Steps**:
1. Using Pandas-Profiling
    - Generate a profile report for an existing CSV file.
    - Customize the profile report to include correlations.
    - Profile a specific subset of columns.
2. Using Great Expectations
    - Create a basic expectation suite for your data.
    - Validate data against an expectation suite.
    - Add multiple expectations to a suite.

In [2]:
# Write your code from here
import pandas as pd
from pandas_profiling import ProfileReport
from great_expectations.data_context import DataContext
import tempfile
import os

def profile_csv_with_pandas_profiling(file_path, subset_columns=None):
    """
    Generates a Pandas-Profiling report for a CSV file.

    Args:
        file_path (str): Path to the CSV file.
        subset_columns (list, optional): List of columns to profile. If None, profiles all.
    """
    try:
        df = pd.read_csv(file_path)
        if subset_columns:
            try:
                df = df[subset_columns]
            except KeyError as e:
                print(f"Error: Column not found: {e}")
                return

        profile = ProfileReport(df, title="Pandas Profiling Report", correlations={"pearson": {"calculate": True}, "spearman": {"calculate": True}, "kendall": {"calculate": True}})
        profile.to_file("pandas_profiling_report.html")
        print("Pandas Profiling report generated successfully at pandas_profiling_report.html")
    except FileNotFoundError:
        print(f"Error: File not found at {file_path}")
    except Exception as e:
        print(f"An error occurred: {e}")

def create_gx_context():
    """
    Creates a Great Expectations Data Context in a temporary directory.

    Returns:
        DataContext: The created Data Context.
    """
    context_path = tempfile.mkdtemp()
    context = DataContext.create(project_dir=context_path)
    return context

def create_expectation_suite(context, suite_name="my_expectation_suite"):
    """
    Creates a Great Expectations Expectation Suite.

    Args:
        context (DataContext): The Data Context to use.
        suite_name (str, optional): The name of the Expectation Suite. Defaults to "my_expectation_suite".

    Returns:
        ExpectationSuite: The created Expectation Suite.
    """
    suite = context.create_expectation_suite(
        expectation_suite_name=suite_name, overwrite_existing=True
    )
    return suite

def add_expectations(suite, df):
    """
    Adds multiple expectations to a Great Expectations Expectation Suite based on the DataFrame.

    Args:
        suite (ExpectationSuite): The Expectation Suite to add expectations to.
        df (pd.DataFrame): The DataFrame to infer expectations from.

    Returns:
        ExpectationSuite: The modified Expectation Suite.
    """
    for column in df.columns:
        # Expect the column to exist
        suite.expect_column_to_exist(column=column)

        # Try to infer the data type and add an appropriate expectation
        if pd.api.types.is_numeric_dtype(df[column]):
            min_val = df[column].min()
            max_val = df[column].max()
            suite.expect_column_values_to_be_between(
                column=column, min_value=min_val, max_value=max_val
            )
        elif pd.api.types.is_datetime64_any_dtype(df[column]):
            #Expect the dates to be in a reasonable range
            min_date = pd.Timestamp('2000-01-01')
            max_date = pd.Timestamp('2030-01-01')
            suite.expect_column_values_to_be_between(
                column=column, min_value=min_date, max_value=max_date
            )
        else:
            unique_values = df[column].unique()
            if len(unique_values) < 50:  # Limit to avoid very long lists
                suite.expect_column_values_to_be_in_set(
                    column=column, value_set=unique_values
                )
            else:
                #For string columns with many unique values, expect non-nullity and a reasonable length
                suite.expect_column_values_to_not_be_null(column=column)
                suite.expect_column_value_lengths_to_be_between(column=column, min_value=1, max_value=255)
    return suite

def create_gx_datasource(context, df):
    """
    Creates a Great Expectations Datasource from a Pandas DataFrame.

    Args:
        context (DataContext): The Great Expectations Data Context.
        df (pd.DataFrame): The Pandas DataFrame.

    Returns:
        BatchRequest: A BatchRequest object.
        str: The path to the temporary CSV file.
    """
    # Create a temporary CSV file
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".csv")
    temp_file_path = temp_file.name
    df.to_csv(temp_file_path, index=False)
    temp_file.close()

    datasource_name = "pandas_datasource"
    if datasource_name not in context.list_datasources():
        context.add_pandas(name=datasource_name)

    batch_request = context.make_pandas_data_asset(
        file_path=temp_file_path,
        asset_name="my_data_asset",
    ).build_batch_request()
    return batch_request, temp_file_path

def validate_data_with_gx(context, batch_request, suite):
    """
    Validates data against a Great Expectations Expectation Suite.

    Args:
        context (DataContext): The Great Expectations Data Context.
        batch_request (BatchRequest): The BatchRequest.
        suite (ExpectationSuite): The Expectation Suite to validate against.

    Returns:
        dict: The validation results.
    """
    results = context.run_validation_operator(
        assets_to_validate=[batch_request],
        expectation_suite_names=[suite.expectation_suite_name],
    ).results
    return results

def display_gx_results(results):
    """
    Displays the Great Expectations validation results.

    Args:
        results (dict): The validation results.
    """
    if not results:
        print("No validation results to display.")
        return

    for result in results.values():
        print(f"Validation Results for batch: {result['batch_kwargs']}")
        for expectation_result in result["results"]:
            if not expectation_result["success"]:
                print(
                    f"  - Expectation '{expectation_result['expectation_config']['expectation_type']}' failed:"
                )
                print(
                    f"    - Column: {expectation_result['expectation_config']['kwargs'].get('column', 'N/A')}"
                )
                print(
                    f"    - Details: {expectation_result['result'].get('details', 'No details available')}"
                )
        print(f"  Summary: {result['success']}")

def main():
    """
    Main function to run the data profiling and validation.
    """
    # 1. Using Pandas-Profiling
    file_path = "data.csv"
    profile_csv_with_pandas_profiling(file_path)
    profile_csv_with_pandas_profiling(file_path, subset_columns=["age", "income"])

    # 2. Using Great Expectations
    context = create_gx_context()

    # Load data with Pandas
    try:
        df = pd.read_csv(file_path)
    except FileNotFoundError:
        print(f"Error: File not found at {file_path}.  Please make sure the file exists before running the script.")
        return  # Exit the program

    suite = create_expectation_suite(context)
    suite = add_expectations(suite, df) #Add expectations
    context.save_expectation_suite(expectation_suite=suite) # save

    batch_request, temp_file_path = create_gx_datasource(context, df)
    results = validate_data_with_gx(context, batch_request, suite)
    display_gx_results(results)

    os.remove(temp_file_path) #Cleanup

if __name__ == "__main__":
    main()


PydanticImportError: `BaseSettings` has been moved to the `pydantic-settings` package. See https://docs.pydantic.dev/2.11/migration/#basesettings-has-moved-to-pydantic-settings for more details.

For further information visit https://errors.pydantic.dev/2.11/u/import-error

### Task 2: Real-time Monitoring of Data Quality

**Steps**:
1. Setting up Alerts for Quality Drops
    - Use the logging library to set up a basic alert on failed expectations.
    - Implementing alerts using email notifications.
    - Using a dashboard like Grafana for visual alerts.
        - Note: Example assumes integration with a monitoring system
        - Alert setup would involve creating a data source and alert rule in Grafana

### Task 3: Using AI for Data Quality Monitoring
**Steps**:
1. Basic AI Models for Monitoring
    - Train a simple anomaly detection model using Isolation Forest.
    - Use a simple custom function based AI logic for outlier detection.
    - Creating a monitoring function that utilizes a pre-trained machine learning model.

In [None]:
# Write your code from here