### Task 1: Validate Data with a Custom Expectation in Great Expectations
**Description**: Create a custom expectation and validate data with Great Expectations.

**Load a sample DataFrame**

data = {
'age': [25, 30, 35, 40, 45],
'income': [50000, 60000, 75000, None, 100000]
}

In [1]:
# Write your code from here
import great_expectations as ge
import pandas as pd

# Sample data
data = {
    'age': [25, 30, 35, 40, 45],
    'income': [50000, 60000, 75000, None, 100000]
}

# Create a pandas DataFrame
df = pd.DataFrame(data)

# Convert pandas DataFrame to a Great Expectations dataset
ge_df = ge.from_pandas(df)

# Define a custom expectation to check that 'income' values are not null
def expect_income_not_null(column):
    return ge_df.expect_column_values_to_not_be_null(column)

# Run the expectation on the 'income' column
result = expect_income_not_null('income')

# Print the result
print(result)

# Check if validation passed
if result['success']:
    print("Data validation passed: No null income values found.")
else:
    print("Data validation failed: Null income values found.")


ModuleNotFoundError: No module named 'great_expectations'

### Task 2: Implement a Basic Alert System for Data Quality Drops
**Description**: Set up a basic alert system that triggers when data quality drops.

In [2]:
def check_data_quality(dqi, threshold=0.8):
    """
    Checks if the data quality index (DQI) falls below a threshold.
    If yes, triggers an alert.
    
    Args:
        dqi (float): Current data quality index (between 0 and 1).
        threshold (float): Threshold below which alert triggers.
    
    Returns:
        str: Alert message or success message.
    """
    if dqi < threshold:
        alert_msg = f"ALERT: Data Quality Index dropped below threshold! Current DQI: {dqi:.2f}"
        # You can extend this to send email, logs, or notifications
        print(alert_msg)
        return alert_msg
    else:
        msg = f"Data Quality Index is good. Current DQI: {dqi:.2f}"
        print(msg)
        return msg

# Example usage:
current_dqi = 0.75  # Example DQI value
check_data_quality(current_dqi, threshold=0.8)


ALERT: Data Quality Index dropped below threshold! Current DQI: 0.75


'ALERT: Data Quality Index dropped below threshold! Current DQI: 0.75'

### Task 3: Real-time Data Quality Monitoring with Python and Great Expectations
**Description**: Implement a system that monitors data quality in real-time.

In [3]:
import time
import pandas as pd
import great_expectations as ge

def generate_new_data(batch_num):
    """
    Simulate incoming data batches.
    Each batch has an 'age' and 'income' column.
    Introduce missing data to simulate quality drops.
    """
    data = {
        'age': [25 + batch_num, 30 + batch_num, 35 + batch_num],
        'income': [50000, None if batch_num % 3 == 0 else 60000, 75000]
    }
    return pd.DataFrame(data)

def validate_data(df):
    """
    Validate data quality using Great Expectations.
    Example: Check that 'income' column has no nulls.
    """
    ge_df = ge.from_pandas(df)
    result = ge_df.expect_column_values_to_not_be_null('income')
    return result

def monitor_data_quality(iterations=5, interval=2):
    """
    Monitor data quality in real-time over multiple iterations.
    """
    for i in range(1, iterations + 1):
        print(f"\nBatch {i}:")
        df = generate_new_data(i)
        print(df)

        validation_result = validate_data(df)
        if validation_result['success']:
            print("DQI Check Passed: No missing income values.")
        else:
            print("DQI Check Failed: Missing income values detected!")
        
        time.sleep(interval)  # Wait before next batch (simulate real-time)

if __name__ == "__main__":
    monitor_data_quality()


ModuleNotFoundError: No module named 'great_expectations'