## Automated Data Quality Monitoring
**Objective**: Use Great Expectations to perform data profiling and write validation rules.

1. Data Profiling with Great Expectations
### Profile a CSV dataset containing customer information to inspect distribution patterns of 'Age' and 'Income' columns.
- Load the dataset using Great Expectations and create a data context.
- Generate a data asset to inspect the summary statistics.
- View the generated expectation suite to analyze data distributions.

In [3]:
# write your code from here
import pandas as pd
import great_expectations as ge
from great_expectations.core.batch import BatchRequest
from great_expectations.checkpoint import SimpleCheckpoint
import os

# Step 0: Create in-memory dataset
data = pd.DataFrame({
    'CustomerID': [1, 2, 3, 4],
    'Age': [25, 34, 45, 52],
    'Income': [48000, 56000, 72000, 85000],
    'Date': ['2024-01-10', '2024-02-15', 'invalid-date', '2024-04-01']
})

# Step 1: Create GE context in temp directory
project_root_dir = "ge_demo_project"
os.makedirs(project_root_dir, exist_ok=True)
context = ge.get_context(context_root_dir=project_root_dir)

# Step 2: Add a datasource for pandas
context.datasources["my_pandas_datasource"] = {
    "class_name": "Datasource",
    "execution_engine": {"class_name": "PandasExecutionEngine"},
    "data_connectors": {
        "runtime_connector": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier"],
        }
    },
}
context.save_datasource("my_pandas_datasource")

# Step 3: Create a runtime batch request for profiling
batch_request = BatchRequest(
    datasource_name="my_pandas_datasource",
    data_connector_name="runtime_connector",
    data_asset_name="customer_profile_data",
    runtime_parameters={"batch_data": data},
    batch_identifiers={"default_identifier": "default"},
)

# Step 4: Create or load expectation suite
suite_name = "customer_data_suite"
context.create_expectation_suite(suite_name=suite_name, overwrite_existing=True)

# Step 5: Create validator
validator = context.get_validator(batch_request=batch_request, expectation_suite_name=suite_name)

# Step 6: Profile columns (Age, Income) and define expectations
validator.expect_column_values_to_be_between("Age", min_value=18, max_value=100)
validator.expect_column_values_to_be_between("Income", min_value=20000, max_value=200000)

# Step 7: Add validation for Date format
validator.expect_column_values_to_match_regex("Date", r"^\d{4}-\d{2}-\d{2}$")

# Step 8: Save suite
validator.save_expectation_suite(discard_failed_expectations=False)

# Step 9: Create checkpoint and run
checkpoint_name = "customer_data_checkpoint"
checkpoint = SimpleCheckpoint(
    name=checkpoint_name,
    data_context=context,
    validations=[{
        "batch_request": batch_request,
        "expectation_suite_name": suite_name,
    }],
)
context.add_or_update_checkpoint(checkpoint=checkpoint)

# Step 10: Run checkpoint and evaluate results
results = checkpoint.run()
success = results["success"]

# Step 11: Print Summary
if success:
    print("✅ All data quality checks passed.")
else:
    print("❌ Some data quality checks failed.")

# Step 12: Unit test-like assertions
def test_expectations(results):
    assert "results" in results, "Missing results in checkpoint output."
    assert "run_id" in results, "Missing run_id in checkpoint output."
    assert isinstance(results["success"], bool), "Success must be a boolean."

test_expectations(results)


ImportError: cannot import name 'SimpleCheckpoint' from 'great_expectations.checkpoint' (/home/vscode/.local/lib/python3.10/site-packages/great_expectations/checkpoint/__init__.py)

2. Writing Validation Rules for Data Ingestion
### Write validation rules for a CSV file to ensure the 'Date' column follows a specific date format.
- Utilize expect_column_values_to_match_regex to enforce date format validation.
- Run the validation and interpret the output.

In [None]:
# write your code from here