## Automated Data Quality Monitoring
**Objective**: Use Great Expectations to perform data profiling and write validation rules.

1. Data Profiling with Great Expectations
### Profile a CSV dataset containing customer information to inspect distribution patterns of 'Age' and 'Income' columns.
- Load the dataset using Great Expectations and create a data context.
- Generate a data asset to inspect the summary statistics.
- View the generated expectation suite to analyze data distributions.

In [1]:
# write your code from here
import great_expectations as ge
from great_expectations.data_context import DataContext
from great_expectations.core.batch import BatchRequest

# Step 1: Initialize Great Expectations DataContext (creates or loads GE project)
context = ge.data_context.DataContext()

# Step 2: Add a datasource pointing to your CSV file location (if not already configured)
datasource_name = "my_csv_datasource"

# Register datasource if not existing (adjust base_directory to your CSV folder)
if datasource_name not in context.list_datasources():
    context.add_datasource(
        name=datasource_name,
        class_name="Datasource",
        execution_engine={
            "class_name": "PandasExecutionEngine"
        },
        data_connectors={
            "default_runtime_data_connector_name": {
                "class_name": "RuntimeDataConnector",
                "batch_identifiers": ["default_identifier_name"]
            }
        }
    )

# Step 3: Load the CSV file as a runtime batch
batch_request = BatchRequest(
    datasource_name=datasource_name,
    data_connector_name="default_runtime_data_connector_name",
    data_asset_name="customer_data",  # arbitrary name for this batch
    runtime_parameters={"path": "./customer_data.csv"},
    batch_identifiers={"default_identifier_name": "default_identifier"}
)

# Step 4: Create or load an expectation suite for profiling
expectation_suite_name = "customer_data_profile_suite"
try:
    suite = context.get_expectation_suite(expectation_suite_name)
except Exception:
    suite = context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)

# Step 5: Get validator to profile data
validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name=expectation_suite_name
)

# Step 6: Generate basic profiling expectations (summary stats)
validator.profile()

# Step 7: Save the expectation suite with profiling results
validator.save_expectation_suite()

print(f"Generated expectation suite '{expectation_suite_name}' with summary statistics for 'Age' and 'Income'.")

# Step 8 (optional): View the expectation suite details
expectations = validator.get_expectation_suite().expectations
for exp in expectations:
    if exp['kwargs'].get('column') in ['Age', 'Income']:
        print(exp)


ImportError: cannot import name 'DataContext' from 'great_expectations.data_context' (/home/vscode/.local/lib/python3.10/site-packages/great_expectations/data_context/__init__.py)

2. Writing Validation Rules for Data Ingestion
### Write validation rules for a CSV file to ensure the 'Date' column follows a specific date format.
- Utilize expect_column_values_to_match_regex to enforce date format validation.
- Run the validation and interpret the output.

In [None]:
# write your code from here