## Automated Data Quality Monitoring
**Objective**: Use Great Expectations to perform data profiling and write validation rules.

1. Data Profiling with Great Expectations
### Profile a CSV dataset containing customer information to inspect distribution patterns of 'Age' and 'Income' columns.
- Load the dataset using Great Expectations and create a data context.
- Generate a data asset to inspect the summary statistics.
- View the generated expectation suite to analyze data distributions.

In [1]:
# write your code from here
!pip install great_expectations


Defaulting to user installation because normal site-packages is not writeable

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
!great_expectations init


/bin/bash: line 1: great_expectations: command not found


In [3]:
import great_expectations as ge
from great_expectations.data_context import DataContext

# Path to your GE project directory
context = DataContext("/path/to/great_expectations")

# Create a batch request for your CSV data
batch_request = {
    "datasource_name": "my_csv_datasource",  # You'll need to configure this datasource in GE
    "data_connector_name": "default_inferred_data_connector_name",  # Usually the default
    "data_asset_name": "customers",  # Logical name for the CSV file
    "batch_identifiers": {
        "default_identifier_name": "default_identifier"
    }
}

# For simplicity, you can also directly load the CSV as a GE dataset (alternative)
df_ge = ge.read_csv("/path/to/customers.csv")

# Create an expectation suite (new or existing)
expectation_suite_name = "customer_data_profile"

# Create or get expectation suite
suite = context.create_expectation_suite(expectation_suite_name, overwrite_existing=True)

# Create a Validator to run profiling on the dataset
validator = context.get_validator(
    batch_request=None,
    expectation_suite_name=expectation_suite_name,
    batch_data=df_ge
)

# Run the profiler on the 'Age' and 'Income' columns
from great_expectations.profile.user_configurable_profiler import UserConfigurableProfiler

profiler = UserConfigurableProfiler(
    profile_dataset=validator,
    excluded_expectations=None,
    included_columns=["Age", "Income"],
)

# Build the expectation suite using the profiler
suite = profiler.build_suite()

# Save the expectation suite
validator.save_expectation_suite(discard_failed_expectations=False)

# Optional: Print summary of expectations
print(f"Generated expectations for {expectation_suite_name}:")
print(suite.expectations)


ImportError: cannot import name 'DataContext' from 'great_expectations.data_context' (/home/vscode/.local/lib/python3.10/site-packages/great_expectations/data_context/__init__.py)

2. Writing Validation Rules for Data Ingestion
### Write validation rules for a CSV file to ensure the 'Date' column follows a specific date format.
- Utilize expect_column_values_to_match_regex to enforce date format validation.
- Run the validation and interpret the output.

In [None]:
# write your code from here