## Automated Data Quality Monitoring
**Objective**: Use Great Expectations to perform data profiling and write validation rules.

1. Data Profiling with Great Expectations
### Profile a CSV dataset containing customer information to inspect distribution patterns of 'Age' and 'Income' columns.
- Load the dataset using Great Expectations and create a data context.
- Generate a data asset to inspect the summary statistics.
- View the generated expectation suite to analyze data distributions.

In [1]:
# write your code from here
import pandas as pd
import great_expectations as gx

# Create sample dataset
df = pd.DataFrame({
    "customer_id": [1, 2, 3, 4, 5],
    "name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "age": [25, 40, 35, 28, 60],
    "income": [50000, 65000, 70000, 48000, 90000]
})

# Create Great Expectations context
context = gx.get_context()

# Add Pandas datasource (Fluent API style)
datasource = context.sources.add_pandas(name="pandas_src")

# Create DataFrame asset
data_asset = datasource.add_dataframe_asset(name="customer_data", dataframe=df)

# Run profiling (data assistant)
assistant = context.assistants.onboarding.run(asset_name="customer_data")

# Save generated expectation suite
assistant.expectation_suite.save()


TypeError: DataAssistantRunner.run_impl.<locals>.run() got an unexpected keyword argument 'asset_name'

2. Writing Validation Rules for Data Ingestion
### Write validation rules for a CSV file to ensure the 'Date' column follows a specific date format.
- Utilize expect_column_values_to_match_regex to enforce date format validation.
- Run the validation and interpret the output.

In [2]:
# write your code from here
import pandas as pd
import great_expectations as gx

# Sample dataset with Date column
df = pd.DataFrame({
    "Date": ["2023-01-01", "2023-02-30", "2023/03/15", "2023-04-10", "15-05-2023"],
    "Value": [100, 200, 150, 300, 250]
})

# Create Great Expectations context
context = gx.get_context()

# Add Pandas datasource and DataFrame asset
datasource = context.sources.add_pandas(name="pandas_src")
data_asset = datasource.add_dataframe_asset(name="date_data", dataframe=df)

# Create or get expectation suite
expectation_suite = context.create_expectation_suite("date_validation_suite", overwrite_existing=True)

# Get validator
validator = context.get_validator(
    batch_request=data_asset.build_batch_request(),
    expectation_suite=expectation_suite,
)

# Add expectation for Date column format YYYY-MM-DD
validator.expect_column_values_to_match_regex(
    column="Date",
    regex=r"^\d{4}-\d{2}-\d{2}$"
)

# Validate data
results = validator.validate()

# Print validation results
print(results)


Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": false,
  "results": [
    {
      "success": false,
      "expectation_config": {
        "expectation_type": "expect_column_values_to_match_regex",
        "kwargs": {
          "column": "Date",
          "regex": "^\\d{4}-\\d{2}-\\d{2}$",
          "batch_id": "pandas_src-date_data"
        },
        "meta": {}
      },
      "result": {
        "element_count": 5,
        "unexpected_count": 2,
        "unexpected_percent": 40.0,
        "partial_unexpected_list": [
          "2023/03/15",
          "15-05-2023"
        ],
        "missing_count": 0,
        "missing_percent": 0.0,
        "unexpected_percent_total": 40.0,
        "unexpected_percent_nonmissing": 40.0
      },
      "meta": {},
      "exception_info": {
        "raised_exception": false,
        "exception_traceback": null,
        "exception_message": null
      }
    }
  ],
  "evaluation_parameters": {},
  "statistics": {
    "evaluated_expectations": 1,
    "successful_expectations": 0,
    "unsu