## Automate Data Quality Checks with Great Expectations
**Introduction**: In this activity, you will learn how to automate data quality checks using the Great Expectations framework. This includes setting up expectations and generating validation reports.

### Task 1: Setup and Initial Expectations

1. Objective: Set up Great Expectations and create initial expectations for a dataset.
2. Steps:
    - Install Great Expectations using pip.
    - Initialize a data context.
    - Create basic expectations on a sample dataset.
    - Eg., Implement a basic setup and expectation for column presence and type.

In [1]:
import great_expectations as gx
import pandas as pd
context=gx.get_context()
data={'id':[1,2,3,4,5],'name':['Alice','Bob','Charlie','David','Eve'],'age':[30,24,35,29,42],'city':['New York','Los Angeles','Chicago','Houston','Miami'],'is_active':[True,True,False,True,False]}
df=pd.DataFrame(data)
datasource=context.data_sources.add_pandas("my_initial_pandas_datasource")
data_asset=datasource.add_dataframe_asset("my_initial_dataframe_asset",dataframe=df)
batch_request=data_asset.build_batch_request()
validator=context.get_validator(batch_request=batch_request,create_expectation_suite_with_name="initial_data_expectations")
validator.expect_column_to_exist("id")
validator.expect_column_to_exist("name")
validator.expect_column_to_exist("age")
validator.expect_column_to_exist("city")
validator.expect_column_to_exist("is_active")
validator.expect_column_values_to_be_of_type("id","int64")
validator.expect_column_values_to_be_of_type("name","object")
validator.expect_column_values_to_be_of_type("age","int64")
validator.expect_column_values_to_be_of_type("city","object")
validator.expect_column_values_to_be_of_type("is_active","bool")
validator.save_expectation_suite(discard_failed_expectations=False)

TypeError: PandasDatasource.add_dataframe_asset() got an unexpected keyword argument 'dataframe'

### Task 2: Validate Datasets and Generate Reports

1. Objective: Validate a dataset against defined expectations and generate a report.
2. Steps:
    - Execute the validation process on the dataset.
    - Review the validation results and generate a report.
    - Eg., Validate completeness and consistency expectations, and view the results.


In [None]:
# Write your code from here
# Write your code from here
import pandas as pd

df = pd.read_csv('dataset.csv')

completeness = df.notnull().mean() * 100

consistency = {}
if 'Age' in df.columns:
    consistency['Age_positive'] = (df['Age'] >= 0).mean() * 100

report = pd.DataFrame({
    'Metric': list(completeness.index) + list(consistency.keys()),
    'Value': list(completeness.values) + list(consistency.values())
})

print(report)


### Task 3: Advanced Expectations and Scheduling

1. Objective: Create advanced expectations for conditional checks and automate the validation.
2. Steps:
    - Define advanced expectations based on complex conditions.
    - Use scheduling tools to automate periodic checks.
    - E.g., an expectation that customer IDs must be unique and schedule a daily check.

In [None]:
# Write your code from here
# Write your code from here
import pandas as pd
import schedule
import time

def validate_dataset():
    df = pd.read_csv('dataset.csv')

    expectations = {}

    if 'CustomerID' in df.columns:
        expectations['CustomerID_uniqueness'] = df['CustomerID'].is_unique

    if 'Email' in df.columns:
        expectations['Valid_Emails'] = df['Email'].str.contains('@').mean() * 100

    report = pd.DataFrame({
        'Expectation': expectations.keys(),
        'Result': expectations.values()
    })

    print(report)

schedule.every().day.at("10:00").do(validate_dataset)

while True:
    schedule.run_pending()
    time.sleep(60)
