## Automate Data Quality Checks with Great Expectations
**Introduction**: In this activity, you will learn how to automate data quality checks using the Great Expectations framework. This includes setting up expectations and generating validation reports.

### Task 1: Setup and Initial Expectations

1. Objective: Set up Great Expectations and create initial expectations for a dataset.
2. Steps:
    - Install Great Expectations using pip.
    - Initialize a data context.
    - Create basic expectations on a sample dataset.
    - Eg., Implement a basic setup and expectation for column presence and type.

In [None]:
# Write your code from here

In [1]:
import great_expectations as ge

# Install Great Expectations (if not already installed)
!pip install great_expectations


# Create a sample dataset
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "Gender": ["F", "M", "M"]
}

# Convert the dataset into a Great Expectations dataframe
df_ge = ge.from_pandas(pd.DataFrame(data))

# Create initial expectations
df_ge.expect_column_to_exist("Name")
df_ge.expect_column_values_to_be_of_type("Age", "int")
df_ge.expect_column_values_to_be_in_set("Gender", ["F", "M"])

# Validate the dataset against the expectations
validation_results = df_ge.validate()

# Print the validation results
print(validation_results)

ModuleNotFoundError: No module named 'great_expectations'

### Task 2: Validate Datasets and Generate Reports

1. Objective: Validate a dataset against defined expectations and generate a report.
2. Steps:
    - Execute the validation process on the dataset.
    - Review the validation results and generate a report.
    - Eg., Validate completeness and consistency expectations, and view the results.


In [None]:
# Write your code from here

In [2]:
from great_expectations.render.renderer import ValidationResultsPageRenderer
from great_expectations.render.view import DefaultJinjaPageView

# Validate the dataset against the defined expectations
validation_results = df_ge.validate()

# Generate a summary report of the validation results

# Render the validation results as HTML
renderer = ValidationResultsPageRenderer()
rendered_content = renderer.render(validation_results)
html_content = DefaultJinjaPageView().render(rendered_content)

# Save the report to an HTML file
with open("validation_report.html", "w") as f:
    f.write(html_content)

print("Validation report generated and saved as 'validation_report.html'.")

ModuleNotFoundError: No module named 'great_expectations'

### Task 3: Advanced Expectations and Scheduling

1. Objective: Create advanced expectations for conditional checks and automate the validation.
2. Steps:
    - Define advanced expectations based on complex conditions.
    - Use scheduling tools to automate periodic checks.
    - E.g., an expectation that customer IDs must be unique and schedule a daily check.

In [None]:
# Write your code from here

In [3]:
# Define an advanced expectation: Ensure that the "Name" column values are unique
df_ge.expect_column_values_to_be_unique("Name")

# Validate the dataset against the new expectation
advanced_validation_results = df_ge.validate()

# Print the advanced validation results
print(advanced_validation_results)

NameError: name 'df_ge' is not defined