# Great Expectations Demo - Following Official Documentation

This notebook follows the official [Great Expectations "Try GX Core" documentation](https://docs.greatexpectations.io/docs/core/introduction/try_gx) patterns:

1. **Validate data in a DataFrame** - Using Pandas DataFrame
2. **Validate data in a SQL table** - Using database connection

Based on the official GX Core workflow patterns.

## Part 1: Validate Data in a DataFrame

This example follows the official GX documentation for validating data in a Pandas DataFrame.

In [1]:
# Import required modules from GX library
import great_expectations as gx
import pandas as pd
import json
from pathlib import Path

In [2]:
# Create Data Context
context = gx.get_context()

In [3]:
# Import sample data into Pandas DataFrame
# Using NYC taxi data from the official GX documentation
df = pd.read_csv(
    "https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
)

display(df)

Unnamed: 0,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
0,1,2019-01-15 03:36:12,2019-01-15 03:42:19,1,1.00,1,N,230,48,1,6.5,0.5,0.5,1.95,0.0,0.3,9.75,
1,1,2019-01-25 18:20:32,2019-01-25 18:26:55,1,0.80,1,N,112,112,1,6.0,1.0,0.5,1.55,0.0,0.3,9.35,0.0
2,1,2019-01-05 06:47:31,2019-01-05 06:52:19,1,1.10,1,N,107,4,2,6.0,0.0,0.5,0.00,0.0,0.3,6.80,
3,1,2019-01-09 15:08:02,2019-01-09 15:20:17,1,2.50,1,N,143,158,1,11.0,0.0,0.5,3.00,0.0,0.3,14.80,
4,1,2019-01-25 18:49:51,2019-01-25 18:56:44,1,0.80,1,N,246,90,1,6.5,1.0,0.5,1.65,0.0,0.3,9.95,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,2,2019-01-02 07:48:44,2019-01-02 08:00:13,6,1.07,1,N,50,161,2,8.5,0.0,0.5,0.00,0.0,0.3,9.30,
9996,2,2019-01-16 19:06:45,2019-01-16 19:10:05,6,0.35,1,N,234,234,1,4.0,1.0,0.5,1.16,0.0,0.3,6.96,
9997,2,2019-01-02 09:10:44,2019-01-02 09:36:46,6,4.12,1,N,50,236,1,20.0,0.0,0.5,6.24,0.0,0.3,27.04,
9998,2,2019-01-03 13:28:36,2019-01-03 13:36:42,6,1.17,1,N,137,234,1,7.0,0.0,0.5,0.90,0.0,0.3,8.70,


In [4]:
# Connect to data - Updated for GX 0.18.22 API with proper error handling
# Create Data Source, Data Asset, Batch Request, and Batch

# Check if data source already exists and handle accordingly
data_source_name = 'pandas'

try:
    # Try to get existing data source first
    data_source = context.get_datasource(data_source_name)
except:
    # Create new data source if it doesn't exist
    data_source = context.sources.add_pandas(data_source_name)

# Create or get data asset
asset_name = 'pd dataframe asset'
try:
    data_asset = data_source.get_asset(asset_name)
except:
    data_asset = data_source.add_dataframe_asset(name=asset_name)

# Create batch request and batch
batch_request = data_asset.build_batch_request(dataframe=df)
batch_list = data_asset.get_batch_list_from_batch_request(batch_request)
batch = batch_list[0]

In [5]:
# Create Expectation - Updated for GX 0.18.22 API
# Define a warning-level Expectation that passenger_count ranges from 1 to 6
expectation_config = gx.core.ExpectationConfiguration(
    expectation_type='expect_column_values_to_be_between',
    kwargs={
        'column': 'passenger_count',
        'min_value': 1,
        'max_value': 6
    }
)

print(f'✅ Expectation Configuration created: {expectation_config.expectation_type}')
print(f'Column: {expectation_config.kwargs["column"]}')
print(f'Range: {expectation_config.kwargs["min_value"]} to {expectation_config.kwargs["max_value"]}')

✅ Expectation Configuration created: expect_column_values_to_be_between
Column: passenger_count
Range: 1 to 6


In [6]:
# Fluent API: Create Expectation Suite and Validate
# Using proper GX Fluent API approach

# Create Expectation Suite using Fluent API
suite = context.add_expectation_suite('dataframe_validation_suite')
print(f'✅ Fluent Expectation Suite created: {suite.expectation_suite_name}')

# Add expectation to suite using Fluent API
suite.add_expectation(expectation_config)
print(f'✅ Added expectation to Fluent suite')

# Create Checkpoint using Fluent API
checkpoint_config = {
    'name': 'dataframe_checkpoint',
    'config_version': 1.0,
    'class_name': 'Checkpoint',
    'run_name_template': '%Y%m%d-%H%M%S-dataframe-run',
    'expectation_suite_name': suite.expectation_suite_name,
    'batch_request': {
        'datasource_name': data_source.name,
        'data_asset_name': data_asset.name
    },
    'action_list': [
        {
            'name': 'store_validation_result',
            'action': {'class_name': 'StoreValidationResultAction'}
        },
        {
            'name': 'update_data_docs',
            'action': {'class_name': 'UpdateDataDocsAction'}
        }
    ]
}

checkpoint = context.add_checkpoint(**checkpoint_config)
print(f'✅ Fluent Checkpoint created: {checkpoint.name}')

# Run validation using Fluent API
print('\n🔄 Running Fluent Checkpoint...')
checkpoint_result = context.run_checkpoint(checkpoint_name=checkpoint.name)
print(f'✅ Fluent Checkpoint completed: {checkpoint_result.success}')

# Display results
print('\n📊 Fluent Validation Results:')
print(f'Success: {checkpoint_result.success}')
print(f'Statistics: {checkpoint_result.get_statistics()}')


✅ Fluent Expectation Suite created: dataframe_validation_suite
✅ Added expectation to Fluent suite
✅ Fluent Checkpoint created: dataframe_checkpoint

🔄 Running Fluent Checkpoint...


Calculating Metrics: 0it [00:00, ?it/s]

✅ Fluent Checkpoint completed: True

📊 Fluent Validation Results:
Success: True
Statistics: {'data_asset_count': 1, 'validation_result_count': 1, 'successful_validation_count': 1, 'unsuccessful_validation_count': 0, 'successful_validation_percent': 100.0, 'validation_statistics': {ValidationResultIdentifier::dataframe_validation_suite/20251003-153535-dataframe-run/20251003T153535.806696Z/pandas-pd dataframe asset: {'evaluated_expectations': 0, 'successful_expectations': 0, 'unsuccessful_expectations': 0, 'success_percent': None}}}
