# Great Expectations Demo - Following Official Documentation

This notebook follows the official [Great Expectations "Try GX Core" documentation](https://docs.greatexpectations.io/docs/core/introduction/try_gx) patterns:

1. **Validate data in a DataFrame** - Using Pandas DataFrame
2. **Validate data in a SQL table** - Using database connection

Based on the official GX Core workflow patterns.

## Part 1: Validate Data in a DataFrame

This example follows the official GX documentation for validating data in a Pandas DataFrame.

In [26]:
# Import required modules from GX library
import great_expectations as gx
import pandas as pd
import json
from pathlib import Path

In [27]:
# Create Data Context
context = gx.get_context()

In [28]:
# Import sample data into Pandas DataFrame
# Using NYC taxi data from the official GX documentation
df = pd.read_csv(
    "https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
)

display(df)

Unnamed: 0,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,trip_distance,rate_code_id,store_and_fwd_flag,pickup_location_id,dropoff_location_id,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge
0,1,2019-01-15 03:36:12,2019-01-15 03:42:19,1,1.00,1,N,230,48,1,6.5,0.5,0.5,1.95,0.0,0.3,9.75,
1,1,2019-01-25 18:20:32,2019-01-25 18:26:55,1,0.80,1,N,112,112,1,6.0,1.0,0.5,1.55,0.0,0.3,9.35,0.0
2,1,2019-01-05 06:47:31,2019-01-05 06:52:19,1,1.10,1,N,107,4,2,6.0,0.0,0.5,0.00,0.0,0.3,6.80,
3,1,2019-01-09 15:08:02,2019-01-09 15:20:17,1,2.50,1,N,143,158,1,11.0,0.0,0.5,3.00,0.0,0.3,14.80,
4,1,2019-01-25 18:49:51,2019-01-25 18:56:44,1,0.80,1,N,246,90,1,6.5,1.0,0.5,1.65,0.0,0.3,9.95,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,2,2019-01-02 07:48:44,2019-01-02 08:00:13,6,1.07,1,N,50,161,2,8.5,0.0,0.5,0.00,0.0,0.3,9.30,
9996,2,2019-01-16 19:06:45,2019-01-16 19:10:05,6,0.35,1,N,234,234,1,4.0,1.0,0.5,1.16,0.0,0.3,6.96,
9997,2,2019-01-02 09:10:44,2019-01-02 09:36:46,6,4.12,1,N,50,236,1,20.0,0.0,0.5,6.24,0.0,0.3,27.04,
9998,2,2019-01-03 13:28:36,2019-01-03 13:36:42,6,1.17,1,N,137,234,1,7.0,0.0,0.5,0.90,0.0,0.3,8.70,


In [29]:
# Connect to data - Updated for GX 0.18.22 API with proper error handling
# Create Data Source, Data Asset, Batch Request, and Batch

# Check if data source already exists and handle accordingly
data_source_name = 'pandas'

try:
    # Try to get existing data source first
    data_source = context.get_datasource(data_source_name)
except:
    # Create new data source if it doesn't exist
    data_source = context.sources.add_pandas(data_source_name)

# Create or get data asset
asset_name = 'pd dataframe asset'
try:
    data_asset = data_source.get_asset(asset_name)
except:
    data_asset = data_source.add_dataframe_asset(name=asset_name)

# Create batch request and batch
batch_request = data_asset.build_batch_request(dataframe=df)
batch_list = data_asset.get_batch_list_from_batch_request(batch_request)
batch = batch_list[0]

In [30]:
# Create Expectation - Updated for GX 0.18.22 API
# Define a warning-level Expectation that passenger_count ranges from 1 to 6
expectation_config = gx.core.ExpectationConfiguration(
    expectation_type='expect_column_values_to_be_between',
    kwargs={
        'column': 'passenger_count',
        'min_value': 1,
        'max_value': 6
    }
)

print(f'✅ Expectation Configuration created: {expectation_config.expectation_type}')
print(f'Column: {expectation_config.kwargs["column"]}')
print(f'Range: {expectation_config.kwargs["min_value"]} to {expectation_config.kwargs["max_value"]}')

✅ Expectation Configuration created: expect_column_values_to_be_between
Column: passenger_count
Range: 1 to 6


In [31]:
# Fluent API: Create Expectation Suite and Validate
# Using proper GX Fluent API approach

# Create Expectation Suite using Fluent API
suite = context.add_expectation_suite('dataframe_validation_suite')
print(f'✅ Fluent Expectation Suite created: {suite.expectation_suite_name}')

# Add expectation to suite using Fluent API
suite.add_expectation(expectation_config)
print(f'✅ Added expectation to Fluent suite')

# Create Checkpoint using Fluent API
checkpoint_config = {
    'name': 'dataframe_checkpoint',
    'config_version': 1.0,
    'class_name': 'Checkpoint',
    'run_name_template': '%Y%m%d-%H%M%S-dataframe-run',
    'expectation_suite_name': suite.expectation_suite_name,
    'batch_request': {
        'datasource_name': data_source.name,
        'data_asset_name': data_asset.name
    },
    'action_list': [
        {
            'name': 'store_validation_result',
            'action': {'class_name': 'StoreValidationResultAction'}
        },
        {
            'name': 'update_data_docs',
            'action': {'class_name': 'UpdateDataDocsAction'}
        }
    ]
}

checkpoint = context.add_checkpoint(**checkpoint_config)
print(f'✅ Fluent Checkpoint created: {checkpoint.name}')

# Run validation using Fluent API
print('\n🔄 Running Fluent Checkpoint...')
checkpoint_result = context.run_checkpoint(checkpoint_name=checkpoint.name)
print(f'✅ Fluent Checkpoint completed: {checkpoint_result.success}')

# Display results
print('\n📊 Fluent Validation Results:')
print(f'Success: {checkpoint_result.success}')
print(f'Statistics: {checkpoint_result.get_statistics()}')


✅ Fluent Expectation Suite created: dataframe_validation_suite
✅ Added expectation to Fluent suite
✅ Fluent Checkpoint created: dataframe_checkpoint

🔄 Running Fluent Checkpoint...


Calculating Metrics: 0it [00:00, ?it/s]

✅ Fluent Checkpoint completed: True

📊 Fluent Validation Results:
Success: True
Statistics: {'data_asset_count': 1, 'validation_result_count': 1, 'successful_validation_count': 1, 'unsuccessful_validation_count': 0, 'successful_validation_percent': 100.0, 'validation_statistics': {ValidationResultIdentifier::dataframe_validation_suite/20251001-152147-dataframe-run/20251001T152147.637014Z/pandas-pd dataframe asset: {'evaluated_expectations': 0, 'successful_expectations': 0, 'unsuccessful_expectations': 0, 'success_percent': None}}}


In [32]:
# Final Cell: Custom Data Docs Styling (Fluent API Compatible)
# This works with ephemeral contexts and in-memory Data Docs

import great_expectations as gx
from pathlib import Path
import webbrowser
import tempfile
import os

print("🎨 Applying Custom Data Docs Styling (Fluent API)")
print("=" * 50)

# Get the context (should already exist from previous cells)
context = gx.get_context()

# Custom CSS to hide unwanted elements
custom_css = """
/* Hide Great Expectations logo */
.navbar-brand img,
.gx-logo,
.great-expectations-logo {
    display: none !important;
}

/* Hide newsletter signup footer */
.footer-newsletter,
.newsletter-signup,
[class*="newsletter"],
[id*="newsletter"],
.stay-current,
.newsletter-footer {
    display: none !important;
}

/* Hide footer text about staying current */
.footer-text,
.footer p,
.footer div {
    display: none !important;
}

/* Add your custom branding */
.navbar-brand::before {
    content: "📊 ";
    font-size: 28px;
}

.navbar-brand::after {
    content: " Data Quality Dashboard";
    font-size: 20px;
    color: #7f8c8d;
}

/* Custom colors */
:root {
    --primary-color: #3498db;
    --success-color: #27ae60;
    --warning-color: #f39c12;
    --danger-color: #e74c3c;
}

/* Enhanced styling */
.expectation-result-success {
    background-color: #d4edda !important;
    border-color: #c3e6cb !important;
    color: #155724 !important;
}

.expectation-result-failure {
    background-color: #f8d7da !important;
    border-color: #f5c6cb !important;
    color: #721c24 !important;
}

.table {
    border-radius: 8px;
    overflow: hidden;
    box-shadow: 0 2px 4px rgba(0,0,0,0.1);
}

.table thead th {
    background-color: #2c3e50;
    color: white;
    font-weight: 600;
}

/* Hide footer links */
footer a,
footer .footer-links {
    display: none !important;
}

/* Custom footer styling */
footer {
    background-color: #f8f9fa;
    border-top: 1px solid #dee2e6;
    padding: 20px 0;
    text-align: center;
}

footer::after {
    content: "© 2024 Your Organization - Data Quality Dashboard";
    color: #6c757d;
    font-size: 14px;
}
"""

def create_custom_data_docs():
    """Create custom Data Docs with Fluent API approach."""
    
    print("🏗️ Building Data Docs with Fluent API...")
    
    try:
        # Build Data Docs - this creates them in memory/temp location
        context.build_data_docs()
        print("✅ Data Docs built successfully")
        
        # Get the Data Docs URLs
        docs_urls = context.get_docs_sites_urls()
        
        if not docs_urls:
            print("❌ No Data Docs URLs found!")
            return False
        
        print(f"📄 Found {len(docs_urls)} Data Docs sites")
        
        # For Fluent API, we'll create a custom HTML file with our styling
        # and open it instead of modifying the generated files
        
        # Create custom HTML template
        custom_html = f"""
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Data Quality Dashboard</title>
    <style>
        {custom_css}
        
        /* Additional custom styling */
        body {{
            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
            margin: 0;
            padding: 0;
            background-color: #f8f9fa;
        }}
        
        .header {{
            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
            color: white;
            padding: 20px 0;
            text-align: center;
            box-shadow: 0 2px 10px rgba(0,0,0,0.1);
        }}
        
        .header h1 {{
            margin: 0;
            font-size: 2.5em;
            font-weight: 300;
        }}
        
        .header p {{
            margin: 10px 0 0 0;
            font-size: 1.2em;
            opacity: 0.9;
        }}
        
        .container {{
            max-width: 1200px;
            margin: 0 auto;
            padding: 20px;
        }}
        
        .card {{
            background: white;
            border-radius: 10px;
            padding: 20px;
            margin: 20px 0;
            box-shadow: 0 2px 10px rgba(0,0,0,0.1);
        }}
        
        .success {{
            color: #27ae60;
            font-weight: bold;
        }}
        
        .failure {{
            color: #e74c3c;
            font-weight: bold;
        }}
        
        .footer {{
            background-color: #2c3e50;
            color: white;
            text-align: center;
            padding: 20px 0;
            margin-top: 40px;
        }}
        
        .data-docs-link {{
            display: inline-block;
            background-color: #3498db;
            color: white;
            padding: 12px 24px;
            text-decoration: none;
            border-radius: 6px;
            margin: 10px;
            font-weight: bold;
            transition: background-color 0.3s;
        }}
        
        .data-docs-link:hover {{
            background-color: #2980b9;
        }}
    </style>
</head>
<body>
    <div class="header">
        <h1>📊 Data Quality Dashboard</h1>
        <p>Comprehensive Data Validation Reports</p>
    </div>
    
    <div class="container">
        <div class="card">
            <h2>🎯 Validation Results</h2>
            <p>Your Great Expectations validation results are ready!</p>
            
            <h3>📋 Available Data Docs Sites:</h3>
            <div style="text-align: center;">
"""
        
        # Add links to each Data Docs site
        for i, site in enumerate(docs_urls):
            site_name = site.get('site_name', f'Site {i+1}')
            site_url = site.get('site_url', '#')
            custom_html += f"""
                <a href="{site_url}" class="data-docs-link" target="_blank">
                    🔗 {site_name}
                </a>
"""
        
        custom_html += """
            </div>
            
            <div style="margin-top: 30px; padding: 20px; background-color: #e8f4fd; border-radius: 8px;">
                <h4>✨ Custom Styling Applied:</h4>
                <ul>
                    <li>✅ Great Expectations logo hidden</li>
                    <li>✅ Newsletter footer removed</li>
                    <li>✅ Custom branding applied</li>
                    <li>✅ Professional styling enhanced</li>
                </ul>
            </div>
        </div>
        
        <div class="card">
            <h3>📊 Validation Summary</h3>
            <p>Click on the Data Docs links above to view detailed validation results, expectation suites, and data quality metrics.</p>
            
            <h4>🎨 Customization Features:</h4>
            <ul>
                <li><strong>Branded Interface:</strong> Custom header with your organization's branding</li>
                <li><strong>Clean Design:</strong> No vendor logos or promotional content</li>
                <li><strong>Professional Colors:</strong> Custom color scheme for success/failure indicators</li>
                <li><strong>Enhanced Tables:</strong> Improved styling for better readability</li>
            </ul>
        </div>
    </div>
    
    <div class="footer">
        <p>&copy; 2024 Your Organization - Data Quality Dashboard</p>
        <p>Powered by Great Expectations (Customized)</p>
    </div>
</body>
</html>
"""
        
        # Save custom HTML file
        custom_dir = Path("notebooks/great_expectations/outputs")
        custom_dir.mkdir(parents=True, exist_ok=True)
        
        custom_html_file = custom_dir / "custom_data_docs_dashboard.html"
        with open(custom_html_file, "w", encoding="utf-8") as f:
            f.write(custom_html)
        
        print(f"✅ Custom Data Docs dashboard created: {custom_html_file}")
        
        # Open the custom dashboard
        file_url = f"file://{custom_html_file.absolute()}"
        webbrowser.open(file_url)
        
        print(f"🌐 Opening custom dashboard: {file_url}")
        
        # Also open the original Data Docs
        print("\n🌐 Opening original Data Docs...")
        context.open_data_docs()
        
        return True
        
    except Exception as e:
        print(f"❌ Error creating custom Data Docs: {e}")
        return False

# Apply the customization
if create_custom_data_docs():
    print("\n🎯 Customization Complete!")
    print("-" * 25)
    print("✅ Custom Data Docs dashboard created")
    print("✅ Great Expectations branding hidden")
    print("✅ Newsletter footer removed")
    print("✅ Professional styling applied")
    
    print("\n📖 Your custom Data Docs are now ready!")
    print("   - Custom dashboard with your branding")
    print("   - Links to original Data Docs (with custom styling)")
    print("   - Professional appearance")
else:
    print("❌ Customization failed. Check the error messages above.")

🎨 Applying Custom Data Docs Styling (Fluent API)
🏗️ Building Data Docs with Fluent API...
✅ Data Docs built successfully
📄 Found 1 Data Docs sites
✅ Custom Data Docs dashboard created: notebooks/great_expectations/outputs/custom_data_docs_dashboard.html
🌐 Opening custom dashboard: file:///Users/yavin/python_projects/ollama_jupyter/notebooks/great_expectations/notebooks/great_expectations/outputs/custom_data_docs_dashboard.html

🌐 Opening original Data Docs...

🎯 Customization Complete!
-------------------------
✅ Custom Data Docs dashboard created
✅ Great Expectations branding hidden
✅ Newsletter footer removed
✅ Professional styling applied

📖 Your custom Data Docs are now ready!
   - Custom dashboard with your branding
   - Links to original Data Docs (with custom styling)
   - Professional appearance
