# Great Expectations Expectation Suites Analysis

This notebook reads Great Expectations expectation suite JSON files and presents the data in DataFrames for analysis.

## Overview
- **Purpose**: Analyze expectation suites from Great Expectations
- **Data Source**: JSON files stored in `BirdiDQ/gx/expectations/`
- **Output**: Structured DataFrames for easy analysis and visualization


- **Suite Metadata**: Name, version, expectation count, creation date
- **Individual Expectations**: Detailed breakdown of all expectations
- **Suite Summaries**: Aggregated views by suite and expectation type
- **Column Coverage**: Which columns have expectations and how many
- **Parameter Analysis**: Range values, mostly parameters, value sets
- **Exports**: CSV files for further analysis

### Key Findings:

**Python Code in Expectations:**
- Python code is embedded in the `meta.notes.content` field of each expectation
- Format: Markdown with code blocks showing `validator.expect_*()` calls
- Includes execution engine information (Pandas, SQL, Oracle, PostgreSQL, etc.)
- Generated by Data Assistants or manually created

**SQL Code:**
- Great Expectations translates Python expectations to SQL when using database backends
- SQL is generated at runtime, not stored in the JSON files
- For databases: Python → Great Expectations → SQL queries
- For files: Python → Pandas operations


In [11]:
import json
import pandas as pd
from pathlib import Path
import os
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set display options for better DataFrame viewing
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', 100)
pd.set_option('display.width', None)


## 1. Locate and Load Expectation Suite JSON Files


In [12]:
# Define the path to the expectations directory
expectations_path = Path('/Users/yavin/python_projects/ollama_jupyter/BirdiDQ/gx/expectations')

# Find all JSON files (excluding hidden files)
json_files = [f for f in expectations_path.glob('*.json') if not f.name.startswith('.')]

print(f"Found {len(json_files)} expectation suite JSON file(s):\n")
for file in sorted(json_files):
    file_size = file.stat().st_size / 1024  # Size in KB
    print(f"   {file.name}")
    print(f"     Size: {file_size:.2f} KB\n")


Found 16 expectation suite JSON file(s):

   Housing_expectation_suite.json
     Size: 0.37 KB

   Housing_onboarding_suite.json
     Size: 0.19 KB

   Housing_onboarding_suite_final.json
     Size: 92.41 KB

   TRANSACTIONS_expectation_suite.json
     Size: 0.74 KB

   TRANSACTIONS_missingness_suite.json
     Size: 0.19 KB

   TRANSACTIONS_missingness_suite_final.json
     Size: 6.08 KB

   TRANSACTIONS_onboarding_suite.json
     Size: 0.19 KB

   TRANSACTIONS_onboarding_suite_final.json
     Size: 47.69 KB

   data_assistant_test_suite.json
     Size: 0.19 KB

   data_assistant_test_suite_final.json
     Size: 47.68 KB

   nyc_taxi_data_expectation_suite.json
     Size: 0.94 KB

   nyc_taxi_data_missingness_suite.json
     Size: 0.19 KB

   nyc_taxi_data_missingness_suite_final.json
     Size: 10.25 KB

   nyc_taxi_data_onboarding_suite.json
     Size: 0.19 KB

   nyc_taxi_data_onboarding_suite_final.json
     Size: 182.59 KB

   test_oracle_suite.json
     Size: 0.63 KB



## 2. Load and Parse Expectation Suites


In [13]:
def load_expectation_json(file_path):
    """Load an expectation suite JSON file and return the parsed data."""
    with open(file_path, 'r') as f:
        return json.load(f)

# Load all expectation suites
expectation_suites = []
for json_file in json_files:
    data = load_expectation_json(json_file)
    expectation_suites.append({
        'file_path': str(json_file),
        'file_name': json_file.name,
        'data': data
    })

print(f" Loaded {len(expectation_suites)} expectation suite(s)")


 Loaded 16 expectation suite(s)


## 3. Extract Suite Metadata


In [14]:
def extract_suite_metadata(suite_data, file_name):
    """Extract high-level metadata from expectation suites."""
    meta = suite_data.get('meta', {})
    
    # Extract citation date if available
    citation_date = 'N/A'
    citations = meta.get('citations', [])
    if citations and len(citations) > 0:
        citation_date = citations[0].get('citation_date', 'N/A')
    
    return {
        'file_name': file_name,
        'suite_name': suite_data.get('expectation_suite_name', 'N/A'),
        'data_asset_type': suite_data.get('data_asset_type', 'N/A'),
        'ge_version': meta.get('great_expectations_version', 'N/A'),
        'expectation_count': len(suite_data.get('expectations', [])),
        'created_date': citation_date,
        'has_citations': len(citations) > 0
    }

# Create metadata DataFrame
metadata_list = [extract_suite_metadata(suite['data'], suite['file_name']) for suite in expectation_suites]
metadata_df = pd.DataFrame(metadata_list)

print("\n Expectation Suite Metadata:")
metadata_df



 Expectation Suite Metadata:


Unnamed: 0,file_name,suite_name,data_asset_type,ge_version,expectation_count,created_date,has_citations
0,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,,0.18.22,11,2025-10-02T12:56:51.210067Z,True
1,data_assistant_test_suite_final.json,data_assistant_test_suite_final,,0.18.22,71,2025-10-02T11:44:11.155755Z,True
2,Housing_expectation_suite.json,Housing_expectation_suite,,0.18.22,1,,False
3,Housing_onboarding_suite.json,Housing_onboarding_suite,,0.18.22,0,,False
4,test_oracle_suite.json,test_oracle_suite,,0.18.22,3,,False
5,nyc_taxi_data_onboarding_suite_final.json,nyc_taxi_data_onboarding_suite_final,,0.18.22,132,2025-10-05T18:01:17.409356Z,True
6,TRANSACTIONS_missingness_suite.json,TRANSACTIONS_missingness_suite,,0.18.22,0,,False
7,TRANSACTIONS_expectation_suite.json,TRANSACTIONS_expectation_suite,,0.18.22,1,,False
8,TRANSACTIONS_onboarding_suite_final.json,TRANSACTIONS_onboarding_suite_final,,0.18.22,71,2025-10-02T12:09:49.133967Z,True
9,nyc_taxi_data_onboarding_suite.json,nyc_taxi_data_onboarding_suite,,0.18.22,0,,False


## 4. Extract Individual Expectations

This section extracts all individual expectations from each suite for detailed analysis.


In [15]:
def extract_expectations(suite_data, file_name, suite_name):
    """Extract individual expectations from a suite."""
    expectations = suite_data.get('expectations', [])
    
    expectations_list = []
    for idx, expectation in enumerate(expectations):
        kwargs = expectation.get('kwargs', {})
        meta = expectation.get('meta', {})
        
        # Extract profiler details if available
        profiler_details = meta.get('profiler_details', {})
        
        expectations_list.append({
            'file_name': file_name,
            'suite_name': suite_name,
            'expectation_index': idx,
            'expectation_type': expectation.get('expectation_type', 'N/A'),
            'column': kwargs.get('column', 'N/A'),
            'min_value': kwargs.get('min_value', 'N/A'),
            'max_value': kwargs.get('max_value', 'N/A'),
            'mostly': kwargs.get('mostly', 'N/A'),
            'value_set_count': len(kwargs.get('value_set', [])) if 'value_set' in kwargs else 0,
            'has_profiler_details': len(profiler_details) > 0,
            'has_notes': 'notes' in meta
        })
    
    return expectations_list

# Create expectations DataFrame
all_expectations = []
for suite in expectation_suites:
    suite_name = suite['data'].get('expectation_suite_name', 'N/A')
    expectations = extract_expectations(suite['data'], suite['file_name'], suite_name)
    all_expectations.extend(expectations)

expectations_df = pd.DataFrame(all_expectations)
print(f"\n Individual Expectations ({len(expectations_df)} expectations):")
expectations_df.head(20)



 Individual Expectations (387 expectations):


Unnamed: 0,file_name,suite_name,expectation_index,expectation_type,column,min_value,max_value,mostly,value_set_count,has_profiler_details,has_notes
0,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,0,expect_column_values_to_not_be_null,TRANSACTION_ID,,,1.0,0,True,False
1,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,1,expect_column_values_to_not_be_null,CUSTOMER_ID,,,0.9,0,True,False
2,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,2,expect_column_values_to_not_be_null,PRODUCT_ID,,,1.0,0,True,False
3,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,3,expect_column_values_to_not_be_null,AMOUNT,,,1.0,0,True,False
4,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,4,expect_column_values_to_not_be_null,QUANTITY,,,1.0,0,True,False
5,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,5,expect_column_values_to_not_be_null,TIMESTAMP,,,1.0,0,True,False
6,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,6,expect_column_values_to_not_be_null,PAYMENT_METHOD,,,1.0,0,True,False
7,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,7,expect_column_values_to_not_be_null,REGION,,,1.0,0,True,False
8,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,8,expect_column_values_to_not_be_null,IS_FRAUDULENT,,,1.0,0,True,False
9,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,9,expect_column_values_to_not_be_null,CUSTOMER_AGE,,,1.0,0,True,False


## 5. Extract Python Code from Expectations

Many expectations include embedded Python code in their metadata showing how to implement them.


In [16]:
import re

def extract_python_code(suite_data, file_name, suite_name):
    """Extract Python code from expectation notes."""
    expectations = suite_data.get('expectations', [])
    
    code_list = []
    for idx, expectation in enumerate(expectations):
        meta = expectation.get('meta', {})
        notes = meta.get('notes', {})
        content = notes.get('content', [])
        
        # Extract Python code from markdown content
        python_code = None
        execution_engine = None
        
        if content and len(content) > 0:
            markdown_text = content[0] if isinstance(content, list) else content
            
            # Extract Python code block
            code_match = re.search(r'```python\n(.*?)\n```', markdown_text, re.DOTALL)
            if code_match:
                python_code = code_match.group(1).strip()
            
            # Extract execution engine
            engine_match = re.search(r'\*\*Execution Engine:\*\* (.+?)(?:\n|$)', markdown_text)
            if engine_match:
                execution_engine = engine_match.group(1).strip()
        
        if python_code:
            code_list.append({
                'file_name': file_name,
                'suite_name': suite_name,
                'expectation_index': idx,
                'expectation_type': expectation.get('expectation_type', 'N/A'),
                'column': expectation.get('kwargs', {}).get('column', 'N/A'),
                'python_code': python_code,
                'execution_engine': execution_engine or 'N/A',
                'has_code': True
            })
    
    return code_list

# Extract Python code from all expectations
all_code = []
for suite in expectation_suites:
    suite_name = suite['data'].get('expectation_suite_name', 'N/A')
    code = extract_python_code(suite['data'], suite['file_name'], suite_name)
    all_code.extend(code)

code_df = pd.DataFrame(all_code)
print(f"\n🐍 Expectations with Python Code: {len(code_df)}")
if len(code_df) > 0:
    # Create a preview column for the table
    display_df = code_df.copy()
    display_df['code_preview'] = display_df['python_code'].str[:60] + '...'
    
    print(f"\nSample Python Code from Expectations:")
    display(display_df[['suite_name', 'expectation_type', 'column', 'execution_engine', 'code_preview']].head(10))
    
    # Show multiple full examples
    print("\n" + "="*100)
    print("📝 FULL PYTHON CODE EXAMPLES:")
    print("="*100)
    
    for i in range(min(5, len(code_df))):
        print(f"\n[{i+1}] Suite: {code_df.iloc[i]['suite_name']}")
        print(f"    Expectation: {code_df.iloc[i]['expectation_type']}")
        print(f"    Column: {code_df.iloc[i]['column']}")
        print(f"    Engine: {code_df.iloc[i]['execution_engine']}")
        print(f"\n    Python Code:")
        print(f"    {code_df.iloc[i]['python_code']}")
        print("-" * 100)
else:
    print("\nNo Python code found in expectation metadata.")



🐍 Expectations with Python Code: 211

Sample Python Code from Expectations:


Unnamed: 0,suite_name,expectation_type,column,execution_engine,code_preview
0,nyc_taxi_data_onboarding_suite_final,expect_table_row_count_to_be_between,,PostgreSQL (SQL) - Generated by Data Assistant,validator.expect_table_row_count_to_be_between(max_value=200...
1,nyc_taxi_data_onboarding_suite_final,expect_table_columns_to_match_set,,PostgreSQL (SQL) - Generated by Data Assistant,validator.expect_table_columns_to_match_set(exact_match=None...
2,nyc_taxi_data_onboarding_suite_final,expect_column_values_to_not_be_null,index,PostgreSQL (SQL) - Generated by Data Assistant,validator.expect_column_values_to_not_be_null(column='index'...
3,nyc_taxi_data_onboarding_suite_final,expect_column_values_to_not_be_null,passenger_count,PostgreSQL (SQL) - Generated by Data Assistant,validator.expect_column_values_to_not_be_null(column='passen...
4,nyc_taxi_data_onboarding_suite_final,expect_column_values_to_not_be_null,trip_distance,PostgreSQL (SQL) - Generated by Data Assistant,validator.expect_column_values_to_not_be_null(column='trip_d...
5,nyc_taxi_data_onboarding_suite_final,expect_column_values_to_not_be_null,store_and_fwd_flag,PostgreSQL (SQL) - Generated by Data Assistant,validator.expect_column_values_to_not_be_null(column='store_...
6,nyc_taxi_data_onboarding_suite_final,expect_column_values_to_not_be_null,payment_type,PostgreSQL (SQL) - Generated by Data Assistant,validator.expect_column_values_to_not_be_null(column='paymen...
7,nyc_taxi_data_onboarding_suite_final,expect_column_values_to_not_be_null,fare_amount,PostgreSQL (SQL) - Generated by Data Assistant,validator.expect_column_values_to_not_be_null(column='fare_a...
8,nyc_taxi_data_onboarding_suite_final,expect_column_values_to_not_be_null,extra,PostgreSQL (SQL) - Generated by Data Assistant,validator.expect_column_values_to_not_be_null(column='extra'...
9,nyc_taxi_data_onboarding_suite_final,expect_column_values_to_not_be_null,mta_tax,PostgreSQL (SQL) - Generated by Data Assistant,validator.expect_column_values_to_not_be_null(column='mta_ta...



📝 FULL PYTHON CODE EXAMPLES:

[1] Suite: nyc_taxi_data_onboarding_suite_final
    Expectation: expect_table_row_count_to_be_between
    Column: N/A
    Engine: PostgreSQL (SQL) - Generated by Data Assistant

    Python Code:
    validator.expect_table_row_count_to_be_between(max_value=20000, min_value=20000)
----------------------------------------------------------------------------------------------------

[2] Suite: nyc_taxi_data_onboarding_suite_final
    Expectation: expect_table_columns_to_match_set
    Column: N/A
    Engine: PostgreSQL (SQL) - Generated by Data Assistant

    Python Code:
    validator.expect_table_columns_to_match_set(exact_match=None, column_set=['tip_amount', 'dropoff_location_id', 'improvement_surcharge', 'congestion_surcharge', 'tolls_amount', 'index', 'rate_code_id', 'extra', 'pickup_location_id', 'trip_distance', 'vendor_id', 'payment_type', 'store_and_fwd_flag', 'pickup', 'mta_tax', 'dropoff', 'passenger_count', 'total_amount', 'fare_amount'])
--------

## View All Python Code for a Specific Suite

You can filter and view all Python code for any suite.


In [17]:
# Choose a suite to view all its Python code
if len(code_df) > 0:
    available_suites = code_df['suite_name'].unique()
    print(f"Available suites with Python code: {len(available_suites)}")
    print(", ".join(available_suites[:5]), "...\n")
    
    # Example: Show all code for the first suite
    example_suite = available_suites[0]
    suite_code = code_df[code_df['suite_name'] == example_suite]
    
    print(f"\n{'='*100}")
    print(f"ALL PYTHON CODE FOR SUITE: {example_suite}")
    print(f"{'='*100}")
    print(f"\nTotal expectations with code: {len(suite_code)}\n")
    
    for idx, row in suite_code.iterrows():
        print(f"[{row['expectation_index'] + 1}] {row['expectation_type']}")
        if row['column'] != 'N/A':
            print(f"    Column: {row['column']}")
        print(f"    Engine: {row['execution_engine']}")
        print(f"\n    {row['python_code']}\n")
        print("-" * 100)


Available suites with Python code: 4
nyc_taxi_data_onboarding_suite_final, TRANSACTIONS_expectation_suite, Housing_onboarding_suite_final, nyc_taxi_data_expectation_suite ...


ALL PYTHON CODE FOR SUITE: nyc_taxi_data_onboarding_suite_final

Total expectations with code: 132

[1] expect_table_row_count_to_be_between
    Engine: PostgreSQL (SQL) - Generated by Data Assistant

    validator.expect_table_row_count_to_be_between(max_value=20000, min_value=20000)

----------------------------------------------------------------------------------------------------
[2] expect_table_columns_to_match_set
    Engine: PostgreSQL (SQL) - Generated by Data Assistant

    validator.expect_table_columns_to_match_set(exact_match=None, column_set=['tip_amount', 'dropoff_location_id', 'improvement_surcharge', 'congestion_surcharge', 'tolls_amount', 'index', 'rate_code_id', 'extra', 'pickup_location_id', 'trip_distance', 'vendor_id', 'payment_type', 'store_and_fwd_flag', 'pickup', 'mta_tax', 'dropoff', '

## 6. Generate Descriptions Using Ollama LLM

Use the local Ollama model to generate human-readable descriptions for each expectation.


In [23]:
import requests
import os
import time
from dotenv import load_dotenv

# Load environment variables
load_dotenv('/Users/yavin/python_projects/ollama_jupyter/.env')

# Get cloud configuration from environment
OLLAMA_BASE_URL = os.getenv("OLLAMA_CLOUD_BASE_URL", "https://ollama.com")
OLLAMA_MODEL = os.getenv("OLLAMA_CLOUD_MODEL", "gpt-oss:20b")
OLLAMA_API_KEY = os.getenv("OLLAMA_API_KEY", "")

print(f"🤖 Using Ollama Cloud")
print(f"   Base URL: {OLLAMA_BASE_URL}")
print(f"   Model: {OLLAMA_MODEL}")
print(f"   API Key: {'✓ Set' if OLLAMA_API_KEY else '✗ Not Set'}")

def generate_expectation_description(row, retry_count=0):
    """
    Generate a human-readable description for an expectation using Ollama Cloud API.
    """
    # Build context from the row
    expectation_type = row['expectation_type']
    column = row.get('column', 'N/A')
    min_value = row.get('min_value', 'N/A')
    max_value = row.get('max_value', 'N/A')
    mostly = row.get('mostly', 'N/A')
    value_set_count = row.get('value_set_count', 0)
    
    # Create a concise prompt
    prompt = f"""Generate a brief, business-friendly description (max 2 sentences) for this data quality check:

Expectation Type: {expectation_type}
Column: {column}
Min Value: {min_value}
Max Value: {max_value}
Mostly Threshold: {mostly}
Value Set Count: {value_set_count}

Description:"""
    
    try:
        # Try different API endpoints
        endpoints_to_try = [
            f"{OLLAMA_BASE_URL}/api/chat",
            f"{OLLAMA_BASE_URL}/api/generate",
            f"{OLLAMA_BASE_URL}/v1/chat/completions"
        ]
        
        for endpoint in endpoints_to_try:
            try:
                print(f"  → Trying endpoint: {endpoint}")
                
                headers = {
                    "Authorization": f"Bearer {OLLAMA_API_KEY}",
                    "Content-Type": "application/json"
                }
                
                # Try different payload formats
                payloads_to_try = [
                    # Format 1: Standard chat format
                    {
                        "model": OLLAMA_MODEL,
                        "messages": [{"role": "user", "content": prompt}],
                        "stream": False,
                        "options": {"temperature": 0.3, "num_predict": 100}
                    },
                    # Format 2: Generate format
                    {
                        "model": OLLAMA_MODEL,
                        "prompt": prompt,
                        "stream": False,
                        "options": {"temperature": 0.3, "num_predict": 100}
                    },
                    # Format 3: OpenAI-compatible format
                    {
                        "model": OLLAMA_MODEL,
                        "messages": [{"role": "user", "content": prompt}],
                        "max_tokens": 100,
                        "temperature": 0.3
                    }
                ]
                
                for payload in payloads_to_try:
                    try:
                        print(f"    → Trying payload format...")
                        response = requests.post(endpoint, json=payload, headers=headers, timeout=30)
                        
                        print(f"    → Status code: {response.status_code}")
                        
                        if response.status_code == 200:
                            result = response.json()
                            print(f"    → Response keys: {list(result.keys())}")
                            
                            # Extract description from different response formats
                            description = None
                            if 'message' in result and 'content' in result['message']:
                                description = result['message']['content'].strip()
                            elif 'response' in result:
                                description = result['response'].strip()
                            elif 'choices' in result and len(result['choices']) > 0:
                                description = result['choices'][0]['message']['content'].strip()
                            
                            if description:
                                # Remove any markdown or extra formatting
                                description = description.replace('**', '').replace('*', '').strip()
                                print(f"    → Generated: {description[:80]}...")
                                return description
                        
                        elif response.status_code == 502:
                            print(f"    → 502 Bad Gateway - trying next format...")
                            continue
                        else:
                            print(f"    → Error {response.status_code}: {response.text[:100]}")
                            
                    except Exception as e:
                        print(f"    → Payload error: {e}")
                        continue
                
            except Exception as e:
                print(f"  → Endpoint error: {e}")
                continue
        
        # If all attempts failed, return fallback
        print(f"  ✗ All API attempts failed")
        return f"Validates {expectation_type.replace('expect_', '').replace('_', ' ')} for column '{column}'"
    
    except Exception as e:
        print(f"  ✗ Unexpected error: {type(e).__name__}: {e}")
        return f"Validates {expectation_type.replace('expect_', '').replace('_', ' ')} for column '{column}'"

print("\\n✨ Ready to generate descriptions using Ollama Cloud API with retry logic...")


🤖 Using Ollama Cloud
   Base URL: https://ollama.com
   Model: gpt-oss:20b
   API Key: ✓ Set
\n✨ Ready to generate descriptions using Ollama Cloud API with retry logic...


In [24]:
# Generate descriptions for a sample (first 10 expectations)
# You can increase this or apply to all expectations

if len(expectations_df) > 0:
    # Start with a smaller sample to test
    sample_size = min(10, len(expectations_df))
    sample_df = expectations_df.head(sample_size).copy()
    
    print(f"Generating descriptions for {sample_size} expectations...")
    print("This may take a few moments...\n")
    
    # Generate descriptions
    descriptions = []
    for idx, row in sample_df.iterrows():
        desc = generate_expectation_description(row)
        descriptions.append(desc)
        print(f"[{len(descriptions)}/{sample_size}] Generated for: {row['expectation_type']}")
        
        # Add delay between requests to avoid rate limiting
        if len(descriptions) < sample_size:
            print("  → Waiting 2 seconds before next request...")
            time.sleep(2)
    
    # Add descriptions to the dataframe
    sample_df['description'] = descriptions
    
    print(f"\n✅ Generated {len(descriptions)} descriptions!\n")
    print("="*100)
    print("EXPECTATIONS WITH LLM-GENERATED DESCRIPTIONS:")
    print("="*100)
    
    # Display the results
    display(sample_df[['suite_name', 'expectation_type', 'column', 'description']].head(10))
    
else:
    print("No expectations found to generate descriptions for.")


Generating descriptions for 10 expectations...
This may take a few moments...

  → Trying endpoint: https://ollama.com/api/chat
    → Trying payload format...
    → Status code: 200
    → Response keys: ['model', 'created_at', 'message', 'done', 'total_duration', 'prompt_eval_count', 'eval_count']
    → Trying payload format...
    → Status code: 502
    → 502 Bad Gateway - trying next format...
    → Trying payload format...
    → Status code: 200
    → Payload error: Extra data: line 2 column 1 (char 143)
  → Trying endpoint: https://ollama.com/api/generate
    → Trying payload format...
    → Status code: 200
    → Response keys: ['model', 'created_at', 'response', 'done', 'done_reason']
    → Trying payload format...
    → Status code: 200
    → Response keys: ['model', 'created_at', 'response', 'thinking', 'done', 'total_duration', 'prompt_eval_count', 'eval_count']
    → Trying payload format...
    → Status code: 200
    → Response keys: ['model', 'created_at', 'response', 'done

Unnamed: 0,suite_name,expectation_type,column,description
0,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,TRANSACTION_ID,Validates column values to not be null for column 'TRANSACTION_ID'
1,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,CUSTOMER_ID,Validates column values to not be null for column 'CUSTOMER_ID'
2,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,PRODUCT_ID,Validates column values to not be null for column 'PRODUCT_ID'
3,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,AMOUNT,Validates column values to not be null for column 'AMOUNT'
4,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,QUANTITY,Validates column values to not be null for column 'QUANTITY'
5,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,TIMESTAMP,Validates column values to not be null for column 'TIMESTAMP'
6,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,PAYMENT_METHOD,Validates column values to not be null for column 'PAYMENT_METHOD'
7,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,REGION,Validates column values to not be null for column 'REGION'
8,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,IS_FRAUDULENT,Validates column values to not be null for column 'IS_FRAUDULENT'
9,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,CUSTOMER_AGE,Validates column values to not be null for column 'CUSTOMER_AGE'


## Optional: Generate Descriptions for ALL Expectations

**Note:** This cell is optional and can take several minutes depending on the number of expectations.
Uncomment and run if you want to generate descriptions for all expectations.


In [None]:
# UNCOMMENT TO RUN FOR ALL EXPECTATIONS
# WARNING: This may take 10-15 minutes for large expectation suites

if len(expectations_df) > 0:
    print(f"Generating descriptions for ALL {len(expectations_df)} expectations...")
    print("Progress will be displayed every 10 expectations...\n")
    
    expectations_with_desc = expectations_df.copy()
    descriptions = []
    
    for idx, row in expectations_with_desc.iterrows():
        desc = generate_expectation_description(row)
        descriptions.append(desc)
        
        # Show progress every 10 expectations
        if (len(descriptions) % 10 == 0):
            print(f"[{len(descriptions)}/{len(expectations_df)}] Processed {len(descriptions)} expectations...")
        
        # Add delay between requests to avoid rate limiting
        if len(descriptions) < len(expectations_df):
            print("  → Waiting 2 seconds before next request...")
            time.sleep(2)
    
    expectations_with_desc['description'] = descriptions
    
    print(f"\n✅ Generated {len(descriptions)} descriptions!\n")
    
    # Display sample
    display(expectations_with_desc[['suite_name', 'expectation_type', 'column', 'description']].head(20))
    
    # Save to CSV -> didnt work the way i thought it would 1st run time 76mins. 
    # output_path = Path('/Users/yavin/python_projects/ollama_jupyter/notebooks/great_expectations/exports/expectations_with_descriptions.csv')
    # expectations_with_desc.to_csv(output_path, index=False)
    # print(f"\n💾 Saved to: {output_path}")
else:
    print("No expectations found.")

print("ℹ️  Uncomment the code above to generate descriptions for all expectations.")


Generating descriptions for ALL 387 expectations...
Progress will be displayed every 10 expectations...

  → Trying endpoint: https://ollama.com/api/chat
    → Trying payload format...
    → Status code: 200
    → Response keys: ['model', 'created_at', 'message', 'done', 'total_duration', 'prompt_eval_count', 'eval_count']
    → Trying payload format...
    → Status code: 502
    → 502 Bad Gateway - trying next format...
    → Trying payload format...
    → Status code: 200
    → Payload error: Extra data: line 2 column 1 (char 143)
  → Trying endpoint: https://ollama.com/api/generate
    → Trying payload format...
    → Status code: 200
    → Response keys: ['model', 'created_at', 'response', 'done', 'done_reason']
    → Trying payload format...
    → Status code: 200
    → Response keys: ['model', 'created_at', 'response', 'thinking', 'done', 'total_duration', 'prompt_eval_count', 'eval_count']
    → Trying payload format...
    → Status code: 200
    → Response keys: ['model', 'crea

Unnamed: 0,suite_name,expectation_type,column,description
0,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,TRANSACTION_ID,Validates column values to not be null for column 'TRANSACTION_ID'
1,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,CUSTOMER_ID,Validates column values to not be null for column 'CUSTOMER_ID'
2,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,PRODUCT_ID,Validates column values to not be null for column 'PRODUCT_ID'
3,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,AMOUNT,Validates column values to not be null for column 'AMOUNT'
4,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,QUANTITY,Validates column values to not be null for column 'QUANTITY'
5,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,TIMESTAMP,Validates column values to not be null for column 'TIMESTAMP'
6,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,PAYMENT_METHOD,Validates column values to not be null for column 'PAYMENT_METHOD'
7,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,REGION,Validates column values to not be null for column 'REGION'
8,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,IS_FRAUDULENT,Validates column values to not be null for column 'IS_FRAUDULENT'
9,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,CUSTOMER_AGE,This check guarantees that every


OSError: Cannot save file into a non-existent directory: '/Users/yavin/python_projects/ollama_jupyter/notebooks/great_expectations/exports'

In [30]:
display(expectations_with_desc[['suite_name', 'expectation_type', 'column', 'description']].head(60))

Unnamed: 0,suite_name,expectation_type,column,description
0,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,TRANSACTION_ID,Validates column values to not be null for column 'TRANSACTION_ID'
1,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,CUSTOMER_ID,Validates column values to not be null for column 'CUSTOMER_ID'
2,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,PRODUCT_ID,Validates column values to not be null for column 'PRODUCT_ID'
3,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,AMOUNT,Validates column values to not be null for column 'AMOUNT'
4,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,QUANTITY,Validates column values to not be null for column 'QUANTITY'
5,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,TIMESTAMP,Validates column values to not be null for column 'TIMESTAMP'
6,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,PAYMENT_METHOD,Validates column values to not be null for column 'PAYMENT_METHOD'
7,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,REGION,Validates column values to not be null for column 'REGION'
8,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,IS_FRAUDULENT,Validates column values to not be null for column 'IS_FRAUDULENT'
9,TRANSACTIONS_missingness_suite_final,expect_column_values_to_not_be_null,CUSTOMER_AGE,This check guarantees that every


In [31]:
output_path = Path('/Users/yavin/python_projects/ollama_jupyter/notebooks/great_expectations/notebooks/great_expectations/outputs/expectations_with_descriptions.csv')
expectations_with_desc.to_csv(output_path, index=False)

In [32]:
ts = pd.read_csv(output_path)
ts.head()

Unnamed: 0,file_name,suite_name,expectation_index,expectation_type,column,min_value,max_value,mostly,value_set_count,has_profiler_details,has_notes,description
0,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,0,expect_column_values_to_not_be_null,TRANSACTION_ID,,,1.0,0,True,False,Validates column values to not be null for column 'TRANSACTION_ID'
1,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,1,expect_column_values_to_not_be_null,CUSTOMER_ID,,,0.9,0,True,False,Validates column values to not be null for column 'CUSTOMER_ID'
2,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,2,expect_column_values_to_not_be_null,PRODUCT_ID,,,1.0,0,True,False,Validates column values to not be null for column 'PRODUCT_ID'
3,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,3,expect_column_values_to_not_be_null,AMOUNT,,,1.0,0,True,False,Validates column values to not be null for column 'AMOUNT'
4,TRANSACTIONS_missingness_suite_final.json,TRANSACTIONS_missingness_suite_final,4,expect_column_values_to_not_be_null,QUANTITY,,,1.0,0,True,False,Validates column values to not be null for column 'QUANTITY'
