# Test Suite: Schema Creation and Architecture

**Purpose:** Validate the three-tier medallion architecture schema creation

**Scope:**
- Schema existence (bronze, silver, gold)
- Ownership and authorization
- Privilege verification
- Architecture compliance
- Idempotency testing
- No unexpected schemas

**Testing Strategy:**
- Existence validation (all 3 schemas present)
- Naming validation (exact case-sensitive matches)
- Ownership validation (correct user assigned)
- Privilege validation (usage and creation rights)
- Architecture validation (3-tier medallion pattern)
- Isolation validation (no extra schemas)

**Prerequisites:**
- PostgreSQL server running
- sql_retail_analytics_warehouse database exists
- `setup/create_schemas.sql` has been executed
- Connection credentials available
- Required packages: psycopg2, pytest, ipytest, pandas

## Setup: Import Dependencies & Configure Connection

In [1]:
import os
import psycopg2
from psycopg2 import sql
import pytest
import ipytest
import pandas as pd

# Configure ipytest for notebook usage
ipytest.autoconfig()

# Database connection parameters
DB_CONFIG = {
    'host': 'localhost',
    'database': 'sql_retail_analytics_warehouse',
    'user': 'postgres',
    'password': os.getenv('POSTGRES_PASSWORD', 'your_password_here')
}

# Expected schemas (medallion architecture) - alphabetically sorted
EXPECTED_SCHEMAS = ['bronze', 'gold', 'silver']

print("✅ Dependencies imported successfully")

✅ Dependencies imported successfully


## Fixtures: Database Connections

In [2]:
@pytest.fixture(scope='module')
def db_connection():
    """Connection to sql_retail_analytics_warehouse database."""
    conn = psycopg2.connect(**DB_CONFIG)
    conn.autocommit = True
    yield conn
    conn.close()

@pytest.fixture(scope='module')
def db_cursor(db_connection):
    """Cursor for warehouse database."""
    cursor = db_connection.cursor()
    yield cursor
    cursor.close()

print("✅ Fixtures defined")

✅ Fixtures defined


## Test Suite 1: Schema Existence

**Tests in this suite:**
1. `test_bronze_schema_exists` - Queries information_schema to verify bronze schema exists
2. `test_silver_schema_exists` - Queries information_schema to verify silver schema exists
3. `test_gold_schema_exists` - Queries information_schema to verify gold schema exists
4. `test_all_three_schemas_exist` - Validates all 3 medallion schemas exist together

**How these tests work:**
- **Test 1** queries information_schema for bronze schema:
  - ✅ Success: count = 1 (bronze schema exists)
  - ❌ Failure: count ≠ 1 (schema missing or duplicated)
  - **Purpose**: Validates raw data layer exists for source system replicas
  - **Bronze layer role**: Stores unprocessed data from CRM and ERP systems
  
- **Test 2** queries information_schema for silver schema:
  - ✅ Success: count = 1 (silver schema exists)
  - ❌ Failure: count ≠ 1 (schema missing or duplicated)
  - **Purpose**: Validates cleansed data layer exists for data quality processing
  - **Silver layer role**: Stores validated, deduplicated, and type-converted data
  
- **Test 3** queries information_schema for gold schema:
  - ✅ Success: count = 1 (gold schema exists)
  - ❌ Failure: count ≠ 1 (schema missing or duplicated)
  - **Purpose**: Validates analytics-ready layer exists for business intelligence
  - **Gold layer role**: Stores dimensional models and pre-aggregated metrics
  
- **Test 4** queries all three medallion schemas together:
  - ✅ Success: schemas = ['bronze', 'gold', 'silver'] (alphabetically sorted)
  - ❌ Failure: Missing schemas or extra schemas
  - **Purpose**: Validates complete 3-tier medallion architecture is in place
  - **Why together**: Ensures no partial deployment (all layers must coexist)

**🏗️ Medallion Architecture Pattern:**
The 3-tier architecture separates data processing concerns:
- **Bronze**: Raw data ingestion (no transformations)
- **Silver**: Data quality and standardization
- **Gold**: Business-ready analytics models

This separation enables:
- Independent layer evolution
- Clear data lineage tracking
- Incremental quality improvements
- Simplified debugging and auditing

In [3]:
%%ipytest -vv

def test_bronze_schema_exists(db_cursor):
    """Verify bronze schema exists."""
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.schemata
        WHERE schema_name = 'bronze'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "Bronze schema must exist"

def test_silver_schema_exists(db_cursor):
    """Verify silver schema exists."""
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.schemata
        WHERE schema_name = 'silver'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "Silver schema must exist"

def test_gold_schema_exists(db_cursor):
    """Verify gold schema exists."""
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.schemata
        WHERE schema_name = 'gold'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "Gold schema must exist"

def test_all_three_schemas_exist(db_cursor):
    """Verify all three medallion schemas exist together."""
    db_cursor.execute("""
        SELECT schema_name
        FROM information_schema.schemata
        WHERE schema_name IN ('bronze', 'silver', 'gold')
        ORDER BY schema_name
    """)
    
    schemas = [row[0] for row in db_cursor.fetchall()]
    assert schemas == EXPECTED_SCHEMAS, \
        f"Expected {EXPECTED_SCHEMAS}, got {schemas}"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 4 items

t_218c7ee2481e45818fccc417d2be7161.py::test_bronze_schema_exists collected 4 items

t_218c7ee2481e45818fccc417d2be7161.py::test_bronze_schema_exists [32mPASSED[0m[32m                      [ 25%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_silver_schema_exists [32mPASSED[0m[32m                      [ 50%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_gold_schema_exists [32mPASSED[0m[32m                        [ 75%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_all_three_schemas_exist [32mPASSED[0m[32m                   [100%][0m[32mPASSED[0m[32m                      [ 25%][0m
t_218c7ee2481e45818

## Test Suite 2: Schema Naming Validation

**Tests in this suite:**
1. `test_schema_names_lowercase` - Verifies all schema names are lowercase (PostgreSQL convention)
2. `test_schema_names_no_special_chars` - Validates schema names contain only alphabetic characters
3. `test_schema_names_exact_match` - Cross-validates schema names from pg_catalog (exact case match)

**How these tests work:**
- **Test 1** checks lowercase convention:
  - Queries all medallion schemas from information_schema
  - Uses Python's `islower()` to validate each schema name
  - ✅ Success: All schemas are lowercase ('bronze', 'silver', 'gold')
  - ❌ Failure: Any schema has uppercase letters (e.g., 'Bronze', 'SILVER')
  - **Purpose**: Ensures PostgreSQL naming best practices (unquoted identifiers are lowercase)
  - **Impact**: Quoted identifiers ("Bronze") are case-sensitive and cause query errors
  
- **Test 2** validates alphanumeric naming:
  - Uses Python's `isalpha()` to check for special characters
  - ✅ Success: All schemas contain only letters
  - ❌ Failure: Schema contains numbers, underscores, or special chars (e.g., 'bronze_v2', 'gold$')
  - **Purpose**: Enforces clean naming without version suffixes or special characters
  - **Rationale**: Simple names are easier to type, reference, and maintain
  
- **Test 3** cross-validates from pg_catalog:
  - Queries pg_namespace (system catalog) for authoritative schema names
  - Compares against expected ['bronze', 'gold', 'silver'] (alphabetically sorted)
  - ✅ Success: Exact match on all three schema names
  - ❌ Failure: Name mismatch or case difference
  - **Purpose**: Validates naming consistency across different catalog views
  - **Why pg_catalog**: System catalog is the authoritative source for object names

**📋 Naming Best Practices Enforced:**
- Lowercase only (PostgreSQL standard)
- No special characters (simplicity)
- No version numbers (schemas are environment-specific, not versioned)
- Descriptive names (bronze/silver/gold clearly indicate data quality tier)
- No prefixes/suffixes (avoid schema_bronze, bronze_layer, etc.)

In [4]:
%%ipytest -vv

def test_schema_names_lowercase(db_cursor):
    """Verify schema names are lowercase (PostgreSQL standard)."""
    db_cursor.execute("""
        SELECT schema_name
        FROM information_schema.schemata
        WHERE schema_name IN ('bronze', 'silver', 'gold')
    """)
    
    schemas = [row[0] for row in db_cursor.fetchall()]
    
    for schema in schemas:
        assert schema.islower(), f"Schema '{schema}' should be lowercase"

def test_schema_names_no_special_chars(db_cursor):
    """Verify schema names contain only alphanumeric characters."""
    db_cursor.execute("""
        SELECT schema_name
        FROM information_schema.schemata
        WHERE schema_name IN ('bronze', 'silver', 'gold')
    """)
    
    schemas = [row[0] for row in db_cursor.fetchall()]
    
    for schema in schemas:
        assert schema.isalpha(), \
            f"Schema '{schema}' should contain only letters"

def test_schema_names_exact_match(db_cursor):
    """Verify schema names match exactly (case-sensitive check)."""
    db_cursor.execute("""
        SELECT nspname
        FROM pg_catalog.pg_namespace
        WHERE nspname IN ('bronze', 'silver', 'gold')
        ORDER BY nspname
    """)
    
    schemas = [row[0] for row in db_cursor.fetchall()]
    assert schemas == EXPECTED_SCHEMAS, \
        f"Schema names don't match exactly: {schemas}"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 3 items

t_218c7ee2481e45818fccc417d2be7161.py::test_schema_names_lowercase collected 3 items

t_218c7ee2481e45818fccc417d2be7161.py::test_schema_names_lowercase [32mPASSED[0m[32m                    [ 33%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_schema_names_no_special_chars [32mPASSED[0m[32m             [ 66%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_schema_names_exact_match [32mPASSED[0m[32m                  [100%][0m[32mPASSED[0m[32m                    [ 33%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_schema_names_no_special_chars [32mPASSED[0m[32m             [ 66%][0m
t_218c7ee2481e45818

## Test Suite 3: Schema Ownership

**Tests in this suite:**
1. `test_bronze_schema_has_owner` - Verifies bronze schema has a valid owner assigned
2. `test_silver_schema_has_owner` - Verifies silver schema has a valid owner assigned
3. `test_gold_schema_has_owner` - Verifies gold schema has a valid owner assigned
4. `test_all_schemas_same_owner` - Validates all medallion schemas share the same owner (consistency)

**How these tests work:**
- **Test 1** queries bronze schema ownership from pg_namespace:
  - Uses `pg_get_userbyid(nspowner)` to convert OID to username
  - ✅ Success: owner is non-NULL and non-empty string
  - ❌ Failure: owner is NULL or empty string
  - **Purpose**: Validates bronze schema has proper ownership for privilege management
  - **Security**: Schema owner controls all objects within the schema
  
- **Test 2** queries silver schema ownership:
  - Same approach as Test 1 for silver schema
  - ✅ Success: owner is non-NULL and non-empty string
  - ❌ Failure: owner is NULL or empty string
  - **Purpose**: Validates silver schema ownership for ETL pipeline access control
  
- **Test 3** queries gold schema ownership:
  - Same approach as Test 1 for gold schema
  - ✅ Success: owner is non-NULL and non-empty string
  - ❌ Failure: owner is NULL or empty string
  - **Purpose**: Validates gold schema ownership for analytics access control
  
- **Test 4** validates ownership consistency across all schemas:
  - Queries all three schema owners from pg_namespace
  - Creates a set of unique owners
  - ✅ Success: len(set(owners)) == 1 (all schemas have same owner)
  - ❌ Failure: Multiple different owners found
  - **Purpose**: Ensures consistent privilege management across all tiers
  - **Why consistency matters**: 
    - Simplifies permission grants
    - Prevents privilege escalation issues
    - Ensures uniform access control policies
    - Easier to audit and maintain

**🔐 Ownership Security Model:**
- **Schema owner** has unrestricted access to all objects in the schema
- **Same owner** across tiers simplifies ETL pipeline credentials
- **Owner privileges** include CREATE, ALTER, DROP on all schema objects
- **Delegation**: Owner can grant specific privileges to other users/roles
- **Best practice**: Use service account as owner, not individual users

In [5]:
%%ipytest -vv

def test_bronze_schema_has_owner(db_cursor):
    """Verify bronze schema has an owner assigned."""
    db_cursor.execute("""
        SELECT pg_catalog.pg_get_userbyid(nspowner) AS owner
        FROM pg_catalog.pg_namespace
        WHERE nspname = 'bronze'
    """)
    
    owner = db_cursor.fetchone()[0]
    assert owner is not None, "Bronze schema must have an owner"
    assert len(owner) > 0, "Owner name should not be empty"

def test_silver_schema_has_owner(db_cursor):
    """Verify silver schema has an owner assigned."""
    db_cursor.execute("""
        SELECT pg_catalog.pg_get_userbyid(nspowner) AS owner
        FROM pg_catalog.pg_namespace
        WHERE nspname = 'silver'
    """)
    
    owner = db_cursor.fetchone()[0]
    assert owner is not None, "Silver schema must have an owner"
    assert len(owner) > 0, "Owner name should not be empty"

def test_gold_schema_has_owner(db_cursor):
    """Verify gold schema has an owner assigned."""
    db_cursor.execute("""
        SELECT pg_catalog.pg_get_userbyid(nspowner) AS owner
        FROM pg_catalog.pg_namespace
        WHERE nspname = 'gold'
    """)
    
    owner = db_cursor.fetchone()[0]
    assert owner is not None, "Gold schema must have an owner"
    assert len(owner) > 0, "Owner name should not be empty"

def test_all_schemas_same_owner(db_cursor):
    """Verify all medallion schemas have the same owner (consistency)."""
    db_cursor.execute("""
        SELECT
            nspname,
            pg_catalog.pg_get_userbyid(nspowner) AS owner
        FROM pg_catalog.pg_namespace
        WHERE nspname IN ('bronze', 'silver', 'gold')
        ORDER BY nspname
    """)
    
    results = db_cursor.fetchall()
    owners = [row[1] for row in results]
    
    # All owners should be the same
    assert len(set(owners)) == 1, \
        f"All schemas should have same owner, got: {dict(results)}"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 4 items

t_218c7ee2481e45818fccc417d2be7161.py::test_bronze_schema_has_owner collected 4 items

t_218c7ee2481e45818fccc417d2be7161.py::test_bronze_schema_has_owner [32mPASSED[0m[32m                   [ 25%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_silver_schema_has_owner [32mPASSED[0m[32m                   [ 50%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_gold_schema_has_owner [32mPASSED[0m[32m                     [ 75%][0m[32mPASSED[0m[32m                   [ 25%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_silver_schema_has_owner [32mPASSED[0m[32m                   [ 50%][0m
t_218c7ee2481e45818

## Test Suite 4: Schema Privileges

**Tests in this suite:**
1. `test_can_use_bronze_schema` - Validates USAGE privilege on bronze schema (can set search_path)
2. `test_can_create_table_in_bronze` - Validates CREATE privilege on bronze schema (DDL operations)
3. `test_can_create_table_in_silver` - Validates CREATE privilege on silver schema (DDL operations)
4. `test_can_create_table_in_gold` - Validates CREATE privilege on gold schema (DDL operations)

**How these tests work:**
- **Test 1** performs functional USAGE privilege test:
  - Executes `SET search_path TO bronze, public`
  - Queries current search_path setting
  - ✅ Success: 'bronze' appears in search_path
  - ❌ Failure: SQL error or bronze not in search_path
  - **Purpose**: Validates current user can reference objects in bronze schema
  - **USAGE privilege**: Required to access any objects within a schema
  - **Without USAGE**: Tables exist but are invisible/inaccessible to user
  
- **Test 2** performs functional CREATE privilege test on bronze:
  1. Creates test table: `bronze.test_privilege_table`
  2. Verifies table exists in information_schema
  3. Cleans up by dropping test table
  - ✅ Success: Table created successfully (count = 1)
  - ❌ Failure: SQL permission error or table not found
  - **Purpose**: Validates current user can create Bronze layer tables (required for ETL ingestion)
  - **Side effects**: Creates and removes test_privilege_table (ephemeral)
  
- **Test 3** performs functional CREATE privilege test on silver:
  - Same approach as Test 2 for silver schema
  - ✅ Success: Can create tables in silver schema
  - ❌ Failure: Permission denied on CREATE
  - **Purpose**: Validates ETL transformations can write to Silver layer
  - **Silver use case**: Data quality scripts must create cleansed tables
  
- **Test 4** performs functional CREATE privilege test on gold:
  - Same approach as Test 2 for gold schema
  - ✅ Success: Can create tables in gold schema
  - ❌ Failure: Permission denied on CREATE
  - **Purpose**: Validates analytics pipelines can write dimensional models
  - **Gold use case**: BI aggregation scripts must create denormalized tables

**🔑 Privilege Hierarchy:**
```
USAGE privilege:
  - Required to access schema at all
  - Allows seeing schema in search_path
  - Does NOT grant access to objects within schema
  
CREATE privilege:
  - Allows creating new objects (tables, views, functions)
  - Requires USAGE privilege as prerequisite
  - Schema owner has CREATE by default
  
Object privileges (TABLE):
  - SELECT, INSERT, UPDATE, DELETE
  - Separate from schema-level privileges
  - Must be granted on each table individually
```

**⚠️ Why Functional Testing:**
- Catalog queries (`has_schema_privilege()`) show theoretical permissions
- Functional tests verify **actual ability** to perform operations
- Catches edge cases: row-level security, connection pooler restrictions
- Tests **effective permissions** (combination of role memberships)

In [6]:
%%ipytest -vv

def test_can_use_bronze_schema(db_cursor):
    """Verify current user has USAGE privilege on bronze schema."""
    # Try to set search_path (requires USAGE)
    db_cursor.execute("SET search_path TO bronze, public")
    
    db_cursor.execute("SHOW search_path")
    search_path = db_cursor.fetchone()[0]
    
    assert 'bronze' in search_path, "Should have USAGE on bronze schema"

def test_can_create_table_in_bronze(db_cursor):
    """Verify current user can create tables in bronze schema."""
    db_cursor.execute("""
        CREATE TABLE IF NOT EXISTS bronze.test_privilege_table (
            id SERIAL PRIMARY KEY,
            test_column TEXT
        )
    """)
    
    # Verify table was created
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_name = 'test_privilege_table'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "Should be able to create tables in bronze schema"
    
    # Clean up
    db_cursor.execute("DROP TABLE IF EXISTS bronze.test_privilege_table")

def test_can_create_table_in_silver(db_cursor):
    """Verify current user can create tables in silver schema."""
    db_cursor.execute("""
        CREATE TABLE IF NOT EXISTS silver.test_privilege_table (
            id SERIAL PRIMARY KEY
        )
    """)
    
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.tables
        WHERE table_schema = 'silver'
        AND table_name = 'test_privilege_table'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "Should be able to create tables in silver schema"
    
    db_cursor.execute("DROP TABLE IF EXISTS silver.test_privilege_table")

def test_can_create_table_in_gold(db_cursor):
    """Verify current user can create tables in gold schema."""
    db_cursor.execute("""
        CREATE TABLE IF NOT EXISTS gold.test_privilege_table (
            id SERIAL PRIMARY KEY
        )
    """)
    
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.tables
        WHERE table_schema = 'gold'
        AND table_name = 'test_privilege_table'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "Should be able to create tables in gold schema"
    
    db_cursor.execute("DROP TABLE IF EXISTS gold.test_privilege_table")

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 4 items

t_218c7ee2481e45818fccc417d2be7161.py::test_can_use_bronze_schema collected 4 items

t_218c7ee2481e45818fccc417d2be7161.py::test_can_use_bronze_schema [32mPASSED[0m[32m                     [ 25%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_can_create_table_in_bronze [32mPASSED[0m[32m                     [ 25%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_can_create_table_in_bronze [32mPASSED[0m[32m                [ 50%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_can_create_table_in_silver [32mPASSED[0m[32m                [ 75%][0m[32mPASSED[0m[32m                [ 50%][0m
t_218c7ee2481e45818

## Test Suite 5: Architecture Compliance

**Tests in this suite:**
1. `test_three_tier_architecture_complete` - Validates exactly 3 medallion schemas exist (no more, no less)
2. `test_schema_order_logical` - Verifies alphabetical ordering matches expected pattern
3. `test_no_missing_tier` - Validates each required tier (bronze/silver/gold) is present

**How these tests work:**
- **Test 1** validates complete architecture:
  - Counts schemas where name IN ('bronze', 'silver', 'gold')
  - ✅ Success: count = 3 (complete 3-tier architecture)
  - ❌ Failure: count < 3 (incomplete deployment) or count > 3 (duplicates)
  - **Purpose**: Ensures full medallion architecture is deployed
  - **Critical for ETL**: All three tiers must exist for data flow to work
  - **Prevents partial deployment**: Catches incomplete schema creation scripts
  
- **Test 2** validates logical alphabetical ordering:
  - Queries schemas sorted alphabetically
  - ✅ Success: schemas = ['bronze', 'gold', 'silver'] (alphabetical order)
  - ❌ Failure: Different ordering or missing schemas
  - **Purpose**: Validates schema names follow expected alphabetical pattern
  - **Note**: Alphabetical ≠ logical data flow (bronze→silver→gold)
  - **Why test this**: Ensures no typos in schema names (e.g., 'glod' instead of 'gold')
  
- **Test 3** validates no missing tier:
  - Loops through EXPECTED_SCHEMAS = ['bronze', 'silver', 'gold']
  - Counts each schema individually
  - ✅ Success: Each schema count = 1
  - ❌ Failure: Any schema count ≠ 1 (missing or duplicated)
  - **Purpose**: Ensures each tier exists exactly once
  - **Granular validation**: Identifies which specific tier is missing
  - **Better error messages**: "Tier 'silver' is missing" vs. "Architecture incomplete"

**🏛️ Medallion Architecture Principles:**

**Bronze Layer (Raw Data Zone):**
- Source system replicas without transformations
- Data types match source (often all TEXT/VARCHAR)
- Preserves complete history including bad/duplicate records
- Schema structure mirrors CSV files exactly
- Purpose: Audit trail and reprocessability

**Silver Layer (Cleansed Data Zone):**
- Type-safe conversions (TEXT → INT, DATE, etc.)
- Data quality rules applied (NOT NULL, CHECK constraints)
- Deduplication and validation
- Standardized naming conventions
- Purpose: Trusted data foundation

**Gold Layer (Analytics Zone):**
- Dimensional models (fact/dimension tables)
- Pre-aggregated metrics and KPIs
- Denormalized for query performance
- Business-friendly naming
- Purpose: Fast BI queries and reporting

**🔄 Data Flow Pattern:**
```
CSV Files → Bronze (raw) → Silver (clean) → Gold (analytics)
            ↓              ↓                 ↓
         Ingestion    Validation      Aggregation
         Scripts      Scripts         Scripts
```

**📊 Architecture Benefits:**
- **Separation of concerns**: Each layer has single responsibility
- **Incremental processing**: Can reprocess Silver/Gold without re-ingesting Bronze
- **Data lineage**: Clear path from source to analytics
- **Flexibility**: Can add new transformations without breaking existing layers
- **Rollback capability**: Can recreate downstream layers from upstream data

In [7]:
%%ipytest -vv

def test_three_tier_architecture_complete(db_cursor):
    """Verify complete 3-tier medallion architecture (bronze→silver→gold)."""
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.schemata
        WHERE schema_name IN ('bronze', 'silver', 'gold')
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 3, "Must have exactly 3 medallion schemas"

def test_schema_order_logical(db_cursor):
    """Verify schemas are in logical order: bronze < gold < silver (alphabetically)."""
    db_cursor.execute("""
        SELECT schema_name
        FROM information_schema.schemata
        WHERE schema_name IN ('bronze', 'silver', 'gold')
        ORDER BY schema_name
    """)
    
    schemas = [row[0] for row in db_cursor.fetchall()]
    
    # When sorted alphabetically: bronze, gold, silver
    assert schemas == ['bronze', 'gold', 'silver'], \
        f"Alphabetical order should be bronze, gold, silver: got {schemas}"

def test_no_missing_tier(db_cursor):
    """Verify no tier is missing from the medallion architecture."""
    for schema in EXPECTED_SCHEMAS:
        db_cursor.execute("""
            SELECT COUNT(*)
            FROM information_schema.schemata
            WHERE schema_name = %s
        """, (schema,))
        
        count = db_cursor.fetchone()[0]
        assert count == 1, f"Tier '{schema}' is missing from architecture"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 3 items

t_218c7ee2481e45818fccc417d2be7161.py::test_three_tier_architecture_complete collected 3 items

t_218c7ee2481e45818fccc417d2be7161.py::test_three_tier_architecture_complete [32mPASSED[0m[32m          [ 33%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_schema_order_logical [32mPASSED[0m[32m                      [ 66%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_no_missing_tier [32mPASSED[0m[32m                           [100%][0m[32mPASSED[0m[32m          [ 33%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_schema_order_logical [32mPASSED[0m[32m                      [ 66%][0m
t_218c7ee2481e45818

## Test Suite 6: Isolation and Clean State

**Tests in this suite:**
1. `test_no_unexpected_user_schemas` - Validates only expected schemas exist (warns if extras found)
2. `test_medallion_schemas_empty_or_structured` - Verifies medallion schemas are either empty or contain expected tables

**How these tests work:**
- **Test 1** queries all user schemas (excluding system schemas):
  - Excludes pg_catalog, information_schema, pg_toast (PostgreSQL system schemas)
  - Compares against allowed_schemas = {'bronze', 'silver', 'gold', 'public', 'setup'}
  - ✅ Success: All schemas are in allowed set (or only basic schemas exist)
  - ⚠️ Warning: Prints unexpected schemas but doesn't fail (lenient for dev environments)
  - ❌ Failure: Only fails if NO schemas exist at all (sanity check)
  - **Purpose**: Detects schema pollution or incomplete cleanup
  - **Lenient approach**: Allows experimental schemas during development
  - **Production use**: In production, should only have medallion + public + setup schemas
  
- **Test 2** validates table counts in medallion schemas:
  - Loops through bronze, silver, gold schemas
  - Counts tables in each schema via information_schema
  - ✅ Success: table_count >= 0 (non-negative count)
  - ❌ Failure: Negative count (impossible, would indicate catalog corruption)
  - **Purpose**: Ensures table counts are valid (sanity check)
  - **Expected states**:
    - **Bronze**: May have tables after ingestion (crm_cust_info, erp_CUST_AZ12, etc.)
    - **Silver**: May be empty initially, populated after transformation
    - **Gold**: May be empty initially, populated after aggregation
  - **Not tested**: Specific table names (too rigid for flexible architecture)

**🧹 Clean State vs. Working State:**

**Clean State (immediately after setup/create_schemas.sql):**
- Bronze: Empty (no tables)
- Silver: Empty (no tables)
- Gold: Empty (no tables)
- Public: Default PostgreSQL schema (may have extensions)
- Setup: May contain ETL procedures/functions

**Working State (after Bronze layer ingestion):**
- Bronze: 5 tables (crm_cust_info, crm_prd_info, crm_sales_details, erp_CUST_AZ12, erp_LOC_A101, erp_PX_CAT_G1V2)
- Silver: Empty or transforming
- Gold: Empty or aggregating
- Public: Unchanged
- Setup: Unchanged

**⚠️ Troubleshooting Unexpected Schemas:**

Common unexpected schemas and causes:
- `test_*`: Left over from manual testing → DROP if safe
- `temp`, `tmp`: Temporary work schemas → DROP if no active sessions
- `backup_*`: Old backup schemas → Archive or DROP
- `v1`, `v2`: Version-specific schemas → Consolidate or migrate
- User schemas: Personal experimentation → Move to development database

**Cleanup commands** (⚠️ USE WITH CAUTION):
```sql
-- Check schema contents first
SELECT table_name FROM information_schema.tables WHERE table_schema = 'unwanted_schema';

-- Drop empty schema
DROP SCHEMA IF EXISTS unwanted_schema;

-- Drop schema with all contents (⚠️ IRREVERSIBLE)
DROP SCHEMA IF EXISTS unwanted_schema CASCADE;
```

**🔍 Schema Pollution Prevention:**
- Use naming conventions (prefix with username for personal schemas)
- Regular audits (run this test suite weekly)
- CI/CD cleanup (drop test schemas after test runs)
- Documentation (maintain schema registry)

In [8]:
%%ipytest -vv

def test_no_unexpected_user_schemas(db_cursor):
    """Verify no unexpected user schemas exist beyond medallion + setup + public."""
    db_cursor.execute("""
        SELECT schema_name
        FROM information_schema.schemata
        WHERE schema_name NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
        ORDER BY schema_name
    """)
    
    schemas = [row[0] for row in db_cursor.fetchall()]
    
    # Expected: bronze, gold, public, setup (if seed scripts run), silver
    allowed_schemas = {'bronze', 'silver', 'gold', 'public', 'setup'}
    unexpected = set(schemas) - allowed_schemas
    
    if unexpected:
        print(f"⚠️  Unexpected schemas found: {unexpected}")
        print(f"   All user schemas: {schemas}")
    
    # This is informational - may have other schemas in development
    assert len(schemas) > 0, "Should have at least some user schemas"

def test_medallion_schemas_empty_or_structured(db_cursor):
    """Verify medallion schemas are either empty or have expected structure."""
    for schema in EXPECTED_SCHEMAS:
        db_cursor.execute("""
            SELECT COUNT(*)
            FROM information_schema.tables
            WHERE table_schema = %s
        """, (schema,))
        
        table_count = db_cursor.fetchone()[0]
        
        # Bronze may have tables (crm_*, erp_*), others may be empty
        # Just verify count is non-negative
        assert table_count >= 0, f"Table count should be >= 0 for {schema}"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 2 items

t_218c7ee2481e45818fccc417d2be7161.py::test_no_unexpected_user_schemas collected 2 items

t_218c7ee2481e45818fccc417d2be7161.py::test_no_unexpected_user_schemas [32mPASSED[0m[32m                [ 50%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_medallion_schemas_empty_or_structured [32mPASSED[0m[32m     [100%][0m[32mPASSED[0m[32m                [ 50%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_medallion_schemas_empty_or_structured [32mPASSED[0m[32m     [100%][0m





## Test Suite 7: Idempotency Validation

**Tests in this suite:**
1. `test_create_schema_if_not_exists_bronze` - Validates CREATE SCHEMA IF NOT EXISTS is idempotent for bronze
2. `test_create_schema_if_not_exists_silver` - Validates CREATE SCHEMA IF NOT EXISTS is idempotent for silver
3. `test_create_schema_if_not_exists_gold` - Validates CREATE SCHEMA IF NOT EXISTS is idempotent for gold
4. `test_multiple_executions_idempotent` - Validates running create_schemas.sql multiple times is safe

**How these tests work:**
- **Test 1** validates bronze schema idempotency:
  - Executes `CREATE SCHEMA IF NOT EXISTS bronze AUTHORIZATION CURRENT_USER`
  - Counts bronze schemas in information_schema
  - ✅ Success: count = 1 (schema still exists, no duplicate created)
  - ❌ Failure: count ≠ 1 (duplicate created or schema vanished)
  - **Purpose**: Validates schema creation is idempotent (safe to run multiple times)
  - **IF NOT EXISTS**: PostgreSQL feature that skips creation if schema already exists
  - **No error**: Unlike plain CREATE SCHEMA, doesn't throw error if schema exists
  
- **Test 2** validates silver schema idempotency:
  - Same approach as Test 1 for silver schema
  - ✅ Success: Silver schema still exists exactly once
  - ❌ Failure: Idempotency violated
  - **Purpose**: Ensures silver layer setup is rerunnable
  
- **Test 3** validates gold schema idempotency:
  - Same approach as Test 1 for gold schema
  - ✅ Success: Gold schema still exists exactly once
  - ❌ Failure: Idempotency violated
  - **Purpose**: Ensures gold layer setup is rerunnable
  
- **Test 4** validates full script idempotency:
  - Executes all three CREATE SCHEMA statements in a loop (3 iterations)
  - Each iteration creates bronze, silver, gold with IF NOT EXISTS
  - Counts total medallion schemas after loop
  - ✅ Success: count = 3 (exactly 3 schemas, no duplicates)
  - ❌ Failure: count ≠ 3 (duplicates created or schemas lost)
  - **Purpose**: Validates complete setup/create_schemas.sql script is rerunnable
  - **Real-world scenario**: Setup script accidentally run twice, CI/CD retry, manual rerun

**🔄 Idempotency Principles:**

**What is Idempotency?**
- Operation produces same result regardless of how many times it's executed
- First execution: Creates schema
- Subsequent executions: No-op (do nothing), no errors
- Mathematical property: f(f(x)) = f(x)

**Why Idempotency Matters for Data Warehouses:**
1. **Deployment Safety**: Can rerun setup scripts without breaking existing schemas
2. **CI/CD Resilience**: Pipeline failures can safely retry
3. **Manual Recovery**: DBAs can rerun scripts without checking state first
4. **Version Control**: Can apply same SQL file across environments
5. **Disaster Recovery**: Restoration procedures are simpler

**Idempotent vs. Non-Idempotent:**

❌ **Non-Idempotent (Fragile):**
```sql
CREATE SCHEMA bronze;  -- Fails if bronze already exists
```
- First run: ✅ Success
- Second run: ❌ ERROR: schema "bronze" already exists

✅ **Idempotent (Robust):**
```sql
CREATE SCHEMA IF NOT EXISTS bronze;  -- Safe to run multiple times
```
- First run: ✅ Creates schema
- Second run: ✅ Skips creation (no error)
- Third run: ✅ Skips creation (no error)

**🛡️ DevOps Impact:**

**Without Idempotency:**
- Must check if schema exists before creating (complex logic)
- Failed deployments require manual cleanup before retry
- Environment drift (dev/staging/prod inconsistencies)
- Fear of running scripts ("Did I already run this?")

**With Idempotency:**
- Run script whenever needed (no state checking)
- Failed deployments can retry immediately
- Convergent infrastructure (all environments reach same state)
- Confidence in automation ("Safe to run anytime")

**📋 Idempotency Checklist for SQL Scripts:**
- ✅ Use `CREATE SCHEMA IF NOT EXISTS`
- ✅ Use `CREATE TABLE IF NOT EXISTS`
- ✅ Use `INSERT ... ON CONFLICT DO NOTHING`
- ✅ Use `CREATE OR REPLACE FUNCTION/VIEW`
- ✅ Avoid `DROP SCHEMA CASCADE` (destructive)
- ✅ Test by running script 2-3 times in sequence

In [9]:
%%ipytest -vv

def test_create_schema_if_not_exists_bronze(db_cursor):
    """Verify CREATE SCHEMA IF NOT EXISTS is idempotent for bronze."""
    # Try to create again (should succeed without error)
    db_cursor.execute("""
        CREATE SCHEMA IF NOT EXISTS bronze AUTHORIZATION CURRENT_USER
    """)
    
    # Verify still exists
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.schemata
        WHERE schema_name = 'bronze'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "Bronze schema should still exist after IF NOT EXISTS"

def test_create_schema_if_not_exists_silver(db_cursor):
    """Verify CREATE SCHEMA IF NOT EXISTS is idempotent for silver."""
    db_cursor.execute("""
        CREATE SCHEMA IF NOT EXISTS silver AUTHORIZATION CURRENT_USER
    """)
    
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.schemata
        WHERE schema_name = 'silver'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "Silver schema should still exist after IF NOT EXISTS"

def test_create_schema_if_not_exists_gold(db_cursor):
    """Verify CREATE SCHEMA IF NOT EXISTS is idempotent for gold."""
    db_cursor.execute("""
        CREATE SCHEMA IF NOT EXISTS gold AUTHORIZATION CURRENT_USER
    """)
    
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.schemata
        WHERE schema_name = 'gold'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "Gold schema should still exist after IF NOT EXISTS"

def test_multiple_executions_idempotent(db_cursor):
    """Verify running create_schemas.sql multiple times is safe."""
    # Execute all three CREATE SCHEMA statements multiple times
    for _ in range(3):
        db_cursor.execute("""
            CREATE SCHEMA IF NOT EXISTS bronze AUTHORIZATION CURRENT_USER;
            CREATE SCHEMA IF NOT EXISTS silver AUTHORIZATION CURRENT_USER;
            CREATE SCHEMA IF NOT EXISTS gold AUTHORIZATION CURRENT_USER;
        """)
    
    # Should still have exactly 3 schemas
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.schemata
        WHERE schema_name IN ('bronze', 'silver', 'gold')
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 3, "Should still have exactly 3 schemas after multiple executions"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 4 items

t_218c7ee2481e45818fccc417d2be7161.py::test_create_schema_if_not_exists_bronze collected 4 items

t_218c7ee2481e45818fccc417d2be7161.py::test_create_schema_if_not_exists_bronze [32mPASSED[0m[32m        [ 25%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_create_schema_if_not_exists_silver [32mPASSED[0m[32m        [ 50%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_create_schema_if_not_exists_gold [32mPASSED[0m[32m          [ 75%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_multiple_executions_idempotent [32mPASSED[0m[32m        [ 25%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_create_schema_if_

## Summary: Run All Tests

**Executes all test suites (20 tests total):**
- Suite 1: Schema Existence (4 tests)
- Suite 2: Schema Naming Validation (3 tests)
- Suite 3: Schema Ownership (4 tests)
- Suite 4: Schema Privileges (4 tests)
- Suite 5: Architecture Compliance (3 tests)
- Suite 6: Isolation and Clean State (2 tests)
- Suite 7: Idempotency Validation (4 tests)

**How this cell works:**
- Executes `ipytest.run('-vv')` which runs all pytest functions defined in this notebook
- `-vv` flag provides **very verbose** output showing:
  - Each test function name as it runs
  - PASSED/FAILED status for each test
  - Detailed assertion messages on failure
  - Full traceback on errors
  - Percentage completion progress

**✅ Success Criteria:**
- All 20 tests show `PASSED` status
- Final summary shows: `20 passed in X.XXs`
- No `FAILED`, `ERROR`, or `SKIPPED` statuses
- No warnings about permission issues

**🔧 Troubleshooting Test Failures:**

| Failure Type | Likely Cause | Solution |
|-------------|--------------|----------|
| Schema does not exist | `create_schemas.sql` not run | Execute `setup/create_schemas.sql` |
| Permission denied on CREATE | Insufficient user privileges | Grant CREATE privilege: `GRANT CREATE ON SCHEMA bronze TO current_user` |
| Schema not lowercase | Manual schema creation with wrong case | Drop and recreate: `DROP SCHEMA "Bronze" CASCADE; CREATE SCHEMA bronze;` |
| Ownership mismatch | Schemas created by different users | Reassign ownership: `ALTER SCHEMA bronze OWNER TO postgres;` |
| Unexpected schemas | Leftover from previous testing | Drop unwanted schemas: `DROP SCHEMA test_schema CASCADE;` |
| Idempotency failure | Script using CREATE instead of CREATE IF NOT EXISTS | Update script to use IF NOT EXISTS clause |

**📊 Reading Test Output:**
- `[  5%]` indicates progress through test suite
- `PASSED` in green = test succeeded
- `FAILED` in red = assertion failed (see details below)
- `ERROR` in red = test couldn't run (setup issue)
- Final line shows total time - useful for performance tracking

In [10]:
# Run all tests in this notebook
ipytest.run('-vv')

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 4 items

t_218c7ee2481e45818fccc417d2be7161.py::test_create_schema_if_not_exists_bronze collected 4 items

t_218c7ee2481e45818fccc417d2be7161.py::test_create_schema_if_not_exists_bronze [32mPASSED[0m[32m        [ 25%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_create_schema_if_not_exists_silver [32mPASSED[0m[32m        [ 50%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_create_schema_if_not_exists_gold [32mPASSED[0m[32m          [ 75%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_multiple_executions_idempotent [32mPASSED[0m[32m        [ 25%][0m
t_218c7ee2481e45818fccc417d2be7161.py::test_create_schema_if_

<ExitCode.OK: 0>

## Manual Inspection: Schema Details

**What this cell does:**
1. **Medallion Architecture Schemas** - Displays bronze/silver/gold schema details (owner, description)
2. **Tables per Schema** - Shows table count and table names in each medallion schema
3. **All User Schemas** - Lists all user-created schemas with their purpose
4. **Current User Privileges** - Shows USAGE and CREATE privileges on medallion schemas

Run this cell for a comprehensive visual overview of the schema architecture.

**How this cell works:**

**Step 1: Medallion Architecture Schemas**
- Queries pg_namespace (system catalog) for bronze, silver, gold schemas
- Uses `pg_get_userbyid(nspowner)` to convert owner OID to username
- Retrieves schema descriptions from `obj_description()` (if any comments exist)
- **Output format**: DataFrame with columns [database, schema_name, owner, description]
- **Expected values**:
  - database: 'sql_retail_analytics_warehouse'
  - schema_name: 'bronze', 'gold', 'silver' (alphabetically)
  - owner: Same user for all three schemas (typically 'postgres')
  - description: NULL initially (can be added with COMMENT ON SCHEMA)

**Step 2: Tables per Schema**
- Counts tables in each medallion schema from information_schema.tables
- Aggregates table names into comma-separated string (ordered alphabetically)
- Filters for BASE TABLE type only (excludes views, foreign tables)
- **Output format**: DataFrame with columns [schema_name, table_count, tables]
- **Expected values**:
  - **Clean state**: No rows returned (no tables in any schema)
  - **After Bronze ingestion**: Bronze shows 5-6 tables (crm_*, erp_*)
  - **After transformations**: Silver/Gold show their respective tables
- **Use case**: Quick overview of data loaded into each layer

**Step 3: All User Schemas**
- Queries all schemas excluding PostgreSQL system schemas
- Categorizes schemas by purpose using CASE statement:
  - 'Medallion Architecture' for bronze/silver/gold
  - 'Default Schema' for public
  - 'Setup Procedures' for setup
  - 'Other' for any additional schemas
- Orders by purpose category, then alphabetically within category
- **Output format**: DataFrame with columns [schema_name, owner, purpose]
- **Expected values**: Medallion schemas first, then public, then setup, then others
- **Use case**: Identify unexpected schemas and their owners for cleanup

**Step 4: Current User Privileges**
- Uses `HAS_SCHEMA_PRIVILEGE()` to check effective privileges for current session user
- Tests USAGE privilege (can access schema objects)
- Tests CREATE privilege (can create new objects in schema)
- **Output format**: DataFrame with columns [schema_name, can_use, can_create]
- **Expected values**:
  - can_use: TRUE for all medallion schemas
  - can_create: TRUE for all medallion schemas (if user is owner or has grants)
- **Use case**: Diagnose permission issues before running ETL scripts

**Expected console output:**
```
📁 Medallion Architecture Schemas:
[DataFrame with 3 rows: bronze, gold, silver]

📊 Tables per Schema:
[Empty or DataFrame showing table counts]

🗂️  All User Schemas:
[DataFrame showing all user schemas categorized]

🔐 Current User Privileges:
[DataFrame showing TRUE/FALSE for can_use and can_create]

✅ Inspection complete
```

**Use this for:**
- Visual confirmation of all test assertions
- Quick architecture overview (how many tables per layer)
- Permission troubleshooting (verify privileges before ETL)
- Schema cleanup planning (identify unexpected schemas)
- Documentation (screenshot for architecture diagrams)

In [11]:
# Connect to warehouse database
conn = psycopg2.connect(**DB_CONFIG)

# Get comprehensive schema information
df_schemas = pd.read_sql("""
    SELECT
        current_database()                     AS database,
        n.nspname                              AS schema_name,
        pg_catalog.pg_get_userbyid(n.nspowner) AS owner,
        pg_catalog.obj_description(n.oid, 'pg_namespace') AS description
    FROM pg_catalog.pg_namespace AS n
    WHERE n.nspname IN ('bronze', 'silver', 'gold')
    ORDER BY n.nspname
""", conn)

print("\n📁 Medallion Architecture Schemas:")
display(df_schemas)

# Get table counts per schema
df_table_counts = pd.read_sql("""
    SELECT
        table_schema AS schema_name,
        COUNT(*) AS table_count,
        STRING_AGG(table_name, ', ' ORDER BY table_name) AS tables
    FROM information_schema.tables
    WHERE table_schema IN ('bronze', 'silver', 'gold')
    AND table_type = 'BASE TABLE'
    GROUP BY table_schema
    ORDER BY table_schema
""", conn)

print("\n📊 Tables per Schema:")
if len(df_table_counts) > 0:
    display(df_table_counts)
else:
    print("   No tables found in medallion schemas (clean state)")

# Get all user schemas
df_all_schemas = pd.read_sql("""
    SELECT
        n.nspname                              AS schema_name,
        pg_catalog.pg_get_userbyid(n.nspowner) AS owner,
        CASE
            WHEN n.nspname IN ('bronze', 'silver', 'gold') THEN 'Medallion Architecture'
            WHEN n.nspname = 'public' THEN 'Default Schema'
            WHEN n.nspname = 'setup' THEN 'Setup Procedures'
            ELSE 'Other'
        END AS purpose
    FROM pg_catalog.pg_namespace AS n
    WHERE n.nspname NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
    ORDER BY
        CASE
            WHEN n.nspname IN ('bronze', 'silver', 'gold') THEN 1
            WHEN n.nspname = 'public' THEN 2
            WHEN n.nspname = 'setup' THEN 3
            ELSE 4
        END,
        n.nspname
""", conn)

print("\n🗂️  All User Schemas:")
display(df_all_schemas)

# Get schema privileges for current user
df_privileges = pd.read_sql("""
    SELECT
        n.nspname AS schema_name,
        HAS_SCHEMA_PRIVILEGE(CURRENT_USER, n.nspname, 'USAGE') AS can_use,
        HAS_SCHEMA_PRIVILEGE(CURRENT_USER, n.nspname, 'CREATE') AS can_create
    FROM pg_catalog.pg_namespace AS n
    WHERE n.nspname IN ('bronze', 'silver', 'gold')
    ORDER BY n.nspname
""", conn)

print("\n🔐 Current User Privileges:")
display(df_privileges)

conn.close()
print("\n✅ Inspection complete")


📁 Medallion Architecture Schemas:


  df_schemas = pd.read_sql("""


Unnamed: 0,database,schema_name,owner,description
0,sql_retail_analytics_warehouse,bronze,postgres,
1,sql_retail_analytics_warehouse,gold,postgres,
2,sql_retail_analytics_warehouse,silver,postgres,



📊 Tables per Schema:
   No tables found in medallion schemas (clean state)

🗂️  All User Schemas:


  df_table_counts = pd.read_sql("""
  df_all_schemas = pd.read_sql("""


Unnamed: 0,schema_name,owner,purpose
0,bronze,postgres,Medallion Architecture
1,gold,postgres,Medallion Architecture
2,silver,postgres,Medallion Architecture
3,public,pg_database_owner,Default Schema
4,pg_temp_28,postgres,Other
5,pg_temp_66,postgres,Other
6,pg_temp_67,postgres,Other
7,pg_toast_temp_28,postgres,Other
8,pg_toast_temp_66,postgres,Other
9,pg_toast_temp_67,postgres,Other



🔐 Current User Privileges:


  df_privileges = pd.read_sql("""


Unnamed: 0,schema_name,can_use,can_create
0,bronze,True,True
1,gold,True,True
2,silver,True,True



✅ Inspection complete


## Architecture Visualization

In [12]:
# Visual representation of the medallion architecture
print("""
╔════════════════════════════════════════════════════════════╗
║           THREE-TIER MEDALLION ARCHITECTURE                ║
╠════════════════════════════════════════════════════════════╣
║                                                            ║
║  ┌──────────────────────────────────────────────────┐    ║
║  │  BRONZE LAYER (Raw Data)                         │    ║
║  │  - Source system replicas (CRM, ERP)             │    ║
║  │  - No transformations                             │    ║
║  │  - Exact CSV mappings                             │    ║
║  └──────────────────────────────────────────────────┘    ║
║                          ↓                                ║
║  ┌──────────────────────────────────────────────────┐    ║
║  │  SILVER LAYER (Cleansed & Conformed)             │    ║
║  │  - Data quality rules applied                     │    ║
║  │  - Type conversions & validation                  │    ║
║  │  - Deduplication                                  │    ║
║  └──────────────────────────────────────────────────┘    ║
║                          ↓                                ║
║  ┌──────────────────────────────────────────────────┐    ║
║  │  GOLD LAYER (Analytics Ready)                     │    ║
║  │  - Dimensional models                             │    ║
║  │  - Pre-aggregated metrics                         │    ║
║  │  - Denormalized for BI tools                      │    ║
║  └──────────────────────────────────────────────────┘    ║
║                                                            ║
╚════════════════════════════════════════════════════════════╝
""")


╔════════════════════════════════════════════════════════════╗
║           THREE-TIER MEDALLION ARCHITECTURE                ║
╠════════════════════════════════════════════════════════════╣
║                                                            ║
║  ┌──────────────────────────────────────────────────┐    ║
║  │  BRONZE LAYER (Raw Data)                         │    ║
║  │  - Source system replicas (CRM, ERP)             │    ║
║  │  - No transformations                             │    ║
║  │  - Exact CSV mappings                             │    ║
║  └──────────────────────────────────────────────────┘    ║
║                          ↓                                ║
║  ┌──────────────────────────────────────────────────┐    ║
║  │  SILVER LAYER (Cleansed & Conformed)             │    ║
║  │  - Data quality rules applied                     │    ║
║  │  - Type conversions & validation                  │    ║
║  │  - Deduplication                                  │    ║
║  └─────