# Test Suite: public.etl_config Configuration Table

**Purpose:** Validate the structure, seeding, and behavior of `public.etl_config` table

**Scope:**
- Table existence and schema placement
- Column definitions (names, types, constraints)
- Required configuration keys are seeded
- Path convention compliance (no trailing slashes)
- Idempotency of re-runs
- Primary key constraint enforcement

**Testing Strategy:**
- Structural validation (table, columns, constraints)
- Data validation (required keys, value formats)
- Behavioral testing (idempotency, conflict handling)
- Integration testing (path resolution)

**Prerequisites:**
- ✅ Database `sql_retail_analytics_warehouse` exists (run `setup/create_db.sql`)
- ✅ Database connection configured (credentials in `.env` file)
- ⚠️ **REQUIRED:** `setup/seed/01_etl_config.sql` must be executed BEFORE running these tests
- ✅ Required packages installed: psycopg2, pytest, ipytest, pandas

**⚠️ If tests fail with "table must exist":**
Run this first in your PostgreSQL client:
```sql
\c sql_retail_analytics_warehouse
\i setup/seed/01_etl_config.sql
```

Or use the diagnostic cell below to check table existence.

## Setup: Import Dependencies & Configure Connection

### 🔐 Database Password Configuration

**Option 1: Environment Variable (Recommended)**
```powershell
# Set for current session
$env:POSTGRES_PASSWORD = "your_actual_password"

# Set permanently (requires new terminal)
[System.Environment]::SetEnvironmentVariable('POSTGRES_PASSWORD', 'your_actual_password', 'User')
```

**Option 2: Update Default in Code**
- Edit the cell below and change `'postgres'` to your actual password
- ⚠️ **Do NOT commit your password to git!**

**Option 3: Create .env file (Best for team projects)**
- Create `.env` file in project root (already in .gitignore)
- Add: `POSTGRES_PASSWORD=your_actual_password`
- Install python-dotenv: `pip install python-dotenv`
- Load in notebook: `from dotenv import load_dotenv; load_dotenv()`

In [1]:
import os
import psycopg2
from psycopg2 import sql
import pytest
import ipytest
import pandas as pd
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Configure ipytest for notebook usage
ipytest.autoconfig()

# Database connection parameters
DB_CONFIG = {
    'host': os.getenv('POSTGRES_HOST', 'localhost'),
    'database': os.getenv('POSTGRES_DB', 'sql_retail_analytics_warehouse'),
    'user': os.getenv('POSTGRES_USER', 'postgres'),
    'password': os.getenv('POSTGRES_PASSWORD', 'postgres')
}

print("✅ Dependencies imported successfully")
print(f"🔌 Connecting to: {DB_CONFIG['host']}/{DB_CONFIG['database']} as {DB_CONFIG['user']}")
print(f"🔑 Password loaded from: {'.env file' if os.getenv('POSTGRES_PASSWORD') else 'default value (update .env file!)'}")

✅ Dependencies imported successfully
🔌 Connecting to: localhost/sql_retail_analytics_warehouse as postgres
🔑 Password loaded from: .env file


## Fixtures: Database Connection & Cleanup

In [2]:
import psycopg2
import pandas as pd

try:
    # Try to connect
    conn = psycopg2.connect(**DB_CONFIG)
    print("✅ Database connection successful")
    
    cursor = conn.cursor()
    
    # Check if etl_config table exists
    cursor.execute("""
        SELECT COUNT(*) 
        FROM information_schema.tables 
        WHERE table_schema = 'public' 
          AND table_name = 'etl_config'
    """)
    table_exists = cursor.fetchone()[0]
    
    if table_exists == 1:
        print("✅ public.etl_config table exists")
        
        # Show current data
        cursor.execute("SELECT * FROM public.etl_config ORDER BY config_key")
        rows = cursor.fetchall()
        if rows:
            print(f"✅ Table has {len(rows)} row(s) of data:")
            for row in rows:
                print(f"   - {row[0]}: {row[1]}")
        else:
            print("⚠️  Table exists but is EMPTY - run setup/seed/01_etl_config.sql")
    else:
        print("❌ public.etl_config table DOES NOT EXIST")
        print("\n🔧 To fix this, run the following SQL script:")
        print("   1. Open your PostgreSQL client (psql, pgAdmin, DBeaver, etc.)")
        print("   2. Connect to database: sql_retail_analytics_warehouse")
        print("   3. Execute: setup/seed/01_etl_config.sql")
        print("\n   Example using psql:")
        print("   psql -U postgres -d sql_retail_analytics_warehouse -f setup/seed/01_etl_config.sql")
    
    cursor.close()
    conn.close()
    
except psycopg2.OperationalError as e:
    print(f"❌ Database connection FAILED: {e}")
    print("\n🔧 To fix this:")
    print("   1. Verify PostgreSQL server is running")
    print("   2. Check credentials in .env file")
    print("   3. Verify database exists (run setup/create_db.sql if needed)")
except Exception as e:
    print(f"❌ Unexpected error: {e}")

✅ Database connection successful
✅ public.etl_config table exists
✅ Table has 2 row(s) of data:
   - base_path_crm: C:/Users/Laurent/Studies/sql-ultimate-course/Udemy-SQL-Data-Warehouse-Project/datasets/source_crm
   - base_path_erp: C:/Users/Laurent/Studies/sql-ultimate-course/Udemy-SQL-Data-Warehouse-Project/datasets/source_erp


## 🔍 Diagnostic: Check Prerequisites

**Run this cell first to verify prerequisites are met**

In [3]:
@pytest.fixture(scope='module')
def db_connection():
    """Create a database connection for tests."""
    conn = psycopg2.connect(**DB_CONFIG)
    conn.autocommit = True
    yield conn
    conn.close()

@pytest.fixture(scope='module')
def db_cursor(db_connection):
    """Create a cursor for executing queries."""
    cursor = db_connection.cursor()
    yield cursor
    cursor.close()

print("✅ Fixtures defined")

✅ Fixtures defined


## Test Suite 1: Table Structure Validation

**Tests in this suite:**
1. `test_etl_config_table_exists` - Verifies the etl_config table exists in public schema
2. `test_etl_config_in_public_schema` - Confirms table is in public schema (not setup, bronze, etc.)
3. `test_etl_config_column_count` - Validates exactly 2 columns exist

**How these tests work:**
- **Test 1** queries information_schema.tables:
  - ✅ Success: count = 1 (table exists)
  - ❌ Failure: count = 0 (table missing - **run setup/seed/01_etl_config.sql first**)
  - **Purpose**: Fundamental check that DDL script was executed
  
- **Test 2** queries table schema location:
  - ✅ Success: table_schema = 'public'
  - ❌ Failure: table_schema ≠ 'public' or NULL (table in wrong schema or doesn't exist)
  - **Why public schema**: Configuration tables accessible across all layers (bronze, silver, gold)
  
- **Test 3** counts columns:
  - ✅ Success: count = 2 (config_key, config_value)
  - ❌ Failure: count ≠ 2 (schema mismatch or table structure changed)
  - **Purpose**: Detects schema drift or incomplete DDL execution

In [4]:
%%ipytest -vv

def test_etl_config_table_exists(db_cursor):
    """Verify public.etl_config table exists."""
    db_cursor.execute("""
        SELECT COUNT(*) 
        FROM information_schema.tables 
        WHERE table_schema = 'public' 
          AND table_name = 'etl_config'
    """)
    count = db_cursor.fetchone()[0]
    assert count == 1, "public.etl_config table must exist"

def test_etl_config_in_public_schema(db_cursor):
    """Verify table is in public schema, not other schemas."""
    db_cursor.execute("""
        SELECT table_schema
        FROM information_schema.tables 
        WHERE table_name = 'etl_config'
    """)
    result = db_cursor.fetchone()
    assert result is not None, "etl_config table not found"
    assert result[0] == 'public', "etl_config must be in public schema"

def test_etl_config_column_count(db_cursor):
    """Verify table has exactly 2 columns."""
    db_cursor.execute("""
        SELECT COUNT(*) 
        FROM information_schema.columns 
        WHERE table_schema = 'public' 
          AND table_name = 'etl_config'
    """)
    count = db_cursor.fetchone()[0]
    assert count == 2, "etl_config should have exactly 2 columns (config_key, config_value)"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 3 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_etl_config_table_exists collected 3 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_etl_config_table_exists [32mPASSED[0m[32m                   [ 33%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_etl_config_in_public_schema [32mPASSED[0m[32m               [ 66%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_etl_config_column_count [32mPASSED[0m[32m                   [ 33%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_etl_config_in_public_schema [32mPASSED[0m[32m               [ 66%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_etl_config_column_count 

## Test Suite 2: Column Definitions

**Tests in this suite:**
1. `test_etl_config_column_definitions` - Validates column names, data types, lengths, and nullability
2. `test_etl_config_primary_key` - Verifies config_key is the primary key

**How these tests work:**
- **Test 1** queries column metadata from information_schema.columns:
  - **config_key validation**:
    - ✅ Type: character varying (VARCHAR)
    - ✅ Max length: 100
    - ✅ Nullable: NO (primary key constraint)
    - ❌ Failure: Any mismatch in type, length, or nullability
  
  - **config_value validation**:
    - ✅ Type: character varying (VARCHAR)
    - ✅ Max length: 200
    - ✅ Nullable: NO (NOT NULL constraint)
    - ❌ Failure: Any mismatch in type, length, or nullability
  
  - **Purpose**: Ensures DDL matches specification (prevents silent schema changes)
  
- **Test 2** queries primary key from pg_index catalog:
  - ✅ Success: pk_columns = ['config_key']
  - ❌ Failure: No PK, wrong column, or composite PK
  - **Purpose**: Validates uniqueness constraint on config_key (prevents duplicate configuration entries)

In [5]:
%%ipytest -vv

def test_etl_config_column_definitions(db_cursor):
    """Verify column names, types, and nullability."""
    db_cursor.execute("""
        SELECT 
            column_name,
            data_type,
            character_maximum_length,
            is_nullable
        FROM information_schema.columns
        WHERE table_schema = 'public'
          AND table_name = 'etl_config'
        ORDER BY ordinal_position
    """)
    
    columns = db_cursor.fetchall()
    column_dict = {
        col[0]: {
            'type': col[1], 
            'max_length': col[2], 
            'nullable': col[3]
        } for col in columns
    }
    
    # Expected: config_key VARCHAR(100) PRIMARY KEY (NOT NULL)
    assert 'config_key' in column_dict, "config_key column must exist"
    assert column_dict['config_key']['type'] == 'character varying', \
        "config_key should be VARCHAR"
    assert column_dict['config_key']['max_length'] == 100, \
        "config_key should have max length 100"
    assert column_dict['config_key']['nullable'] == 'NO', \
        "config_key should be NOT NULL (primary key)"
    
    # Expected: config_value VARCHAR(200) NOT NULL
    assert 'config_value' in column_dict, "config_value column must exist"
    assert column_dict['config_value']['type'] == 'character varying', \
        "config_value should be VARCHAR"
    assert column_dict['config_value']['max_length'] == 200, \
        "config_value should have max length 200"
    assert column_dict['config_value']['nullable'] == 'NO', \
        "config_value should be NOT NULL"

def test_etl_config_primary_key(db_cursor):
    """Verify config_key is the primary key."""
    db_cursor.execute("""
        SELECT a.attname
        FROM pg_index i
        JOIN pg_attribute a ON a.attrelid = i.indrelid AND a.attnum = ANY(i.indkey)
        WHERE i.indrelid = 'public.etl_config'::regclass
          AND i.indisprimary
    """)
    pk_columns = [row[0] for row in db_cursor.fetchall()]
    assert pk_columns == ['config_key'], "Primary key should be 'config_key' only"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 2 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_etl_config_column_definitions collected 2 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_etl_config_column_definitions [32mPASSED[0m[32m             [ 50%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_etl_config_primary_key [32mPASSED[0m[32m                    [100%][0m[32mPASSED[0m[32m             [ 50%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_etl_config_primary_key [32mPASSED[0m[32m                    [100%][0m





## Test Suite 3: Seeded Configuration Keys

**Tests in this suite:**
1. `test_required_config_keys_exist` - Verifies both base_path_crm and base_path_erp keys are seeded
2. `test_config_values_not_null_or_empty` - Validates all config values are non-NULL and non-empty
3. `test_config_values_are_valid_paths` - Ensures config values look like valid file paths

**How these tests work:**
- **Test 1** queries for required configuration keys:
  - ✅ Success: Both 'base_path_crm' and 'base_path_erp' exist, total count = 2
  - ❌ Failure: Missing keys or extra unexpected keys
  - **Purpose**: Validates seed script populated required ETL source paths
  - **Why these keys**: CRM and ERP are the two source systems for the data warehouse
  
- **Test 2** checks for NULL or empty values:
  - ✅ Success: Query returns 0 rows (all values populated)
  - ❌ Failure: Any rows returned (indicates NULL or empty string values)
  - **Purpose**: Prevents runtime errors from missing configuration
  - **TRIM check**: Catches whitespace-only values that would fail in production
  
- **Test 3** validates path structure:
  - Checks each config value contains path separators ('/' or '\')
  - Verifies value is not just a separator (e.g., not just '/' or '\')
  - ✅ Success: All values contain valid path components
  - ❌ Failure: Value doesn't look like a file path
  - **Purpose**: Early detection of malformed paths that would cause ETL failures
  - **Cross-platform**: Accepts both forward slashes (Unix/Linux) and backslashes (Windows)

In [6]:
%%ipytest -vv

def test_required_config_keys_exist(db_cursor):
    """Verify both required configuration keys are seeded."""
    db_cursor.execute("""
        SELECT config_key
        FROM public.etl_config
        WHERE config_key IN ('base_path_crm', 'base_path_erp')
        ORDER BY config_key
    """)
    
    keys = [row[0] for row in db_cursor.fetchall()]
    assert 'base_path_crm' in keys, "base_path_crm must be seeded"
    assert 'base_path_erp' in keys, "base_path_erp must be seeded"
    assert len(keys) == 2, "Exactly 2 required keys should be present"

def test_config_values_not_null_or_empty(db_cursor):
    """Verify all config values are non-NULL and non-empty."""
    db_cursor.execute("""
        SELECT config_key, config_value
        FROM public.etl_config
        WHERE config_value IS NULL 
           OR TRIM(config_value) = ''
    """)
    
    invalid_entries = db_cursor.fetchall()
    assert len(invalid_entries) == 0, \
        f"Found config keys with NULL or empty values: {invalid_entries}"

def test_config_values_are_valid_paths(db_cursor):
    """Verify config values look like valid file paths."""
    db_cursor.execute("""
        SELECT config_key, config_value
        FROM public.etl_config
        WHERE config_key IN ('base_path_crm', 'base_path_erp')
    """)
    
    for key, value in db_cursor.fetchall():
        # Should contain path separators
        assert ('/' in value or '\\' in value), \
            f"{key} value '{value}' doesn't look like a file path"
        # Should not be just a separator
        assert len(value.strip('/\\')) > 0, \
            f"{key} value '{value}' is invalid"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 3 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_required_config_keys_exist collected 3 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_required_config_keys_exist [32mPASSED[0m[32m                [ 33%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_config_values_not_null_or_empty [32mPASSED[0m[32m           [ 66%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_config_values_are_valid_paths [32mPASSED[0m[32m             [100%][0m[32mPASSED[0m[32m                [ 33%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_config_values_not_null_or_empty [32mPASSED[0m[32m           [ 66%][0m
t_e25fc1d24d884774a

## Test Suite 4: Path Convention Compliance

**Tests in this suite:**
1. `test_no_trailing_slashes_in_paths` - Verifies CRITICAL requirement: no trailing slashes
2. `test_paths_use_forward_slashes` - Validates cross-platform path compatibility

**How these tests work:**
- **Test 1** checks for trailing slashes (CRITICAL validation):
  - For each path, validates:
    - ✅ Success: Does NOT end with '/' (forward slash)
    - ✅ Success: Does NOT end with '\' (backslash)
    - ❌ Failure: Ends with any trailing slash
  - **Why this is CRITICAL**:
    - Path concatenation: `base_path + '/' + filename`
    - With trailing slash: `path//filename` (double slash - breaks some systems)
    - Without trailing slash: `path/filename` (correct)
  - **Real-world impact**: Trailing slashes cause file-not-found errors in ETL pipelines
  - **Example violation**: `/datasets/source_crm/` should be `/datasets/source_crm`
  
- **Test 2** validates forward slash usage:
  - ✅ Success: Path contains forward slashes '/' (cross-platform standard)
  - ⚠️ Warning: Path contains backslashes '\' (Windows-specific, may fail on Linux/Mac)
  - ❌ Failure: Path contains no slashes at all
  - **Purpose**: Ensures portability across Windows, Linux, and macOS
  - **PostgreSQL standard**: Forward slashes work on all platforms
  - **Best practice**: Use forward slashes even on Windows for database-stored paths

In [7]:
%%ipytest -vv

def test_no_trailing_slashes_in_paths(db_cursor):
    """Verify CRITICAL requirement: no trailing slashes in file paths."""
    db_cursor.execute("""
        SELECT config_key, config_value
        FROM public.etl_config
        WHERE config_key IN ('base_path_crm', 'base_path_erp')
    """)
    
    for key, value in db_cursor.fetchall():
        assert not value.endswith('/'), \
            f"{key} has trailing forward slash: '{value}' (VIOLATION: must not end with /)"
        assert not value.endswith('\\'), \
            f"{key} has trailing backslash: '{value}' (VIOLATION: must not end with \\)"

def test_paths_use_forward_slashes(db_cursor):
    """Verify paths use forward slashes (cross-platform compatibility)."""
    db_cursor.execute("""
        SELECT config_key, config_value
        FROM public.etl_config
        WHERE config_key IN ('base_path_crm', 'base_path_erp')
    """)
    
    for key, value in db_cursor.fetchall():
        # Should use forward slashes (PostgreSQL/cross-platform standard)
        assert '/' in value, f"{key} should use forward slashes: '{value}'"
        # Warn if backslashes are present (Windows-specific)
        if '\\' in value:
            print(f"⚠️  Warning: {key} contains backslashes: '{value}'")

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 2 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_no_trailing_slashes_in_paths collected 2 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_no_trailing_slashes_in_paths [32mPASSED[0m[32m              [ 50%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_paths_use_forward_slashes [32mPASSED[0m[32m                 [100%][0m[32mPASSED[0m[32m              [ 50%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_paths_use_forward_slashes [32mPASSED[0m[32m                 [100%][0m





## Test Suite 5: Idempotency & Conflict Handling

**Tests in this suite:**
1. `test_idempotent_rerun_no_duplicates` - Verifies re-running INSERT doesn't create duplicates
2. `test_existing_values_preserved_on_conflict` - Confirms ON CONFLICT DO NOTHING preserves original values
3. `test_primary_key_constraint_enforced` - Validates PK prevents duplicates without ON CONFLICT

**How these tests work:**
- **Test 1** tests idempotent INSERT behavior:
  1. Count existing rows before re-inserting
  2. Execute INSERT with ON CONFLICT (config_key) DO NOTHING
  3. Count again - should match original count
  - ✅ Success: count_before == count_after (no duplicates created)
  - ❌ Failure: count_after > count_before (duplicates created - ON CONFLICT not working)
  - **Purpose**: Validates seed script can be run multiple times safely
  - **DevOps impact**: Enables repeatable deployments without manual cleanup
  
- **Test 2** verifies values aren't overwritten:
  1. Capture original values
  2. Attempt INSERT with different values using ON CONFLICT DO NOTHING
  3. Verify values unchanged
  - ✅ Success: original_values == current_values (values preserved)
  - ❌ Failure: Values changed (ON CONFLICT DO UPDATE incorrectly used, or missing ON CONFLICT)
  - **Purpose**: Prevents accidental overwriting of production configuration
  - **Safety**: Existing environment-specific paths remain intact on re-deployment
  
- **Test 3** validates primary key enforcement WITHOUT ON CONFLICT:
  - Attempts INSERT of duplicate key without ON CONFLICT clause
  - ✅ Success: Raises psycopg2.errors.UniqueViolation exception
  - ❌ Failure: INSERT succeeds (primary key constraint missing or not enforced)
  - **Purpose**: Confirms database-level constraint works independently of application logic
  - **Database integrity**: PK constraint is last line of defense against duplicates
  - **Technical**: Uses pytest.raises() to assert exception is thrown

In [8]:
%%ipytest -vv

def test_idempotent_rerun_no_duplicates(db_cursor):
    """Verify re-running INSERT with ON CONFLICT doesn't create duplicates."""
    # Count before
    db_cursor.execute("""
        SELECT COUNT(*) FROM public.etl_config
        WHERE config_key IN ('base_path_crm', 'base_path_erp')
    """)
    count_before = db_cursor.fetchone()[0]
    
    # Re-run the INSERT (should do nothing due to ON CONFLICT)
    db_cursor.execute("""
        INSERT INTO public.etl_config (config_key, config_value)
        VALUES
          ('base_path_crm', 'dummy_path_1'),
          ('base_path_erp', 'dummy_path_2')
        ON CONFLICT (config_key) DO NOTHING
    """)
    
    # Count after
    db_cursor.execute("""
        SELECT COUNT(*) FROM public.etl_config
        WHERE config_key IN ('base_path_crm', 'base_path_erp')
    """)
    count_after = db_cursor.fetchone()[0]
    
    assert count_before == count_after, \
        f"Idempotency violated: count changed from {count_before} to {count_after}"

def test_existing_values_preserved_on_conflict(db_cursor):
    """Verify ON CONFLICT DO NOTHING preserves existing values."""
    # Get original values
    db_cursor.execute("""
        SELECT config_key, config_value
        FROM public.etl_config
        WHERE config_key IN ('base_path_crm', 'base_path_erp')
        ORDER BY config_key
    """)
    original_values = {row[0]: row[1] for row in db_cursor.fetchall()}
    
    # Attempt to insert different values (should be ignored)
    db_cursor.execute("""
        INSERT INTO public.etl_config (config_key, config_value)
        VALUES
          ('base_path_crm', 'THIS_SHOULD_BE_IGNORED'),
          ('base_path_erp', 'THIS_SHOULD_ALSO_BE_IGNORED')
        ON CONFLICT (config_key) DO NOTHING
    """)
    
    # Verify values unchanged
    db_cursor.execute("""
        SELECT config_key, config_value
        FROM public.etl_config
        WHERE config_key IN ('base_path_crm', 'base_path_erp')
        ORDER BY config_key
    """)
    current_values = {row[0]: row[1] for row in db_cursor.fetchall()}
    
    assert original_values == current_values, \
        "ON CONFLICT DO NOTHING failed to preserve existing values"

def test_primary_key_constraint_enforced(db_cursor):
    """Verify primary key prevents duplicate keys without ON CONFLICT."""
    from psycopg2 import errors
    
    with pytest.raises(errors.UniqueViolation):
        db_cursor.execute("""
            INSERT INTO public.etl_config (config_key, config_value)
            VALUES ('base_path_crm', 'duplicate_attempt')
        """)

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 3 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_idempotent_rerun_no_duplicates collected 3 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_idempotent_rerun_no_duplicates [32mPASSED[0m[32m            [ 33%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_existing_values_preserved_on_conflict [32mPASSED[0m[32m     [ 66%][0m[32mPASSED[0m[32m            [ 33%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_existing_values_preserved_on_conflict [32mPASSED[0m[32m     [ 66%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_primary_key_constraint_enforced [32mPASSED[0m[32m           [100%][0m
t_e25fc1d24d884774a

## Test Suite 6: Integration - Path Construction

**Tests in this suite:**
1. `test_path_construction_no_double_slashes` - Validates path concatenation doesn't create double slashes
2. `test_realistic_file_path_construction` - Tests path building with real CSV filenames

**How these tests work:**
- **Test 1** simulates runtime path concatenation in SQL:
  - Constructs full paths using SQL concatenation: `config_value || '/' || 'test_file.csv'`
  - For each constructed path:
    - ✅ Success: No '//' (double slash) in full_path
    - ✅ Success: Path ends with 'test_file.csv'
    - ❌ Failure: Contains '//' (indicates trailing slash in base_path)
  - **Purpose**: Simulates actual ETL path construction logic
  - **Why test in SQL**: Validates database-side path building (used in COPY commands)
  - **Example good**: `'/datasets/source_crm' || '/' || 'cust_info.csv'` = `/datasets/source_crm/cust_info.csv` ✅
  - **Example bad**: `'/datasets/source_crm/' || '/' || 'cust_info.csv'` = `/datasets/source_crm//cust_info.csv` ❌
  
- **Test 2** tests with realistic CRM filenames:
  - Test files: ['cust_info.csv', 'prd_info.csv', 'CUST_AZ12.csv']
  - Fetches base_path_crm from database
  - Constructs full paths using Python f-string: `f"{base_path}/{filename}"`
  - Validates each constructed path:
    - ✅ Success: No '//' in path
    - ✅ Success: Path ends with expected filename
    - ❌ Failure: Double slashes or filename corruption
  - **Purpose**: End-to-end validation with actual ETL source filenames
  - **Real-world simulation**: Uses same path construction logic as Bronze layer ETL
  - **Coverage**: Tests both standard naming (cust_info.csv) and uppercase variants (CUST_AZ12.csv)

In [9]:
%%ipytest -vv

def test_path_construction_no_double_slashes(db_cursor):
    """Verify path concatenation doesn't create double slashes."""
    db_cursor.execute("""
        SELECT 
            config_key,
            config_value,
            config_value || '/' || 'test_file.csv' AS full_path
        FROM public.etl_config
        WHERE config_key IN ('base_path_crm', 'base_path_erp')
    """)
    
    for key, base_path, full_path in db_cursor.fetchall():
        assert '//' not in full_path, \
            f"{key}: Path construction created double slash: '{full_path}'"
        assert full_path.endswith('test_file.csv'), \
            f"{key}: Path construction failed: '{full_path}'"

def test_realistic_file_path_construction(db_cursor):
    """Test path construction with realistic filenames."""
    test_files = ['cust_info.csv', 'prd_info.csv', 'CUST_AZ12.csv']
    
    db_cursor.execute("""
        SELECT config_value
        FROM public.etl_config
        WHERE config_key = 'base_path_crm'
    """)
    base_path = db_cursor.fetchone()[0]
    
    for filename in test_files:
        full_path = f"{base_path}/{filename}"
        assert '//' not in full_path, f"Double slash in: {full_path}"
        assert full_path.endswith(filename), f"Filename lost: {full_path}"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 2 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_path_construction_no_double_slashes collected 2 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_path_construction_no_double_slashes [32mPASSED[0m[32m       [ 50%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_realistic_file_path_construction [32mPASSED[0m[32m          [100%][0m[32mPASSED[0m[32m       [ 50%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_realistic_file_path_construction [32mPASSED[0m[32m          [100%][0m





## Test Suite 7: Data Integrity

**Tests in this suite:**
1. `test_no_duplicate_keys` - Verifies no duplicate config_key values exist
2. `test_config_key_case_sensitivity` - Validates keys use consistent snake_case convention

**How these tests work:**
- **Test 1** checks for duplicates using GROUP BY with HAVING:
  - Groups by config_key and counts occurrences
  - Filters to show only keys appearing more than once
  - ✅ Success: Query returns 0 rows (no duplicates)
  - ❌ Failure: Any rows returned (indicates duplicate keys)
  - **Purpose**: Validates primary key constraint is working
  - **Redundant check**: Yes, but important for data integrity auditing
  - **Detects issues**: Catches edge cases like constraint disabled/dropped
  - **Example violation**: Both 'base_path_crm' and 'BASE_PATH_CRM' exist (PostgreSQL identifiers are case-sensitive in data)
  
- **Test 2** validates naming convention compliance:
  - Fetches all config keys
  - For each key, validates:
    - ✅ Success: key == key.lower() (all lowercase)
    - ✅ Success: No spaces in key
    - ❌ Failure: Mixed case (e.g., 'base_Path_CRM')
    - ❌ Failure: Contains spaces (e.g., 'base path crm')
  - **Purpose**: Enforces consistent naming convention (snake_case)
  - **Why it matters**:
    - Prevents confusion: 'base_path_crm' vs 'Base_Path_CRM' vs 'BASE_PATH_CRM'
    - Easier SQL queries: No need to remember exact casing
    - Standard convention: snake_case is Python/PostgreSQL community standard
  - **Convention**: lowercase_with_underscores (snake_case)
  - **Alternative rejected**: camelCase (not PostgreSQL convention), UPPER_CASE (harder to read)

In [10]:
%%ipytest -vv

def test_no_duplicate_keys(db_cursor):
    """Verify no duplicate config_key values exist."""
    db_cursor.execute("""
        SELECT config_key, COUNT(*)
        FROM public.etl_config
        GROUP BY config_key
        HAVING COUNT(*) > 1
    """)
    
    duplicates = db_cursor.fetchall()
    assert len(duplicates) == 0, f"Found duplicate keys: {duplicates}"

def test_config_key_case_sensitivity(db_cursor):
    """Verify config keys use consistent casing convention."""
    db_cursor.execute("""
        SELECT config_key
        FROM public.etl_config
        WHERE config_key IN ('base_path_crm', 'base_path_erp')
    """)
    
    keys = [row[0] for row in db_cursor.fetchall()]
    for key in keys:
        # Verify lowercase with underscores (snake_case)
        assert key == key.lower(), f"Key '{key}' should be lowercase"
        assert ' ' not in key, f"Key '{key}' should not contain spaces"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 2 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_no_duplicate_keys collected 2 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_no_duplicate_keys [32mPASSED[0m[32m                         [ 50%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_config_key_case_sensitivity [32mPASSED[0m[32m               [100%][0m[32mPASSED[0m[32m                         [ 50%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_config_key_case_sensitivity [32mPASSED[0m[32m               [100%][0m





## Summary: Run All Tests

**Executes all test suites (15 tests total):**
- Suite 1: Table Structure Validation (3 tests)
- Suite 2: Column Definitions (2 tests)
- Suite 3: Seeded Configuration Keys (3 tests)
- Suite 4: Path Convention Compliance (2 tests)
- Suite 5: Idempotency & Conflict Handling (3 tests)
- Suite 6: Integration - Path Construction (2 tests)
- Suite 7: Data Integrity (2 tests)

**How this cell works:**
- Executes `ipytest.run('-vv')` which runs all pytest functions defined in this notebook
- `-vv` flag provides **very verbose** output showing:
  - Each test function name as it runs
  - PASSED/FAILED status for each test
  - Detailed assertion messages on failure
  - Full traceback on errors
  - Percentage completion progress

**✅ Success Criteria:**
- All 15 tests show `PASSED` status
- Final summary shows: `15 passed in X.XXs`
- No `FAILED`, `ERROR`, or `SKIPPED` statuses

**🔧 Troubleshooting Test Failures:**

| Failure Type | Likely Cause | Solution |
|-------------|--------------|----------|
| Table doesn't exist | Seed script not run | Execute `setup/seed/01_etl_config.sql` |
| Column mismatch | Schema drift or wrong version | Verify DDL matches specification |
| Missing config keys | Incomplete seed data | Check INSERT statements in seed script |
| Trailing slash in path | Manual UPDATE or wrong seed | Fix paths: `UPDATE public.etl_config SET config_value = RTRIM(config_value, '/') WHERE config_key IN ('base_path_crm', 'base_path_erp')` |
| Primary key violation | Constraint missing | Add constraint: `ALTER TABLE public.etl_config ADD PRIMARY KEY (config_key)` |
| Duplicate keys found | Data corruption | Investigate and remove duplicates manually |

**📊 What Good Output Looks Like:**
```
============================= test session starts =============================
collected 15 items

test_01_etl_config.py::test_etl_config_table_exists PASSED              [  6%]
test_01_etl_config.py::test_etl_config_in_public_schema PASSED          [ 13%]
...
test_01_etl_config.py::test_config_key_case_sensitivity PASSED          [100%]
============================== 15 passed in 0.45s ==============================
```

In [11]:
# Run all tests in this notebook
ipytest.run('-vv')

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_setup
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 2 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_no_duplicate_keys collected 2 items

t_e25fc1d24d884774a37c61e478b34a90.py::test_no_duplicate_keys [32mPASSED[0m[32m                         [ 50%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_config_key_case_sensitivity [32mPASSED[0m[32m               [100%][0m[32mPASSED[0m[32m                         [ 50%][0m
t_e25fc1d24d884774a37c61e478b34a90.py::test_config_key_case_sensitivity [32mPASSED[0m[32m               [100%][0m





<ExitCode.OK: 0>

## Manual Inspection: View Configuration

**What this cell does:**
1. **Current Configuration** - Displays all config entries with path validation checks
2. **Table Structure** - Shows column definitions, types, lengths, and constraints
3. **Constraints** - Lists all table constraints (PRIMARY KEY, etc.)

**How this cell works:**

**Step 1: Current Configuration Display**
- Queries all config entries with validation indicators
- **Output format**: DataFrame with validation indicators
- **Expected values**:
  - config_key: 'base_path_crm', 'base_path_erp'
  - config_value: Full paths without trailing slashes
  - value_length: Typically 20-100 characters
  - path_check: All should show '✅ No trailing slash'
- **Use case**: Quick visual verification of configuration state

**Step 2: Table Structure**
- Queries column metadata from information_schema
- **Output format**: DataFrame showing schema definition
- **Expected columns**:
  1. config_key: character varying(100), NOT NULL, no default
  2. config_value: character varying(200), NOT NULL, no default
- **Purpose**: Verify DDL matches documentation

**Step 3: Constraints**
- Queries constraint definitions from pg_constraint catalog
- **Output format**: DataFrame with constraint details
- **Expected constraints**:
  - Primary Key on config_key
  - Possibly named 'etl_config_pkey' or similar
- **Purpose**: Validates database-level integrity constraints are in place

**Expected console output:**
```
📋 Current Configuration:
[DataFrame showing config_key, config_value, value_length, path_check]

🏗️  Table Structure:
[DataFrame showing column definitions]

🔒 Constraints:
[DataFrame showing PRIMARY KEY constraint]

✅ Inspection complete
```

**Use this for:**
- Visual confirmation of test assertions
- Debugging path configuration issues
- Verifying schema matches specification
- Documentation/screenshots for handoffs
- Troubleshooting trailing slash violations

In [12]:
# Connect and display current configuration
conn = psycopg2.connect(**DB_CONFIG)

# View all configuration entries
df_config = pd.read_sql("""
    SELECT 
        config_key,
        config_value,
        LENGTH(config_value) AS value_length,
        CASE 
            WHEN config_value LIKE '%/' THEN '❌ Has trailing slash'
            ELSE '✅ No trailing slash'
        END AS path_check
    FROM public.etl_config
    ORDER BY config_key
""", conn)

print("\n📋 Current Configuration:")
display(df_config)

# View table structure
df_structure = pd.read_sql("""
    SELECT 
        column_name,
        data_type,
        character_maximum_length,
        is_nullable,
        column_default
    FROM information_schema.columns
    WHERE table_schema = 'public'
      AND table_name = 'etl_config'
    ORDER BY ordinal_position
""", conn)

print("\n🏗️  Table Structure:")
display(df_structure)

# View constraints
df_constraints = pd.read_sql("""
    SELECT 
        conname AS constraint_name,
        CASE contype
            WHEN 'p' THEN 'PRIMARY KEY'
            WHEN 'f' THEN 'FOREIGN KEY'
            WHEN 'u' THEN 'UNIQUE'
            WHEN 'c' THEN 'CHECK'
            ELSE contype::text
        END AS constraint_type,
        pg_get_constraintdef(oid) AS definition
    FROM pg_constraint
    WHERE conrelid = 'public.etl_config'::regclass
    ORDER BY conname
""", conn)

print("\n🔒 Constraints:")
display(df_constraints)

conn.close()
print("\n✅ Inspection complete")


📋 Current Configuration:


  df_config = pd.read_sql("""


Unnamed: 0,config_key,config_value,value_length,path_check
0,base_path_crm,C:/Users/Laurent/Studies/sql-ultimate-course/U...,97,✅ No trailing slash
1,base_path_erp,C:/Users/Laurent/Studies/sql-ultimate-course/U...,97,✅ No trailing slash



🏗️  Table Structure:


  df_structure = pd.read_sql("""


Unnamed: 0,column_name,data_type,character_maximum_length,is_nullable,column_default
0,config_key,character varying,100,NO,
1,config_value,character varying,200,NO,



🔒 Constraints:


  df_constraints = pd.read_sql("""


Unnamed: 0,constraint_name,constraint_type,definition
0,etl_config_config_key_not_null,n,NOT NULL config_key
1,etl_config_config_value_not_null,n,NOT NULL config_value
2,etl_config_pkey,PRIMARY KEY,PRIMARY KEY (config_key)



✅ Inspection complete
