# Test Suite: Bronze Data Tables DDL

**Purpose:** Validate the creation and structure of bronze layer data tables

**Scope:**
- Table existence (all 6 bronze data tables)
- Schema ownership and location
- Column definitions and data types
- Table naming convention compliance
- No constraints validation (raw ingestion layer)
- Source system separation (CRM vs ERP)

**Testing Strategy:**
- Existence validation (all 6 tables created)
- Structure validation (correct columns and types)
- Naming validation (matches CSV file conventions)
- Constraint validation (no PKs, FKs, or indexes)
- Isolation validation (tables in bronze schema only)

**Prerequisites:**
- PostgreSQL server running
- sql_retail_analytics_warehouse database exists
- bronze schema exists
- `scripts/bronze/ddl_bronze_tables.sql` has been executed
- Connection credentials available
- Required packages: psycopg2, pytest, ipytest, pandas

## Setup: Import Dependencies & Configure Connection

In [1]:
import os
import psycopg2
from psycopg2 import sql
import pytest
import ipytest
import pandas as pd

# Configure ipytest for notebook usage
ipytest.autoconfig()

# Database connection parameters
DB_CONFIG = {
    'host': 'localhost',
    'database': 'sql_retail_analytics_warehouse',
    'user': 'postgres',
    'password': os.getenv('POSTGRES_PASSWORD', 'your_password_here')
}

# Expected bronze data tables (excludes load_jobs, load_log)
# Note: PostgreSQL converts unquoted identifiers to lowercase
EXPECTED_CRM_TABLES = ['crm_cust_info', 'crm_prd_info', 'crm_sales_details']
EXPECTED_ERP_TABLES = ['erp_cust_az12', 'erp_loc_a101', 'erp_px_cat_g1v2']
EXPECTED_ALL_TABLES = EXPECTED_CRM_TABLES + EXPECTED_ERP_TABLES

print("✅ Dependencies imported successfully")

✅ Dependencies imported successfully


## Fixtures: Database Connections

In [2]:
@pytest.fixture(scope='module')
def db_connection():
    """Connection to sql_retail_analytics_warehouse database."""
    conn = psycopg2.connect(**DB_CONFIG)
    conn.autocommit = True
    yield conn
    conn.close()

@pytest.fixture(scope='module')
def db_cursor(db_connection):
    """Cursor for warehouse database."""
    cursor = db_connection.cursor()
    yield cursor
    cursor.close()

print("✅ Fixtures defined")

✅ Fixtures defined


## Test Suite 1: Table Existence

**Tests in this suite:**
1. `test_all_six_bronze_tables_exist` - Validates total count of bronze data tables
2. `test_crm_cust_info_exists` - Verifies CRM customer information table
3. `test_crm_prd_info_exists` - Verifies CRM product information table
4. `test_crm_sales_details_exists` - Verifies CRM sales transaction table
5. `test_erp_cust_az12_exists` - Verifies ERP customer demographics table
6. `test_erp_loc_a101_exists` - Verifies ERP location mapping table
7. `test_erp_px_cat_g1v2_exists` - Verifies ERP product category table

**How these tests work:**

**Total Count Validation (`test_all_six_bronze_tables_exist`):**
- Queries `information_schema.tables` for BASE TABLE objects in bronze schema
- Excludes metadata tables (`load_jobs`, `load_log`) from count
- ✅ **Success:** Exactly 6 data tables exist in bronze schema
- ❌ **Failure:** Count is not 6, indicating missing or extra tables
- **Purpose:** Ensures DDL script executed completely without partial failures

**Individual Table Validation (6 table-specific tests):**
- Each test queries `information_schema.tables` for a specific table by name
- Filters by schema='bronze' and exact table_name match
- ✅ **Success:** COUNT(*) returns 1 (table exists and is unique)
- ❌ **Failure:** COUNT(*) returns 0 (missing) or >1 (duplicates)
- **Purpose:** Validates each individual table creation

**📝 PostgreSQL Identifier Case Sensitivity Note:**
- PostgreSQL converts **unquoted** identifiers to lowercase
- DDL uses unquoted names: `CREATE TABLE bronze.erp_CUST_AZ12` → stored as `erp_cust_az12`
- Tests must query using lowercase: `'erp_cust_az12'` not `'erp_CUST_AZ12'`
- This is PostgreSQL standard behavior (different from SQL Server/MySQL)
- To preserve case, DDL would need quotes: `CREATE TABLE bronze."erp_CUST_AZ12"`

**🏗️ Bronze Layer Architecture:**
These 6 tables form the raw data ingestion layer:
- **CRM tables (3):** Customer relationship management system data
  - `crm_cust_info`: Customer demographics and profile
  - `crm_prd_info`: Product catalog and lifecycle
  - `crm_sales_details`: Transaction-level sales records
- **ERP tables (3):** Enterprise resource planning system data  
  - `erp_cust_az12`: Customer birth date and gender
  - `erp_loc_a101`: Customer geographic location
  - `erp_px_cat_g1v2`: Product categorization hierarchy

In [3]:
%%ipytest -vv

def test_all_six_bronze_tables_exist(db_cursor):
    """Verify all 6 bronze data tables exist."""
    db_cursor.execute("""
        SELECT table_name
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_type = 'BASE TABLE'
        AND table_name NOT IN ('load_jobs', 'load_log')
        ORDER BY table_name
    """)
    
    tables = [row[0] for row in db_cursor.fetchall()]
    assert len(tables) == 6, f"Expected 6 bronze data tables, found {len(tables)}: {tables}"

def test_crm_cust_info_exists(db_cursor):
    """Verify bronze.crm_cust_info table exists."""
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_name = 'crm_cust_info'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "bronze.crm_cust_info table must exist"

def test_crm_prd_info_exists(db_cursor):
    """Verify bronze.crm_prd_info table exists."""
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_name = 'crm_prd_info'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "bronze.crm_prd_info table must exist"

def test_crm_sales_details_exists(db_cursor):
    """Verify bronze.crm_sales_details table exists."""
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_name = 'crm_sales_details'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "bronze.crm_sales_details table must exist"

def test_erp_cust_az12_exists(db_cursor):
    """Verify bronze.erp_cust_az12 table exists."""
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_name = 'erp_cust_az12'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "bronze.erp_cust_az12 table must exist"

def test_erp_loc_a101_exists(db_cursor):
    """Verify bronze.erp_loc_a101 table exists."""
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_name = 'erp_loc_a101'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "bronze.erp_loc_a101 table must exist"

def test_erp_px_cat_g1v2_exists(db_cursor):
    """Verify bronze.erp_px_cat_g1v2 table exists."""
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_name = 'erp_px_cat_g1v2'
    """)
    
    count = db_cursor.fetchone()[0]
    assert count == 1, "bronze.erp_px_cat_g1v2 table must exist"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_bronze
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 7 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_all_six_bronze_tables_exist collected 7 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_all_six_bronze_tables_exist [32mPASSED[0m[32m               [ 14%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_cust_info_exists [32mPASSED[0m[32m                      [ 28%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_prd_info_exists [32mPASSED[0m[32m                       [ 42%][0m[32mPASSED[0m[32m               [ 14%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_cust_info_exists [32mPASSED[0m[32m                      [ 28%][0m
t_1947571c5ca54797

## Test Suite 2: CRM Table Structure

**Tests in this suite:**
1. `test_crm_cust_info_columns` - Validates customer information table structure
2. `test_crm_prd_info_columns` - Validates product information table structure
3. `test_crm_sales_details_columns` - Validates sales transaction table structure

**How these tests work:**

**Column Validation Strategy:**
- Queries `information_schema.columns` for each table's column definitions
- Creates dictionary: `{column_name: data_type}` from query results
- Compares against expected column definitions
- ✅ **Success:** All expected columns exist with correct data types
- ❌ **Failure:** Missing columns or type mismatches

**Test 1: `crm_cust_info` Structure (7 columns)**
- Validates customer demographic and profile data structure
- **Key columns:**
  - `customer_id` (integer): Numeric identifier from source system
  - `customer_key` (varchar): Business key for dimension modeling
  - `customer_first_name`, `customer_last_name` (varchar): Name fields
  - `customer_material_status` (varchar): Marital status
  - `customer_gender` (varchar): Gender designation
  - `customer_create_date` (date): Customer record creation timestamp
- **Purpose:** Ensures 1:1 mapping to source CRM `cust_info.csv` file

**Test 2: `crm_prd_info` Structure (7 columns)**
- Validates product catalog and lifecycle data structure
- **Key columns:**
  - `product_id` (integer): Numeric identifier from source system
  - `product_key` (varchar): Business key for dimension modeling
  - `product_nm` (varchar): Product name/description
  - `product_cost` (integer): Manufacturing or acquisition cost
  - `product_line` (varchar): Product line categorization
  - `product_start_date`, `product_end_date` (timestamp): Product lifecycle dates
- **Purpose:** Ensures 1:1 mapping to source CRM `prd_info.csv` file
- **Note:** Timestamps allow for intraday product changes

**Test 3: `crm_sales_details` Structure (9 columns)**
- Validates transactional sales data structure (highest volume table)
- **Key columns:**
  - `sales_order_number` (varchar): Unique order identifier
  - `sales_product_key` (varchar): FK reference to product
  - `sales_customer_id` (integer): FK reference to customer
  - `sales_order_date` (timestamp): When order was placed (intraday precision)
  - `sales_shipping_date`, `sales_due_date` (date): Fulfillment dates
  - `sales_sales` (integer): Total sales amount (revenue)
  - `sales_quantity` (integer): Units sold
  - `sales_price` (integer): Unit price at time of sale
- **Purpose:** Ensures 1:1 mapping to source CRM `sales_details.csv` file
- **Business Logic:** `sales_sales = sales_quantity × sales_price`

**🔍 Data Type Validation Importance:**
- **Exact match required:** PostgreSQL uses specific type names
  - VARCHAR appears as `'character varying'` in information_schema
  - TIMESTAMP appears as `'timestamp without time zone'`
  - INTEGER appears as `'integer'`
- **Type mismatches indicate:**
  - DDL script changes not reflected in tests
  - Manual schema alterations
  - Version drift between environments

In [4]:
%%ipytest -vv

def test_crm_cust_info_columns(db_cursor):
    """Verify crm_cust_info has correct columns."""
    db_cursor.execute("""
        SELECT column_name, data_type
        FROM information_schema.columns
        WHERE table_schema = 'bronze'
        AND table_name = 'crm_cust_info'
        ORDER BY ordinal_position
    """)
    
    columns = {row[0]: row[1] for row in db_cursor.fetchall()}
    
    expected_columns = {
        'customer_id': 'integer',
        'customer_key': 'character varying',
        'customer_first_name': 'character varying',
        'customer_last_name': 'character varying',
        'customer_material_status': 'character varying',
        'customer_gender': 'character varying',
        'customer_create_date': 'date'
    }
    
    for col_name, expected_type in expected_columns.items():
        assert col_name in columns, f"Column '{col_name}' missing from crm_cust_info"
        assert columns[col_name] == expected_type, \
            f"Column '{col_name}' has wrong type: expected {expected_type}, got {columns[col_name]}"

def test_crm_prd_info_columns(db_cursor):
    """Verify crm_prd_info has correct columns."""
    db_cursor.execute("""
        SELECT column_name, data_type
        FROM information_schema.columns
        WHERE table_schema = 'bronze'
        AND table_name = 'crm_prd_info'
        ORDER BY ordinal_position
    """)
    
    columns = {row[0]: row[1] for row in db_cursor.fetchall()}
    
    expected_columns = {
        'product_id': 'integer',
        'product_key': 'character varying',
        'product_nm': 'character varying',
        'product_cost': 'integer',
        'product_line': 'character varying',
        'product_start_date': 'timestamp without time zone',
        'product_end_date': 'timestamp without time zone'
    }
    
    for col_name, expected_type in expected_columns.items():
        assert col_name in columns, f"Column '{col_name}' missing from crm_prd_info"
        assert columns[col_name] == expected_type, \
            f"Column '{col_name}' has wrong type: expected {expected_type}, got {columns[col_name]}"

def test_crm_sales_details_columns(db_cursor):
    """Verify crm_sales_details has correct columns."""
    db_cursor.execute("""
        SELECT column_name, data_type
        FROM information_schema.columns
        WHERE table_schema = 'bronze'
        AND table_name = 'crm_sales_details'
        ORDER BY ordinal_position
    """)
    
    columns = {row[0]: row[1] for row in db_cursor.fetchall()}
    
    expected_columns = {
        'sales_order_number': 'character varying',
        'sales_product_key': 'character varying',
        'sales_customer_id': 'integer',
        'sales_order_date': 'timestamp without time zone',
        'sales_shipping_date': 'date',
        'sales_due_date': 'date',
        'sales_sales': 'integer',
        'sales_quantity': 'integer',
        'sales_price': 'integer'
    }
    
    for col_name, expected_type in expected_columns.items():
        assert col_name in columns, f"Column '{col_name}' missing from crm_sales_details"
        assert columns[col_name] == expected_type, \
            f"Column '{col_name}' has wrong type: expected {expected_type}, got {columns[col_name]}"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_bronze
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 3 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_cust_info_columns collected 3 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_cust_info_columns [32mPASSED[0m[32m                     [ 33%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_prd_info_columns [32mPASSED[0m[32m                      [ 66%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_sales_details_columns [32mPASSED[0m[32m                     [ 33%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_prd_info_columns [32mPASSED[0m[32m                      [ 66%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_sales_details_col

## Test Suite 3: ERP Table Structure

**Tests in this suite:**
1. `test_erp_cust_az12_columns` - Validates ERP customer demographics structure
2. `test_erp_loc_a101_columns` - Validates ERP location hierarchy structure
3. `test_erp_px_cat_g1v2_columns` - Validates ERP product categorization structure

**How these tests work:**

**Column Validation Strategy:**
- Same approach as CRM tests: query information_schema, build dictionary, compare
- ✅ **Success:** All expected columns exist with correct data types
- ❌ **Failure:** Missing columns or type mismatches

**Test 1: `erp_cust_az12` Structure (3 columns)**
- Validates ERP customer supplemental demographics
- **Key columns:**
  - `cid` (varchar): Customer identifier (joins to CRM customer_key)
  - `date_of_birth` (date): Birth date for age calculations
  - `gender` (varchar): Gender designation
- **Purpose:** Ensures 1:1 mapping to source ERP `CUST_AZ12.csv` file
- **Integration:** Complements `crm_cust_info` with additional demographics
- **Note:** ERP uses abbreviated naming convention (`cid` vs `customer_id`)

**Test 2: `erp_loc_a101` Structure (2 columns)**
- Validates ERP customer geographic location data
- **Key columns:**
  - `cid` (varchar): Customer identifier (joins to CRM customer_key)
  - `country` (varchar): Customer country location
- **Purpose:** Ensures 1:1 mapping to source ERP `LOC_A101.csv` file
- **Integration:** Adds geographic dimension missing from CRM data
- **Simplification:** Bronze layer stores as-is; silver layer may normalize to country dimension

**Test 3: `erp_px_cat_g1v2` Structure (4 columns)**
- Validates ERP product categorization hierarchy
- **Key columns:**
  - `id` (varchar): Product identifier (joins to CRM product_key)
  - `category` (varchar): Top-level product category
  - `subcategory` (varchar): Detailed product subcategory
  - `maintenance` (varchar): Maintenance classification or product status
- **Purpose:** Ensures 1:1 mapping to source ERP `PX_CAT_G1V2.csv` file
- **Integration:** Enriches `crm_prd_info` with category hierarchy
- **Hierarchy:** category → subcategory → product (supports drill-down analytics)

**🔄 ERP vs CRM Schema Differences:**

**Naming Conventions:**
- **CRM:** Descriptive, verbose names (`customer_first_name`, `sales_order_number`)
- **ERP:** Abbreviated, cryptic codes (`cid`, `PX_CAT_G1V2`)
- **Bronze layer:** Preserves source system naming (no standardization)
- **Silver layer:** Harmonizes naming across systems

**Identifier Strategy:**
- **CRM:** Numeric IDs (`customer_id` integer) + business keys (varchar)
- **ERP:** String-based identifiers only (`cid`, `id` varchar)
- **Join strategy:** CRM `customer_key` ↔ ERP `cid`, CRM `product_key` ↔ ERP `id`

**Data Granularity:**
- **CRM:** Transactional detail (sales facts, product lifecycle)
- **ERP:** Master data and reference tables (demographics, categories, locations)
- **Complementary:** ERP enriches CRM transactions with dimensions

In [5]:
%%ipytest -vv

def test_erp_cust_az12_columns(db_cursor):
    """Verify erp_cust_az12 has correct columns."""
    db_cursor.execute("""
        SELECT column_name, data_type
        FROM information_schema.columns
        WHERE table_schema = 'bronze'
        AND table_name = 'erp_cust_az12'
        ORDER BY ordinal_position
    """)
    
    columns = {row[0]: row[1] for row in db_cursor.fetchall()}
    
    expected_columns = {
        'cid': 'character varying',
        'date_of_birth': 'date',
        'gender': 'character varying'
    }
    
    for col_name, expected_type in expected_columns.items():
        assert col_name in columns, f"Column '{col_name}' missing from erp_cust_az12"
        assert columns[col_name] == expected_type, \
            f"Column '{col_name}' has wrong type: expected {expected_type}, got {columns[col_name]}"

def test_erp_loc_a101_columns(db_cursor):
    """Verify erp_loc_a101 has correct columns."""
    db_cursor.execute("""
        SELECT column_name, data_type
        FROM information_schema.columns
        WHERE table_schema = 'bronze'
        AND table_name = 'erp_loc_a101'
        ORDER BY ordinal_position
    """)
    
    columns = {row[0]: row[1] for row in db_cursor.fetchall()}
    
    expected_columns = {
        'cid': 'character varying',
        'country': 'character varying'
    }
    
    for col_name, expected_type in expected_columns.items():
        assert col_name in columns, f"Column '{col_name}' missing from erp_loc_a101"
        assert columns[col_name] == expected_type, \
            f"Column '{col_name}' has wrong type: expected {expected_type}, got {columns[col_name]}"

def test_erp_px_cat_g1v2_columns(db_cursor):
    """Verify erp_px_cat_g1v2 has correct columns."""
    db_cursor.execute("""
        SELECT column_name, data_type
        FROM information_schema.columns
        WHERE table_schema = 'bronze'
        AND table_name = 'erp_px_cat_g1v2'
        ORDER BY ordinal_position
    """)
    
    columns = {row[0]: row[1] for row in db_cursor.fetchall()}
    
    expected_columns = {
        'id': 'character varying',
        'category': 'character varying',
        'subcategory': 'character varying',
        'maintenance': 'character varying'
    }
    
    for col_name, expected_type in expected_columns.items():
        assert col_name in columns, f"Column '{col_name}' missing from erp_px_cat_g1v2"
        assert columns[col_name] == expected_type, \
            f"Column '{col_name}' has wrong type: expected {expected_type}, got {columns[col_name]}"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_bronze
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 3 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_erp_cust_az12_columns collected 3 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_erp_cust_az12_columns [32mPASSED[0m[32m                     [ 33%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_erp_loc_a101_columns [32mPASSED[0m[32m                      [ 66%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_erp_px_cat_g1v2_columns [32mPASSED[0m[32m                     [ 33%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_erp_loc_a101_columns [32mPASSED[0m[32m                      [ 66%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_erp_px_cat_g1v2_columns

## Test Suite 4: Table Naming Convention

**Tests in this suite:**
1. `test_all_tables_in_bronze_schema` - Validates schema placement for all tables
2. `test_crm_tables_follow_naming_convention` - Validates CRM prefix convention
3. `test_erp_tables_follow_naming_convention` - Validates ERP prefix convention

**How these tests work:**

**Test 1: Schema Placement Validation**
- Iterates through all expected tables (CRM + ERP)
- For each table, queries `information_schema.tables` to find its schema
- ✅ **Success:** table_schema = 'bronze' for all tables
- ❌ **Failure:** Table found in different schema or not found at all
- **Purpose:** Prevents accidental table creation in wrong schema (public, silver, gold)
- **Why important:** Schema isolation is critical for medallion architecture

**Test 2: CRM Prefix Convention**
- Queries all tables in bronze schema with `LIKE 'crm_%'` pattern
- ✅ **Success:** Exactly 3 tables found, all match expected CRM table list
- ❌ **Failure:** Wrong count or unexpected table names
- **Purpose:** Validates source system identification through naming
- **Convention enforced:**
  - All CRM tables start with `crm_` prefix
  - Table names reflect source file names (e.g., `crm_cust_info` ← `cust_info.csv`)
  - Lowercase only (PostgreSQL standard)

**Test 3: ERP Prefix Convention**
- Queries all tables in bronze schema with `LIKE 'erp_%'` pattern
- ✅ **Success:** Exactly 3 tables found, all match expected ERP table list
- ❌ **Failure:** Wrong count or unexpected table names
- **Purpose:** Validates source system identification through naming
- **Convention enforced:**
  - All ERP tables start with `erp_` prefix
  - Table names preserve source system conventions (e.g., `erp_cust_az12` ← `CUST_AZ12.csv`)
  - Lowercase due to PostgreSQL identifier normalization

**📛 Bronze Layer Naming Standards:**

**Source System Prefixes:**
- **`crm_`** - Customer Relationship Management system tables
- **`erp_`** - Enterprise Resource Planning system tables
- **Purpose:** Immediately identify data lineage and source system
- **Future-proof:** Easy to add new source systems (`edi_`, `api_`, `legacy_`, etc.)

**Table Name Composition:**
- **Format:** `{source_prefix}_{original_table_or_file_name}`
- **Examples:**
  - CRM file `cust_info.csv` → `crm_cust_info`
  - ERP file `CUST_AZ12.csv` → `erp_cust_az12` (lowercased)
- **Rationale:** Preserves traceability to source files

**Why Naming Conventions Matter:**
- **Data Lineage:** Instantly identify source system from table name
- **Pattern Matching:** Can query all CRM or ERP tables programmatically
- **Collision Avoidance:** Prefix prevents name clashes (both systems may have "customer" table)
- **Metadata Queries:** Easy to aggregate stats by source system using LIKE patterns
- **Documentation:** Self-documenting schema (new team members understand structure)

**🔍 PostgreSQL Case Behavior (Critical):**
- Unquoted identifiers converted to lowercase automatically
- `CREATE TABLE erp_CUST_AZ12` → stored as `erp_cust_az12`
- Query with `'erp_cust_az12'` NOT `'erp_CUST_AZ12'`
- Inconsistent casing in queries will fail silently (0 results)

In [6]:
%%ipytest -vv

def test_all_tables_in_bronze_schema(db_cursor):
    """Verify all data tables are in bronze schema."""
    for table in EXPECTED_ALL_TABLES:
        db_cursor.execute("""
            SELECT table_schema
            FROM information_schema.tables
            WHERE table_name = %s
        """, (table,))
        
        result = db_cursor.fetchone()
        assert result is not None, f"Table '{table}' not found"
        assert result[0] == 'bronze', \
            f"Table '{table}' should be in bronze schema, found in '{result[0]}'"

def test_crm_tables_follow_naming_convention(db_cursor):
    """Verify CRM tables follow crm_ prefix convention."""
    db_cursor.execute("""
        SELECT table_name
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_name LIKE 'crm_%'
        ORDER BY table_name
    """)
    
    crm_tables = [row[0] for row in db_cursor.fetchall()]
    
    assert len(crm_tables) == 3, \
        f"Expected 3 CRM tables, found {len(crm_tables)}: {crm_tables}"
    
    for table in EXPECTED_CRM_TABLES:
        assert table in crm_tables, f"Expected CRM table '{table}' not found"

def test_erp_tables_follow_naming_convention(db_cursor):
    """Verify ERP tables follow erp_ prefix convention."""
    db_cursor.execute("""
        SELECT table_name
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_name LIKE 'erp_%'
        ORDER BY table_name
    """)
    
    erp_tables = [row[0] for row in db_cursor.fetchall()]
    
    assert len(erp_tables) == 3, \
        f"Expected 3 ERP tables, found {len(erp_tables)}: {erp_tables}"
    
    for table in EXPECTED_ERP_TABLES:
        assert table in erp_tables, f"Expected ERP table '{table}' not found"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_bronze
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 3 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_all_tables_in_bronze_schema collected 3 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_all_tables_in_bronze_schema [32mPASSED[0m[32m               [ 33%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_tables_follow_naming_convention [32mPASSED[0m[32m       [ 66%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_erp_tables_follow_naming_convention [32mPASSED[0m[32m       [100%][0m[32mPASSED[0m[32m               [ 33%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_tables_follow_naming_convention [32mPASSED[0m[32m       [ 66%][0m
t_1947571c5ca54797

## Test Suite 5: No Constraints (Raw Ingestion Layer)

**Tests in this suite:**
1. `test_no_primary_keys_on_bronze_tables` - Validates absence of PK constraints
2. `test_no_foreign_keys_on_bronze_tables` - Validates absence of FK constraints
3. `test_no_unique_constraints_on_bronze_tables` - Validates absence of unique constraints
4. `test_no_check_constraints_on_bronze_tables` - Validates absence of check constraints
5. `test_no_indexes_on_bronze_tables` - Validates absence of indexes

**How these tests work:**

**Constraint Validation (Tests 1-4):**
- Queries `information_schema.table_constraints` for each table
- Filters by constraint_type: 'PRIMARY KEY', 'FOREIGN KEY', 'UNIQUE', 'CHECK'
- ✅ **Success:** COUNT(*) = 0 for each constraint type on each table
- ❌ **Failure:** Any constraint found (indicates violation of bronze layer principles)

**Index Validation (Test 5):**
- Queries `pg_indexes` system catalog for each table
- ✅ **Success:** COUNT(*) = 0 (no indexes on table)
- ❌ **Failure:** Indexes found (reduces write performance)

**Why Bronze Tables Have NO Constraints:**

**🚫 No Primary Keys:**
- **Reason:** Source data may contain duplicates (data quality issues to fix in silver layer)
- **Example:** Multiple rows for same customer_id due to system errors
- **Bronze philosophy:** Accept data as-is, validate later
- **Silver layer:** Will deduplicate and assign surrogate keys

**🚫 No Foreign Keys:**
- **Reason:** Relationships may be broken in source systems
- **Example:** Sales record referencing non-existent customer_id
- **Bronze philosophy:** Preserve orphaned records for investigation
- **Silver layer:** Will validate referential integrity and flag violations
- **Performance:** FK checks slow down bulk INSERT operations

**🚫 No Unique Constraints:**
- **Reason:** Business keys may not be unique in source data
- **Example:** Duplicate customer_key values from data entry errors
- **Bronze philosophy:** Capture duplicates for data quality analysis
- **Silver layer:** Will identify and resolve duplicates using business rules

**🚫 No Check Constraints:**
- **Reason:** Source data may violate business rules
- **Example:** Negative sales_quantity, future dates, NULL required fields
- **Bronze philosophy:** Schema-on-read (load first, validate later)
- **Silver layer:** Will apply data quality rules and quarantine invalid records

**🚫 No Indexes:**
- **Reason:** Optimize for write performance, not read performance
- **Bronze usage pattern:** Bulk INSERT operations (COPY from CSV)
- **Performance impact:** Indexes slow down INSERT by 50-70%
- **Query pattern:** Bronze tables rarely queried directly (ETL reads, not analytics)
- **Silver/gold layers:** Will add indexes optimized for query patterns

**⚡ Performance Benefits of Constraint-Free Design:**

**Write Performance:**
- No constraint validation overhead during INSERT
- No index maintenance during bulk loads
- Faster COPY operations (direct file → table)
- Reduced lock contention

**Operational Benefits:**
- Load never fails due to constraint violations
- Complete audit trail of source data quality issues
- Simplified error recovery (just truncate and reload)
- No cascade delete concerns

**Bronze Layer Loading Philosophy:**
```
Source System → Bronze Layer (no validation)
Bronze Layer → Silver Layer (validation + cleansing)
Silver Layer → Gold Layer (business logic + aggregation)
```

**🔧 When Constraints Are Added:**
- **Silver layer:** Primary keys (surrogate keys), unique constraints (business keys)
- **Gold layer:** Referential integrity (star schema relationships)
- **Never bronze:** Constraints would violate raw data preservation principle

In [7]:
%%ipytest -vv

def test_no_primary_keys_on_bronze_tables(db_cursor):
    """Verify bronze data tables have no primary keys (raw ingestion)."""
    for table in EXPECTED_ALL_TABLES:
        db_cursor.execute("""
            SELECT COUNT(*)
            FROM information_schema.table_constraints
            WHERE table_schema = 'bronze'
            AND table_name = %s
            AND constraint_type = 'PRIMARY KEY'
        """, (table,))
        
        pk_count = db_cursor.fetchone()[0]
        assert pk_count == 0, \
            f"Table '{table}' should not have primary key (raw ingestion layer)"

def test_no_foreign_keys_on_bronze_tables(db_cursor):
    """Verify bronze data tables have no foreign keys."""
    for table in EXPECTED_ALL_TABLES:
        db_cursor.execute("""
            SELECT COUNT(*)
            FROM information_schema.table_constraints
            WHERE table_schema = 'bronze'
            AND table_name = %s
            AND constraint_type = 'FOREIGN KEY'
        """, (table,))
        
        fk_count = db_cursor.fetchone()[0]
        assert fk_count == 0, \
            f"Table '{table}' should not have foreign keys (relationships in silver/gold)"

def test_no_unique_constraints_on_bronze_tables(db_cursor):
    """Verify bronze data tables have no unique constraints."""
    for table in EXPECTED_ALL_TABLES:
        db_cursor.execute("""
            SELECT COUNT(*)
            FROM information_schema.table_constraints
            WHERE table_schema = 'bronze'
            AND table_name = %s
            AND constraint_type = 'UNIQUE'
        """, (table,))
        
        unique_count = db_cursor.fetchone()[0]
        assert unique_count == 0, \
            f"Table '{table}' should not have unique constraints (duplicates allowed)"

def test_no_check_constraints_on_bronze_tables(db_cursor):
    """Verify bronze data tables have no check constraints."""
    for table in EXPECTED_ALL_TABLES:
        db_cursor.execute("""
            SELECT COUNT(*)
            FROM information_schema.table_constraints
            WHERE table_schema = 'bronze'
            AND table_name = %s
            AND constraint_type = 'CHECK'
        """, (table,))
        
        check_count = db_cursor.fetchone()[0]
        assert check_count == 0, \
            f"Table '{table}' should not have check constraints (validation in silver)"

def test_no_indexes_on_bronze_tables(db_cursor):
    """Verify bronze data tables have no indexes (write-optimized)."""
    for table in EXPECTED_ALL_TABLES:
        db_cursor.execute("""
            SELECT COUNT(*)
            FROM pg_indexes
            WHERE schemaname = 'bronze'
            AND tablename = %s
        """, (table,))
        
        index_count = db_cursor.fetchone()[0]
        assert index_count == 0, \
            f"Table '{table}' should not have indexes (raw ingestion prioritizes write speed)"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_bronze
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 5 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_no_primary_keys_on_bronze_tables collected 5 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_no_primary_keys_on_bronze_tables [32mPASSED[0m[32m          [ 20%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_no_foreign_keys_on_bronze_tables [32mPASSED[0m[32m          [ 20%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_no_foreign_keys_on_bronze_tables [32mPASSED[0m[32m          [ 40%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_no_unique_constraints_on_bronze_tables [32mPASSED[0m[32m    [ 60%][0m[32mPASSED[0m[32m          [ 40%][0m
t_1947571c5ca54797

## Test Suite 6: Source System Separation

**Tests in this suite:**
1. `test_crm_and_erp_table_counts` - Validates 3 CRM + 3 ERP table distribution
2. `test_table_names_exact_match` - Validates complete table inventory matches expectations

**How these tests work:**

**Test 1: Source System Distribution**
- Counts CRM tables: `WHERE table_name LIKE 'crm_%'`
- Counts ERP tables: `WHERE table_name LIKE 'erp_%'`
- ✅ **Success:** crm_count = 3 AND erp_count = 3
- ❌ **Failure:** Incorrect counts (missing tables or extras from other sources)
- **Purpose:** Validates balanced source system representation

**Test 2: Exact Inventory Match**
- Retrieves all bronze data table names (excludes load_jobs, load_log)
- Sorts actual and expected lists alphabetically
- Performs exact list comparison (order-sensitive)
- ✅ **Success:** Lists match exactly (same names, same order)
- ❌ **Failure:** Name differences, missing tables, or extra tables
- **Purpose:** Ensures complete 1:1 match with expected table inventory

**🗂️ Source System Architecture:**

**CRM System (3 tables - Customer-Facing Data):**
- `crm_cust_info` - Customer master data (demographics, profile)
- `crm_prd_info` - Product catalog (lifecycle, pricing)
- `crm_sales_details` - Transaction facts (orders, revenue)
- **Characteristics:** Transactional, high-volume, customer-centric
- **Update frequency:** Real-time to daily batches
- **Data quality:** Variable (direct customer input, manual entry)

**ERP System (3 tables - Back-Office Data):**
- `erp_cust_az12` - Customer supplemental demographics
- `erp_loc_a101` - Geographic location hierarchy
- `erp_px_cat_g1v2` - Product categorization taxonomy
- **Characteristics:** Master data, low-volume, reference tables
- **Update frequency:** Weekly to monthly batches
- **Data quality:** Higher (controlled by data stewards)

**Why 3+3 Split Matters:**

**System Isolation:**
- **Independent failures:** CRM outage doesn't block ERP data loads
- **Separate schedules:** Different refresh frequencies per system
- **Clear ownership:** CRM team owns CRM tables, ERP team owns ERP tables
- **Simplified troubleshooting:** Issues isolated to source system

**Data Lineage Clarity:**
- **Prefix convention:** Instantly identify which system data came from
- **Audit trails:** Track data quality issues back to source
- **Version control:** Source systems evolve independently
- **Documentation:** Self-documenting schema (prefix = lineage)

**Integration Strategy:**
- **Bronze layer:** Keep systems separate (no joins, no merging)
- **Silver layer:** Integrate via business keys (customer_key ↔ cid, product_key ↔ id)
- **Gold layer:** Unified dimensional model (customer dimension combines CRM + ERP)

**🔄 Why NOT Merge at Bronze Level:**

**Preserve Source Fidelity:**
- Different schemas, naming conventions, data types
- Source system changes don't cascade to integrated tables
- Ability to reload from source without affecting other system

**Data Quality Transparency:**
- CRM vs ERP quality metrics tracked separately
- Identify which system is source of data issues
- Business can prioritize data quality investments

**Operational Flexibility:**
- Load CRM and ERP on different schedules
- Retry failed loads without affecting other system
- Add new source systems without redesigning bronze layer

**Historical Analysis:**
- Track when each source system data was loaded
- Compare CRM vs ERP data arrival times
- Analyze cross-system latency and synchronization

**🎯 Expected Table Count Validation:**
- **Total:** 6 data tables (excludes metadata tables)
- **CRM:** 3 tables (customer, product, sales)
- **ERP:** 3 tables (demographics, location, categories)
- **Ratio:** 1:1 balance between transactional (CRM) and master data (ERP)
- **Future growth:** Can add more sources (`edi_`, `api_`, `legacy_`) without changing existing tables

In [8]:
%%ipytest -vv

def test_crm_and_erp_table_counts(db_cursor):
    """Verify correct distribution: 3 CRM tables, 3 ERP tables."""
    # Count CRM tables
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_name LIKE 'crm_%'
    """)
    crm_count = db_cursor.fetchone()[0]
    
    # Count ERP tables
    db_cursor.execute("""
        SELECT COUNT(*)
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_name LIKE 'erp_%'
    """)
    erp_count = db_cursor.fetchone()[0]
    
    assert crm_count == 3, f"Expected 3 CRM tables, found {crm_count}"
    assert erp_count == 3, f"Expected 3 ERP tables, found {erp_count}"

def test_table_names_exact_match(db_cursor):
    """Verify exact table names (case-sensitive check)."""
    db_cursor.execute("""
        SELECT table_name
        FROM information_schema.tables
        WHERE table_schema = 'bronze'
        AND table_type = 'BASE TABLE'
        AND table_name NOT IN ('load_jobs', 'load_log')
        ORDER BY table_name
    """)
    
    actual_tables = sorted([row[0] for row in db_cursor.fetchall()])
    expected_tables = sorted(EXPECTED_ALL_TABLES)
    
    assert actual_tables == expected_tables, \
        f"Table names don't match exactly.\nExpected: {expected_tables}\nActual: {actual_tables}"

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_bronze
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 2 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_and_erp_table_counts collected 2 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_and_erp_table_counts [32mPASSED[0m[32m                  [ 50%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_table_names_exact_match [32mPASSED[0m[32m                   [100%][0m[32mPASSED[0m[32m                  [ 50%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_table_names_exact_match [32mPASSED[0m[32m                   [100%][0m





## Summary: Run All Tests

In [9]:
# Run all tests in this notebook
ipytest.run('-vv')

platform win32 -- Python 3.12.4, pytest-8.4.2, pluggy-1.6.0 -- c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\.venv\Scripts\python.exe
cachedir: .pytest_cache
rootdir: c:\Users\Laurent\Studies\sql-ultimate-course\Udemy-SQL-Data-Warehouse-Project\tests\tests_bronze
plugins: anyio-4.11.0, nbmake-1.5.5
[1mcollecting ... [0mcollected 2 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_and_erp_table_counts collected 2 items

t_1947571c5ca54797bfe227608c1c40b3.py::test_crm_and_erp_table_counts [32mPASSED[0m[32m                  [ 50%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_table_names_exact_match [32mPASSED[0m[32m                   [100%][0m[32mPASSED[0m[32m                  [ 50%][0m
t_1947571c5ca54797bfe227608c1c40b3.py::test_table_names_exact_match [32mPASSED[0m[32m                   [100%][0m





<ExitCode.OK: 0>

## Manual Inspection: Table Details

In [10]:
# Connect to warehouse database
conn = psycopg2.connect(**DB_CONFIG)

# Get comprehensive table information
df_tables = pd.read_sql("""
    SELECT
        table_name,
        CASE
            WHEN table_name LIKE 'crm_%' THEN 'CRM'
            WHEN table_name LIKE 'erp_%' THEN 'ERP'
            ELSE 'Other'
        END AS source_system,
        (
            SELECT COUNT(*)
            FROM information_schema.columns c
            WHERE c.table_schema = t.table_schema
            AND c.table_name = t.table_name
        ) AS column_count,
        pg_size_pretty(pg_total_relation_size(format('%I.%I', table_schema, table_name))) AS total_size
    FROM information_schema.tables t
    WHERE table_schema = 'bronze'
    AND table_type = 'BASE TABLE'
    AND table_name NOT IN ('load_jobs', 'load_log')
    ORDER BY source_system, table_name
""", conn)

print("\n📊 Bronze Data Tables:")
display(df_tables)

# Get detailed column information for all tables
df_columns = pd.read_sql("""
    SELECT
        table_name,
        column_name,
        data_type,
        character_maximum_length,
        is_nullable,
        column_default
    FROM information_schema.columns
    WHERE table_schema = 'bronze'
    AND table_name IN ('crm_cust_info', 'crm_prd_info', 'crm_sales_details',
                       'erp_cust_az12', 'erp_loc_a101', 'erp_px_cat_g1v2')
    ORDER BY table_name, ordinal_position
""", conn)

print("\n📋 Column Details:")
display(df_columns)

# Check for any constraints (should be none)
df_constraints = pd.read_sql("""
    SELECT
        table_name,
        constraint_type,
        constraint_name
    FROM information_schema.table_constraints
    WHERE table_schema = 'bronze'
    AND table_name IN ('crm_cust_info', 'crm_prd_info', 'crm_sales_details',
                       'erp_cust_az12', 'erp_loc_a101', 'erp_px_cat_g1v2')
    ORDER BY table_name, constraint_type
""", conn)

print("\n🔒 Constraints (should be empty):")
if len(df_constraints) > 0:
    display(df_constraints)
    print("⚠️  WARNING: Bronze tables should not have constraints!")
else:
    print("   ✅ No constraints found (correct for raw ingestion layer)")

# Check for any indexes (should be none)
df_indexes = pd.read_sql("""
    SELECT
        tablename,
        indexname,
        indexdef
    FROM pg_indexes
    WHERE schemaname = 'bronze'
    AND tablename IN ('crm_cust_info', 'crm_prd_info', 'crm_sales_details',
                      'erp_cust_az12', 'erp_loc_a101', 'erp_px_cat_g1v2')
    ORDER BY tablename, indexname
""", conn)

print("\n📇 Indexes (should be empty):")
if len(df_indexes) > 0:
    display(df_indexes)
    print("⚠️  WARNING: Bronze tables should not have indexes!")
else:
    print("   ✅ No indexes found (correct for write-optimized ingestion)")

# Summary statistics
df_summary = pd.read_sql("""
    SELECT
        CASE
            WHEN table_name LIKE 'crm_%' THEN 'CRM'
            WHEN table_name LIKE 'erp_%' THEN 'ERP'
        END AS source_system,
        COUNT(*) AS table_count,
        SUM((
            SELECT COUNT(*)
            FROM information_schema.columns c
            WHERE c.table_schema = t.table_schema
            AND c.table_name = t.table_name
        )) AS total_columns
    FROM information_schema.tables t
    WHERE table_schema = 'bronze'
    AND table_type = 'BASE TABLE'
    AND table_name NOT IN ('load_jobs', 'load_log')
    GROUP BY source_system
    ORDER BY source_system
""", conn)

print("\n📈 Summary by Source System:")
display(df_summary)

conn.close()
print("\n✅ Inspection complete")


📊 Bronze Data Tables:


  df_tables = pd.read_sql("""


Unnamed: 0,table_name,source_system,column_count,total_size
0,crm_cust_info,CRM,7,0 bytes
1,crm_prd_info,CRM,7,0 bytes
2,crm_sales_details,CRM,9,0 bytes
3,erp_cust_az12,ERP,3,0 bytes
4,erp_loc_a101,ERP,2,0 bytes
5,erp_px_cat_g1v2,ERP,4,0 bytes



📋 Column Details:


  df_columns = pd.read_sql("""


Unnamed: 0,table_name,column_name,data_type,character_maximum_length,is_nullable,column_default
0,crm_cust_info,customer_id,integer,,YES,
1,crm_cust_info,customer_key,character varying,50.0,YES,
2,crm_cust_info,customer_first_name,character varying,50.0,YES,
3,crm_cust_info,customer_last_name,character varying,50.0,YES,
4,crm_cust_info,customer_material_status,character varying,50.0,YES,
5,crm_cust_info,customer_gender,character varying,50.0,YES,
6,crm_cust_info,customer_create_date,date,,YES,
7,crm_prd_info,product_id,integer,,YES,
8,crm_prd_info,product_key,character varying,50.0,YES,
9,crm_prd_info,product_nm,character varying,50.0,YES,



🔒 Constraints (should be empty):
   ✅ No constraints found (correct for raw ingestion layer)

📇 Indexes (should be empty):
   ✅ No indexes found (correct for write-optimized ingestion)

📈 Summary by Source System:


  df_constraints = pd.read_sql("""
  df_indexes = pd.read_sql("""
  df_summary = pd.read_sql("""


Unnamed: 0,source_system,table_count,total_columns
0,CRM,3,23.0
1,ERP,3,9.0



✅ Inspection complete
