# **Chapter 31: Database Testing Techniques**

---

## **31.1 Introduction to Database Testing Methodologies**

Database testing ensures that the data persistence layer of an application functions correctly, securely, and efficiently. While SQL skills (covered in Chapter 30) provide the tools to query data, testing methodologies provide the systematic approach to validate database behavior under various conditions.

**The Scope of Database Testing:**

Unlike unit testing which isolates code components, database testing validates:
- **Structural Integrity:** Tables, columns, constraints, and relationships match design specifications
- **Data Integrity:** Business rules are enforced at the database level, preventing invalid data states
- **Functional Correctness:** Stored procedures, triggers, and functions execute business logic accurately
- **Performance:** Queries execute within acceptable time limits under expected load
- **Security:** Access controls prevent unauthorized data access or modification
- **Recoverability:** Data can be restored accurately after failures or disasters

**Testing Pyramid for Databases:**
```
         ┌─────────────┐
         │   UI/E2E    │  (Verify data flows end-to-end)
         │    Tests    │
        ┌┴─────────────┴┐
        │  Integration   │  (APIs + Database interactions)
        │    Testing     │
       ┌┴───────────────┴┐
       │   Database Unit  │  (Stored procedures, functions)
       │     Testing      │
      ┌┴─────────────────┴┐
      │  Schema/Data Unit  │  (Constraints, triggers, CRUD)
      │     Testing        │
      └────────────────────┘
```

---

## **31.2 Schema Testing**

Schema testing validates that the database structure matches the approved design and supports application requirements.

### **31.2.1 Structural Validation**

Schema testing verifies that tables, columns, data types, and constraints exist as specified.

```python
class SchemaTesting:
    """
    Automated schema validation against specifications
    """
    
    def test_table_existence(self, db_connection, expected_schema):
        """
        Verify all required tables exist
        """
        cursor = db_connection.cursor()
        
        # Get actual tables
        cursor.execute("""
            SELECT table_name 
            FROM information_schema.tables 
            WHERE table_schema = DATABASE()
        """)
        actual_tables = {row[0] for row in cursor.fetchall()}
        
        # Compare with expected
        expected_tables = set(expected_schema.keys())
        
        missing_tables = expected_tables - actual_tables
        extra_tables = actual_tables - expected_tables
        
        assert len(missing_tables) == 0, f"Missing tables: {missing_tables}"
        
        # Extra tables might be acceptable (audit logs, etc.), but flag them
        if extra_tables:
            print(f"Warning: Unexpected tables found: {extra_tables}")
    
    def test_column_definitions(self, db_connection, table_name, expected_columns):
        """
        Verify column names, types, and constraints
        """
        cursor = db_connection.cursor()
        
        cursor.execute("""
            SELECT 
                column_name,
                data_type,
                is_nullable,
                column_default,
                character_maximum_length
            FROM information_schema.columns
            WHERE table_name = %s
            ORDER BY ordinal_position
        """, (table_name,))
        
        actual_columns = {}
        for row in cursor.fetchall():
            actual_columns[row[0]] = {
                'type': row[1],
                'nullable': row[2] == 'YES',
                'default': row[3],
                'max_length': row[4]
            }
        
        # Validate each expected column
        for col_name, specs in expected_columns.items():
            assert col_name in actual_columns, f"Column {col_name} missing from {table_name}"
            
            actual = actual_columns[col_name]
            
            # Data type validation (with aliases normalized)
            expected_type = self.normalize_type(specs['type'])
            actual_type = self.normalize_type(actual['type'])
            
            assert actual_type == expected_type, \
                f"{table_name}.{col_name}: Expected {expected_type}, got {actual_type}"
            
            # Nullable constraint
            if specs.get('nullable') is not None:
                assert actual['nullable'] == specs['nullable'], \
                    f"{table_name}.{col_name}: Nullable mismatch"
            
            # Default value
            if 'default' in specs:
                assert actual['default'] == specs['default'], \
                    f"{table_name}.{col_name}: Default value mismatch"
    
    def normalize_type(self, data_type):
        """Normalize type names across databases"""
        type_mapping = {
            'int': 'integer',
            'bigint': 'bigint',
            'varchar': 'character varying',
            'text': 'text',
            'timestamp': 'timestamp without time zone',
            'datetime': 'timestamp',
            'bool': 'boolean',
            'decimal': 'numeric',
            'float': 'double precision'
        }
        return type_mapping.get(data_type.lower(), data_type.lower())
    
    def test_index_existence(self, db_connection, expected_indexes):
        """
        Verify required indexes exist for performance
        """
        cursor = db_connection.cursor()
        
        cursor.execute("""
            SELECT 
                table_name,
                index_name,
                column_name
            FROM information_schema.statistics
            WHERE table_schema = DATABASE()
            ORDER BY table_name, index_name, seq_in_index
        """)
        
        actual_indexes = {}
        for row in cursor.fetchall():
            table, idx_name, col = row
            key = f"{table}.{idx_name}"
            if key not in actual_indexes:
                actual_indexes[key] = []
            actual_indexes[key].append(col)
        
        # Check expected indexes exist
        for idx_spec in expected_indexes:
            table = idx_spec['table']
            columns = tuple(idx_spec['columns'])
            idx_name = idx_spec.get('name', f"idx_{table}_{'_'.join(columns)}")
            
            # Check if any index covers these columns (in order)
            found = False
            for key, idx_columns in actual_indexes.items():
                if key.startswith(f"{table}.") and tuple(idx_columns[:len(columns)]) == columns:
                    found = True
                    break
            
            assert found, f"Index missing on {table}({', '.join(columns)})"
```

### **31.2.2 Constraint Validation**

Constraints enforce business rules at the database level.

```python
def test_primary_key_constraints(self, db_connection, table_name, pk_columns):
    """
    Verify primary key constraints (uniqueness and not-null)
    """
    cursor = db_connection.cursor()
    
    # Test 1: Uniqueness
    cursor.execute(f"""
        SELECT {', '.join(pk_columns)}, COUNT(*) as cnt
        FROM {table_name}
        GROUP BY {', '.join(pk_columns)}
        HAVING COUNT(*) > 1
    """)
    
    duplicates = cursor.fetchall()
    assert len(duplicates) == 0, f"Duplicate PKs found in {table_name}: {duplicates}"
    
    # Test 2: NOT NULL
    for col in pk_columns:
        cursor.execute(f"""
            SELECT COUNT(*) FROM {table_name} WHERE {col} IS NULL
        """)
        null_count = cursor.fetchone()[0]
        assert null_count == 0, f"NULL values found in PK column {table_name}.{col}"

def test_foreign_key_integrity(self, db_connection, child_table, fk_column, parent_table, pk_column):
    """
    Verify referential integrity between tables
    """
    cursor = db_connection.cursor()
    
    # Find orphaned records
    cursor.execute(f"""
        SELECT c.{fk_column}
        FROM {child_table} c
        LEFT JOIN {parent_table} p ON c.{fk_column} = p.{pk_column}
        WHERE p.{pk_column} IS NULL
          AND c.{fk_column} IS NOT NULL
    """)
    
    orphans = cursor.fetchall()
    assert len(orphans) == 0, f"Orphaned records in {child_table}: {orphans[:5]}..."

def test_check_constraints(self, db_connection, table_name):
    """
    Verify CHECK constraints are enforced
    """
    cursor = db_connection.cursor()
    
    # Example: Price must be > 0
    try:
        cursor.execute(f"""
            INSERT INTO {table_name} (name, price) VALUES ('Test', -10)
        """)
        db_connection.commit()
        assert False, "CHECK constraint not enforced: Negative price allowed"
    except IntegrityError:
        db_connection.rollback()  # Expected behavior
    
    # Example: Status must be in allowed values
    try:
        cursor.execute(f"""
            INSERT INTO {table_name} (name, status) VALUES ('Test', 'invalid_status')
        """)
        db_connection.commit()
        assert False, "CHECK constraint not enforced: Invalid status allowed"
    except IntegrityError:
        db_connection.rollback()

def test_unique_constraints(self, db_connection, table_name, unique_columns):
    """
    Verify unique constraints (emails, usernames, etc.)
    """
    cursor = db_connection.cursor()
    
    # Find duplicates
    cursor.execute(f"""
        SELECT {', '.join(unique_columns)}, COUNT(*) as cnt
        FROM {table_name}
        GROUP BY {', '.join(unique_columns)}
        HAVING COUNT(*) > 1
    """)
    
    duplicates = cursor.fetchall()
    assert len(duplicates) == 0, f"Unique constraint violations in {table_name}: {duplicates}"
```

---

## **31.3 Data Integrity Testing**

Data integrity testing ensures that business rules are maintained regardless of how data enters the system (UI, API, bulk import, or direct SQL).

### **31.3.1 Entity Integrity**

```python
class DataIntegrityTesting:
    """
    Comprehensive data integrity validation
    """
    
    def test_no_duplicate_records(self, db_connection, table_name, unique_key_fields):
        """
        Detect duplicate business keys (not just surrogate keys)
        """
        cursor = db_connection.cursor()
        
        # Composite unique key check
        fields_str = ', '.join(unique_key_fields)
        
        cursor.execute(f"""
            SELECT {fields_str}, COUNT(*) as duplicate_count
            FROM {table_name}
            GROUP BY {fields_str}
            HAVING COUNT(*) > 1
        """)
        
        duplicates = cursor.fetchall()
        
        if duplicates:
            # Log first few duplicates for analysis
            for dup in duplicates[:5]:
                print(f"Duplicate found: {dict(zip(unique_key_fields, dup))}")
            
            assert False, f"Found {len(duplicates)} duplicate records in {table_name}"
    
    def test_mandatory_fields(self, db_connection, table_name, required_fields):
        """
        Verify critical fields never contain NULL or empty values
        """
        cursor = db_connection.cursor()
        
        for field in required_fields:
            # Check NULL
            cursor.execute(f"""
                SELECT COUNT(*) FROM {table_name} WHERE {field} IS NULL
            """)
            null_count = cursor.fetchone()[0]
            
            # Check empty strings (for text fields)
            cursor.execute(f"""
                SELECT COUNT(*) FROM {table_name} WHERE {field} = ''
            """)
            empty_count = cursor.fetchone()[0]
            
            total_invalid = null_count + empty_count
            
            assert total_invalid == 0, \
                f"{table_name}.{field} has {null_count} NULLs and {empty_count} empty strings"
    
    def test_data_format_consistency(self, db_connection, table_name, field, pattern):
        """
        Verify data follows expected formats (regex validation)
        """
        import re
        
        cursor = db_connection.cursor()
        cursor.execute(f"SELECT {field} FROM {table_name} WHERE {field} IS NOT NULL")
        
        invalid_values = []
        for (value,) in cursor.fetchall():
            if not re.match(pattern, str(value)):
                invalid_values.append(value)
        
        assert len(invalid_values) == 0, \
            f"Invalid format in {table_name}.{field}: {invalid_values[:10]}"
```

### **31.3.2 Referential Integrity Across Systems**

When data spans multiple databases or services:

```python
def test_cross_system_referential_integrity(self, source_db, target_db, mapping_config):
    """
    Verify data consistency between OLTP and Data Warehouse, or microservices
    """
    # Example: Verify all users in Auth Service exist in User Profile Service
    source_cursor = source_db.cursor()
    target_cursor = target_db.cursor()
    
    # Get all IDs from source
    source_cursor.execute(f"""
        SELECT {mapping_config['source_id_field']}, {mapping_config['checksum_fields']}
        FROM {mapping_config['source_table']}
        WHERE {mapping_config['filter_condition']}
    """)
    
    source_data = {row[0]: row[1:] for row in source_cursor.fetchall()}
    
    # Get corresponding IDs from target
    target_cursor.execute(f"""
        SELECT {mapping_config['target_id_field']}, {mapping_config['checksum_fields']}
        FROM {mapping_config['target_table']}
    """)
    
    target_data = {row[0]: row[1:] for row in target_cursor.fetchall()}
    
    # Find missing in target
    missing_in_target = set(source_data.keys()) - set(target_data.keys())
    assert len(missing_in_target) == 0, \
        f"Records missing in target: {list(missing_in_target)[:10]}"
    
    # Find data mismatches (if checksums provided)
    if mapping_config.get('verify_checksums'):
        mismatches = []
        for id_key in source_data:
            if source_data[id_key] != target_data.get(id_key):
                mismatches.append(id_key)
        
        assert len(mismatches) == 0, \
            f"Data mismatches for IDs: {mismatches[:10]}"
```

---

## **31.4 Database Security Testing**

Security testing ensures that data is protected from unauthorized access and injection attacks.

### **31.4.1 Privilege and Access Control Testing**

```python
class DatabaseSecurityTesting:
    """
    Security testing for database layer
    """
    
    def test_principle_of_least_privilege(self, db_connection, user_config):
        """
        Verify users only have necessary permissions
        """
        cursor = db_connection.cursor()
        
        # Get actual permissions
        cursor.execute("""
            SELECT table_name, privilege_type
            FROM information_schema.table_privileges
            WHERE grantee = %s
        """, (user_config['username'],))
        
        actual_permissions = {(row[0], row[1]) for row in cursor.fetchall()}
        expected_permissions = set(user_config['expected_permissions'])
        
        # Check for excessive permissions
        excess = actual_permissions - expected_permissions
        assert len(excess) == 0, f"User has excessive permissions: {excess}"
        
        # Check for missing permissions
        missing = expected_permissions - actual_permissions
        assert len(missing) == 0, f"User missing required permissions: {missing}"
    
    def test_sql_injection_prevention(self, db_connection, vulnerable_endpoint):
        """
        Test parameterized queries prevent injection
        """
        injection_attempts = [
            "' OR '1'='1",
            "'; DROP TABLE users; --",
            "1 UNION SELECT * FROM passwords",
            "' OR 1=1--",
            "1; DELETE FROM orders WHERE '1'='1"
        ]
        
        cursor = db_connection.cursor()
        
        for payload in injection_attempts:
            try:
                # Attempt injection through application's query method
                # This simulates what happens if app doesn't use parameterized queries
                result = vulnerable_endpoint.search(payload)
                
                # If we get results or no error, injection might have worked
                if result and len(result) > 0:
                    # Check if result contains unexpected data (union attack success)
                    if 'password' in str(result).lower() or 'admin' in str(result).lower():
                        assert False, f"Potential SQL Injection vulnerability with payload: {payload}"
                        
            except Exception as e:
                # Expected: Should raise error or return empty results safely
                pass  # Good - query failed safely or returned no data
    
    def test_sensitive_data_encryption(self, db_connection, table_name, sensitive_fields):
        """
        Verify sensitive data is encrypted at rest
        """
        cursor = db_connection.cursor()
        
        for field in sensitive_fields:
            cursor.execute(f"""
                SELECT {field} 
                FROM {table_name} 
                WHERE {field} IS NOT NULL 
                LIMIT 10
            """)
            
            for (value,) in cursor.fetchall():
                # Check if value appears encrypted (not plaintext)
                # This is a heuristic - adjust based on encryption method
                is_encrypted = (
                    isinstance(value, bytes) or  # Binary encryption
                    value.startswith('enc:') or   # Prefix marker
                    len(value) > 100 and not value.isalnum()  # Long random string
                )
                
                # If it looks like plaintext PII, flag it
                if '@' in str(value) or str(value).isdigit():  # Email or SSN pattern
                    assert False, \
                        f"Sensitive field {field} appears unencrypted: {value[:10]}..."
    
    def test_audit_logging(self, db_connection, audit_table):
        """
        Verify sensitive operations are logged
        """
        cursor = db_connection.cursor()
        
        # Perform sensitive operation
        test_user = f"audit_test_{time.time()}"
        cursor.execute("""
            INSERT INTO users (username, email) VALUES (%s, %s)
        """, (test_user, "audit@test.com"))
        db_connection.commit()
        
        # Check audit log
        time.sleep(0.5)  # Allow async logging
        
        cursor.execute(f"""
            SELECT * FROM {audit_table}
            WHERE table_name = 'users'
            AND action = 'INSERT'
            AND new_values LIKE %s
            ORDER BY timestamp DESC
            LIMIT 1
        """, (f"%{test_user}%",))
        
        audit_record = cursor.fetchone()
        assert audit_record is not None, "Audit log entry not created for sensitive operation"
        
        # Verify audit contains required fields
        assert audit_record['user'] is not None, "Audit missing user who performed action"
        assert audit_record['timestamp'] is not None, "Audit missing timestamp"
```

---

## **31.5 Backup and Recovery Testing**

Data durability requires that backups are valid and recovery procedures work within RTO/RPO constraints.

### **31.5.1 Backup Validation**

```python
class BackupTesting:
    """
    Testing database backup and recovery procedures
    """
    
    def test_backup_integrity(self, backup_file_path):
        """
        Verify backup file is valid and restorable
        """
        import subprocess
        
        # Check backup file exists and has content
        assert os.path.exists(backup_file_path), "Backup file not found"
        assert os.path.getsize(backup_file_path) > 0, "Backup file is empty"
        
        # Verify backup format (e.g., SQL dump, binary)
        if backup_file_path.endswith('.sql'):
            # Check SQL syntax validity (basic)
            with open(backup_file_path, 'r') as f:
                header = f.read(1000)
                assert 'CREATE TABLE' in header or 'INSERT INTO' in header or 'COPY' in header
            
            # Try restore to temporary database
            result = subprocess.run([
                'mysql', '--execute', f'SOURCE {backup_file_path}',
                '--database', 'test_restore_db'
            ], capture_output=True, text=True)
            
            assert result.returncode == 0, f"Backup restore failed: {result.stderr}"
    
    def test_point_in_time_recovery(self, db_connection, backup_time, test_scenario):
        """
        Verify database can be restored to specific point in time
        """
        # This requires binary logging (MySQL) or WAL archiving (PostgreSQL)
        
        # 1. Note current state
        cursor = db_connection.cursor()
        cursor.execute("SELECT COUNT(*) FROM transactions")
        count_before = cursor.fetchone()[0]
        
        # 2. Simulate data loss scenario
        cursor.execute("DELETE FROM transactions WHERE created_at > %s", (backup_time,))
        db_connection.commit()
        
        # 3. Restore from backup to temporary instance
        # (Implementation depends on infrastructure)
        
        # 4. Apply binary logs up to specific time
        
        # 5. Verify data consistency
        cursor.execute("SELECT COUNT(*) FROM transactions")
        count_after = cursor.fetchone()[0]
        
        assert count_after == count_before, "Point-in-time recovery did not restore expected data"
    
    def test_replication_lag(self, primary_db, replica_db, max_lag_seconds=5):
        """
        Verify read replicas are within acceptable lag
        """
        # Write to primary
        primary_cursor = primary_db.cursor()
        test_marker = f"lag_test_{time.time()}"
        
        primary_cursor.execute("""
            INSERT INTO replication_test (marker, created_at) 
            VALUES (%s, NOW())
        """, (test_marker,))
        primary_db.commit()
        write_time = time.time()
        
        # Poll replica until found or timeout
        replica_cursor = replica_db.cursor()
        found = False
        timeout = time.time() + max_lag_seconds
        
        while time.time() < timeout:
            replica_cursor.execute("""
                SELECT 1 FROM replication_test WHERE marker = %s
            """, (test_marker,))
            if replica_cursor.fetchone():
                found = True
                lag = time.time() - write_time
                break
            time.sleep(0.1)
        
        assert found, f"Replication lag exceeded {max_lag_seconds} seconds"
        print(f"Replication lag: {lag:.2f} seconds")

    def test_failover_procedure(self, primary_host, replica_host, app_connection):
        """
        Test automatic failover to replica when primary fails
        """
        # Simulate primary failure (stop service or block network)
        # This is typically done in staging environment
        
        # Verify app switches to replica
        # Check for error rates during transition
        
        # Verify no split-brain (old primary must not accept writes if it comes back)
        pass  # Implementation depends on specific HA setup (PgPool, Patroni, etc.)
```

---

## **31.6 Concurrency and Lock Testing**

Testing how the database handles simultaneous operations is critical for multi-user applications.

### **31.6.1 Deadlock Detection**

```python
import threading
import queue

class ConcurrencyTesting:
    def test_deadlock_scenario(self, db_pool):
        """
        Test that concurrent updates don't cause deadlocks
        or that deadlocks are handled gracefully
        """
        errors = queue.Queue()
        results = queue.Queue()
        
        def transaction_a():
            """T1: Update account 1, then account 2"""
            try:
                conn = db_pool.get_connection()
                cursor = conn.cursor()
                
                cursor.execute("START TRANSACTION")
                cursor.execute("UPDATE accounts SET balance = balance - 100 WHERE id = 1")
                time.sleep(0.1)  # Simulate processing
                cursor.execute("UPDATE accounts SET balance = balance + 100 WHERE id = 2")
                conn.commit()
                results.put(('A', 'success'))
            except Exception as e:
                if 'deadlock' in str(e).lower():
                    results.put(('A', 'deadlock'))
                else:
                    errors.put(('A', str(e)))
            finally:
                conn.close()
        
        def transaction_b():
            """T2: Update account 2, then account 1 (opposite order - classic deadlock)"""
            try:
                conn = db_pool.get_connection()
                cursor = conn.cursor()
                
                cursor.execute("START TRANSACTION")
                cursor.execute("UPDATE accounts SET balance = balance - 50 WHERE id = 2")
                time.sleep(0.1)
                cursor.execute("UPDATE accounts SET balance = balance + 50 WHERE id = 1")
                conn.commit()
                results.put(('B', 'success'))
            except Exception as e:
                if 'deadlock' in str(e).lower():
                    results.put(('B', 'deadlock'))
                else:
                    errors.put(('B', str(e)))
            finally:
                conn.close()
        
        # Run both transactions simultaneously
        t1 = threading.Thread(target=transaction_a)
        t2 = threading.Thread(target=transaction_b)
        
        t1.start()
        t2.start()
        t1.join()
        t2.join()
        
        # Collect results
        outcomes = []
        while not results.empty():
            outcomes.append(results.get())
        
        # At least one should succeed, deadlocks should be handled (retried or errored gracefully)
        success_count = sum(1 for _, status in outcomes if status == 'success')
        deadlock_count = sum(1 for _, status in outcomes if status == 'deadlock')
        
        assert success_count >= 1, "Both transactions failed"
        
        # If deadlock occurred, verify data consistency (no partial updates)
        if deadlock_count > 0:
            conn = db_pool.get_connection()
            cursor = conn.cursor()
            cursor.execute("SELECT balance FROM accounts WHERE id IN (1, 2)")
            balances = cursor.fetchall()
            # Verify balances are consistent (no lost updates)
```

### **31.6.2 Isolation Level Testing**

```python
def test_read_committed_isolation(self, db_connection):
    """
    Verify READ COMMITTED prevents dirty reads but allows non-repeatable reads
    """
    cursor = db_connection.cursor()
    
    # Setup
    cursor.execute("UPDATE accounts SET balance = 1000 WHERE id = 1")
    db_connection.commit()
    
    results = {}
    
    def reader():
        """Read balance twice with delay"""
        conn = get_new_connection()
        cur = conn.cursor()
        
        # First read
        cur.execute("SELECT balance FROM accounts WHERE id = 1")
        first_read = cur.fetchone()[0]
        
        time.sleep(0.5)  # Allow writer to commit
        
        # Second read (may see different value in READ COMMITTED)
        cur.execute("SELECT balance FROM accounts WHERE id = 1")
        second_read = cur.fetchone()[0]
        
        results['reader'] = (first_read, second_read)
    
    def writer():
        """Update and commit while reader is paused"""
        time.sleep(0.2)  # Let reader start first
        
        conn = get_new_connection()
        cur = conn.cursor()
        cur.execute("UPDATE accounts SET balance = 2000 WHERE id = 1")
        conn.commit()
        results['writer'] = 'committed'
    
    t1 = threading.Thread(target=reader)
    t2 = threading.Thread(target=writer)
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    
    first, second = results['reader']
    # In READ COMMITTED: first might be 1000, second should be 2000 (no dirty read, but non-repeatable)
    assert first == 1000, "Dirty read occurred"
    assert second == 2000, "Should see committed data on second read"

def test_serializable_isolation(self, db_connection):
    """
    Verify SERIALIZABLE prevents phantom reads
    """
    # This is the strictest isolation level
    # Test that range queries are protected from inserts by other transactions
    pass  # Implementation similar to above with range queries
```

---

## **31.7 Database Migration Testing**

Schema changes are high-risk operations requiring thorough validation.

### **31.7.1 Schema Change Validation**

```python
class MigrationTesting:
    def test_migration_idempotency(self, migration_script, db_connection):
        """
        Verify migration can run multiple times without error (idempotent)
        """
        # First run
        result1 = self.run_migration(migration_script, db_connection)
        assert result1['success'], f"First migration failed: {result1['error']}"
        
        # Second run (should succeed or gracefully skip)
        result2 = self.run_migration(migration_script, db_connection)
        
        # Should either succeed with no changes or report already applied
        assert result2['success'] or 'already exists' in str(result2.get('error', '')).lower()
    
    def test_rollback_procedure(self, migration, db_connection):
        """
        Verify downgrade/rollback script works correctly
        """
        # Apply migration
        migration.upgrade(db_connection)
        
        # Add test data in new schema
        cursor = db_connection.cursor()
        cursor.execute("INSERT INTO new_table (col) VALUES ('test')")
        db_connection.commit()
        
        # Rollback
        migration.downgrade(db_connection)
        
        # Verify old schema restored
        cursor.execute("""
            SELECT COUNT(*) FROM information_schema.tables 
            WHERE table_name = 'new_table'
        """)
        count = cursor.fetchone()[0]
        assert count == 0, "Rollback did not remove new table"
        
        # Verify application still works with old schema
        # (Integration test with app)
    
    def test_data_migration_accuracy(self, db_connection, migration_config):
        """
        Verify data transformations during migration are accurate
        """
        cursor = db_connection.cursor()
        
        # Setup pre-migration state
        cursor.execute("""
            INSERT INTO old_orders (total_cents, currency) 
            VALUES (10000, 'USD'), (5000, 'EUR')
        """)
        db_connection.commit()
        
        # Run migration that splits amount into dollars/cents
        self.run_migration(migration_config['script'], db_connection)
        
        # Verify data transformed correctly
        cursor.execute("SELECT total_dollars, total_cents, currency FROM new_orders")
        rows = cursor.fetchall()
        
        assert len(rows) == 2
        for dollars, cents, currency in rows:
            assert dollars >= 0, "Negative dollars after migration"
            assert 0 <= cents < 100, "Cents out of range"
    
    def test_zero_downtime_migration(self, db_connection, blue_green_config):
        """
        Test expand-contract pattern for zero-downtime deployments
        """
        # Phase 1: Expand (add new column alongside old)
        # Verify reads work from both old and new
        # Verify writes update both
        
        # Phase 2: Migrate data in background
        # Verify data consistency between old and new columns
        
        # Phase 3: Contract (remove old column)
        # Verify application uses new column exclusively
        pass
```

---

## **31.8 NoSQL Database Testing**

NoSQL databases require different testing approaches due to schema flexibility and consistency models.

### **31.8.1 Document Database Testing (MongoDB)**

```python
class MongoDBTesting:
    """
    Testing for document databases
    """
    
    def test_document_schema_validation(self, db, collection_name, schema):
        """
        Verify documents match expected schema (even if DB is schemaless)
        """
        from jsonschema import validate, ValidationError
        
        collection = db[collection_name]
        errors = []
        
        for doc in collection.find().limit(1000):  # Sample first 1000
            try:
                validate(instance=doc, schema=schema)
            except ValidationError as e:
                errors.append({
                    'id': doc.get('_id'),
                    'error': str(e),
                    'field': e.path
                })
        
        assert len(errors) == 0, f"Schema violations found: {errors[:5]}"
    
    def test_replica_set_consistency(self, primary_db, secondary_db):
        """
        Verify eventual consistency doesn't exceed acceptable delay
        """
        # Write to primary
        test_doc = {"test_id": f"consistency_{time.time()}", "value": "test"}
        primary_db.test_collection.insert_one(test_doc)
        write_time = time.time()
        
        # Read from secondary until found or timeout
        timeout = time.time() + 5  # 5 second max lag
        
        while time.time() < timeout:
            doc = secondary_db.test_collection.find_one({"test_id": test_doc['test_id']})
            if doc:
                lag = time.time() - write_time
                assert doc['value'] == test_doc['value']
                print(f"Eventual consistency lag: {lag:.3f}s")
                return
        
        assert False, "Secondary did not replicate within 5 seconds"
    
    def test_sharding_distribution(self, db, collection_name):
        """
        Verify data is evenly distributed across shards
        """
        # MongoDB specific: Check chunk distribution
        stats = db.command("collStats", collection_name)
        
        if 'shards' in stats:
            shard_sizes = {shard: info['size'] for shard, info in stats['shards'].items()}
            
            # Calculate coefficient of variation
            import statistics
            sizes = list(shard_sizes.values())
            mean_size = statistics.mean(sizes)
            stdev = statistics.stdev(sizes)
            cv = stdev / mean_size if mean_size > 0 else 0
            
            # CV should be low for even distribution
            assert cv < 0.3, f"Uneven shard distribution: {shard_sizes}"
    
    def test_atomic_operations(self, db):
        """
        Test findAndModify atomicity
        """
        collection = db.counter_test
        
        # Initialize counter
        collection.insert_one({"_id": "counter", "value": 0})
        
        # Concurrent increments
        import concurrent.futures
        
        def increment():
            for _ in range(100):
                collection.find_one_and_update(
                    {"_id": "counter"},
                    {"$inc": {"value": 1}}
                )
        
        with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
            futures = [executor.submit(increment) for _ in range(10)]
            concurrent.futures.wait(futures)
        
        # Verify final count (should be exactly 1000)
        final = collection.find_one({"_id": "counter"})
        assert final['value'] == 1000, f"Atomicity failure: expected 1000, got {final['value']}"
```

### **31.8.2 Key-Value Store Testing (Redis)**

```python
class RedisTesting:
    def test_data_expiration(self, redis_client):
        """
        Verify TTL (Time To Live) is respected
        """
        import time
        
        # Set key with 2 second expiration
        redis_client.setex("test_key", 2, "test_value")
        
        # Should exist immediately
        assert redis_client.get("test_key") == b"test_value"
        
        # Should exist after 1 second
        time.sleep(1)
        assert redis_client.get("test_key") == b"test_value"
        
        # Should expire after 3 seconds
        time.sleep(2)
        assert redis_client.get("test_key") is None
    
    def test_persistence(self, redis_client):
        """
        Test RDB/AOF persistence
        """
        # Write data
        redis_client.set("persistent_key", "persistent_value")
        
        # Trigger BGSAVE (or wait for auto)
        redis_client.bgsave()
        time.sleep(1)
        
        # Simulate restart by reconnecting
        new_client = get_redis_connection()
        
        # Verify data survived
        value = new_client.get("persistent_key")
        assert value == b"persistent_value", "Persistence failed"
    
    def test_concurrent_access(self, redis_client):
        """
        Test race conditions with INCR (atomic operation)
        """
        import threading
        
        redis_client.set("counter", 0)
        
        def increment():
            for _ in range(100):
                redis_client.incr("counter")
        
        threads = [threading.Thread(target=increment) for _ in range(10)]
        for t in threads:
            t.start()
        for t in threads:
            t.join()
        
        final = int(redis_client.get("counter"))
        assert final == 1000, f"Race condition: expected 1000, got {final}"
```

---

## **Chapter Summary**

### **Key Takeaways from Chapter 31:**

**Schema Testing:**
- **Automated validation:** Compare actual schema against specifications using information_schema
- **Constraint verification:** Ensure PK uniqueness, FK referential integrity, CHECK constraints, and UNIQUE constraints are enforced
- **Index validation:** Verify performance-critical indexes exist on foreign keys and frequently queried columns

**Data Integrity Testing:**
- **Entity integrity:** No duplicate business keys, mandatory fields populated, format consistency (regex validation)
- **Referential integrity:** No orphaned records, cascading deletes work correctly, cross-system consistency maintained
- **Domain integrity:** Values within valid ranges, valid enumeration values, proper data types

**Security Testing:**
- **Privilege validation:** Principle of least privilege enforced, no excessive grants
- **SQL injection prevention:** Parameterized queries required, input sanitization verified
- **Encryption at rest:** Sensitive fields (PII, credentials) stored encrypted, not plaintext
- **Audit logging:** Sensitive operations (DDL, DCL, DML on critical tables) logged with user, timestamp, and before/after values

**Backup and Recovery:**
- **Backup integrity:** Files valid, restorable, corruption-free (checksum validation)
- **RPO/RTO validation:** Point-in-time recovery tested, replication lag within SLA (< 5 seconds typical)
- **Failover testing:** Automatic failover to replicas works, no split-brain scenarios

**Concurrency Testing:**
- **Deadlock detection:** Concurrent transactions don't cause indefinite blocking; deadlocks resolved via timeout or retry
- **Isolation levels:** READ COMMITTED prevents dirty reads; SERIALIZABLE prevents phantom reads (verify based on requirements)
- **Lock contention:** Long-running transactions don't block critical operations

**Migration Testing:**
- **Idempotency:** Migrations can be re-run safely without errors
- **Rollback capability:** Downgrade scripts tested and restore previous schema without data loss
- **Zero-downtime:** Expand-contract pattern verified for high-availability systems
- **Data transformation:** Migration scripts correctly transform data types and relationships

**NoSQL Testing:**
- **Schema validation:** Even schemaless DBs need document structure validation (JSON Schema)
- **Consistency models:** Eventual consistency lag measured and within acceptable bounds; strong consistency verified where required
- **Sharding:** Data evenly distributed; no hot spots
- **Atomic operations:** findAndModify, transactions (multi-document) tested for race conditions
- **TTL/Persistence:** Expiration works correctly; persistence survives restarts

---

## **📖 Next Chapter: Chapter 32 - Test Data Management**

With database testing techniques mastered, **Chapter 32** will focus on **Test Data Management strategies** to ensure you have the right data for comprehensive testing without compromising security or privacy.

In **Chapter 32**, you'll learn:

- **Test Data Strategy:** Deterministic vs. random data, production cloning vs. synthetic generation, subsetting strategies
- **Data Masking and Anonymization:** Techniques for GDPR/CCPA compliance (k-anonymity, l-diversity), tokenization, and format-preserving encryption
- **Synthetic Data Generation:** Using tools like Faker, Tonic, or Delphix to create realistic but fake datasets
- **Test Data Subsetting:** Extracting representative samples from production while maintaining referential integrity
- **Data Refresh Strategies:** Automated refresh pipelines, golden datasets, and self-service test data portals
- **PII Handling:** Detecting and protecting personally identifiable information across test environments
- **Test Data Pools:** Managing shared test data across teams, reservation systems, and data versioning

**Chapter 32** completes your database testing expertise by ensuring you can efficiently create, manage, and maintain test data that enables thorough validation while protecting sensitive information.

**Continue to Chapter 32 to master test data management and ensure your testing environments are both effective and compliant!**

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='30. sql_for_testers.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='32. database_testing_tools.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
