# Silver B - Delta Live Tables

This notebook implements the Silver B stage using Delta Live Tables (DLT).
Silver B performs business transformations on Silver A tables based on configurable business rules.

## Key Features:
- Ingests Silver A delta tables
- Applies transformation rules from business_rules.transform_definitions
- Creates materialized Silver B tables for each source/lob/domain combination
- Uses DLT decorators for automatic table creation and dependency management
- Supports priority-based transformation ordering

## Setup and Configuration

In [None]:
# Import required libraries
import dlt
from pyspark.sql import DataFrame, SparkSession
from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, StringType
from typing import Dict, List, Any
import logging

# Import pipeline modules
from utils.config_loader import config_loader
import silver_pipeline_stages as stages

# Initialize logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Load configuration
config = config_loader.load_config()
source_combinations = config_loader.get_source_combinations()

# Get active Spark session
spark = SparkSession.getActiveSession()

print(f"Silver B DLT Pipeline - Configuration loaded")
print(f"Processing {len(source_combinations)} source combinations")
print(f"Silver schema: {config.silver_schema}")
print(f"Transform rules table: {config.transform_rules_table}")

## Helper Functions for Silver B Processing

In [None]:
def create_silver_b_table(source: str, lob: str, domain: str) -> DataFrame:
    """
    Create Silver B table for a specific source/lob/domain combination
    
    Args:
        source: Source identifier
        lob: Line of business
        domain: Domain (pharmacy, medical, member)
        
    Returns:
        Processed Silver B DataFrame
    """
    logger.info(f"Creating Silver B table for {source}/{lob}/{domain}")
    
    # Read Silver A table using DLT
    silver_a_table_name = f"{source}_{lob}_{domain}_silver_a"
    
    try:
        # Use DLT read to get the Silver A table
        silver_a_df = dlt.read(silver_a_table_name)
        logger.info(f"Successfully read Silver A table: {silver_a_table_name}")
        
        # Get row count for logging
        row_count = silver_a_df.count()
        logger.info(f"Silver A table contains {row_count} rows")
        
    except Exception as e:
        logger.error(f"Failed to read Silver A table {silver_a_table_name}: {e}")
        # Return empty DataFrame if Silver A doesn't exist
        empty_schema = StructType([StructField("placeholder", StringType(), True)])
        return spark.createDataFrame([], empty_schema)
    
    # Get transformation rules from business rules table
    transform_rules = stages.get_transform_rules(config, source, lob, domain)
    logger.info(f"Found {len(transform_rules)} transformation rules for {source}/{lob}/{domain}")
    
    if transform_rules:
        for rule in transform_rules:
            logger.debug(f"Transform rule: {rule['field_name']} -> {rule['transform']} (Priority: {rule['priority']})")
    
    # Apply transformations
    try:
        silver_b_df = stages.apply_transformations(silver_a_df, transform_rules)
        logger.info(f"Silver B processing complete - {silver_b_df.count()} rows, columns: {silver_b_df.columns}")
        return silver_b_df
    except Exception as e:
        logger.error(f"Failed to apply transformations for {source}/{lob}/{domain}: {e}")
        # Return original DataFrame if transformations fail
        return silver_a_df

## Data Quality Expectations for Silver B

In [None]:
def apply_silver_b_quality_checks(df: DataFrame, source: str, lob: str, domain: str) -> DataFrame:
    """
    Apply data quality checks specific to Silver B stage
    
    Args:
        df: Silver B DataFrame
        source: Source identifier
        lob: Line of business
        domain: Domain
        
    Returns:
        DataFrame with quality checks applied
    """
    if not config.validation_enabled:
        logger.info("Data quality validation is disabled")
        return df
    
    logger.info(f"Applying Silver B quality checks for {source}/{lob}/{domain}")
    
    # Apply required field validation
    validated_df = stages.validate_required_fields(df, config, source, lob, domain)
    
    # Additional Silver B specific validations can be added here
    # For example: data type validations after transformations
    
    logger.info(f"Quality checks complete - {validated_df.count()} rows passed validation")
    return validated_df


def validate_transformation_success(df: DataFrame, source: str, lob: str, domain: str) -> bool:
    """
    Additional validation to ensure transformations were applied successfully
    
    Args:
        df: Transformed DataFrame
        source: Source identifier
        lob: Line of business
        domain: Domain
        
    Returns:
        Boolean indicating if transformations were successful
    """
    try:
        # Check if DataFrame has data
        if df.count() == 0:
            logger.warning(f"No data in Silver B table for {source}/{lob}/{domain}")
            return False
        
        # Check if DataFrame has expected columns (based on schema rules)
        schema_rules = stages.get_schema_rules(config, source, lob, domain)
        expected_fields = [rule['field_name'] for rule in schema_rules]
        
        missing_fields = [field for field in expected_fields if field not in df.columns]
        if missing_fields:
            logger.warning(f"Missing expected fields in Silver B for {source}/{lob}/{domain}: {missing_fields}")
            return False
        
        logger.info(f"Transformation validation passed for {source}/{lob}/{domain}")
        return True
        
    except Exception as e:
        logger.error(f"Transformation validation failed for {source}/{lob}/{domain}: {e}")
        return False

## DLT Table Definitions

Dynamic generation of DLT tables for each source/lob/domain combination

In [None]:
# Generate DLT table definitions for each source/lob/domain combination
for source, lob, domain in source_combinations:
    
    # Create a closure to capture the current values of source, lob, domain
    def make_silver_b_table(src, lb, dom):
        
        @dlt.table(
            name=f"{src}_{lb}_{dom}_silver_b",
            comment=f"Silver B stage for {src}/{lb}/{dom} - Business transformations based on rules",
            table_properties={
                "quality": "silver",
                "layer": "silver_b",
                "source": src,
                "lob": lb,
                "domain": dom
            }
        )
        @dlt.expect_all_or_drop("valid_silver_b_data")
        def silver_b_table():
            """
            Create Silver B table with transformations from Silver A
            """
            # Create Silver B DataFrame
            silver_b_df = create_silver_b_table(src, lb, dom)
            
            # Validate transformation success
            if not validate_transformation_success(silver_b_df, src, lb, dom):
                logger.warning(f"Transformation validation failed for {src}/{lb}/{dom}")
            
            # Apply quality checks
            validated_df = apply_silver_b_quality_checks(silver_b_df, src, lb, dom)
            
            return validated_df
        
        return silver_b_table
    
    # Create the table function and add it to the global namespace
    table_func = make_silver_b_table(source, lob, domain)
    globals()[f"{source}_{lob}_{domain}_silver_b"] = table_func
    
    print(f"Created DLT table definition: {source}_{lob}_{domain}_silver_b")

print(f"\nTotal Silver B tables defined: {len(source_combinations)}")

## Advanced DLT Expectations for Data Quality

In [None]:
# Additional DLT expectations that can be applied to Silver B tables
# These are optional and can be customized based on business requirements

def create_advanced_expectations():
    """
    Create advanced DLT expectations for Silver B tables
    These can be applied to specific tables based on business requirements
    """
    
    # Example: Expect certain fields to not be null after transformation
    @dlt.expect("patient_id_not_null")
    def expect_patient_id_not_null(df):
        return F.col("patient_id").isNotNull()
    
    # Example: Expect transformed dates to be in valid format
    @dlt.expect("valid_date_format")
    def expect_valid_date_format(df):
        # This would check if date columns are in expected format after transformation
        return F.col("service_date").rlike(r'^\d{4}-\d{2}-\d{2}$')
    
    # Example: Expect numeric fields to be within reasonable ranges
    @dlt.expect("valid_amount_range")
    def expect_valid_amount_range(df):
        return (F.col("amount") >= 0) & (F.col("amount") <= 999999.99)
    
    logger.info("Advanced DLT expectations defined")

# Uncomment the line below to enable advanced expectations
# create_advanced_expectations()

print("Advanced expectations available but not enabled by default")
print("Uncomment create_advanced_expectations() call to enable")

## Pipeline Monitoring and Logging

In [None]:
# Log pipeline configuration for monitoring
logger.info("="*60)
logger.info("SILVER B DLT PIPELINE CONFIGURATION")
logger.info("="*60)
logger.info(f"Pipeline stage: Silver B (Business Transformations)")
logger.info(f"Silver schema: {config.silver_schema}")
logger.info(f"Transform rules table: {config.transform_rules_table}")
logger.info(f"Validation enabled: {config.validation_enabled}")
logger.info(f"Source combinations: {len(source_combinations)}")

for i, (source, lob, domain) in enumerate(source_combinations, 1):
    logger.info(f"  {i}. {source}/{lob}/{domain}:")
    logger.info(f"     Input:  {source}_{lob}_{domain}_silver_a")
    logger.info(f"     Output: {source}_{lob}_{domain}_silver_b")

logger.info("="*60)
logger.info("SILVER B DLT PIPELINE READY")
logger.info("="*60)

## Usage Instructions

To use this notebook in Databricks DLT:

1. **Create a new DLT Pipeline** in the Databricks workspace
2. **Set the source** to this notebook (`silver_b_dlt.ipynb`)
3. **Configure pipeline settings**:
   - Target schema: `silver` (or your configured silver schema)
   - Pipeline mode: `Triggered` for batch processing
   - Cluster configuration: Based on your data volume
4. **Ensure dependencies**:
   - Silver A tables must exist (either from Silver A DLT pipeline or other source)
   - Business rules tables must exist (`business_rules.transform_definitions`)
   - Utils modules must be available in the workspace
5. **Pipeline Dependencies**:
   - This pipeline should run AFTER the Silver A pipeline
   - Set up proper scheduling or triggering to ensure Silver A tables are available

**Input Tables**: This pipeline reads from:
- `{silver_schema}.{source}_{lob}_{domain}_silver_a`

**Output Tables**: This pipeline creates:
- `{silver_schema}.{source}_{lob}_{domain}_silver_b`

**Dependencies**: This pipeline depends on:
- Silver A tables: `{silver_schema}.{source}_{lob}_{domain}_silver_a`
- Business rules: `business_rules.transform_definitions`

**Transformation Logic**:
- Transformations are applied in priority order (ascending)
- Multiple transformations can be applied to the same field
- Failed transformations are logged but don't stop the pipeline
- Supports SQL expressions in transformation rules