# Silver A - Delta Live Tables

This notebook implements the Silver A stage using Delta Live Tables (DLT).
Silver A performs schema selection based on configurable business rules from Bronze tables.

## Key Features:
- Ingests Bronze delta tables
- Applies schema selection rules from business_rules.schema_definitions
- Creates materialized Silver A tables for each source/lob/domain combination
- Uses DLT decorators for automatic table creation and dependency management

## Setup and Configuration

In [None]:
# Import required libraries
import dlt
from pyspark.sql import DataFrame, SparkSession
from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, StringType
from typing import Dict, List, Any
import logging

# Import pipeline modules
from utils.config_loader import config_loader
import silver_pipeline_stages as stages

# Initialize logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Load configuration
config = config_loader.load_config()
source_combinations = config_loader.get_source_combinations()

# Get active Spark session
spark = SparkSession.getActiveSession()

print(f"Silver A DLT Pipeline - Configuration loaded")
print(f"Processing {len(source_combinations)} source combinations")
print(f"Bronze schema: {config.bronze_schema}")
print(f"Silver schema: {config.silver_schema}")
print(f"Schema rules table: {config.schema_rules_table}")

## Pipeline Parameters

Set pipeline parameters that can be configured when running the DLT pipeline.

In [None]:
# Pipeline parameters - these can be overridden when running the DLT pipeline
# Default year_month for Bronze table lookup
year_month = spark.conf.get("pipeline.year_month", "202407")

print(f"Using year_month parameter: {year_month}")
print(f"To override, set pipeline.year_month configuration when running DLT pipeline")

## Helper Functions for Silver A Processing

In [None]:
def create_silver_a_table(source: str, lob: str, domain: str, year_month: str) -> DataFrame:
    """
    Create Silver A table for a specific source/lob/domain combination
    
    Args:
        source: Source identifier
        lob: Line of business
        domain: Domain (pharmacy, medical, member)
        year_month: Year-month string (YYYYMM)
        
    Returns:
        Processed Silver A DataFrame
    """
    logger.info(f"Creating Silver A table for {source}/{lob}/{domain}")
    
    # Construct Bronze table name
    bronze_table = config_loader.get_bronze_table_name(source, lob, domain, year_month)
    logger.info(f"Reading from Bronze table: {bronze_table}")
    
    # Check if Bronze table exists
    if not stages.validate_bronze_table_exists(bronze_table):
        logger.warning(f"Bronze table {bronze_table} does not exist, returning empty DataFrame")
        # Return empty DataFrame with placeholder schema
        empty_schema = StructType([StructField("placeholder", StringType(), True)])
        return spark.createDataFrame([], empty_schema)
    
    # Read Bronze table
    try:
        bronze_df = spark.table(bronze_table)
        logger.info(f"Successfully read Bronze table with {bronze_df.count()} rows")
    except Exception as e:
        logger.error(f"Failed to read Bronze table {bronze_table}: {e}")
        empty_schema = StructType([StructField("placeholder", StringType(), True)])
        return spark.createDataFrame([], empty_schema)
    
    # Get schema rules from business rules table
    schema_rules = stages.get_schema_rules(config, source, lob, domain)
    logger.info(f"Found {len(schema_rules)} schema rules for {source}/{lob}/{domain}")
    
    if schema_rules:
        for rule in schema_rules:
            logger.debug(f"Schema rule: {rule['field_name']} ({rule['data_type']}) [Required: {rule['is_required']}]")
    
    # Apply schema selection
    try:
        silver_a_df = stages.apply_schema_selection(bronze_df, schema_rules)
        logger.info(f"Silver A processing complete - {silver_a_df.count()} rows, columns: {silver_a_df.columns}")
        return silver_a_df
    except Exception as e:
        logger.error(f"Failed to apply schema selection for {source}/{lob}/{domain}: {e}")
        # Return original DataFrame if schema selection fails
        return bronze_df

## Data Quality Expectations for Silver A

In [None]:
def apply_silver_a_quality_checks(df: DataFrame, source: str, lob: str, domain: str) -> DataFrame:
    """
    Apply data quality checks specific to Silver A stage
    
    Args:
        df: Silver A DataFrame
        source: Source identifier
        lob: Line of business
        domain: Domain
        
    Returns:
        DataFrame with quality checks applied
    """
    if not config.validation_enabled:
        logger.info("Data quality validation is disabled")
        return df
    
    logger.info(f"Applying Silver A quality checks for {source}/{lob}/{domain}")
    
    # Apply required field validation
    validated_df = stages.validate_required_fields(df, config, source, lob, domain)
    
    logger.info(f"Quality checks complete - {validated_df.count()} rows passed validation")
    return validated_df

## DLT Table Definitions

Dynamic generation of DLT tables for each source/lob/domain combination

In [None]:
# Generate DLT table definitions for each source/lob/domain combination
for source, lob, domain in source_combinations:
    
    # Create a closure to capture the current values of source, lob, domain
    def make_silver_a_table(src, lb, dom):
        
        @dlt.table(
            name=f"{src}_{lb}_{dom}_silver_a",
            comment=f"Silver A stage for {src}/{lb}/{dom} - Schema selection based on business rules",
            table_properties={
                "quality": "silver",
                "layer": "silver_a",
                "source": src,
                "lob": lb,
                "domain": dom
            }
        )
        @dlt.expect_all_or_drop("valid_silver_a_data")
        def silver_a_table():
            """
            Create Silver A table with schema selection from Bronze
            """
            # Create Silver A DataFrame
            silver_a_df = create_silver_a_table(src, lb, dom, year_month)
            
            # Apply quality checks
            validated_df = apply_silver_a_quality_checks(silver_a_df, src, lb, dom)
            
            return validated_df
        
        return silver_a_table
    
    # Create the table function and add it to the global namespace
    table_func = make_silver_a_table(source, lob, domain)
    globals()[f"{source}_{lob}_{domain}_silver_a"] = table_func
    
    print(f"Created DLT table definition: {source}_{lob}_{domain}_silver_a")

print(f"\nTotal Silver A tables defined: {len(source_combinations)}")

## Pipeline Monitoring and Logging

In [None]:
# Log pipeline configuration for monitoring
logger.info("="*60)
logger.info("SILVER A DLT PIPELINE CONFIGURATION")
logger.info("="*60)
logger.info(f"Pipeline stage: Silver A (Schema Selection)")
logger.info(f"Year-month parameter: {year_month}")
logger.info(f"Bronze schema: {config.bronze_schema}")
logger.info(f"Silver schema: {config.silver_schema}")
logger.info(f"Schema rules table: {config.schema_rules_table}")
logger.info(f"Validation enabled: {config.validation_enabled}")
logger.info(f"Source combinations: {len(source_combinations)}")

for i, (source, lob, domain) in enumerate(source_combinations, 1):
    logger.info(f"  {i}. {source}/{lob}/{domain} -> {source}_{lob}_{domain}_silver_a")

logger.info("="*60)
logger.info("SILVER A DLT PIPELINE READY")
logger.info("="*60)

## Usage Instructions

To use this notebook in Databricks DLT:

1. **Create a new DLT Pipeline** in the Databricks workspace
2. **Set the source** to this notebook (`silver_a_dlt.ipynb`)
3. **Configure pipeline settings**:
   - Target schema: `silver` (or your configured silver schema)
   - Pipeline mode: `Triggered` for batch processing
   - Cluster configuration: Based on your data volume
4. **Set pipeline parameters** (optional):
   - `pipeline.year_month`: Override the default year-month for Bronze table lookup
5. **Ensure dependencies**:
   - Business rules tables must exist (`business_rules.schema_definitions`)
   - Bronze tables must exist with the expected naming pattern
   - Utils modules must be available in the workspace

**Output Tables**: This pipeline will create Silver A tables with names like:
- `{silver_schema}.{source}_{lob}_{domain}_silver_a`

**Dependencies**: This pipeline depends on:
- Bronze tables: `{bronze_schema}.{source}_{lob}_{domain}_{year_month}`
- Business rules: `business_rules.schema_definitions`