# Silver Layer: ERP Customers Transformation
This notebook processes raw data from the `erp_cust_az12` bronze table.
- **Engine**: Uses a centralized `silver_engine` for universal trimming and metadata.
- **Cleaning**: Normalizes Gender (M/F -> Male/Female), cleans Customer IDs (removes "NAS" prefix), and validates birth dates.

In [0]:
%run ../../helpers/silver_engine.ipynb

In [0]:
%python
import pyspark.sql.functions as F

def logic(df):
    return (
        df
        # 1. ID Cleaning: Strip "NAS" prefix and convert to Upper Case for cross-system matching
        .withColumn("customer_number", 
            F.upper(
                F.when(F.col("cid").startswith("NAS"), F.substring(F.col("cid"), 4, 20))
                 .otherwise(F.col("cid"))
            )
        )
        # 2. Gender Normalization: Consolidating various input formats into standardized values
        .withColumn("gender", 
            F.when(F.upper(F.col("gen")).isin("F", "FEMALE"), "Female")
             .when(F.upper(F.col("gen")).isin("M", "MALE"), "Male")
             .otherwise("n/a")
        )
        # 3. Birth Date Validation: Replacing future dates with NULL to ensure data integrity
        .withColumn("birth_date", 
            F.when(F.col("bdate") > F.current_date(), F.lit(None))
             .otherwise(F.col("bdate"))
        )
        # 4. Final Column Selection
        .select("customer_number", "birth_date", "gender")
    )

# Executing the automated silver pipeline
run_silver_pipeline("erp_cust_az12", "erp_customers", logic)

In [0]:
%sql
-- Previewing the standardized ERP customer data
SELECT * FROM workspace.silver.erp_customers;