# Silver Customers Transformation

## Overview
This module implements Silver-layer transformation for customer master data.
The Silver layer focuses on data cleansing, standardization, and type enforcement, converting raw Bronze records into analytics-ready, trusted datasets.

Unlike Bronze ingestion, Silver processing runs in batch mode and applies business rules while preserving lineage from the Bronze layer.

## Source Table

- Catalog: migration_project_db_ws
- Schema: bronze
- Table: customers
```sql
migration_project_db_ws.bronze.customers
```

## Target Table
- Catalog: `migration_project_db_ws`
- Schema: `silver`
- Table: `customers`
- Format: Delta Lake
- Processing Type: Batch Transformation

```sql
migration_project_db_ws.silver.customers
```

## Transformation Objectives
The Silver Customers layer ensures:
- Proper data types (dates, identifiers)
- Standardized text fields
- Removal of invalid or incomplete records
- Business-ready customer dimension structure

## Transformation Logic

### Step 1 -- Read from Bronze Layer
```python
df_bronze_customers = spark.read.table(
    "migration_project_db_ws.bronze.customers"
)
```
---

### Step 2 -- Standardization
```python
# Standardize Column Names

customers_df = (
    bronze_df
    .withColumnRenamed("CUSTOMER_ID", "customer_id")
    .withColumnRenamed("FIRST_NAME", "first_name")
    .withColumnRenamed("LAST_NAME", "last_name")
    .withColumnRenamed("EMAIL", "email")
    .withColumnRenamed("COUNTRY", "country")
    .withColumnRenamed("CREATED_DATE", "created_date")
)
```
---

### Step 3 -- Data Cleansing
```python
# Data cleansing

window_spec = Window.partitionBy("customer_id").orderBy(col("created_date").desc())

silver_customers_df = (
    customers_df
    .withColumn("customer_id", col("customer_id").cast("int"))
    .withColumn("created_date", to_date(col("created_date")))
    .withColumn("email", lower(col("email")))
    .filter(col("customer_id").isNotNull())
    .filter(col("email").isNotNull())
    .withColumn("row_num", row_number().over(window_spec))
    .filter(col("row_num")==1)
    .drop("row_num")
)
```
Applied rules:
- Remove records without `customer_id`
- Normalize email casing
- Convert date fields into proper `DATE` type

### Step 4 -- Write to Silver Table
```python
silver_customers_df.write.mode("overwrite").saveAsTable("migration_project_db_ws.silver.customers")
```

#### Why overwrite?
- Silver represents the latest clean state
- Allows deterministic recomputation
- Simplifies downstream dependencies

## Validation
```sql
SELECT COUNT(*) FROM migration_project_db_ws.silver.customers;
```

Addition checks:
- No null `customer_id`
- Valid `created_date` values
- Cleaned email formats

## Summary
- Silver layer cleans and standardizes Bronze customer data
- Business rules are applied consistently
- Output is analytics-ready and reusable
- Fully aligned with medallion architecture best practices