# SILVER LAYER — PURPOSE 

Silver Layer cleans and standardizes raw clickstream data from Bronze by enforcing schemas, fixing data types, handling nulls, and filtering invalid records.
The result is analytics-ready, trusted data suitable for business logic and aggregations.

##Create Silver Schema

In [0]:
%sql
CREATE SCHEMA IF NOT EXISTS e_commerce_capstone.silver;


## Read from Bronze

In [0]:
from pyspark.sql.functions import col, to_timestamp, when



In [0]:
bronze_df = spark.table("e_commerce_capstone.bronze.raw_events")


### Explicit Data Type Casting

In [0]:
silver_df = (
    bronze_df
    .withColumn("event_time", to_timestamp("event_time"))
    .withColumn("price", col("price").cast("double"))
    .withColumn("user_id", col("user_id").cast("long"))
    .withColumn("product_id", col("product_id").cast("string"))
)


Silver enforces strict data types to avoid downstream analytical errors.

## Minimal Null Handling

Drop rows where event_time is null

Keep missing brand/category (real-world data!)

In [0]:
silver_df = silver_df.filter(col("event_time").isNotNull())

#Event time is mandatory for any time-based analysis.


## Filter Valid Event Types Only

In [0]:
valid_events = ["view", "cart", "remove_from_cart", "purchase"]

silver_df = silver_df.filter(col("event_type").isin(valid_events))


## Add Derived Columns

event_date → for partitioning

is_purchase → simple downstream metric

In [0]:
from pyspark.sql.functions import to_date, lit

silver_df = (
    silver_df
    .withColumn("event_date", to_date("event_time"))
    .withColumn("is_purchase", when(col("event_type") == "purchase", 1).otherwise(0))
)


# Write Silver Delta Table

In [0]:
(
    silver_df
    .write
    .format("delta")
    .mode("overwrite")
    .partitionBy("event_date")
    .saveAsTable("e_commerce_capstone.silver.events_clean")
)


## Optimize (Partition + Z-Order )

In [0]:
%sql 
OPTIMIZE e_commerce_capstone.silver.events_clean
ZORDER BY (user_id, product_id);


path,metrics
,"List(111, 102, List(45769226, 130478493, 8.528694967567568E7, 111, 9466851414), List(67215848, 282229580, 9.470500431372549E7, 102, 9659910440), 213, List(minCubeSize(107374182400), List(0, 0), List(213, 15756631290), 0, List(102, 9659910440), 102, null), null, 0, 1, 213, 111, false, 0, 0, 1769855457084, 1769855742452, 8, 102, null, List(0, 0), null, 13, 13, 508157, 0, null)"


### Validate Silver

In [0]:
%sql
SELECT
  COUNT(*) AS total_rows,
  COUNT(DISTINCT user_id) AS users,
  SUM(is_purchase) AS purchases
FROM e_commerce_capstone.silver.events_clean;


total_rows,users,purchases
411709736,15639803,6848824
