SILVER LAYER (CLEANING)
What We Fix

Remove nulls

Fix datatypes

Remove negative quantities

In [0]:
# ---- Parameters (CI/CD compatible) ----
dbutils.widgets.text("storage_account", "retaildatalaketest")
dbutils.widgets.text("container", "ecommerce")

storage_account = dbutils.widgets.get("storage_account")
container = dbutils.widgets.get("container")

base_path = f"abfss://{container}@{storage_account}.dfs.core.windows.net"

In [0]:
from pyspark.sql.functions import col

# bronze_path = "abfss://ecommerce@retaildatalaketest.dfs.core.windows.net/bronze/online_retail_bronze"
# silver_path = "abfss://ecommerce@retaildatalaketest.dfs.core.windows.net/silver/online_retail_silver"

silver_path = f"{base_path}/silver/online_retail_silver"
bronze_path = f"{base_path}/bronze/online_retail_bronze"

# Read bronze
bronze_df = spark.read.format("delta").load(bronze_path)

# Clean and transform to silver
silver_df = (bronze_df
    # Ensure correct data types (in case inferSchema missed something)
    .withColumn("Quantity", col("Quantity").cast("int"))
    .withColumn("UnitPrice", col("Price").cast("double"))   # Usually column is "UnitPrice", not "Price"
    .withColumn("CustomerID", col("CustomerID").cast("bigint"))
    
    # Business filters
    .filter(col("Quantity") > 0)
    .filter(col("UnitPrice") > 0)
    .filter(col("CustomerID").isNotNull())   # dropna on CustomerID
    
    # Optional: Add derived columns
    .withColumn("TotalAmount", col("Quantity") * col("UnitPrice"))
)

# Show sample
silver_df.limit(10).display()

# Write to silver layer
silver_df.write \
    .format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .save(silver_path)

spark.read.format("delta").load(silver_path).limit(10).display()
spark.sql("DESCRIBE HISTORY delta.`" + silver_path + "`").display()

Choose or Create a Catalog and Schema
First, decide where to put them. Common setup:
Catalog: main (default) or create your own like ecommerce
Schema: bronze and silver
Run this SQL to create them if they don't exist:

In [0]:
%sql
CREATE CATALOG IF NOT EXISTS ecommerce
MANAGED LOCATION 'abfss://ecommerce@retaildatalaketest.dfs.core.windows.net/managed/ecommerce/';

CREATE SCHEMA IF NOT EXISTS ecommerce.bronze;
CREATE SCHEMA IF NOT EXISTS ecommerce.silver;

Register the Silver,Bronze Table

In [0]:
%sql
-- Bronze table
CREATE TABLE IF NOT EXISTS ecommerce.bronze.online_retail_bronze
USING DELTA
LOCATION 'abfss://ecommerce@retaildatalaketest.dfs.core.windows.net/bronze/online_retail_bronze';

-- Silver table
CREATE TABLE IF NOT EXISTS ecommerce.silver.online_retail_silver
USING DELTA
LOCATION 'abfss://ecommerce@retaildatalaketest.dfs.core.windows.net/silver/online_retail_silver';

check if table exist and is querying

In [0]:
%sql
SELECT * FROM ecommerce.silver.online_retail_silver LIMIT 10