# Bronze Layer – Online Retail Data Ingestion

## Purpose
This notebook ingests the raw Online Retail dataset into Databricks as a Bronze Delta table.
The Bronze layer represents raw, unmodified data loaded from the source with minimal processing.

## Data Source
- Dataset: Online Retail Transaction Data
- Source: UCI Machine Learning Repository
- Original format: Excel
- Ingested format: CSV (converted externally due to Databricks Serverless constraints)

## Storage
- Raw file location:
  /Volumes/workspace/repeat_purchase/raw_data/online_retail.csv
- Bronze table:
  workspace.repeat_purchase.bronze_online_retail

## Notes
- No data cleaning or filtering is performed at this stage
- Schema inference is enabled
- All transformations will be handled in the Silver layer


In [0]:
# Read raw Online Retail CSV data from Unity Catalog Volume (Bronze ingestion)

input_path = "/Volumes/workspace/repeat_purchase/raw_data/online_retail.csv"

df_raw = (
    spark.read
    .option("header", "true")
    .option("inferSchema", "true")
    .csv(input_path)
)


In [0]:
# Basic sanity checks – schema and sample records

df_raw.printSchema()
display(df_raw.limit(10))


root
 |-- InvoiceNo: string (nullable = true)
 |-- StockCode: string (nullable = true)
 |-- Description: string (nullable = true)
 |-- Quantity: integer (nullable = true)
 |-- InvoiceDate: string (nullable = true)
 |-- UnitPrice: double (nullable = true)
 |-- CustomerID: integer (nullable = true)
 |-- Country: string (nullable = true)



InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,01-12-2010 08:26,2.55,17850,United Kingdom
536365,71053,WHITE METAL LANTERN,6,01-12-2010 08:26,3.39,17850,United Kingdom
536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,01-12-2010 08:26,2.75,17850,United Kingdom
536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,01-12-2010 08:26,3.39,17850,United Kingdom
536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,01-12-2010 08:26,3.39,17850,United Kingdom
536365,22752,SET 7 BABUSHKA NESTING BOXES,2,01-12-2010 08:26,7.65,17850,United Kingdom
536365,21730,GLASS STAR FROSTED T-LIGHT HOLDER,6,01-12-2010 08:26,4.25,17850,United Kingdom
536366,22633,HAND WARMER UNION JACK,6,01-12-2010 08:28,1.85,17850,United Kingdom
536366,22632,HAND WARMER RED POLKA DOT,6,01-12-2010 08:28,1.85,17850,United Kingdom
536367,84879,ASSORTED COLOUR BIRD ORNAMENT,32,01-12-2010 08:34,1.69,13047,United Kingdom


In [0]:
# Write raw data to Bronze Delta table

(
    df_raw.write
    .format("delta")
    .mode("overwrite")
    .saveAsTable("workspace.repeat_purchase.bronze_online_retail")
)


In [0]:
# Verify record count in Bronze table

display(
    spark.sql("""
    SELECT COUNT(*) AS row_count
    FROM workspace.repeat_purchase.bronze_online_retail
    """)
)



row_count
541909
