# Bronze Layer – Multi-Source Ingestion (CRM & ERP)

## Overview

This notebook implements the **Bronze layer** of the Medallion Architecture for the Bike Sales Lakehouse project.

The purpose of this layer is to ingest raw data from multiple source systems (CRM and ERP) into the Lakehouse with minimal transformation, preserving the original structure for traceability and replayability.

## Source Systems

- **CRM System**
  - cust_info.csv
  - prd_info.csv
  - sales_details.csv

- **ERP System**
  - CUST_AZ12.csv
  - LOC_A101.csv
  - PX_CAT_G1V2.csv

## Responsibilities of the Bronze Layer

- Ingest raw CSV files
- Apply schema definitions
- Add metadata columns (e.g., ingestion timestamp)
- Write data in Delta format
- Store data in the Bronze schema
- Ensure idempotent and repeatable ingestion

## Output

All datasets are written as Delta tables in the Bronze layer and serve as the foundation for downstream Silver transformations.

---

This notebook is designed to simulate a production-style ingestion pipeline within a Databricks Lakehouse environment.

## Architectural Principle

The Bronze layer maintains raw data fidelity. No business logic or cleansing is applied at this stage. Transformations and conformance occur in the Silver layer.

## Define Ingestion Configuration

In [0]:
INGESTION_CONFIG = [
    {
        "source": "crm",
        "path": "/Volumes/workspace/bronze/raw_sources/source_crm/cust_info.csv",
        "table": "crm_cust_info"   
    },
    {
        "source": "crm",
        "path": "/Volumes/workspace/bronze/raw_sources/source_crm/prd_info.csv",
        "table": "crm_prd_info"
    },
    {
        "source": "crm",
        "path": "/Volumes/workspace/bronze/raw_sources/source_crm/sales_details.csv",
        "table": "crm_sales_details"
    },
    {
        "source": "erp",
        "path": "/Volumes/workspace/bronze/raw_sources/source_erp/CUST_AZ12.csv",
        "table": "erp_cust_az12"
    },
    {
        "source": "erp",
        "path": "/Volumes/workspace/bronze/raw_sources/source_erp/LOC_A101.csv",
        "table": "erp_loc_a101"
    },
    {
        "source": "erp",
        "path": "/Volumes/workspace/bronze/raw_sources/source_erp/PX_CAT_G1V2.csv",
        "table": "erp_px_cat_g1v2"
    }
]

## Ingest Files into Bronze Tables

In [0]:
for item in INGESTION_CONFIG:
    print(f"Ingesting {item['source']} → workspace.bronze.{item['table']}")

    df = (
        spark.read
             .option("header", "true")
             .option("inferSchema", "true")
             .csv(item["path"])
    )

    (
        df.write
          .mode("overwrite")
          .format("delta")
          .saveAsTable(f"workspace.bronze.{item['table']}")
    )