# Bronze Layer Ingestion

Goal: Ingest raw JSON files in "../data" into Bronze Delta tables

Datasets:
- customers.json  -> bronze.customers_raw
- products        -> bronze.products_raw
- orders          -> bronze.orders_raw
- sales           -> bronze.sales_raw
- countries       -> bronze.countries_raw

## Imports, Context Setting and Paths

Imports

In [0]:
from src.utils import load_variable_json, is_json_line
from pyspark.sql.functions import current_timestamp, lit

RAW_PATH = "../data/"
SOURCE_SYSTEM = "sales test homework"

Set Databricks context

In [0]:
spark.sql("USE CATALOG md_sales_dashboard")
spark.sql("USE SCHEMA bronze")



## Helper Function Definition
This helper function standardizes the JSON -> Bronze ingestion process:
1. Reads JSON file with the loader ('load_variable_json')
2. Converts pandas dataframe into Spark DataFrame 
3. Adds metadata:
    - _ingest_timestamp
    - _ingest_file
    - _source_system
4. Writes DataFrame to a Delta table into the md_sales_dashboard.bronze namespace


In [0]:
def ingest_json_to_bronze(filename: str, table_name: str):
    """
    Load a JSON file from ../data into a Bronze Delta table.

    - Reads JSON with load_variable_json (Pandas)
    - Converts to Spark DataFrame
    - Adds standard Bronze metadata
    - Writes as Delta table using 'table_name' in the current schema
    """
    full_path = RAW_PATH + filename
    pdf = load_variable_json(full_path)
    sdf = spark.createDataFrame(pdf)

    # Add bronze metadata
    bronze_df = (
        sdf
        .withColumn("_ingest_timestamp", current_timestamp())
        .withColumn("_ingest_file", lit(filename))
        .withColumn("_source_system", lit(SOURCE_SYSTEM))
    )

    # Write Delta table
    (
    bronze_df
        .write
        .format("delta")
        .mode("overwrite")
        .saveAsTable(table_name)
    )



## Load Delta Tables 


###*Load Customers*

Ingest Customers

In [0]:
ingest_json_to_bronze("customers.json", "customers_raw")

Check Customers load

In [0]:
display(spark.table("customers_raw").limit(5))

### *Load Orders*

Ingest Orders

In [0]:
ingest_json_to_bronze("orders.json", "orders_raw")

Check Orders load

In [0]:
display(spark.table("orders_raw").limit(5))