# 🥉 Bronze Layer - Raw Data Ingestion

This notebook implements the **Bronze layer** of the Medallion architecture.  
Its purpose is to:

- Ingest CSV files from the **Raw** layer using **Databricks Autoloader**.  
- Handle **incremental ingestion** and **schema evolution**.  
- Store the data in **Delta Lake** format within the `bronzevolume`.  

Each execution takes the source (`src_value`) as a parameter and generates raw data persisted and ready for the **Silver** layer.

---


The value of `src_value` comes from the **SrcParameters** notebook,  
where an array with the different sources is defined:  

`sales, stores, customers, products, salespersons, campaigns, dates, times`

In this way, the pipeline runs **dynamically**, executing for each one.

In [0]:
src_value = dbutils.widgets.get("src")
print(src_value)

sales


In [0]:
%sql
CREATE VOLUME IF NOT EXISTS workspace.bronze.bronzevolume

**Databricks Autoloader** is used to read CSV files from the **Raw** layer:  
- `schemaLocation`: stores the schema information and checkpoint.  
- `schemaEvolutionMode = rescue`: handles schema changes.  
- `load`: points to the source path defined by `src_value`.  

In [0]:
df = spark.readStream.format("cloudFiles") \
        .option("cloudFiles.format", "csv") \
        .option("cloudFiles.schemaLocation", f"/Volumes/workspace/bronze/bronzevolume/{src_value}/checkpoint") \
        .option("cloudFiles.schemaEvolutionMode", "rescue") \
        .load(f"/Volumes/workspace/raw/rawvolume/rawdata/{src_value}/")

The data is written in **Delta** format inside the Bronze volume:  
- `outputMode("append")`: adds new records.  
- `trigger(once=True)`: runs the ingestion only once.  
- `checkpointLocation`: ensures fault tolerance and avoids duplicates.  
- `path`: final path where the raw data is stored.  

In [0]:

df.writeStream.format("delta") \
    .outputMode("append") \
    .trigger(once=True) \
    .option("checkpointLocation", f"/Volumes/workspace/bronze/bronzevolume/{src_value}/checkpoint") \
    .option("path", f"/Volumes/workspace/bronze/bronzevolume/{src_value}/data") \
    .start() \
    .awaitTermination()