# 🥉 Bronze Layer: Raw Data Ingestion

## 1. Overview
The **Bronze Layer** acts as the landing zone for all raw data. The primary goal is to capture the source data in its original state while adding basic metadata to track its entry into the Lakehouse.



## 2. Ingestion Configuration Logic
Instead of hard-coding every table, we use a **Metadata-Driven** approach. The `ingestion_config` list stores the necessary details for each file:

* **`path`**: The source location within the **Unity Catalog Volume**.
* **`table`**: The destination table name inside the `sales.bronz_layer` schema.

### 📋 Dataset Configuration
The following list defines the files to be processed:

| Source File Path | Target Table Name |
| :--- | :--- |
| `.../Sales.csv` | **Sales** |
| `.../Products.csv` | **Products** |
| `.../Customers.csv` | **Customers** |
| `.../Stores_Locations.csv` | **Stores_Locations** |
| `.../Regions.csv` | **Regions** |
| `.../Sales_Teams.csv` | **Sales_Teams** |

---

## 3. Key Ingestion Steps
When the ingestion script runs, it performs the following automated tasks:

1. **Read Source**: Loads files from the specified Volume paths.
2. **Metadata Tagging**: Adds an `ingestion_data` column containing a timestamp.
3. **Column Cleaning**: Automatically replaces spaces in headers with underscores (`_`) to satisfy **Delta Lake** requirements.
4. **Delta Write**: Saves the data to the Catalog using `.saveAsTable()`.


In [0]:
ingestion_config = [
  {"path":"/Volumes/sales/bronz_layer/raw_data/Sales.csv",
   "table": "Sales"
   },
  {"path": "/Volumes/sales/bronz_layer/raw_data/Products.csv",
   "table": "Products"
   },
  {
    "path": "/Volumes/sales/bronz_layer/raw_data/Customers.csv",
    "table": "Customers"
  },
  {
    "path" : "/Volumes/sales/bronz_layer/raw_data/Stores Locations.csv",
    "table": "Stores_Locations"
  },
  {
    "path" : "/Volumes/sales/bronz_layer/raw_data/Regions.csv",
    "table": "Regions"
  },
  {
    "path": "/Volumes/sales/bronz_layer/raw_data/Sales Teams.csv",
    "table": "Sales_Teams"
  }
]


In [0]:
import pandas as pd 
import datetime
for item in ingestion_config : 
  try : 
    print(f"trying to ingest : {item['table']}")
    df = pd.read_csv(item['path'], encoding= 'latin1') #we use different encoding, the product table contain some special characters
    #replace spaces on columns names with '_'
    df.columns = [c.replace(' ','_') for c in df.columns]
    df['ingestion_data'] = datetime.datetime.now()
    # convert pandas as to spark data frame 
    spark_df = spark.createDataFrame(df)
    table_path = f"sales.bronz_layer.{item['table']}"
    # save this spark dataframe as delta table 
    spark_df.write.mode("overwrite").format("delta").saveAsTable(table_path)
    print(f"sucess {item['table']} is now in catalog")
  except FileNotFoundError:
        print(f" Error: Could not find the file for {item['table']}.")
  except Exception as e:
        print(f" An error occurred: {e}")

trying to ingest : Sales
sucess Sales is now in catalog
trying to ingest : Products
sucess Products is now in catalog
trying to ingest : Customers
sucess Customers is now in catalog
trying to ingest : Stores_Locations
sucess Stores_Locations is now in catalog
trying to ingest : Regions
sucess Regions is now in catalog
trying to ingest : Sales_Teams
sucess Sales_Teams is now in catalog
