# 🥉 Bronze Layer — Raw Ingestion

**Annie's Magic Numbers Medallion Architecture**

This notebook ingests raw CSV files into the Bronze layer in Delta format.

### 🔐 CELL 0 — ADLS Gen2 Authentication (Storage Account Key)

In [None]:
# ============================================================
# CELL 0 — Azure Data Lake Gen2 Authentication
# ============================================================
# Configuración de rendimiento para 8 núcleos
spark.conf.set('spark.sql.shuffle.partitions', '8')
spark.conf.set('spark.databricks.delta.optimizeWrite.enabled', 'true')
spark.conf.set('spark.databricks.delta.autoOptimize.optimizeWrite', 'true')

spark.conf.set(
    "fs.azure.account.key.anniedatalake123.dfs.core.windows.net",
    "<REDACTED_AZURE_STORAGE_KEY>"
)

### 🟦 CELL 1 — Azure Data Lake Base Paths

In [None]:
# ============================================================
# CELL 1 — Azure Data Lake Gen2 Base Paths
# ============================================================
container_name = "annie-data"
storage_account = "anniedatalake123"

base_path = f"abfss://{container_name}@{storage_account}.dfs.core.windows.net/"
raw_path = base_path + "raw/"
bronze_path = base_path + "bronze/"

### 🟦 CELL 2 — Validate RAW Zone Accessibility

In [None]:
# ============================================================
# CELL 2 — Validate RAW Zone Accessibility
# ============================================================
dbutils.fs.ls(raw_path)

### 🟦 CELL 3 — Generic CSV Reader Function

In [None]:
# ============================================================
# CELL 3 — Generic CSV Reader Function
# ============================================================
def read_csv(filename):
    return (
        spark.read
             .option("header", True)
             .option("inferSchema", True)
             .csv(raw_path + filename)
    )

### 🟦 CELL 4 — Load RAW CSV Files into DataFrames

In [None]:
# ============================================================
# CELL 4 — Load RAW CSV Files into DataFrames
# ============================================================
sales_df = read_csv("SalesFINAL12312016.csv")
purchases_df = read_csv("PurchasesFINAL12312016.csv")
prices_df = read_csv("2017PurchasePricesDec.csv")
begin_inventory_df = read_csv("BegInvFINAL12312016.csv")
end_inventory_df = read_csv("EndInvFINAL12312016.csv")
invoices_df = read_csv("InvoicePurchases12312016.csv")

### 🟦 CELL 5 — Data Inspection & Schema Validation

In [None]:
# ============================================================
# CELL 5 — Data Inspection & Schema Validation
# ============================================================
display(sales_df)
display(purchases_df)
display(prices_df)

### 🟦 CELL 6 — Bronze Delta Writer Function

In [None]:
def write_bronze(df, table_name):
    tp = bronze_path + table_name
    df.write.format('delta').mode('overwrite').save(tp)
    print(f'   ✅ bronze.{table_name} saved')


### 🟦 CELL 7 — Persist DataFrames to Bronze Layer

In [None]:
tables = [
    ('sales', sales_df),
    ('purchases', purchases_df),
    ('prices', prices_df),
    ('begin_inventory', begin_inventory_df),
    ('end_inventory', end_inventory_df),
    ('invoices', invoices_df)
]

for name, df in tables:
    print(f"Processing bronze.{name}...")
    # Optimizamos para 8 cores antes de escribir
    optimized_df = df.repartition(8).cache()
    optimized_df.count()
    write_bronze(optimized_df, name)


### 🟦 CELL 8 — Validate Bronze Folder Structure

In [None]:
# ============================================================
# CELL 8 — Validate Bronze Folder Structure
# ============================================================
dbutils.fs.ls(bronze_path)

### 🟦 CELL 9 — Validate Delta Read from Bronze

In [None]:
# ============================================================
# CELL 9 — Validate Delta Read from Bronze
# ============================================================
sales_bronze_df = (
    spark.read
         .format("delta")
         .load(bronze_path + "sales")
)

display(sales_bronze_df)