# Transformation Iceberg : Raw -> Silver
Ce notebook transforme les tables Iceberg de la zone **raw** en tables **silver** (dimensions et faits) de mani√®re g√©n√©rique.

## √âtapes :
1. Boucle sur le mapping raw -> silver.
2. D√©tection des colonnes via `DESCRIBE TABLE`.
3. Cr√©ation de la table silver si absente (avec partition automatique si une colonne date existe).
4. Insertion des donn√©es si la table existe d√©j√†.


In [None]:
from pyspark.sql import SparkSession

# -----------------------------------------------------
# 1. D√©marrage de la session Spark
# -----------------------------------------------------

spark = SparkSession.builder.appName("Iceberg Raw to Warahouse Data Loader").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
print("Spark session initialis√©e avec Iceberg et MinIO.")
print("‚úÖ Spark session initialis√©e avec Iceberg et MinIO.")

spark.sql('CREATE NAMESPACE IF NOT EXISTS lakehouse.silver').show()
print(f"Namespace Iceberg cr√©√©e : silver")

In [None]:
# -----------------------------------------------------
# 2. Mapping des fichiers bruts vers les tables Silver
# -----------------------------------------------------

file_table_map = {
    "lakehouse.raw.drivers": "lakehouse.silver.dim_drivers",
    "lakehouse.raw.trucks": "lakehouse.silver.dim_trucks",
    "lakehouse.raw.trailers": "lakehouse.silver.dim_trailers",
    "lakehouse.raw.customers": "lakehouse.silver.dim_customers",
    "lakehouse.raw.facilities": "lakehouse.silver.dim_facilities",
    "lakehouse.raw.routes": "lakehouse.silver.dim_routes",
    "lakehouse.raw.loads": "lakehouse.silver.fact_loads",
    "lakehouse.raw.trips": "lakehouse.silver.fact_trips",
    "lakehouse.raw.fuel_purchases": "lakehouse.silver.fact_fuel_purchases",
    "lakehouse.raw.maintenance_records": "lakehouse.silver.fact_maintenance_records",
    "lakehouse.raw.delivery_events": "lakehouse.silver.fact_delivery_events",
    "lakehouse.raw.safety_incidents": "lakehouse.silver.fact_safety_incidents",
    "lakehouse.raw.driver_monthly_metrics": "lakehouse.silver.agg_driver_monthly_metrics",
    "lakehouse.raw.truck_utilization_metrics": "lakehouse.silver.agg_truck_utilization_metrics"
}


In [None]:
# -----------------------------------------------------
# 3. Cr√©ation et insertion dans les tables Silver
# -----------------------------------------------------

for raw_table, silver_table in file_table_map.items():
    print(f"‚û°Ô∏è Traitement: {raw_table} -> {silver_table}")

    # V√©rifier si la table raw existe et r√©cup√©rer les colonnes
    try:
        schema_info = spark.sql(f"DESCRIBE TABLE {raw_table}").collect()
        columns = [row.col_name for row in schema_info if row.col_name not in ('# col_name', '')]
    except:
        print(f"‚ö†Ô∏è Table {raw_table} introuvable, on passe.")
        continue

    select_cols = ", ".join(columns)
 
    create_sql = f"CREATE TABLE IF NOT EXISTS {silver_table} AS SELECT {select_cols} FROM {raw_table}"

    spark.sql(create_sql)
    print(f"‚úÖ Table {silver_table} cr√©√©e ou existante.")

    # Insert si existe d√©j√†
    insert_sql = f"INSERT INTO {silver_table} SELECT {select_cols} FROM {raw_table}"
    spark.sql(insert_sql)
    print(f"‚úÖ Donn√©es ins√©r√©es dans {silver_table}.")

print("üéØ Toutes les op√©rations ont √©t√© ex√©cut√©es.")

In [None]:
# Aper√ßu des donn√©es pour lakehouse.dim_drivers
spark.sql("SELECT * FROM lakehouse.silver.dim_drivers LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.dim_trucks
spark.sql("SELECT * FROM lakehouse.silver.dim_trucks LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.dim_trailers
spark.sql("SELECT * FROM lakehouse.silver.dim_trailers LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.dim_customers
spark.sql("SELECT * FROM lakehouse.silver.dim_customers LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.dim_facilities
spark.sql("SELECT * FROM lakehouse.silver.dim_facilities LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.dim_routes
spark.sql("SELECT * FROM lakehouse.silver.dim_routes LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.fact_loads
spark.sql("SELECT * FROM lakehouse.silver.fact_loads LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.fact_trips
spark.sql("SELECT * FROM lakehouse.silver.fact_trips LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.fact_fuel_purchases
spark.sql("SELECT * FROM lakehouse.silver.fact_fuel_purchases LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.fact_maintenance_records
spark.sql("SELECT * FROM lakehouse.silver.fact_maintenance_records LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.fact_delivery_events
spark.sql("SELECT * FROM lakehouse.silver.fact_delivery_events LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.fact_safety_incidents
spark.sql("SELECT * FROM lakehouse.silver.fact_safety_incidents LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.agg_driver_monthly_metrics
spark.sql("SELECT * FROM lakehouse.silver.agg_driver_monthly_metrics LIMIT 10").show()

In [None]:
# Aper√ßu des donn√©es pour lakehouse.agg_truck_utilization_metrics
spark.sql("SELECT * FROM lakehouse.silver.agg_truck_utilization_metrics LIMIT 10").show()