# ðŸ”„ Bronze to Silver - Risk, Compliance & Audit

Ce notebook transforme les donnÃ©es brutes (bronze) en tables Delta nettoyÃ©es (silver) dans le Lakehouse.

## Ã‰tapes :
1. Charger les fichiers CSV depuis Files/bronze/
2. Appliquer les transformations (typage des dates, nettoyage)
3. Ã‰crire les tables Delta dans le Lakehouse

In [None]:
# Cell 1: Load Bronze data
print("ðŸ“¥ Chargement des donnÃ©es bronze...")

controls = spark.read.csv("Files/bronze/controls.csv", header=True, inferSchema=True)
executions = spark.read.csv("Files/bronze/control_executions.csv", header=True, inferSchema=True)
incidents = spark.read.csv("Files/bronze/incidents.csv", header=True, inferSchema=True)
remediation = spark.read.csv("Files/bronze/remediation_actions.csv", header=True, inferSchema=True)
vendors = spark.read.csv("Files/bronze/vendors.csv", header=True, inferSchema=True)

print(f"âœ… Controls: {controls.count()} lignes")
print(f"âœ… Executions: {executions.count()} lignes")
print(f"âœ… Incidents: {incidents.count()} lignes")
print(f"âœ… Remediation: {remediation.count()} lignes")
print(f"âœ… Vendors: {vendors.count()} lignes")

In [None]:
# Cell 2: Transform to Silver (typage, nettoyage)
from pyspark.sql.functions import col, to_date

print("ðŸ”§ Transformation des donnÃ©es...")

# Controls - dÃ©jÃ  propre
controls_silver = controls

# Control Executions - typage date
executions_silver = executions \
    .withColumn("execution_date", to_date(col("execution_date")))

# Incidents - typage date
incidents_silver = incidents \
    .withColumn("detection_date", to_date(col("detection_date")))

# Remediation Actions - typage dates multiples
remediation_silver = remediation \
    .withColumn("start_date", to_date(col("start_date"))) \
    .withColumn("target_completion_date", to_date(col("target_completion_date"))) \
    .withColumn("completion_date", to_date(col("completion_date")))

# Vendors - typage date et risk_score
vendors_silver = vendors \
    .withColumn("last_audit_date", to_date(col("last_audit_date"))) \
    .withColumn("risk_score", col("risk_score").cast("float"))

print("âœ… Transformations appliquÃ©es!")

In [None]:
# Cell 3: Write to Delta Tables (Silver)
print("ðŸ’¾ Ã‰criture des tables Delta...")

controls_silver.write.mode("overwrite").format("delta").saveAsTable("controls")
print("âœ… Table 'controls' crÃ©Ã©e")

executions_silver.write.mode("overwrite").format("delta").saveAsTable("control_executions")
print("âœ… Table 'control_executions' crÃ©Ã©e")

incidents_silver.write.mode("overwrite").format("delta").saveAsTable("incidents")
print("âœ… Table 'incidents' crÃ©Ã©e")

remediation_silver.write.mode("overwrite").format("delta").saveAsTable("remediation_actions")
print("âœ… Table 'remediation_actions' crÃ©Ã©e")

vendors_silver.write.mode("overwrite").format("delta").saveAsTable("vendors")
print("âœ… Table 'vendors' crÃ©Ã©e")

print("\nðŸŽ‰ Silver tables created successfully!")
print("ðŸ“Š VÃ©rifiez les tables dans la section 'Tables' du Lakehouse")

In [None]:
# Cell 4: Validation - AperÃ§u des tables crÃ©Ã©es
print("ðŸ“Š AperÃ§u des tables crÃ©Ã©es:\n")

print("=== CONTROLS ===")
spark.sql("SELECT * FROM controls LIMIT 5").show()

print("\n=== CONTROL EXECUTIONS ===")
spark.sql("SELECT * FROM control_executions LIMIT 5").show()

print("\n=== INCIDENTS ===")
spark.sql("SELECT * FROM incidents LIMIT 5").show()

print("\n=== REMEDIATION ACTIONS ===")
spark.sql("SELECT * FROM remediation_actions LIMIT 5").show()

print("\n=== VENDORS ===")
spark.sql("SELECT * FROM vendors LIMIT 5").show()