# 03 - Streaming Silver & Silver_ML

Pipeline de transformation :
- **Bronze ‚Üí Silver** : Nettoyage et enrichissement
- **Bronze ‚Üí Silver_ML** : Feature engineering pour le Machine Learning

## Configuration

In [5]:
from pyspark.sql.functions import (
    col, from_unixtime, to_timestamp, round,
    lag, avg, stddev, row_number, when, sqrt, pow, lit, min as spark_min, broadcast
)
from pyspark.sql.window import Window
from config import get_s3_path, create_spark_session

BRONZE_PATH = get_s3_path("bronze", "flights")
SILVER_PATH = get_s3_path("silver", "flights")
SILVER_ML_PATH = get_s3_path("silver", "flights_ml")
CHECKPOINT_SILVER = get_s3_path("checkpoints", "silver_flights")
CHECKPOINT_SILVER_ML = get_s3_path("checkpoints", "silver_ml_flights")
AIRPORTS_CSV = "./data/airports.csv"

spark = create_spark_session("StreamingSilver")

print(f"‚úÖ Input:     {BRONZE_PATH}")
print(f"‚úÖ Silver:    {SILVER_PATH}")
print(f"‚úÖ Silver_ML: {SILVER_ML_PATH}")

‚úÖ Spark Session 'StreamingSilver' configur√©e
‚úÖ Input:     s3a://datalake/bronze/flights
‚úÖ Silver:    s3a://datalake/silver/flights
‚úÖ Silver_ML: s3a://datalake/silver/flights_ml


## Chargement des donn√©es a√©roports (pour Silver_ML)

In [6]:
df_airports = spark.read.option("header", "true").csv(AIRPORTS_CSV).select(
    col("ident").alias("airport_icao"),
    col("name").alias("airport_name"),
    col("iso_country").alias("airport_country"),
    col("latitude_deg").cast("double").alias("airport_lat"),
    col("longitude_deg").cast("double").alias("airport_lon")
).filter(col("type").isin("large_airport", "medium_airport"))

print(f"‚úÖ {df_airports.count()} a√©roports charg√©s")

‚úÖ 5211 a√©roports charg√©s


## Stream 1 : Bronze ‚Üí Silver

In [7]:
df_bronze_stream = spark.readStream.format("delta").load(BRONZE_PATH)

df_silver = df_bronze_stream \
    .filter(col("icao24").isNotNull()) \
    .filter(col("latitude").isNotNull() & col("longitude").isNotNull()) \
    .withColumn("event_timestamp", to_timestamp(from_unixtime(col("time")))) \
    .withColumn("velocity_kmh", round(col("velocity") * 3.6, 2)) \
    .withColumn("altitude_meters", col("baro_altitude")) \
    .select(
        "event_timestamp", "icao24", "callsign", "origin_country",
        "longitude", "latitude", "velocity_kmh", "altitude_meters",
        "on_ground", "category"
    )

print(f"üöÄ Stream 1: Bronze ‚Üí Silver")

query_silver = df_silver.writeStream \
    .format("delta") \
    .outputMode("append") \
    .option("checkpointLocation", CHECKPOINT_SILVER) \
    .option("mergeSchema", "true") \
    .start(SILVER_PATH)

üöÄ Stream 1: Bronze ‚Üí Silver


26/01/23 15:38:16 WARN ResolveWriteToStream: spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets and will be disabled.
26/01/23 15:38:16 WARN StreamingQueryManager: Stopping existing streaming query [id=03cb7525-313b-4658-807f-c1584735ae56, runId=1af062d8-10ee-4697-b766-c1f13030c8bb], as a new run is being started.
26/01/23 15:38:16 ERROR TorrentBroadcast: Store broadcast broadcast_608 fail, remove all pieces of the broadcast


## Stream 2 : Bronze ‚Üí Silver_ML (Feature Engineering)

Transformation avec features pour le ML, directement depuis Bronze.

In [8]:
def process_ml_batch(batch_df, batch_id):
    """Traitement d'un micro-batch pour Silver_ML avec feature engineering."""
    
    if batch_df.isEmpty():
        return
    
    # Transformation Bronze ‚Üí format Silver
    df_base = batch_df \
        .filter(col("icao24").isNotNull()) \
        .filter(col("latitude").isNotNull() & col("longitude").isNotNull()) \
        .withColumn("event_timestamp", to_timestamp(from_unixtime(col("time")))) \
        .withColumn("velocity_kmh", round(col("velocity") * 3.6, 2)) \
        .withColumn("altitude_meters", col("baro_altitude"))
    
    # Nettoyage ML
    df_clean = df_base \
        .filter(col("altitude_meters").between(-500, 15000)) \
        .filter(col("velocity_kmh").between(0, 1200))
    
    if df_clean.isEmpty():
        return
    
    # Features temporelles
    window_aircraft = Window.partitionBy("icao24").orderBy("event_timestamp")
    
    df_temporal = df_clean \
        .withColumn("prev_altitude", lag("altitude_meters", 1).over(window_aircraft)) \
        .withColumn("prev_velocity", lag("velocity_kmh", 1).over(window_aircraft)) \
        .withColumn("altitude_change", col("altitude_meters") - col("prev_altitude")) \
        .withColumn("velocity_change", col("velocity_kmh") - col("prev_velocity")) \
        .withColumn("observation_rank", row_number().over(window_aircraft))
    
    # Jointure a√©roports
    df_on_ground = df_temporal.filter(col("on_ground") == True)
    df_in_flight = df_temporal.filter(col("on_ground") == False)
    
    if df_on_ground.count() > 0:
        df_with_airports = df_on_ground.crossJoin(broadcast(df_airports)).withColumn(
            "dist", sqrt(pow(col("latitude") - col("airport_lat"), 2) + pow(col("longitude") - col("airport_lon"), 2))
        )
        
        w = Window.partitionBy("icao24", "event_timestamp")
        df_closest = df_with_airports.withColumn("min_dist", spark_min("dist").over(w)) \
            .filter(col("dist") == col("min_dist")) \
            .drop("dist", "min_dist", "airport_lat", "airport_lon")
        
        df_enriched = df_closest.unionByName(
            df_in_flight.withColumn("airport_icao", lit(None))
                        .withColumn("airport_name", lit(None))
                        .withColumn("airport_country", lit(None)),
            allowMissingColumns=True
        )
    else:
        df_enriched = df_in_flight \
            .withColumn("airport_icao", lit(None)) \
            .withColumn("airport_name", lit(None)) \
            .withColumn("airport_country", lit(None))
    
    # Features rolling window
    rolling_window = Window.partitionBy("icao24").orderBy("event_timestamp").rowsBetween(-5, 0)
    
    df_rolling = df_enriched \
        .withColumn("rolling_avg_altitude", avg("altitude_meters").over(rolling_window)) \
        .withColumn("rolling_std_altitude", stddev("altitude_meters").over(rolling_window)) \
        .withColumn("rolling_avg_velocity", avg("velocity_kmh").over(rolling_window))
    
    # Label flight_phase
    df_ml = df_rolling.withColumn(
        "flight_phase",
        when(col("on_ground") == True, "GROUND")
        .when((col("altitude_change") > 50) & (col("altitude_meters") < 3000), "TAKEOFF")
        .when(col("altitude_change") > 20, "CLIMB")
        .when(col("altitude_change").between(-20, 20) & (col("altitude_meters") > 8000), "CRUISE")
        .when(col("altitude_change") < -20, "DESCENT")
        .otherwise("TRANSITION")
    )
    
    # S√©lection des colonnes finales
    df_final = df_ml.select(
        "event_timestamp", "icao24", "callsign", "origin_country",
        "longitude", "latitude", "velocity_kmh", "altitude_meters",
        "on_ground", "category",
        "prev_altitude", "prev_velocity", "altitude_change", "velocity_change", "observation_rank",
        "airport_icao", "airport_name", "airport_country",
        "rolling_avg_altitude", "rolling_std_altitude", "rolling_avg_velocity",
        "flight_phase"
    )
    
    # √âcriture
    df_final.write.format("delta").mode("append").save(SILVER_ML_PATH)

In [9]:
df_bronze_ml_stream = spark.readStream.format("delta").load(BRONZE_PATH)

print(f"üöÄ Stream 2: Bronze ‚Üí Silver_ML (Feature Engineering)")

query_silver_ml = df_bronze_ml_stream.writeStream \
    .foreachBatch(process_ml_batch) \
    .option("checkpointLocation", CHECKPOINT_SILVER_ML) \
    .start()

üöÄ Stream 2: Bronze ‚Üí Silver_ML (Feature Engineering)


26/01/23 15:38:26 WARN ResolveWriteToStream: spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets and will be disabled.


                                                                                

## Monitoring des streams

In [None]:
import time

print("üìä Monitoring des streams (Ctrl+C pour arr√™ter)")
print("="*60)

try:
    while True:
        print(f"\n‚è±Ô∏è  {time.strftime('%H:%M:%S')}")
        print(f"  Silver:    {query_silver.status}")
        print(f"  Silver_ML: {query_silver_ml.status}")
        time.sleep(30)
except KeyboardInterrupt:
    print("\n‚èπÔ∏è  Arr√™t demand√©...")

üìä Monitoring des streams (Ctrl+C pour arr√™ter)

‚è±Ô∏è  15:38:32
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:38:39 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:38:57 ERROR NonFateSharingFuture: Failed to get result from future  
scala.runtime.NonLocalReturnControl
26/01/23 15:39:00 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
                                                                                


‚è±Ô∏è  15:39:02
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:39:15 ERROR NonFateSharingFuture: Failed to get result from future  
scala.runtime.NonLocalReturnControl
26/01/23 15:39:23 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
                                                                                


‚è±Ô∏è  15:39:32
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


                                                                                


‚è±Ô∏è  15:40:02
  Silver:    {'message': 'Getting offsets from DeltaSource[s3a://datalake/bronze/flights]', 'isDataAvailable': False, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:40:03 WARN S3AInstrumentation: Closing output stream statistics while data is still marked as pending upload in OutputStreamStatistics{counters=((op_abort=0) (stream_write_exceptions_completing_upload=0) (action_executor_acquired.failures=0) (object_multipart_aborted=0) (stream_write_total_data=0) (stream_write_total_time=0) (op_hflush=0) (object_multipart_aborted.failures=0) (stream_write_block_uploads=1) (multipart_upload_completed=0) (stream_write_queue_duration=0) (multipart_upload_completed.failures=0) (stream_write_exceptions=0) (op_hsync=0) (op_abort.failures=0) (action_executor_acquired=0) (stream_write_bytes=43468));
gauges=((stream_write_block_uploads_data_pending=43468) (stream_write_block_uploads_pending=1));
minimums=((multipart_upload_completed.min=-1) (object_multipart_aborted.failures.min=-1) (action_executor_acquired.min=-1) (multipart_upload_completed.failures.min=-1) (op_abort.failures.min=-1) (op_abort.min=-1) (action_executor_acquired.failures.min=-1) (


‚è±Ô∏è  15:40:32
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:40:32 ERROR NonFateSharingFuture: Failed to get result from future
scala.runtime.NonLocalReturnControl
26/01/23 15:40:47 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:40:58 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers


‚è±Ô∏è  15:41:02
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:41:07 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:41:19 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:41:21 ERROR NonFateSharingFuture: Failed to get result from future  
scala.runtime.NonLocalReturnControl
                                                                                


‚è±Ô∏è  15:41:32
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:41:35 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:41:43 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:41:48 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:42:00 ERROR NonFateSharingFuture: Failed to get result from future  
scala.runtime.NonLocalReturnControl
[Stage 670:==>             (8 + 8) / 50][Stage 673:>               (0 + 0) / 50]


‚è±Ô∏è  15:42:02
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:42:07 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:42:21 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:42:27 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
                                                                                


‚è±Ô∏è  15:42:32
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}





‚è±Ô∏è  15:43:02
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:43:16 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:43:17 ERROR NonFateSharingFuture: Failed to get result from future  
scala.runtime.NonLocalReturnControl
26/01/23 15:43:30 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
                                                                                


‚è±Ô∏è  15:43:33
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:43:52 ERROR NonFateSharingFuture: Failed to get result from future  
scala.runtime.NonLocalReturnControl
                                                                                


‚è±Ô∏è  15:44:03
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:44:04 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:44:10 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:44:21 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
                                                                                


‚è±Ô∏è  15:44:33
  Silver:    {'message': 'Getting offsets from DeltaSource[s3a://datalake/bronze/flights]', 'isDataAvailable': False, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:44:54 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers


‚è±Ô∏è  15:45:03
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


                                                                                


‚è±Ô∏è  15:45:33
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:45:41 ERROR NonFateSharingFuture: Failed to get result from future  
scala.runtime.NonLocalReturnControl
26/01/23 15:45:41 ERROR NonFateSharingFuture: Failed to get result from future
scala.runtime.NonLocalReturnControl
                                                                                


‚è±Ô∏è  15:46:03
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


[Stage 1033:>                                                      (0 + 8) / 10]


‚è±Ô∏è  15:46:33
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:47:02 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers


‚è±Ô∏è  15:47:03
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


                                                                                


‚è±Ô∏è  15:47:33
  Silver:    {'message': 'Getting offsets from DeltaSource[s3a://datalake/bronze/flights]', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:47:43 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
[Stage 1087:>                                                       (0 + 2) / 2]


‚è±Ô∏è  15:48:04
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:48:16 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
                                                                                


‚è±Ô∏è  15:48:35
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:48:48 ERROR NonFateSharingFuture: Failed to get result from future  
scala.runtime.NonLocalReturnControl
                                                                                


‚è±Ô∏è  15:49:05
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}





‚è±Ô∏è  15:49:35
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


[Stage 1149:>                                                       (0 + 2) / 2]


‚è±Ô∏è  15:50:05
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


[Stage 1160:>              (0 + 8) / 20][Stage 1162:>               (0 + 0) / 2]


‚è±Ô∏è  15:50:35
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}





‚è±Ô∏è  15:51:06
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}





‚è±Ô∏è  15:51:36
  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


[Stage 1192:>                                                       (0 + 1) / 1]


‚è±Ô∏è  15:52:06


                                                                                

  Silver:    {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


26/01/23 15:52:21 WARN MemoryManager: Total allocation exceeds 95.00% (1,020,054,720 bytes) of heap memory
Scaling row group sizes to 95.00% for 8 writers
26/01/23 15:52:22 ERROR NonFateSharingFuture: Failed to get result from future  
scala.runtime.NonLocalReturnControl


‚è±Ô∏è  15:52:36
  Silver:    {'message': 'Waiting for data to arrive', 'isDataAvailable': False, 'isTriggerActive': False}
  Silver_ML: {'message': 'Processing new data', 'isDataAvailable': True, 'isTriggerActive': True}


                                                                                


‚è±Ô∏è  15:53:06
  Silver:    {'message': 'Getting offsets from DeltaSource[s3a://datalake/bronze/flights]', 'isDataAvailable': False, 'isTriggerActive': True}
  Silver_ML: {'message': 'Getting offsets from DeltaSource[s3a://datalake/bronze/flights]', 'isDataAvailable': False, 'isTriggerActive': True}

‚è±Ô∏è  15:53:36
  Silver:    {'message': 'Getting offsets from DeltaSource[s3a://datalake/bronze/flights]', 'isDataAvailable': False, 'isTriggerActive': True}
  Silver_ML: {'message': 'Getting offsets from DeltaSource[s3a://datalake/bronze/flights]', 'isDataAvailable': False, 'isTriggerActive': True}

‚è±Ô∏è  15:54:06
  Silver:    {'message': 'Getting offsets from DeltaSource[s3a://datalake/bronze/flights]', 'isDataAvailable': False, 'isTriggerActive': True}
  Silver_ML: {'message': 'Getting offsets from DeltaSource[s3a://datalake/bronze/flights]', 'isDataAvailable': False, 'isTriggerActive': True}

‚è±Ô∏è  15:54:36
  Silver:    {'message': 'Waiting for data to arrive', 'isDataAvailabl

## Arr√™t des streams

In [None]:
query_silver.stop()
query_silver_ml.stop()
print("‚úÖ Tous les streams arr√™t√©s")

‚úÖ Tous les streams arr√™t√©s


## V√©rification

In [8]:
print("üìä Statistiques :")
print(f"  Bronze:    {spark.read.format('delta').load(BRONZE_PATH).count():,} lignes")
print(f"  Silver:    {spark.read.format('delta').load(SILVER_PATH).count():,} lignes")
print(f"  Silver_ML: {spark.read.format('delta').load(SILVER_ML_PATH).count():,} lignes")

print("\nüìä Distribution flight_phase (Silver_ML) :")
spark.read.format("delta").load(SILVER_ML_PATH).groupBy("flight_phase").count().orderBy("count", ascending=False).show()

üìä Statistiques :
  Bronze:    98,840 lignes
  Silver:    97,950 lignes
  Silver_ML: 88,432 lignes

üìä Distribution flight_phase (Silver_ML) :
+------------+-----+
|flight_phase|count|
+------------+-----+
|      CRUISE|37190|
|  TRANSITION|29374|
|     DESCENT|10699|
|       CLIMB| 8877|
|     TAKEOFF| 2035|
|      GROUND|  257|
+------------+-----+

