# Gold Layer – Telemetry Insights

## Objective
This notebook aggregates telemetry data into driver-level insights per session.  
It helps analyze driving styles, speed consistency, and car performance.

## Steps
1. Load Silver telemetry data (`telemetry_data`) and session info (`session_info`).  
2. Convert timestamps and align with session metadata.  
3. Compute driver-level metrics:
   - Average, max, and min speed  
   - Average throttle & braking percentages  
   - Gear usage distribution  
   - Distance covered  
4. Join with session info for contextual enrichment (year, round, circuit, session name).  
5. Store results in Gold layer as `gold.telemetry_insights`.  
6. Optimize table with ZORDER by `(session_key, Driver)`.  


In [0]:
from pyspark.sql import functions as F

# 1. Load Silver tables
telemetry = spark.table("silver.telemetry_data")
session_info = spark.table("silver.session_info")

# 2. Clean and standardize telemetry data
telemetry = telemetry.withColumnRenamed("Date", "telemetry_time")

# Ensure numeric columns are safe
telemetry = telemetry.withColumn("Speed", F.col("Speed").cast("double"))
telemetry = telemetry.withColumn("Throttle", F.col("Throttle").cast("double"))
telemetry = telemetry.withColumn("Brake", F.col("Brake").cast("double"))
telemetry = telemetry.withColumn("Distance", F.col("Distance").cast("double"))
telemetry = telemetry.withColumn("nGear", F.col("nGear").cast("int"))

# 3. Aggregate telemetry metrics at driver + session level
telemetry_agg = (
    telemetry.groupBy("session_key", "Driver")
    .agg(
        F.avg("Speed").alias("avg_speed"),
        F.max("Speed").alias("max_speed"),
        F.min("Speed").alias("min_speed"),
        F.avg("Throttle").alias("avg_throttle"),
        F.avg("Brake").alias("avg_brake"),
        F.expr("percentile_approx(Speed, 0.5)").alias("median_speed"),
        F.sum("Distance").alias("total_distance"),
        F.countDistinct("nGear").alias("unique_gears_used")
    )
)

# 4. Join with session info for enrichment
telemetry_insights = (
    telemetry_agg.join(session_info, on="session_key", how="left")
)

# 5. Finalize schema
telemetry_insights = telemetry_insights.select(
    "session_key",
    "Driver",
    "Year",
    "Round",
    "Circuit",
    "SessionName",
    "EventName",
    "avg_speed",
    "max_speed",
    "min_speed",
    "median_speed",
    "avg_throttle",
    "avg_brake",
    "total_distance",
    "unique_gears_used"
)

# 6. Write to Gold layer
(
    telemetry_insights.write.format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable("gold.telemetry_insights")
)

# 7. Optimize
spark.sql("OPTIMIZE gold.telemetry_insights ZORDER BY (session_key, Driver)")