# Gold Layer – Weather Impact

## Objective
This notebook creates insights on how weather conditions affect lap times, tire performance, and race results.

## Steps
1. Load Silver tables:
   - `lap_times` (performance metrics)
   - `weather_data` (conditions per session)
   - `session_info` (metadata enrichment)
2. Align lap timestamps with weather snapshots.
3. Compute correlations:
   - Avg lap time by rainfall/humidity/track temp.
   - Pit stop frequency under different weather conditions.
   - Speed variation in different weather states.
4. Join with session info for year, round, and circuit.
5. Store results in Gold layer as `gold.weather_impact`.
6. Optimize with ZORDER on `(session_key, Driver)`.


In [0]:
from pyspark.sql import functions as F
from pyspark.sql.window import Window

# 1. Load Silver tables
laps = spark.table("silver.lap_times")
weather = spark.table("silver.weather_data")
session_info = spark.table("silver.session_info")

# 2. Standardize timestamps
laps = laps.withColumn("lap_start_time", F.col("lap_start_time").cast("timestamp"))
weather = weather.withColumn("Time", F.col("Time").cast("timestamp"))

# 3. Join laps with weather using nearest timestamp per session
window_spec = Window.partitionBy("session_key").orderBy(F.abs(F.col("lap_start_time").cast("long") - F.col("Time").cast("long")))

lap_weather = (
    laps.join(weather, "session_key")
    .withColumn("time_diff", F.abs(F.col("lap_start_time").cast("long") - F.col("Time").cast("long")))
    .withColumn("rank", F.row_number().over(window_spec))
    .filter("rank = 1")
    .drop("time_diff", "rank")
)

# 4. Compute weather impact metrics
weather_impact = (
    lap_weather.groupBy("session_key", "Driver")
    .agg(
        F.avg("lap_time").alias("avg_lap_time"),
        F.avg("Rainfall").alias("avg_rainfall"),
        F.avg("TrackTemp").alias("avg_track_temp"),
        F.avg("AirTemp").alias("avg_air_temp"),
        F.avg("Humidity").alias("avg_humidity"),
        F.avg("WindSpeed").alias("avg_wind_speed"),
        F.count(F.when(F.col("PitInTime").isNotNull(), 1)).alias("pit_stops")
    )
)

# 5. Enrich with session info
weather_impact = weather_impact.join(session_info, "session_key", "left")

# 6. Select final schema
weather_impact = weather_impact.select(
    "session_key",
    "Driver",
    "Year",
    "Round",
    "Circuit",
    "SessionName",
    "EventName",
    "avg_lap_time",
    "avg_rainfall",
    "avg_track_temp",
    "avg_air_temp",
    "avg_humidity",
    "avg_wind_speed",
    "pit_stops"
)

# 7. Write to Gold layer
(
    weather_impact.write.format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable("gold.weather_impact")
)

# 8. Optimize
spark.sql("OPTIMIZE gold.weather_impact ZORDER BY (session_key, Driver)")