# Silver Layer Transformation – Weather Data

This notebook processes the raw **bronze.weather_data** table into the **silver.weather_data** table.

### Steps performed:
1. **Read** the raw weather data from the Bronze layer.  
2. **Clean & Transform**:  
   - Remove duplicate records based on `weather_id` and `timestamp`.  
   - Standardize `wind_direction` to uppercase.  
   - Apply basic data quality filters (valid ranges for temperature and humidity).  
3. **Write** the cleaned data into the Silver layer as a Delta table.  

**Output:**  
A curated `silver.weather_data` table ready for downstream analytics.


In [0]:
# Databricks notebook source
from pyspark.sql.functions import col, upper

# Read from Bronze
bronze_weather_df = spark.read.format("delta").table("bronze.weather_data")

# Transformations / Cleaning
silver_weather_df = (
    bronze_weather_df
    # Remove duplicates
    .dropDuplicates(["weather_id", "timestamp"])
    # Ensure wind_direction is uppercase
    .withColumn("wind_direction", upper(col("wind_direction")))
    # Basic sanity filters
    .filter(col("track_temperature") > -50)   # remove invalid values
    .filter(col("air_temperature") > -50)     # remove invalid values
    .filter(col("humidity").between(0, 100))  # valid humidity %
)

# Write to Silver
(
    silver_weather_df.write
    .format("delta")
    .mode("overwrite")   # for first run, switch to "append" for incremental loads
    .saveAsTable("silver.weather_data")
)

print("✅ weather_data table created successfully in silver layer")