# 📅 LoadStaticDimensions - Dates and Times

This notebook loads and transforms the **static dimensions** (`dates` and `times`) through the Medallion architecture:

- **Raw → Bronze**: CSV files are read from the Raw layer and stored as Delta tables in Bronze.  
- **Bronze → Silver**: typing transformations (`cast`) are applied using dedicated functions (`silver_transform_dates`, `silver_transform_times`).  
- **Silver → Gold**: the final dimensional tables (`DimDates`, `DimTimes`) are created and made ready for the analytical model.  
 


In [0]:
from pyspark.sql import functions as F
from pyspark.sql.utils import AnalysisException
from pyspark.sql.types import DateType, IntegerType, StringType, TimestampType

- `src_values`: list of static dimensions to process (`dates`, `times`).  
- `raw_base_path`, `bronze_base_path`, `silver_base_path`, `gold_base_path`: base paths for each layer in the Medallion architecture.  
- `pk_map`: maps each dimension to its primary key (`date_sk`, `time_sk`).  
- `bronze_schema`, `silver_schema`, `gold_schema`: schemas where the tables will be stored in each layer.  

In [0]:
src_values = ["dates", "times"]  
raw_base_path = "/Volumes/workspace/raw/rawvolume/rawdata"
bronze_base_path = "/Volumes/workspace/bronze/bronzevolume"
silver_base_path = "/Volumes/workspace/silver"
gold_base_path = "/Volumes/workspace/gold"
pk_map = {
    "dates": "date_sk",
    "times": "time_sk"
}
bronze_schema = "workspace.bronze"
silver_schema = "workspace.silver"
gold_schema   = "workspace.gold"

- **`silver_transform_dates(df)`**: Converts dates and related columns to proper types (date and int).

- **`silver_transform_times(df)`**: Converts time keys to int and keeps time attributes as strings. 

In [0]:
def silver_transform_dates(df):
    return (df.withColumn("full_date", F.col("full_date").cast(DateType()))
              .withColumn("date_sk", F.col("date_sk").cast(IntegerType()))
              .withColumn("year", F.col("year").cast(IntegerType()))
              .withColumn("month", F.col("month").cast(IntegerType()))
              .withColumn("day", F.col("day").cast(IntegerType()))
              .withColumn("weekday", F.col("weekday").cast(IntegerType()))
              .withColumn("quarter", F.col("quarter").cast(IntegerType())))

def silver_transform_times(df):
    return (df.withColumn("time_sk", F.col("time_sk").cast(IntegerType()))
              .withColumn("full_time", F.col("full_time").cast(StringType()))
              .withColumn("hour", F.col("hour").cast(IntegerType()))
              .withColumn("minute", F.col("minute").cast(IntegerType()))
              .withColumn("day_part", F.col("day_part").cast(StringType()))
              .withColumn("full_time_ampm", F.col("full_time_ampm").cast(StringType())))


For each dimension in `src_values`, the **Raw → Bronze → Silver → Gold** flow is executed:

1. **Raw → Bronze**  
   - Reads CSV files from the Raw layer.  
   - Deletes the previous directory in Bronze and overwrites it with Delta-formatted data.  

2. **Bronze → Silver**  
   - Reads the Bronze table.  
   - Applies the corresponding transformation function (`silver_transform_dates` or `silver_transform_times`).  
   - Saves the result as a table in the Silver schema (`silver_{dim}`).  

3. **Silver → Gold**  
   - Takes the Silver table.  
   - Deletes the previous Gold table if it exists.  
   - Creates the final dimensional table (`DimDates`, `DimTimes`) in the Gold schema.  


In [0]:
for dim in src_values:

    print(f"\n===== Processing {dim.upper()} =====")

    pk_col = pk_map[dim]
    raw_path = f"{raw_base_path}/{dim}"
    bronze_path = f"{bronze_base_path}/{dim}"

    raw_df = (
        spark.read.format("csv")
        .option("header", "true")
        .option("inferSchema", "false")
        .load(raw_path)
    )

    ############################# BRONZE ######################
    dbutils.fs.rm(bronze_path, recurse=True)
    raw_df.write.format("delta").mode("overwrite").save(bronze_path)

    ############################# SILVER ########################
    silver_table = f"{silver_schema}.silver_{dim}"
    bronze_df = spark.read.format("delta").load(bronze_path)

    if dim == "dates":
        bronze_df = silver_transform_dates(bronze_df)
    elif dim == "times":
        bronze_df = silver_transform_times(bronze_df)

    bronze_df.write.format("delta").mode("overwrite").saveAsTable(silver_table)

    ############################# GOLD ###################################
    gold_table = f"{gold_schema}.Dim{dim}"
    silver_df = spark.table(silver_table)
    
    spark.sql(f"DROP TABLE IF EXISTS {gold_table}")
    silver_df.write.format("delta").mode("overwrite").saveAsTable(gold_table)


===== Procesando DATES =====

===== Procesando TIMES =====


In [0]:
%sql
SELECT * FROM gold.dimdates

full_date,date_sk,year,month,day,weekday,quarter
2024-01-01,1,2024,1,1,1,1
2024-01-02,2,2024,1,2,2,1
2024-01-03,3,2024,1,3,3,1
2024-01-04,4,2024,1,4,4,1
2024-01-05,5,2024,1,5,5,1
2024-01-06,6,2024,1,6,6,1
2024-01-07,7,2024,1,7,7,1
2024-01-08,8,2024,1,8,1,1
2024-01-09,9,2024,1,9,2,1
2024-01-10,10,2024,1,10,3,1


In [0]:
%sql
SELECT * FROM gold.dimtimes

time_sk,full_time,hour,minute,day_part,full_time_ampm
1,00:00,0,0,Night,12:00 AM
2,00:01,0,1,Night,12:01 AM
3,00:02,0,2,Night,12:02 AM
4,00:03,0,3,Night,12:03 AM
5,00:04,0,4,Night,12:04 AM
6,00:05,0,5,Night,12:05 AM
7,00:06,0,6,Night,12:06 AM
8,00:07,0,7,Night,12:07 AM
9,00:08,0,8,Night,12:08 AM
10,00:09,0,9,Night,12:09 AM
