f_OrderDetail

**In certain scenarios, it may be possible to create a single Sales table that encompasses all necessary information, rather than maintaining separate Order Header and Order Detail fact tables. However, adopting this approach in the current context would result in the loss of critical data elements such as status and tax. Therefore, this model will consist of two fact tables: Order Header and Order Details.**

In [0]:

from pyspark.sql import DataFrame, Window
from pyspark.sql import functions as F
from pyspark.sql.types import (
    DecimalType, IntegerType, StringType, TimestampType, StructType, StructField
)
from pyspark.sql.functions import col, desc, when, datediff, current_date

In [0]:
%run "/Workspace/Utils/Utils"

In [0]:
#Loading Table

df = spark.table("adlslmcompany_silver.managed_silver.sales_orderdetail")

In [0]:
#Displaying table

df.display()

In [0]:
def gold_clean_f_orderdetail(df): 

    #Drop columns that will not be necessary for this dimension table
    df = df.drop("rowguid", "ModifiedDate", "bronze_ingestion_timestamp", "silves_transformed_timestamp" )

    # Adds processed timestamp
    df = df.withColumn("processed_timestamp", F.current_timestamp())

    #Cast to ensure datatype
    df = df.select(
         F.col('SalesOrderID').cast(IntegerType()).alias('SalesOrderID'),
         F.col('SalesOrderDetailID').cast(IntegerType()).alias('SalesOrderDetailID'),
         F.col('OrderQty').cast(IntegerType()).alias('OrderQty'),
         F.col('ProductID').cast(IntegerType()).alias('ProductID'),
         F.col('UnitPrice').cast(DecimalType(19,4)).alias('UnitPrice'),
         F.col('UnitPriceDiscount').cast(DecimalType(19,4)).alias('UnitPriceDiscount'),
         F.col('LineTotal').cast(DecimalType(38,6)).alias('LineTotal'),
         F.col('processed_timestamp').cast(TimestampType()).alias('processed_timestamp')
                 )
    return df

In [0]:
#Defining expected schema
expected_schema = StructType([
    StructField("SalesOrderID", IntegerType(), False),             
    StructField("SalesOrderDetailID", IntegerType(), False),
    StructField("OrderQty", IntegerType(), False),
    StructField("ProductID", IntegerType(), False),
    StructField("UnitPrice", DecimalType(19,4), False),
    StructField("UnitPriceDiscount", DecimalType(19,4), False),
    StructField("LineTotal", DecimalType(38,6), False),
    StructField("processed_timestamp", TimestampType(), False),
                            ])

In [0]:
#Transforming DF

gold_df = gold_clean_f_orderdetail(df)

In [0]:
#Comparing lenghts

compare_lengths(df, gold_df)

In [0]:
#Checking the schema 
_validate_schema(gold_df, expected_schema)

**IMPORTANT: Please note that this is a simulated project; the upsert operation will be executed within this notebook. In a production environment, a dedicated notebook containing only the function and validations would be developed. All function notebooks would be orchestrated by Azure Data Factory (ADF) pipelines or Azure Databricks (ADB) workflows. The method of upsert may vary based on the utilization of auto loader, streaming, or Change Data Feed (CDF).**


In [0]:
#Loading into the Gold Layer 
# 
#  
target_table= "f_orderdetail"   

schema = "star_schema"

catalog = "adlscompany_gold"

primary_keys = ["SalesOrderDetailID"]


upsert_table(gold_df, target_table, primary_keys, schema, catalog )