The gold layer is the final layer in the data pipeline, where the refined and transformed data resides for business intelligence and analytics purposes.
1. The code creates a Delta table called `dim1` by reading the streaming data from the `data2_cleansed` table. It adds an identity column for surrogate key and defines the table properties, schema, and other configurations.
2. Similarly, the code creates another Delta table called `dim2` by reading the streaming data from the `data3_cleansed` table. It also adds an identity column for surrogate key and specifies the table properties, schema, and configurations.
3. The code then creates a fact table by joining the `data1_cleansed` table with the `dim1` and `dim2` tables. It performs inner joins on the `key_column` between the streams and selects specific columns from each table. It also computes a new column using expressions and transformations.
4. The resulting joined data is selected and transformed, and the fact table is created with the specified table properties, comment, and configurations.

Overall, the code creates dimension tables (`dim1` and `dim2`) and a fact table by joining the dimensions with the streaming data from `data1_cleansed`. The resulting tables are stored as Delta tables with the specified properties and configurations.

In [None]:
# Create dimension1 table from Delta Live Table data2_cleansed by adding identity column for surrogate key 
@dlt.table(
    schema={gold_layer_schema},
    comment="Load data to dimension1 table",
    table_properties={"quality": "gold", "pipelines.reset.allowed": "true"},
    spark_conf={"pipelines.trigger.interval": "60 seconds"},
    temporary=False,
)
def dim1():
    return dlt.read_stream("data2_cleansed")

# Create dimension2 table by adding identity column for surrogate key 
@dlt.table(
    schema={gold_layer_schema},
    comment="Load data to dimension2 table",
    table_properties={"quality": "gold", "pipelines.reset.allowed": "true"},
    spark_conf={"pipelines.trigger.interval": "60 seconds"},
    temporary=False,
)
def dim2():
    return dlt.read_stream("data3_cleansed")

# Create Fact Table by joining dimension tables with the third table
@dlt.table(
    comment="load data to fact table",
    table_properties={"quality": "gold", "pipelines.reset.allowed": "true"},
    spark_conf={"pipelines.trigger.interval": "60 seconds"},
    temporary=False,
)
def fact_table():
    s = dlt.read_stream("data1_cleansed").alias("s")
    p = dlt.read_stream("dim1").alias("p")
    c = dlt.read_stream("dim2").alias("c")
    return ( 
        # inner joins on the key_column between the streams s (representing data1_cleansed), p (representing dim1), and c (representing dim2)
        s.join(p, s.key_column == p.key_column, "inner")
        .join(c, s.key_column == c.key_column, "inner")
        # computes a new column using the select() function and various transformations such as col(), alias(), and expr().
        .select(
            "s.column1",
            "c.column2",
            "p.column3",
            col("s.old_column_name").cast("date").alias("new_column_name"),
            "s.column4",
            "s.column5",
            expr("s.column4 * s.column5").alias("new_column_name"),
        )
    )
