# Gold Layer
The gold layer is the final layer in the data pipeline, where the refined and transformed data resides for business intelligence and analytics purposes.
1. The code creates a Delta table called `dim1` by reading the streaming data from the `dim1_cleansed` table. It adds an identity column for surrogate key and defines the table properties, schema, and other configurations.
2. Similarly, the code creates another Delta table called `dim2` by reading the streaming data from the `dim2_cleansed` table. It also adds an identity column for surrogate key and specifies the table properties, schema, and configurations.
3. The code then creates a fact table by joining the `fact_cleansed` table with the `dim1` and `dim2` tables. It performs inner joins on the `key_column` between the streams and selects specific columns from each table. It also computes a new column using expressions and transformations.
4. The resulting joined data is selected and transformed, and the fact table is created with the specified table properties, comment, and configurations.

Overall, the code creates dimension tables (`dim1` and `dim2`) and a fact table by joining the dimensions with the streaming data from `fact_cleansed`. The resulting tables are stored as Delta tables with the specified properties and configurations.

In [None]:
schema={gold_layer_schema}, # specifies the schema of the table

# read data from streaming source dim1_cleansed
dim1 = spark.readStream.format("delta")\
  .option("maxFilesPerTrigger",5)\ # maxFilesPerTrigger specifies the maximum number of new files to be considered in every trigger,default value is 1000
  .option("ignoreChanges","true")\
  .load("{Delta table path for dim1_cleansed}")

# read data from streaming source dim2_cleansed
dim2 = spark.readStream.format("delta")\
  .option("maxFilesPerTrigger",5)\ # maxFilesPerTrigger specifies the maximum number of new files to be considered in every trigger,default value is 1000
  .option("ignoreChanges","true")\
  .load("{Delta table path for dim2_cleansed}")

# read data from streaming source fact_cleansed
fact_cleansed = spark.readStream.format("delta")\
  .option("maxFilesPerTrigger",5)\ # maxFilesPerTrigger specifies the maximum number of new files to be considered in every trigger,default value is 1000
  .option("ignoreChanges","true")\
  .load("{Delta table path for fact_cleansed}")

########Creating fact_data frame after the joins and transforamtion##########

fact_df = fact_cleansed.join(dim1, fact_cleansed.key_column==dim1.key_column,"inner")
                       .join(dim2, fact_cleansed.key_column==dim2.key_column,"inner") 
           # computes a new column using the select() function and various transformations such as col(), alias(), and expr().
        .select(
            "fact_cleansed.column1",
            "dim1.column2",
            "dim2.column3",
            col("fact_cleansed.old_column_name").cast("date").alias("new_column_name"),
            "fact_cleansed.column4",
            "fact_cleansed.column5",
            expr("fact_cleansed.column4 * fact_cleansed.column5").alias("new_column_name"),
        )
    
# Write into delta table (/data/delta/fact) ,creating a silver delta table from fact_df DataFrame
fact_df.writeStream.format("delta") \
   .outputMode("append") \
   .option("mergeSchema", "true") \
   .trigger("processing=30 seconds") \
   .option("checkpointLocation", "</data/delta/fact_checkpoint_path>") \ ##A checkpoint directory/location is required to track the streaming updates. If not specified , a default checkpoint directory is created at /local_disk0/tmp/.
   .table("fact")

###########End of File ##################