## Create Dimension and Fact tables (Gold layer)

The Python code imports the **dlt** library and the **col** and **expr** functions from the **pyspark.sql.functions** module.

The **dlt** library is used to read data from files and define tables.

In [None]:
import dlt
from pyspark.sql.functions import col, expr

The below code defines two dimension tables, **dimension1** and **dimension2** using the @dlt.table decorator. The decorator takes an optional argument, which is a dictionary representing the schema of the table. The **dimension1** and **dimension2** tables use the schema argument to define their schemas.

The **dimension1** and **dimension2** tables are defined by reading the cleansed data from files using the **dlt.read** function.

In [None]:
@dlt.table (
    {Dimension table 1 Schema}
) 
def dimemsion1():
    return dlt.read("ToBeDimension1_cleansed")

@dlt.table (
    {Dimension table 2 Schema}
) 
def dimemsion2():
    return dlt.read("ToBeDimension2_cleansed")

 The **fact_table** is defined by joining the **dimension1**, **dimension2**, and cleansed data from a file using the **join** function and selecting necessary columns using the **select** function. The **col** and **expr** functions are used to create new columns by casting a column to a date type and multiplying two columns, respectively. **fact_table** infers the schema from the joined tables.

In [None]:
@dlt.table
def fact_table():
    s = dlt.read("dimension1").alias("s")
    p = dlt.read("dimension2").alias("p")
    c = dlt.read("ToBeFact_cleansed").alias("c")
    return (
        s.join(p, s.key_column == p.key_column, "inner")
        .join(c, s.key_column == c.key_column, "inner")
        .select(
            "s.column1",
            "c.column2",
            "p.column3",
            col("s.old_column_name").cast("date").alias("new_column_name"),
            "s.column4",
            "s.column5",
            expr("s.column4 * s.column5").alias("new_column_name"),
        )
    )
     
     