## Transformations (Silver layer)

The silver layer is an intermediate layer where data is refined, cleansed, and transformed for analysis.
We are loading and transform data into respective Delta tables (`dim1_cleansed`, `dim2_cleansed`, and `fact_cleansed`).


In [None]:
new_schema = {Schema} #If you want to structure your data differently for the silver layer, you can specify the new schema in the "Schema" place-holder.

In [None]:
from pyspark.sql.functions import * # used for data manipulation and transformation in Spark SQL

In [None]:
df1 = ( spark.read.table("Dim1_raw") # spark.read() function is used to read data from a source i.e. bronze layer.
        .select(*) # selects all columns from the DataFrame read from the source.
        .withColumnRenamed('OldColumnName', 'New_Column_Name') #changing column name
        .withColumn("column_name", regexp_replace("column_name", "string_value", "new_string_value")) #replace part of a string with another string
        .withColumn("datetime_comun", from_unixtime("formatted_datettime_column")) #changes date-time column to unix date-time format
        .withColumn("column_name", from_json(col("column_name"), new_schema)) #changing the schema of a column in json 
        .withColumn("column_name", explode("column_name")) #exploding the array to get the individual rows
        {Tranformations} # This place-holder represents additional transformations that you want to make with respect to the data.
        )
##create a table from the DataFrame
df1.write.format("delta").saveAsTable("Dim1_cleansed")
    


In [None]:
df2 = ( spark.read.table("Dim2_raw") # spark.read() function is used to read data from a source i.e. bronze layer.
        .select(*) # selects all columns from the DataFrame read from the source.
        .withColumnRenamed('OldColumnName', 'New_Column_Name') #changing column name
        .withColumn("datetime_comun", from_unixtime("formatted_datettime_column")) #changes date-time column to unix date-time format 
        .withColumn("column_name", regexp_replace("column_name", "string_value", "new_string_value")) #replace part of a string with another string 
        .withColumn("column_name", from_json(col("column_name"), new_schema)) #changing the schema of a column in json 
        .withColumn("column_name", explode("column_name")) #exploding the array to get the individual rows
        {Tranformations} # This place-holder represents additional transformations that you want to make with respect to the data.
        )
##create a table from the DataFrame
df2.write.format("delta").saveAsTable("Dim2_cleansed")

In [None]:
 df3 = ( spark.read.table("Fact_raw") # spark.read() function is used to read data from a source i.e. bronze layer.
        .select(*) # selects all columns from the DataFrame read from the source.
        .withColumnRenamed('OldColumnName', 'New_Column_Name') #changing column name
        .withColumn("datetime_comun", from_unixtime("formatted_datettime_column")) #changes date-time column to unix date-time format 
        .withColumn("column_name", regexp_replace("column_name", "string_value", "new_string_value")) #replace part of a string with another string 
        .withColumn("column_name", from_json(col("column_name"), new_schema)) #changing the schema of a column in json 
        .withColumn("column_name", explode("column_name")) #exploding the array to get the individual rows
        {Tranformations} # This place-holder represents additional transformations that you want to make with respect to the data.
        )
##create a table from the DataFrame
df2.write.format("delta").saveAsTable("Fact_cleansed")

> Note: - All the transformations you make on a table need to match with the new schema that you have declared at the beginning.