## ✅ List of Operations Performed

1. Load data from the source table `workspace.de_practice_source.sales` using Spark. — [Supporting Document (Azure - `read.table`)](https://learn.microsoft.com/en-us/azure/databricks/data/tables#read-tables)
2. Display the loaded DataFrame. — 


In [0]:
#Load Data From Source
source = spark.read.table('workspace.de_practice_source.sales')
source.display()

- Primary Key: fransiseid
- Hash Key: [create hash key]
- Created Date: [add created date]
- Modified Date: [add modified date]
- indcurrent record: 0 (not active), 1 (active)

## ✅ List of Operations Performed

1. Load data from the source DataFrame.
2. Concatenate all columns into a single column named `ConCatValue` using no delimiter. — [Supporting Document (Azure - `concat_ws` Function)](https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/concat_ws)



In [0]:
from pyspark.sql import functions as F

# Load Data From Source and concatenate all columns into 'ConCatValue'
source = source.withColumn('ConCatValue', F.concat_ws('', *source.columns))
display(source)

## ✅ List of Operations Performed

1. Add a new column `IndCurrent` with a constant value of `1`. — [Supporting Document ( `lit` Function)](https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.functions.lit.html)
2. Add a new column `CreatedDate` with the current timestamp. — [Supporting Document (Azure - `current_timestamp`)](https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/current_timestamp)
3. Add a new column `ModifiedDate` with the current timestamp. — [Supporting Document (Azure - `current_timestamp`)](https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/current_timestamp)


In [0]:
# Add IndCurrent, CreatedDate, and ModifiedDate columns
source = source.withColumn("IndCurrent", F.lit(1)) \
    .withColumn("CreatedDate", F.current_timestamp()) \
    .withColumn("ModifiedDate", F.current_timestamp())

In [0]:
from pyspark.sql.window import Window

window_spec = Window.orderBy(F.monotonically_increasing_id())
source = source.withColumn("storage_id", F.row_number().over(window_spec))

first_cols = ["storage_id"]
other_cols = [col for col in source.columns if col not in first_cols]
source = source.select(first_cols + other_cols)

display(source)

## ✅ List of Operations Performed

1. Generate SHA-256 hash from the `ConCatValue` column. — [Supporting Document (Azure)](https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/sha2)
2. Create a new column named `RowHash` to store the hash. — [Supporting Document (Azure)](https://learn.microsoft.com/en-us/azure/databricks/pyspark/basics#create-columns)
3. Drop the intermediate column `ConCatValue`. — [Supporting Document (Azure)](https://learn.microsoft.com/en-us/azure/databricks/pyspark/basics#remove-columns)

In [0]:
# Generate SHA-256 hash of concatenated column values and drop 'ConCatValue'
source = source.withColumn("RowHash", F.sha2(F.col("ConCatValue"), 256)).drop('ConCatValue')
display(source)


## ✅ List of Operations Performed

1. Write data from the source DataFrame to the target table `workspace.de_practice_target.target` using append mode. — [Supporting Document (Azure - `saveAsTable`)](https://learn.microsoft.com/en-us/azure/databricks/data/tables#write-to-tables)


In [0]:
#writing to the target schema  
source.write.mode("append").saveAsTable("workspace.de_practice_target.sales")