# Transactions Transformation - Silver Layer

## Summary
- Purpose: Clean and transform transaction records from the Bronze layer into Silver-level normalized transactions ready for downstream joins and aggregation.
- Inputs: `capstone.bronze.transactions` Delta table (ingested raw records)
- Outputs: `capstone.silver.transactions` Delta table (cleaned, typed)
- Audit: Calls `audit_log(spark, table_name, log_path)` after write to record operation metadata.

## Key Transformations
- Filter corrupt records
- Normalize timestamps to UTC
- Clean currency and numeric columns
- Type casting for `price` and `quantity`

## Usage
- Run in Databricks; ensure Bronze tables exist and `log_path` is configured.


In [None]:
dbutils.widgets.text("catalog", "capstone", "Enter the Catalog: ")

In [None]:
from pyspark.sql.functions import col
from capstone_pipeline.main import transform_clean_timestamp, transform_clean_currency, \
    fiter_corrupt_records, transform_clean_digits, audit_log


table_name = f'{dbutils.widgets.get("catalog")}.silver.transactions'
log_path = f'/Volumes/{dbutils.widgets.get("catalog")}/_meta/history'

In [None]:
dftransactions = spark.table(f'{dbutils.widgets.get("catalog")}.bronze.transactions')
display(dftransactions.columns)

# "order_id","item_id","quantity","price","order_timestamp","corrupted_flag","_ingest_timestamp","_source_file_name"

In [None]:
dftransactions_cleaned = (dftransactions
            .drop("_rescue")
            .transform(fiter_corrupt_records, "corrupted_flag")
            .transform(transform_clean_timestamp, "order_timestamp")
            .transform(transform_clean_currency, "price")
            .transform(transform_clean_digits, "quantity")
            .withColumn(
                    "price",
                    col("price").cast("decimal(10,2)")
            )
            .withColumn(
                    "quantity",
                    col("quantity").cast("int")
            )
            .select("order_id","item_id","quantity","price","order_timestamp","_ingest_timestamp","_source_file_name"))


In [None]:
(dftransactions_cleaned
    .write
    .format("delta")
    .mode("overwrite")
    .saveAsTable(table_name))


In [None]:
audit_log(spark, table_name, log_path)