### SILVER LAYER LOAD


**Purpose:**  
This notebook performs the **Silver layer transformation**, converting raw Bronze-level farm datasets into standardized, clean, and ready-to-use Silver tables. These tables serve as the foundation for Gold layer aggregation, AI insights, and downstream analytics.


**Workflow:**  
1. **Load Bronze tables** (soil, crop, market, pest, rainfall) from Delta Lake.  
2. **Standardize column names** to camelCase for consistency.  
3. **Clean string fields** (trim whitespace, lowercase city and crop names).  
4. **Standardize date columns** to proper date format.  
5. **Unify city and crop names** across all tables for proper joins.  
6. **Write cleaned Silver tables** to Delta Lake (overwrite mode).  
7. **Audit logging**:  
   - Record start/end timestamps, task name, status (SUCCESS/FAILED), and messages in a Delta audit table.

**Notes:**  
- Update `catalog_name`, `schema_name_bronze`, `schema_name`, `audit_schema_name`, and `workflow_job_id` to match your environment.  
- If any Bronze table is missing, the notebook will log a FAILURE in the audit table.  
- These Silver tables feed directly into **Gold layer aggregation** for advanced analytics and AI farm advisory.

In [0]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, trim, lower, to_date, current_timestamp
import traceback
import time

In [0]:
%sql
CREATE SCHEMA IF NOT EXISTS databricks_free_edition.databricks_silver;

In [0]:
catalog_name = "databricks_free_edition"
schema_name = "databricks_silver"
schema_name_bronze = "databricks_bronze"
audit_schema_name = "audit_logs"
workflow_job_id = dbutils.widgets.get("workflow_job_id")

In [0]:
def write_audit(workflow_job_id,task, status, start_ts, end_ts, message=""):
    rows = [(workflow_job_id,task, status, start_ts, end_ts, message)]
    schema = "workflow_job_id STRING,task STRING, status STRING, start_time TIMESTAMP, end_time TIMESTAMP, message STRING"
    spark.createDataFrame(rows, schema=schema).write.format("delta").mode("append").saveAsTable(
        f"{catalog_name}.{audit_schema_name}.error_reporting_audit"
    )

task_name = "silver_transform"
start_ts = spark.sql("SELECT current_timestamp()").collect()[0][0]

try:
    print(f"Starting {task_name} ...")

    # Load Bronze (if missing, this will throw and be captured)
    bronze_soil = spark.table(f"{catalog_name}.{schema_name_bronze}.bronze_soil")
    bronze_crop = spark.table(f"{catalog_name}.{schema_name_bronze}.bronze_crop")
    bronze_market = spark.table(f"{catalog_name}.{schema_name_bronze}.bronze_market")
    bronze_pest = spark.table(f"{catalog_name}.{schema_name_bronze}.bronze_pest")
    bronze_rainfall = spark.table(f"{catalog_name}.{schema_name_bronze}.bronze_rainfall")


    # Helper: camelCase renamer
    def to_camel_case(col_name):
        parts = col_name.replace(" ", "_").split("_")
        return parts[0].lower() + "".join(p.title() for p in parts[1:])

    # Generic cleaning function
    def clean_df(df):
        # rename columns to camelCase
        for c in df.columns:
            df = df.withColumnRenamed(c, to_camel_case(c))
        # trim string columns and lowercase city/crop names
        string_cols = [f.name for f in df.schema.fields if "string" in f.dataType.simpleString()]
        for sc in string_cols:
            df = df.withColumn(sc, trim(col(sc)))
        # standardize date column if present
        if "date" in df.columns:
            df = df.withColumn("date", to_date(col("date")))
        return df

    soil_silver = clean_df(bronze_soil)
    crop_silver = clean_df(bronze_crop)
    market_silver = clean_df(bronze_market)
    pest_silver = clean_df(bronze_pest)
    bronze_rainfall = clean_df(bronze_rainfall)

    # Additional standardizations: unify city/crop names lowercase for joins
    def std_names(df):
        if "city" in df.columns:
            df = df.withColumn("city", lower(col("city")))
        if "cropName" in df.columns:
            df = df.withColumn("cropName", lower(col("cropName")))
        return df

    soil_silver = std_names(soil_silver)
    crop_silver = std_names(crop_silver)
    market_silver = std_names(market_silver)
    pest_silver = std_names(pest_silver)
    bronze_rainfall = std_names(bronze_rainfall)

    # Write Silver tables (overwrite)
    soil_silver.write.format("delta").mode("overwrite").saveAsTable(f"{catalog_name}.{schema_name}.silver_soil")
    crop_silver.write.format("delta").mode("overwrite").saveAsTable(f"{catalog_name}.{schema_name}.silver_crop")
    market_silver.write.format("delta").mode("overwrite").saveAsTable(f"{catalog_name}.{schema_name}.silver_market")
    pest_silver.write.format("delta").mode("overwrite").saveAsTable(f"{catalog_name}.{schema_name}.silver_pest")
    bronze_rainfall.write.format("delta").mode("overwrite").saveAsTable(f"{catalog_name}.{schema_name}.silver_rainfall")


    end_ts = spark.sql("SELECT current_timestamp()").collect()[0][0]
    write_audit(workflow_job_id,task_name, "SUCCESS", start_ts, end_ts, "Silver tables created")
    print("Silver transform completed successfully.")
except Exception as e:
    end_ts = spark.sql("SELECT current_timestamp()").collect()[0][0]
    tb = traceback.format_exc()
    write_audit(workflow_job_id,task_name, "FAILED", start_ts, end_ts, str(tb)[:4000])
    print("Silver transform failed. See audit table for details.")
    raise
