# Databricks notebook source
# MAGIC %md
# MAGIC # 📘 ADF Ingestion to Silver Pipeline
# MAGIC **Author:** Bruce Jenks  
# MAGIC **Last Updated:** July 7, 2025  
# MAGIC  
# MAGIC This document outlines the process of using **Azure Data Factory (ADF)** to trigger a data pipeline that moves data into **Azure Data Lake**, followed by transformation and promotion into the **Silver layer in Delta Lake** via Databricks.  
# MAGIC  
# MAGIC ---
# MAGIC 
# MAGIC ## 🔁 Overview of the ADF to Databricks Flow
# MAGIC 
# MAGIC 1. **ADF Pipeline Execution**
# MAGIC     - Triggers when a file lands in blob storage (raw or external container).
# MAGIC     - Moves or transforms the file and places it in the **`adf-silver`** container.
# MAGIC 
# MAGIC 2. **Databricks Pipeline**
# MAGIC     - Reads the file from `/mnt/adf-silver/Your_Folder`
# MAGIC     - Converts it from Parquet (or CSV) to **Delta format**
# MAGIC     - Writes to `/mnt/delta/silver/your_table_name`
# MAGIC     - Optionally registers the table to Unity Catalog/Hive Metastore
# MAGIC 
# MAGIC ---

# COMMAND ----------

# MAGIC %md
# MAGIC ## 📥 ADF Setup Notes
# MAGIC - **Linked Services:** Azure Blob Storage & Databricks
# MAGIC - **Dataset Format:** Parquet or CSV from ADF
# MAGIC - **Sink Path:** `https://datalakelv426.blob.core.windows.net/adf-silver/<your-folder>`
# MAGIC - **Debugging:** Use `Display Output` in ADF to confirm destination path.
# MAGIC 
# MAGIC ✅ **Tip:** Use `"@dataset().path"` + `"@utcnow()"` to dynamically name your output files if needed.

# COMMAND ----------

# MAGIC %md
# MAGIC ## 📂 Mount Check (optional)
# MAGIC Confirm that `/mnt/adf-silver` is mounted and accessible:

# COMMAND ----------

display(dbutils.fs.ls("/mnt/adf-silver"))

# COMMAND ----------

# MAGIC %md
# MAGIC ## 🔄 Sample Code to Promote to Silver Layer
# MAGIC This example reads the file from ADF output and converts it to Delta:

# COMMAND ----------

from pyspark.sql import SparkSession
from write_utils import write_df_to_delta  # already in repo

spark = SparkSession.builder.getOrCreate()

input_path = "/mnt/adf-silver/Vendor_Registry_Silver"  # From ADF
output_path = "/mnt/delta/silver/vendor_registry_silver"  # Delta Silver

df = spark.read.format("parquet").load(input_path)

write_df_to_delta(
    df,
    path=output_path,
    mode="overwrite",
    merge_schema=True,
    register_table=True,
    verbose=True
)

# COMMAND ----------

# MAGIC %md
# MAGIC ## 🧪 Validation
# MAGIC After loading to Delta, validate the table exists:
# MAGIC 
# MAGIC ```sql
# MAGIC SELECT * FROM vendor_registry_silver LIMIT 10
# MAGIC ```
# MAGIC 
# MAGIC Or confirm with Python:
# MAGIC 
# MAGIC ```python
# MAGIC spark.sql("SELECT COUNT(*) FROM vendor_registry_silver").show()
# MAGIC ```

# COMMAND ----------

# MAGIC %md
# MAGIC ## 📝 Notes and Gotchas
# MAGIC - If the mount isn't working, ensure `secret_scope_setup` or `azure_key_vault_setup` is completed.
# MAGIC - Confirm the ADF container and folder name match exactly.
# MAGIC - If you get `PATH_NOT_FOUND`, double-check your mount and blob location in Azure Portal.
# MAGIC - You can always explore blobs using:
# MAGIC   ```python
# MAGIC   display(dbutils.fs.ls("/mnt/adf-silver"))
# MAGIC   ```

# COMMAND ----------

# MAGIC %md
# MAGIC ## 📌 Summary
# MAGIC - ADF loads data into blob storage under `adf-silver`
# MAGIC - Databricks reads and transforms it to Delta format
# MAGIC - Silver layer stored at `/mnt/delta/silver/...`
# MAGIC - Registered to metastore for SQL-based exploration and BI access
