In [0]:
# 🧪 Test: write_df_to_delta Utility

This notebook tests the functionality of the `write_df_to_delta()` function from the utils module.

---

## 📦 1. Load Utility

```python
%run ./write_utils
```

---

## 📄 2. Read Test Data (finance_invoice_data.csv)

```python
from pyspark.sql.functions import input_file_name, lit

df = (
    spark.read
        .option("header", "true")
        .option("inferSchema", "true")
        .csv("/mnt/raw-ingest/finance_invoice_data.csv")
        .withColumn("source_file", input_file_name())
        .withColumn("ingestion_type", lit("finance_invoices"))
)

df.show(5)
```

---

## ✅ 3. Basic Write Test

```python
write_df_to_delta(
    df,
    path="/mnt/delta/bronze/test_finance_invoices_basic",
    mode="overwrite",
    merge_schema=True,
    register_table=True,
    verbose=True
)
```

---

## ⚠️ 4. Schema Mismatch (merge_schema=False)

```python
df_bad = df.withColumn("extra_column", lit("test"))

try:
    write_df_to_delta(
        df_bad,
        path="/mnt/delta/bronze/test_finance_invoices_basic",
        mode="append",
        merge_schema=False,
        register_table=False,
        verbose=True
    )
except Exception as e:
    print("Expected schema mismatch error:")
    print(e)
```

---

## 🧪 5. Dry Run Test

```python
write_df_to_delta(
    df,
    path="/mnt/delta/bronze/test_finance_invoices_dryrun",
    dry_run=True,
    verbose=True
)
```

---

## 🗂️ 6. Partitioned Write

```python
df_partitioned = df.withColumn("vendor_partition", lit("vendor_test"))

write_df_to_delta(
    df_partitioned,
    path="/mnt/delta/bronze/test_finance_invoices_partitioned",
    partition_by=["vendor_partition"],
    register_table=True,
    verbose=True
)
```