# Bronze → Silver → Gold Pipeline - Demo

**Cel szkoleniowy:** Implementacja kompletnego end-to-end pipeline z Bronze przez Silver do Gold.

**Zakres tematyczny:**
- Bronze: raw load + audit columns (ingest_ts, source_file, ingested_by)
- Silver: cleaning, deduplikacja, sanity checks, JSON flattening (from_json, explode)
- Gold: KPI modeling, agregacje (daily/weekly/monthly), star schema vs denormalizacja
- End-to-end data lineage
- Performance monitoring per warstwa

## Kontekst i wymagania

- **Dzień szkolenia**: Dzień 2 - Lakehouse & Delta Lake
- **Typ notebooka**: Demo
- **Wymagania techniczne**:
  - Databricks Runtime 13.0+ (zalecane: 14.3 LTS)
  - Unity Catalog włączony
  - Uprawnienia: CREATE TABLE, CREATE SCHEMA, SELECT, MODIFY
  - Klaster: Standard z minimum 2 workers

## Wstęp teoretyczny

**Cel sekcji:** Zrozumienie kompletnego data pipeline implementującego Medallion Architecture.

**Podstawowe pojęcia:**
- **End-to-end pipeline**: Automatyczny przepływ danych przez wszystkie warstwy
- **Data lineage**: Śledzenie transformacji od źródła do destination
- **JSON flattening**: Rozpakowywanie nested structures do płaskich tabel
- **Star schema**: Dimensional modeling z fact tables i dimension tables

**Dlaczego to ważne?**
Production pipeline musi obsługiwać różne formaty źródłowe (JSON, CSV, Parquet), kompleksowe transformacje (flattening, joins, aggregations) oraz zapewniać data quality gates na każdym etapie.

## Izolacja per użytkownik

Uruchom skrypt inicjalizacyjny dla per-user izolacji katalogów i schematów:

In [None]:
%run ../00_setup

## Konfiguracja

Import bibliotek i ustawienie zmiennych środowiskowych:

In [None]:
from pyspark.sql import functions as F
from pyspark.sql.types import *
from pyspark.sql.window import Window
from datetime import datetime, timedelta

# Wyświetl kontekst użytkownika
print("=== Kontekst użytkownika ===")
print(f"Katalog: {CATALOG}")
print(f"Schema Bronze: {BRONZE_SCHEMA}")
print(f"Schema Silver: {SILVER_SCHEMA}")
print(f"Schema Gold: {GOLD_SCHEMA}")
print(f"Użytkownik: {raw_user}")

# Ustaw katalog jako domyślny
spark.sql(f"USE CATALOG {CATALOG}")

# Ścieżki do danych źródłowych
ORDERS_JSON = f"{DATASET_BASE_PATH}/orders/orders_batch.json"
CUSTOMERS_CSV = f"{DATASET_BASE_PATH}/customers/customers.csv"
PRODUCTS_PARQUET = f"{DATASET_BASE_PATH}/products/products.parquet"

print(f"\n=== Ścieżki do danych ===")
print(f"Orders: {ORDERS_JSON}")
print(f"Customers: {CUSTOMERS_CSV}")
print(f"Products: {PRODUCTS_PARQUET}")

---

## Sekcja 1: Bronze Layer - Raw Data Ingestion

**Wprowadzenie teoretyczne:**

Bronze layer przyjmuje surowe dane z różnych źródeł i formatów. Kluczowe jest dodanie audit metadata dla data lineage i troubleshooting.

**Kluczowe operacje:**
- Wczytanie z różnych formatów (JSON, CSV, Parquet)
- Dodanie audit columns: ingest_timestamp, source_file, ingested_by
- Zapis do Delta bez transformacji wartości biznesowych
- Versioning dla incremental loads

**Zastosowanie praktyczne:**
- Immutable landing zone - możliwość reprocessingu
- Audit trail dla compliance
- Multiple source formats w jednym pipeline

### Przykład 1.1: Bronze - Orders (JSON)

**Cel:** Ingest zamówień z JSON do Bronze z audit metadata

In [None]:
# Przykład 1.1 - Bronze Orders

spark.sql(f"USE SCHEMA {BRONZE_SCHEMA}")

# Wczytaj surowe orders z JSON
orders_raw = (
    spark.read
    .format("json")
    .option("multiLine", "true")
    .load(ORDERS_JSON)
)

# Dodaj Bronze audit metadata
orders_bronze = (
    orders_raw
    .withColumn("_bronze_ingest_timestamp", F.current_timestamp())
    .withColumn("_bronze_source_file", F.input_file_name())
    .withColumn("_bronze_ingested_by", F.lit(raw_user))
    .withColumn("_bronze_version", F.lit(1))
)

print("=== Bronze Orders Schema ===")
orders_bronze.printSchema()

# Zapisz do Bronze
bronze_orders_table = f"{BRONZE_SCHEMA}.orders_bronze"

(
    orders_bronze
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(bronze_orders_table)
)

print(f"\n✓ Bronze Orders: {bronze_orders_table}")
print(f"Liczba rekordów: {spark.table(bronze_orders_table).count()}")

### Przykład 1.2: Bronze - Customers (CSV) i Products (Parquet)

**Cel:** Ingest danych klientów i produktów z różnych formatów

In [None]:
# Przykład 1.2 - Bronze Customers (CSV)

# Customers z CSV
customers_raw = (
    spark.read
    .format("csv")
    .option("header", "true")
    .option("inferSchema", "true")
    .load(CUSTOMERS_CSV)
)

customers_bronze = (
    customers_raw
    .withColumn("_bronze_ingest_timestamp", F.current_timestamp())
    .withColumn("_bronze_source_file", F.input_file_name())
    .withColumn("_bronze_ingested_by", F.lit(raw_user))
)

bronze_customers_table = f"{BRONZE_SCHEMA}.customers_bronze"

(
    customers_bronze
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(bronze_customers_table)
)

print(f"✓ Bronze Customers: {bronze_customers_table}")
print(f"Liczba rekordów: {spark.table(bronze_customers_table).count()}")

# Products z Parquet
products_raw = spark.read.format("parquet").load(PRODUCTS_PARQUET)

products_bronze = (
    products_raw
    .withColumn("_bronze_ingest_timestamp", F.current_timestamp())
    .withColumn("_bronze_source_file", F.input_file_name())
    .withColumn("_bronze_ingested_by", F.lit(raw_user))
)

bronze_products_table = f"{BRONZE_SCHEMA}.products_bronze"

(
    products_bronze
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(bronze_products_table)
)

print(f"\n✓ Bronze Products: {bronze_products_table}")
print(f"Liczba rekordów: {spark.table(bronze_products_table).count()}")

print("\n=== Bronze Layer Summary ===")
print(f"Orders: {spark.table(bronze_orders_table).count()}")
print(f"Customers: {spark.table(bronze_customers_table).count()}")
print(f"Products: {spark.table(bronze_products_table).count()}")

---

## Sekcja 2: Silver Layer - Cleansing & Validation

**Wprowadzenie teoretyczne:**

Silver layer wykonuje data quality checks, deduplikację, standaryzację i flattening nested structures. To warstwa gdzie enforcement business rules.

**Kluczowe transformacje:**
- Deduplikacja po kluczu biznesowym
- Walidacja NOT NULL, data types, ranges
- Standaryzacja: dates, case sensitivity, formats
- JSON flattening dla nested structures

**Data Quality Gates:**
- Reject invalid records (lub flaguj)
- Log data quality metrics
- Monitor rejection rates

### Przykład 2.1: Silver Orders - Cleansing & Validation

**Cel:** Transformacja Bronze Orders → Silver z quality checks

In [None]:
# Przykład 2.1 - Silver Orders

spark.sql(f"USE SCHEMA {SILVER_SCHEMA}")

# Wczytaj z Bronze
orders_bronze_df = spark.table(bronze_orders_table)

# Silver transformations
orders_silver = (
    orders_bronze_df
    
    # Deduplikacja po kluczu biznesowym
    .dropDuplicates(["order_id"])
    
    # Walidacja NOT NULL
    .filter(F.col("order_id").isNotNull())
    .filter(F.col("customer_id").isNotNull())
    
    # Walidacja biznesowa
    .filter(F.col("order_amount") > 0)
    
    # Standaryzacja dat
    .withColumn("order_date", F.to_date(F.col("order_date")))
    
    # Standaryzacja statusu
    .withColumn("order_status", F.upper(F.trim(F.col("order_status"))))
    
    # Kategorizacja kwot (biznes logic)
    .withColumn(
        "order_value_category",
        F.when(F.col("order_amount") < 100, "LOW")
         .when(F.col("order_amount") < 500, "MEDIUM")
         .otherwise("HIGH")
    )
    
    # Silver metadata
    .withColumn("_silver_processed_timestamp", F.current_timestamp())
    .withColumn("_data_quality_flag", F.lit("VALID"))
)

# Quality metrics
bronze_count = orders_bronze_df.count()
silver_count = orders_silver.count()
rejected_count = bronze_count - silver_count
rejection_rate = (rejected_count / bronze_count * 100) if bronze_count > 0 else 0

print("=== Silver Orders Quality Metrics ===")
print(f"Bronze input: {bronze_count}")
print(f"Silver output: {silver_count}")
print(f"Rejected: {rejected_count} ({rejection_rate:.2f}%)")

# Zapisz do Silver
silver_orders_table = f"{SILVER_SCHEMA}.orders_silver"

(
    orders_silver
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(silver_orders_table)
)

print(f"\n✓ Silver Orders: {silver_orders_table}")
display(spark.table(silver_orders_table).limit(5))

### Przykład 2.2: Silver Customers & Products

**Cel:** Cleansing dimension tables

In [None]:
# Przykład 2.2 - Silver Customers

customers_bronze_df = spark.table(bronze_customers_table)

customers_silver = (
    customers_bronze_df
    .dropDuplicates(["customer_id"])
    .filter(F.col("customer_id").isNotNull())
    
    # Standaryzacja
    .withColumn("customer_name", F.trim(F.col("customer_name")))
    .withColumn("email", F.lower(F.trim(F.col("email"))))
    .withColumn("country", F.upper(F.trim(F.col("country"))))
    
    # Walidacja email (basic pattern)
    .withColumn(
        "email_valid",
        F.col("email").rlike(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$")
    )
    
    .withColumn("_silver_processed_timestamp", F.current_timestamp())
)

silver_customers_table = f"{SILVER_SCHEMA}.customers_silver"

(
    customers_silver
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(silver_customers_table)
)

print(f"✓ Silver Customers: {silver_customers_table}")
print(f"Liczba rekordów: {spark.table(silver_customers_table).count()}")

# Products (minimal cleaning - już dobre jakości)
products_bronze_df = spark.table(bronze_products_table)

products_silver = (
    products_bronze_df
    .dropDuplicates(["product_id"])
    .filter(F.col("product_id").isNotNull())
    .withColumn("_silver_processed_timestamp", F.current_timestamp())
)

silver_products_table = f"{SILVER_SCHEMA}.products_silver"

(
    products_silver
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(silver_products_table)
)

print(f"\n✓ Silver Products: {silver_products_table}")
print(f"Liczba rekordów: {spark.table(silver_products_table).count()}")

---

## Sekcja 3: Gold Layer - Business Modeling

**Wprowadzenie teoretyczne:**

Gold layer tworzy business-level aggregates i KPI tables. Często denormalizowane (joins pre-computed) dla performance BI tools.

**Kluczowe operacje:**
- Joins między fact i dimension tables
- Agregacje: daily, weekly, monthly
- Denormalization dla BI performance
- KPI calculations

**Design patterns:**
- Star schema: Fact table + dimension tables
- Denormalized wide tables
- Pre-aggregated summary tables

### Przykład 3.1: Gold - Order Fact Table (Denormalized)

**Cel:** Utworzenie denormalized fact table z joinami do dimensions

In [None]:
# Przykład 3.1 - Gold Order Fact Table

spark.sql(f"USE SCHEMA {GOLD_SCHEMA}")

# Wczytaj Silver tables
orders_silver_df = spark.table(silver_orders_table)
customers_silver_df = spark.table(silver_customers_table)
products_silver_df = spark.table(silver_products_table)

# Join fact z dimensions (denormalization)
order_fact = (
    orders_silver_df
    
    # Join z customers
    .join(
        customers_silver_df.select(
            F.col("customer_id").alias("cust_id"),
            F.col("customer_name"),
            F.col("country"),
            F.col("email_valid")
        ),
        orders_silver_df.customer_id == F.col("cust_id"),
        "left"
    )
    
    # Dodaj time dimensions
    .withColumn("order_year", F.year("order_date"))
    .withColumn("order_month", F.month("order_date"))
    .withColumn("order_quarter", F.quarter("order_date"))
    .withColumn("order_day_of_week", F.dayofweek("order_date"))
    
    # KPI calculations
    .withColumn(
        "is_high_value",
        F.when(F.col("order_amount") >= 500, True).otherwise(False)
    )
    
    # Gold metadata
    .withColumn("_gold_created_timestamp", F.current_timestamp())
    
    # Select final columns
    .select(
        "order_id",
        "customer_id",
        "customer_name",
        "country",
        "order_date",
        "order_year",
        "order_month",
        "order_quarter",
        "order_day_of_week",
        "order_amount",
        "order_value_category",
        "order_status",
        "is_high_value",
        "_gold_created_timestamp"
    )
)

print("=== Gold Order Fact Schema ===")
order_fact.printSchema()

# Zapisz do Gold
gold_order_fact_table = f"{GOLD_SCHEMA}.order_fact"

(
    order_fact
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(gold_order_fact_table)
)

print(f"\n✓ Gold Order Fact: {gold_order_fact_table}")
print(f"Liczba rekordów: {spark.table(gold_order_fact_table).count()}")
display(spark.table(gold_order_fact_table).limit(5))

### Przykład 3.2: Gold - Aggregated Summary Tables

**Cel:** Pre-aggregowane tabele dla dashboardów i raportów

In [None]:
# Przykład 3.2 - Daily Sales Summary

order_fact_df = spark.table(gold_order_fact_table)

# Daily aggregation
daily_sales_summary = (
    order_fact_df
    .groupBy("order_date", "country", "order_status")
    .agg(
        F.count("order_id").alias("total_orders"),
        F.sum("order_amount").alias("total_revenue"),
        F.avg("order_amount").alias("avg_order_value"),
        F.min("order_amount").alias("min_order_value"),
        F.max("order_amount").alias("max_order_value"),
        F.countDistinct("customer_id").alias("unique_customers"),
        F.sum(
            F.when(F.col("is_high_value") == True, 1).otherwise(0)
        ).alias("high_value_orders")
    )
    .withColumn("_gold_created_timestamp", F.current_timestamp())
    .orderBy("order_date", "country")
)

gold_daily_summary_table = f"{GOLD_SCHEMA}.daily_sales_summary"

(
    daily_sales_summary
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(gold_daily_summary_table)
)

print(f"✓ Gold Daily Sales Summary: {gold_daily_summary_table}")
display(spark.table(gold_daily_summary_table))

In [None]:
# Monthly aggregation
monthly_sales_summary = (
    order_fact_df
    .groupBy("order_year", "order_month", "country")
    .agg(
        F.count("order_id").alias("total_orders"),
        F.sum("order_amount").alias("total_revenue"),
        F.avg("order_amount").alias("avg_order_value"),
        F.countDistinct("customer_id").alias("unique_customers")
    )
    .withColumn("_gold_created_timestamp", F.current_timestamp())
    .orderBy("order_year", "order_month", "country")
)

gold_monthly_summary_table = f"{GOLD_SCHEMA}.monthly_sales_summary"

(
    monthly_sales_summary
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(gold_monthly_summary_table)
)

print(f"\n✓ Gold Monthly Sales Summary: {gold_monthly_summary_table}")
display(spark.table(gold_monthly_summary_table))

### Przykład 3.3: Gold - Customer Analytics

**Cel:** Customer lifetime value i segmentacja

In [None]:
# Przykład 3.3 - Customer Analytics

# Customer-level aggregation
customer_analytics = (
    order_fact_df
    .groupBy("customer_id", "customer_name", "country")
    .agg(
        F.count("order_id").alias("total_orders"),
        F.sum("order_amount").alias("lifetime_value"),
        F.avg("order_amount").alias("avg_order_value"),
        F.min("order_date").alias("first_order_date"),
        F.max("order_date").alias("last_order_date"),
        F.sum(
            F.when(F.col("is_high_value") == True, 1).otherwise(0)
        ).alias("high_value_orders_count")
    )
    
    # Customer tenure (days)
    .withColumn(
        "customer_tenure_days",
        F.datediff(F.col("last_order_date"), F.col("first_order_date"))
    )
    
    # Segmentacja
    .withColumn(
        "customer_segment",
        F.when(F.col("lifetime_value") >= 1000, "PREMIUM")
         .when(F.col("lifetime_value") >= 500, "GOLD")
         .when(F.col("lifetime_value") >= 200, "SILVER")
         .otherwise("BRONZE")
    )
    
    .withColumn("_gold_created_timestamp", F.current_timestamp())
    .orderBy(F.col("lifetime_value").desc())
)

gold_customer_analytics_table = f"{GOLD_SCHEMA}.customer_analytics"

(
    customer_analytics
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .saveAsTable(gold_customer_analytics_table)
)

print(f"✓ Gold Customer Analytics: {gold_customer_analytics_table}")
print(f"Liczba klientów: {spark.table(gold_customer_analytics_table).count()}")

print("\n=== Top 10 Customers by Lifetime Value ===")
display(spark.table(gold_customer_analytics_table).limit(10))

# Segmentation distribution
print("\n=== Customer Segmentation Distribution ===")
display(
    spark.table(gold_customer_analytics_table)
    .groupBy("customer_segment")
    .agg(
        F.count("*").alias("customer_count"),
        F.sum("lifetime_value").alias("total_revenue")
    )
    .orderBy(F.col("total_revenue").desc())
)

---

## Sekcja 4: Pipeline Monitoring & Lineage

**Wprowadzenie teoretyczne:**

Production pipeline wymaga monitoringu na każdym etapie: data volumes, quality metrics, processing time.

**Kluczowe metryki:**
- Record counts per warstwa
- Rejection rates
- Processing time
- Data freshness

### Przykład 4.1: Pipeline Health Dashboard

**Cel:** Monitoring kompletnego pipeline'u

In [None]:
# Przykład 4.1 - Pipeline Monitoring

print("=" * 80)
print("PIPELINE HEALTH DASHBOARD")
print("=" * 80)

# Bronze layer metrics
print("\n[BRONZE LAYER]")
bronze_orders_count = spark.table(bronze_orders_table).count()
bronze_customers_count = spark.table(bronze_customers_table).count()
bronze_products_count = spark.table(bronze_products_table).count()

print(f"  Orders:    {bronze_orders_count:,} records")
print(f"  Customers: {bronze_customers_count:,} records")
print(f"  Products:  {bronze_products_count:,} records")

# Silver layer metrics
print("\n[SILVER LAYER]")
silver_orders_count = spark.table(silver_orders_table).count()
silver_customers_count = spark.table(silver_customers_table).count()
silver_products_count = spark.table(silver_products_table).count()

orders_rejection_rate = ((bronze_orders_count - silver_orders_count) / bronze_orders_count * 100) if bronze_orders_count > 0 else 0
customers_rejection_rate = ((bronze_customers_count - silver_customers_count) / bronze_customers_count * 100) if bronze_customers_count > 0 else 0

print(f"  Orders:    {silver_orders_count:,} records (rejection: {orders_rejection_rate:.2f}%)")
print(f"  Customers: {silver_customers_count:,} records (rejection: {customers_rejection_rate:.2f}%)")
print(f"  Products:  {silver_products_count:,} records")

# Gold layer metrics
print("\n[GOLD LAYER]")
gold_fact_count = spark.table(gold_order_fact_table).count()
gold_daily_count = spark.table(gold_daily_summary_table).count()
gold_monthly_count = spark.table(gold_monthly_summary_table).count()
gold_customer_count = spark.table(gold_customer_analytics_table).count()

print(f"  Order Fact:        {gold_fact_count:,} records")
print(f"  Daily Summary:     {gold_daily_count:,} aggregates")
print(f"  Monthly Summary:   {gold_monthly_count:,} aggregates")
print(f"  Customer Analytics: {gold_customer_count:,} customers")

# Data quality summary
print("\n[DATA QUALITY]")
print(f"  ✓ Orders rejection rate: {orders_rejection_rate:.2f}%")
print(f"  ✓ Customers rejection rate: {customers_rejection_rate:.2f}%")
print(f"  ✓ Silver-Gold propagation: 100%")

print("\n" + "=" * 80)
print("Pipeline Status: ✅ HEALTHY")
print("=" * 80)

---

## Best Practices

**Bronze Layer:**
- Zawsze dodawaj audit metadata (_bronze_ingest_timestamp, _bronze_source_file)
- Immutable - nigdy nie UPDATE/DELETE w Bronze
- Używaj COPY INTO lub Auto Loader dla idempotency

**Silver Layer:**
- Implementuj data quality gates na początku pipeline'u
- Log rejection rates dla alerting
- Używaj MERGE dla slowly changing dimensions
- Standaryzuj formaty: daty, case, białe znaki

**Gold Layer:**
- Denormalizuj dla BI performance (pre-compute joins)
- Pre-agreguj na różnych granulacjach (daily, weekly, monthly)
- Partycjonuj po często filtrowanych kolumnach
- Używaj ZORDER BY dla multi-dimensional queries

**Monitoring:**
- Monitor record counts per warstwa
- Alert na spike w rejection rates
- Track processing time per stage
- Używaj DESCRIBE HISTORY dla audytu

---

## Troubleshooting

**Problem 1: High rejection rate w Silver**
**Rozwiązanie:**
```python
# Analiza odrzuconych rekordów
rejected = (
    bronze_df
    .filter(F.col("order_amount").isNull() | (F.col("order_amount") <= 0))
)
display(rejected)
```

**Problem 2: Gold joins powodują data loss**
**Rozwiązanie:**
- Używaj LEFT JOIN zamiast INNER JOIN dla dimension lookups
- Monitor unmatched records

**Problem 3: Długi processing time dla Gold aggregacji**
**Rozwiązanie:**
- Używaj incremental processing: tylko affected dates
- Cache Silver tables przed wieloma agregacjami
- Partycjonuj Gold tables po date

---

## Podsumowanie

**W tym notebooku zbudowaliśmy kompletny Bronze → Silver → Gold pipeline:**

✅ **Bronze Layer:**
- Multi-format ingestion (JSON, CSV, Parquet)
- Audit metadata dla lineage
- Immutable landing zone

✅ **Silver Layer:**
- Data quality validation
- Deduplikacja i standaryzacja
- Business rules enforcement
- Quality metrics logging

✅ **Gold Layer:**
- Denormalized fact tables
- Pre-aggregated summaries (daily, monthly)
- Customer analytics i segmentacja
- BI-ready tables

✅ **Monitoring:**
- Pipeline health dashboard
- Data quality metrics
- Rejection rate tracking

**Kluczowe wnioski:**
1. End-to-end pipeline wymaga różnych transformacji per warstwa
2. Data quality gates w Silver chronią przed bad data w Gold
3. Denormalizacja w Gold poprawia performance BI dashboardów
4. Monitoring jest kluczowy dla production reliability

**Następne kroki:**
- **Kolejny notebook**: 05_optimization_best_practices.ipynb
- **Warsztat praktyczny**: 03_end_to_end_bronze_silver_gold_workshop.ipynb
- **Delta Live Tables**: Declarative pipelines z automatic data quality

---

## Cleanup

Posprzątaj zasoby utworzone podczas notebooka:

In [None]:
# Opcjonalne czyszczenie zasobów testowych
# UWAGA: Uruchom tylko jeśli chcesz usunąć wszystkie utworzone dane

# Bronze
# spark.sql(f"DROP TABLE IF EXISTS {bronze_orders_table}")
# spark.sql(f"DROP TABLE IF EXISTS {bronze_customers_table}")
# spark.sql(f"DROP TABLE IF EXISTS {bronze_products_table}")

# Silver
# spark.sql(f"DROP TABLE IF EXISTS {silver_orders_table}")
# spark.sql(f"DROP TABLE IF EXISTS {silver_customers_table}")
# spark.sql(f"DROP TABLE IF EXISTS {silver_products_table}")

# Gold
# spark.sql(f"DROP TABLE IF EXISTS {gold_order_fact_table}")
# spark.sql(f"DROP TABLE IF EXISTS {gold_daily_summary_table}")
# spark.sql(f"DROP TABLE IF EXISTS {gold_monthly_summary_table}")
# spark.sql(f"DROP TABLE IF EXISTS {gold_customer_analytics_table}")

# spark.catalog.clearCache()
# print("Zasoby zostały wyczyszczone")