# BI & ML Integrations

**KION Training - Dzie≈Ñ 3**

---

## üìö Agenda

1. Power BI Integration (Direct Lake vs Direct Query)
2. Databricks SQL Warehouses
3. MLflow Basics: Experiments, Tracking, Registry
4. Feature Store Introduction
5. Gold Layer ‚Üí ML Dataset Pipeline
6. End-to-End Integration Architecture

---

## üéØ Cele szkolenia

Po tym module bƒôdziesz potrafiƒá:
- ≈ÅƒÖczyƒá Power BI z Databricks
- U≈ºywaƒá SQL Warehouses do BI workloads
- Trackowaƒá ML experiments z MLflow
- Korzystaƒá z Feature Store
- Przygotowywaƒá Gold layer dla ML i BI

---

## 1Ô∏è‚É£ Power BI Integration

Databricks oferuje dwa g≈Ç√≥wne tryby po≈ÇƒÖczenia z Power BI:

### 1. Direct Lake (najnowsze, najbardziej wydajne)
- **Native Delta Lake access**: Power BI czyta bezpo≈õrednio z Delta Lake
- **Extreme performance**: brak po≈õrednik√≥w, cache w memory
- **Automatic refresh**: changes w Delta ‚Üí automatic refresh
- **Requirements**: Power BI Premium, Fabric integration

### 2. Direct Query
- **Live connection**: ka≈ºde query idzie do Databricks SQL Warehouse
- **No data caching**: always fresh data
- **Slower than import**: network latency per query
- **Use case**: large datasets, real-time dashboards

### 3. Import Mode (alternatywa)
- **Data copied to Power BI**: snapshot of data
- **Fast queries**: local cache
- **Scheduled refresh needed**: manual/automatic refresh
- **Size limits**: Power BI dataset size constraints

---

## üõ†Ô∏è Setup i konfiguracja

### Izolacja per u≈ºytkownik

Uruchom skrypt inicjalizacyjny dla per-user izolacji katalog√≥w i schemat√≥w:

In [None]:
%run ../00_setup

### Import bibliotek i konfiguracja

In [None]:
from pyspark.sql import functions as F
from pyspark.sql.types import *
from datetime import datetime, timedelta

# Wy≈õwietl kontekst u≈ºytkownika
print("=== Kontekst u≈ºytkownika ===")
print(f"Katalog: {CATALOG}")
print(f"Schema Bronze: {BRONZE_SCHEMA}")
print(f"Schema Silver: {SILVER_SCHEMA}")
print(f"Schema Gold: {GOLD_SCHEMA}")
print(f"U≈ºytkownik: {raw_user}")

# Ustaw katalog jako domy≈õlny
spark.sql(f"USE CATALOG {CATALOG}")
spark.sql(f"USE SCHEMA {SILVER_SCHEMA}")

## üîå Connection Setup

### Step 1: Prepare data in Gold layer

In [None]:
from pyspark.sql.functions import *

# Create fact table for Power BI
fact_sales = spark.sql(f"""
    SELECT 
        o.order_id,
        o.order_date,
        o.customer_id,
        o.product_id,
        o.quantity,
        o.total_amount as amount,
        o.payment_method as status,
        YEAR(o.order_date) as year,
        MONTH(o.order_date) as month,
        QUARTER(o.order_date) as quarter,
        DAYOFWEEK(o.order_date) as day_of_week
    FROM {CATALOG}.{SILVER_SCHEMA}.orders_silver o
    WHERE o.payment_method IS NOT NULL
""")

fact_sales.write.format("delta").mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable(f"{CATALOG}.{GOLD_SCHEMA}.fact_sales")

print("‚úÖ Fact table created: fact_sales")

In [None]:
# Create dimension tables

# Dim Customer
dim_customer = spark.sql(f"""
    SELECT 
        customer_id,
        first_name,
        last_name,
        email,
        country,
        registration_date
    FROM {CATALOG}.{BRONZE_SCHEMA}.customers_bronze
""")

dim_customer.write.format("delta").mode("overwrite").saveAsTable(f"{CATALOG}.{GOLD_SCHEMA}.dim_customer")

# Dim Product  
dim_product = spark.sql(f"""
    SELECT 
        product_id,
        product_name,
        category,
        price
    FROM {CATALOG}.{BRONZE_SCHEMA}.products_bronze
""")

dim_product.write.format("delta").mode("overwrite").saveAsTable(f"{CATALOG}.{GOLD_SCHEMA}.dim_product")

# Dim Date (date dimension)
from datetime import datetime, timedelta

start_date = datetime(2020, 1, 1)
end_date = datetime(2025, 12, 31)
date_list = [start_date + timedelta(days=x) for x in range((end_date - start_date).days + 1)]

date_data = [(
    d,
    d.year,
    d.month,
    d.day,
    (d.month - 1) // 3 + 1,  # quarter
    d.strftime('%A'),  # day name
    d.strftime('%B'),  # month name
    d.isocalendar()[1]  # week number
) for d in date_list]

dim_date = spark.createDataFrame(
    date_data,
    ["date", "year", "month", "day", "quarter", "day_name", "month_name", "week_number"]
)

dim_date.write.format("delta").mode("overwrite").saveAsTable(f"{CATALOG}.{GOLD_SCHEMA}.dim_date")

print("‚úÖ Dimension tables created: dim_customer, dim_product, dim_date")

In [None]:
# Optimize tables for BI performance
spark.sql(f"OPTIMIZE {CATALOG}.{GOLD_SCHEMA}.fact_sales ZORDER BY (order_date, customer_id)")
spark.sql(f"OPTIMIZE {CATALOG}.{GOLD_SCHEMA}.dim_customer ZORDER BY (customer_id)")
spark.sql(f"OPTIMIZE {CATALOG}.{GOLD_SCHEMA}.dim_product ZORDER BY (product_id)")

print("‚úÖ Tables optimized for query performance")

### Step 2: Power BI Connection String

**For Direct Query mode:**

1. Open Power BI Desktop
2. Get Data ‚Üí More ‚Üí Azure ‚Üí Azure Databricks
3. Provide connection details:
   - **Server hostname**: `your-workspace.cloud.databricks.com`
   - **HTTP Path**: `/sql/1.0/warehouses/<warehouse-id>`
   - **Authentication**: Azure AD / Personal Access Token

4. Select tables from your Gold schema:
   - Use catalog: `training_catalog`
   - Use schema: `<your_user>_gold` (created by 00_setup.ipynb)
   - Tables: `fact_sales`, `dim_customer`, `dim_product`, `dim_date`

5. Choose: **Direct Query** (not Import)

**Power BI will now query Databricks SQL Warehouse in real-time!**

**Nota**: Twoje tabele znajdujƒÖ siƒô w schemacie `{CATALOG}.{GOLD_SCHEMA}`, gdzie `GOLD_SCHEMA` jest unikalny dla ka≈ºdego uczestnika (np. `trainer_gold`).

---

## 2Ô∏è‚É£ Databricks SQL Warehouses

**SQL Warehouses** = Compute engine zoptymalizowany dla BI queries

### Typy SQL Warehouses:

| Type | Use Case | Performance | Cost |
|------|----------|-------------|------|
| **Serverless** | Ad-hoc queries, dashboards | Instant startup | Pay-per-query |
| **Pro** | Production BI, high concurrency | Photon acceleration | Medium |
| **Classic** | Development, testing | Standard Spark | Low |

### Creating SQL Warehouse (via UI or API):

In [None]:
# SQL Warehouse configuration (example JSON for API)
warehouse_config = {
    "name": "KION_BI_Warehouse",
    "cluster_size": "Medium",  # Small / Medium / Large / X-Large
    "min_num_clusters": 1,
    "max_num_clusters": 3,  # Auto-scaling
    "auto_stop_mins": 10,  # Stop after 10 min idle
    "enable_photon": True,  # Photon acceleration
    "enable_serverless_compute": False,  # Pro warehouse
    "spot_instance_policy": "COST_OPTIMIZED",
    "warehouse_type": "PRO",
    "tags": {
        "project": "kion_analytics",
        "environment": "production"
    }
}

print("SQL Warehouse configuration for BI workloads")

### Querying through SQL Warehouse:

In [None]:
# Example BI query - aggregated sales by month
bi_query = spark.sql(f"""
    SELECT 
        d.year,
        d.month,
        d.month_name,
        COUNT(DISTINCT f.order_id) as total_orders,
        COUNT(DISTINCT f.customer_id) as unique_customers,
        SUM(f.amount) as total_revenue,
        AVG(f.amount) as avg_order_value,
        SUM(f.quantity) as total_quantity
    FROM {CATALOG}.{GOLD_SCHEMA}.fact_sales f
    JOIN {CATALOG}.{GOLD_SCHEMA}.dim_date d ON f.order_date = d.date
    GROUP BY d.year, d.month, d.month_name
    ORDER BY d.year DESC, d.month DESC
""")

bi_query.display()

In [None]:
# Example BI query - customer segmentation
customer_analysis = spark.sql(f"""
    SELECT 
        c.country,
        COUNT(DISTINCT f.customer_id) as customer_count,
        SUM(f.amount) as total_revenue,
        AVG(f.amount) as avg_transaction_value,
        COUNT(f.order_id) / COUNT(DISTINCT f.customer_id) as avg_orders_per_customer
    FROM {CATALOG}.{GOLD_SCHEMA}.fact_sales f
    JOIN {CATALOG}.{GOLD_SCHEMA}.dim_customer c ON f.customer_id = c.customer_id
    GROUP BY c.country
    ORDER BY total_revenue DESC
""")

customer_analysis.display()

### Query result caching:

SQL Warehouses automatically cache query results:
- Identical queries return cached results (seconds)
- Cache invalidated on data changes
- Reduces cost and latency

---

## 3Ô∏è‚É£ MLflow Basics: Experiments, Tracking, Registry

**MLflow** = Open-source platform dla ML lifecycle management

### Komponenty MLflow:
1. **Tracking**: Log parameters, metrics, artifacts
2. **Models**: Package models for deployment
3. **Registry**: Centralized model repository
4. **Projects**: Reproducible runs

### Basic MLflow workflow:

In [None]:
import mlflow
import mlflow.sklearn
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd

# Set experiment
mlflow.set_experiment("/Users/your.email@company.com/KION_Customer_LTV_Prediction")

print("‚úÖ MLflow experiment set")

In [None]:
# Load training data from Gold layer
training_data = spark.sql(f"""
    SELECT 
        c.customer_id,
        c.country,
        DATEDIFF(CURRENT_DATE(), c.registration_date) as days_since_registration,
        COUNT(o.order_id) as total_orders,
        SUM(o.total_amount) as lifetime_value,
        AVG(o.total_amount) as avg_order_value,
        MAX(o.order_date) as last_order_date,
        DATEDIFF(CURRENT_DATE(), MAX(o.order_date)) as days_since_last_order
    FROM {CATALOG}.{BRONZE_SCHEMA}.customers_bronze c
    LEFT JOIN {CATALOG}.{SILVER_SCHEMA}.orders_silver o ON c.customer_id = o.customer_id
    GROUP BY c.customer_id, c.country, c.registration_date
    HAVING total_orders >= 2  -- Only customers with at least 2 orders
""")

df = training_data.toPandas()
print(f"‚úÖ Training data loaded: {len(df)} customers")

In [None]:
# Feature engineering
df_encoded = pd.get_dummies(df, columns=['country'])

# Split features and target
X = df_encoded.drop(['customer_id', 'lifetime_value', 'last_order_date'], axis=1)
y = df_encoded['lifetime_value']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set: {len(X_train)}, Test set: {len(X_test)}")

In [None]:
# Train model with MLflow tracking
with mlflow.start_run(run_name="RandomForest_v1") as run:
    
    # Log parameters
    n_estimators = 100
    max_depth = 10
    mlflow.log_param("n_estimators", n_estimators)
    mlflow.log_param("max_depth", max_depth)
    mlflow.log_param("model_type", "RandomForestRegressor")
    mlflow.log_param("train_size", len(X_train))
    mlflow.log_param("test_size", len(X_test))
    
    # Train model
    model = RandomForestRegressor(
        n_estimators=n_estimators,
        max_depth=max_depth,
        random_state=42
    )
    model.fit(X_train, y_train)
    
    # Predictions
    y_pred = model.predict(X_test)
    
    # Calculate metrics
    mse = mean_squared_error(y_test, y_pred)
    rmse = mse ** 0.5
    r2 = r2_score(y_test, y_pred)
    
    # Log metrics
    mlflow.log_metric("mse", mse)
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2_score", r2)
    
    # Log model
    mlflow.sklearn.log_model(
        model,
        "model",
        registered_model_name="kion_customer_ltv_model"
    )
    
    # Log feature importance
    feature_importance = pd.DataFrame({
        'feature': X.columns,
        'importance': model.feature_importances_
    }).sort_values('importance', ascending=False)
    
    # Save as artifact
    feature_importance.to_csv('/tmp/feature_importance.csv', index=False)
    mlflow.log_artifact('/tmp/feature_importance.csv')
    
    print(f"‚úÖ Model trained and logged to MLflow")
    print(f"   RMSE: {rmse:.2f}")
    print(f"   R¬≤ Score: {r2:.4f}")
    print(f"   Run ID: {run.info.run_id}")

In [None]:
# View experiment runs
experiment = mlflow.get_experiment_by_name("/Users/your.email@company.com/KION_Customer_LTV_Prediction")
runs_df = mlflow.search_runs(experiment_ids=[experiment.experiment_id])

runs_df[['run_id', 'params.n_estimators', 'metrics.rmse', 'metrics.r2_score']].display()

### Loading and using registered model:

In [None]:
# Load model from registry
model_name = "kion_customer_ltv_model"
model_version = 1

loaded_model = mlflow.sklearn.load_model(f"models:/{model_name}/{model_version}")

# Make predictions
predictions = loaded_model.predict(X_test)

print(f"‚úÖ Model loaded and predictions made")
print(f"Sample predictions: {predictions[:5]}")

### Model Registry workflow:

```
Development ‚Üí Staging ‚Üí Production ‚Üí Archived
```

- **Development**: Experimental models
- **Staging**: Models being tested/validated
- **Production**: Active models serving predictions
- **Archived**: Deprecated models

---

## 4Ô∏è‚É£ Feature Store Introduction

**Feature Store** = Centralized repository dla ML features

### Benefits:
- **Reusability**: Share features across teams
- **Consistency**: Same features in training & serving
- **Discovery**: Browse available features
- **Versioning**: Track feature evolution
- **Online/Offline**: Batch training + real-time serving

### Creating feature table:

In [None]:
from databricks.feature_store import FeatureStoreClient

fs = FeatureStoreClient()

# Create feature table
feature_df = spark.sql(f"""
    SELECT 
        customer_id,
        COUNT(order_id) as total_orders,
        SUM(total_amount) as lifetime_value,
        AVG(total_amount) as avg_order_value,
        MAX(order_date) as last_order_date,
        DATEDIFF(CURRENT_DATE(), MAX(order_date)) as days_since_last_order,
        COUNT(DISTINCT YEAR(order_date)) as active_years,
        CURRENT_TIMESTAMP() as feature_timestamp
    FROM {CATALOG}.{SILVER_SCHEMA}.orders_silver
    GROUP BY customer_id
""")

# Write to Feature Store
fs.create_table(
    name=f"{CATALOG}.{GOLD_SCHEMA}.customer_features",
    primary_keys=["customer_id"],
    df=feature_df,
    description="Customer behavioral features for LTV prediction"
)

print("‚úÖ Feature table created in Feature Store")

In [None]:
# Update features (incremental)
# This would run daily to refresh features
updated_features = spark.sql(f"""
    SELECT 
        customer_id,
        COUNT(order_id) as total_orders,
        SUM(total_amount) as lifetime_value,
        AVG(total_amount) as avg_order_value,
        MAX(order_date) as last_order_date,
        DATEDIFF(CURRENT_DATE(), MAX(order_date)) as days_since_last_order,
        COUNT(DISTINCT YEAR(order_date)) as active_years,
        CURRENT_TIMESTAMP() as feature_timestamp
    FROM {CATALOG}.{SILVER_SCHEMA}.orders_silver
    WHERE order_date >= CURRENT_DATE() - INTERVAL 1 DAYS
    GROUP BY customer_id
""")

fs.write_table(
    name=f"{CATALOG}.{GOLD_SCHEMA}.customer_features",
    df=updated_features,
    mode="merge"  # Merge updates
)

print("‚úÖ Features updated")

In [None]:
# Read features for training
feature_df = fs.read_table(name=f"{CATALOG}.{GOLD_SCHEMA}.customer_features")
feature_df.display()

### Training with Feature Store:

In [None]:
from databricks.feature_store import FeatureLookup

# Define feature lookups
feature_lookups = [
    FeatureLookup(
        table_name=f"{CATALOG}.{GOLD_SCHEMA}.customer_features",
        feature_names=[
            "total_orders",
            "lifetime_value",
            "avg_order_value",
            "days_since_last_order",
            "active_years"
        ],
        lookup_key="customer_id"
    )
]

# Training set (just customer_ids + labels)
# Note: You would need to create a churn labels table first
# For demonstration purposes, we'll create a simple example
training_set_df = spark.sql(f"""
    SELECT 
        customer_id,
        CASE 
            WHEN DATEDIFF(CURRENT_DATE(), MAX(order_date)) > 90 THEN 1
            ELSE 0
        END as will_churn
    FROM {CATALOG}.{SILVER_SCHEMA}.orders_silver
    GROUP BY customer_id
""")

# Create training set with features
training_set = fs.create_training_set(
    df=training_set_df,
    feature_lookups=feature_lookups,
    label="will_churn"
)

training_df = training_set.load_df()
training_df.display()

# Features are automatically joined from Feature Store!
print("‚úÖ Training set created with Feature Store lookup")

---

## 5Ô∏è‚É£ Gold Layer ‚Üí ML Dataset Pipeline

### Best practices dla ML-ready data:

In [None]:
# Create ML-optimized table
ml_dataset = spark.sql(f"""
    WITH customer_metrics AS (
        SELECT 
            customer_id,
            COUNT(order_id) as total_orders,
            SUM(total_amount) as lifetime_value,
            AVG(total_amount) as avg_order_value,
            STDDEV(total_amount) as stddev_order_value,
            MIN(order_date) as first_order_date,
            MAX(order_date) as last_order_date,
            DATEDIFF(MAX(order_date), MIN(order_date)) as customer_lifespan_days,
            COUNT(DISTINCT DATE_TRUNC('month', order_date)) as active_months
        FROM {CATALOG}.{SILVER_SCHEMA}.orders_silver
        WHERE payment_method IS NOT NULL
        GROUP BY customer_id
    ),
    customer_demographics AS (
        SELECT 
            customer_id,
            country,
            DATEDIFF(CURRENT_DATE(), registration_date) as days_since_registration
        FROM {CATALOG}.{BRONZE_SCHEMA}.customers_bronze
    )
    SELECT 
        cm.*,
        cd.country,
        cd.days_since_registration,
        -- Derived features
        cm.total_orders / NULLIF(cm.active_months, 0) as orders_per_month,
        cm.lifetime_value / NULLIF(cm.total_orders, 0) as avg_order_value_calc,
        DATEDIFF(CURRENT_DATE(), cm.last_order_date) as recency_days,
        -- Target: Will customer order in next 90 days?
        CASE 
            WHEN DATEDIFF(CURRENT_DATE(), cm.last_order_date) <= 90 THEN 1
            ELSE 0
        END as is_active_customer
    FROM customer_metrics cm
    JOIN customer_demographics cd ON cm.customer_id = cd.customer_id
    WHERE cm.total_orders >= 2  -- Filter out one-time buyers
""")

ml_dataset.write.format("delta").mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable(f"{CATALOG}.{GOLD_SCHEMA}.customer_activity_prediction_dataset")

print("‚úÖ ML dataset created: customer_activity_prediction_dataset")

In [None]:
# Add metadata for ML team
spark.sql(f"""
    ALTER TABLE {CATALOG}.{GOLD_SCHEMA}.customer_activity_prediction_dataset
    SET TBLPROPERTIES (
        'ml_use_case' = 'customer_churn_prediction',
        'target_variable' = 'is_active_customer',
        'feature_count' = '15',
        'last_updated' = current_timestamp(),
        'refresh_frequency' = 'daily',
        'owner' = 'ml-team@kion.com'
    )
""")

spark.sql(f"""
    COMMENT ON TABLE {CATALOG}.{GOLD_SCHEMA}.customer_activity_prediction_dataset IS
    'ML-ready dataset for predicting customer activity in next 90 days.
     Target: is_active_customer (1 = active, 0 = inactive).
     Updated daily at 3 AM.'
""")

print("‚úÖ Metadata added to ML dataset")

---

## 6Ô∏è‚É£ End-to-End Integration Architecture

### Complete data flow:

```
Source Systems (CSV, JSON, APIs)
    ‚Üì
Bronze Layer (Raw data + Audit)
    ‚Üì
Silver Layer (Cleaned + Validated)
    ‚Üì
    ‚îú‚Üí Gold Layer (BI) ‚Üí SQL Warehouse ‚Üí Power BI Dashboards
    ‚îú‚Üí Gold Layer (ML) ‚Üí Feature Store ‚Üí MLflow Models
    ‚îî‚Üí Delta Sharing ‚Üí External Partners
```

### Integration checklist:

In [None]:
# Integration health check
integration_check = spark.sql(f"""
    SELECT 
        'BI Layer' as integration_type,
        COUNT(*) as table_count,
        SUM(size_in_bytes) / 1024 / 1024 / 1024 as size_gb
    FROM system.information_schema.tables
    WHERE table_catalog = '{CATALOG}'
        AND table_schema = '{GOLD_SCHEMA}'
        AND (table_name LIKE 'fact_%' OR table_name LIKE 'dim_%')
    
    UNION ALL
    
    SELECT 
        'ML Layer' as integration_type,
        COUNT(*) as table_count,
        SUM(size_in_bytes) / 1024 / 1024 / 1024 as size_gb
    FROM system.information_schema.tables
    WHERE table_catalog = '{CATALOG}'
        AND table_schema = '{GOLD_SCHEMA}'
        AND table_name LIKE '%prediction%'
    
    UNION ALL
    
    SELECT 
        'Feature Store' as integration_type,
        COUNT(*) as table_count,
        SUM(size_in_bytes) / 1024 / 1024 / 1024 as size_gb
    FROM system.information_schema.tables
    WHERE table_catalog = '{CATALOG}'
        AND table_schema = '{GOLD_SCHEMA}'
        AND table_name LIKE '%features%'
""")

integration_check.display()

### Monitoring integration health:

In [None]:
# Query usage statistics
usage_stats = spark.sql(f"""
    SELECT 
        DATE(event_time) as date,
        request_params.full_name_arg as table_name,
        COUNT(*) as query_count,
        COUNT(DISTINCT user_identity.email) as unique_users
    FROM system.access.audit
    WHERE action_name = 'getTable'
        AND request_params.full_name_arg LIKE '{CATALOG}.{GOLD_SCHEMA}%'
        AND event_date >= CURRENT_DATE() - INTERVAL 7 DAYS
    GROUP BY date, table_name
    ORDER BY date DESC, query_count DESC
""")

usage_stats.display()

print("üìä Gold layer usage by BI and ML teams")

---

## ‚úÖ Podsumowanie

### Nauczy≈Çe≈õ siƒô:

‚úÖ **Power BI Integration**: Direct Lake vs Direct Query  
‚úÖ **SQL Warehouses**: Optimized compute dla BI workloads  
‚úÖ **MLflow Tracking**: Log experiments, parameters, metrics  
‚úÖ **Model Registry**: Centralized model management  
‚úÖ **Feature Store**: Reusable ML features  
‚úÖ **Gold ‚Üí ML Pipeline**: Preparing data for ML use cases  

### Key Takeaways:

1. **Unified Platform**: BI, ML, Analytics on same Lakehouse
2. **SQL Warehouses**: Separate compute dla BI = better performance
3. **MLflow**: Track everything - experiments, models, artifacts
4. **Feature Store**: Share features across teams, avoid duplication
5. **Gold Layer**: Serve both BI dashboards AND ML models

### Gratulacje! üéâ

Uko≈Ñczy≈Çe≈õ **Dzie≈Ñ 3** szkolenia KION!

Pozna≈Çe≈õ:
- Advanced transformations (Day 3.1)
- Delta Live Tables (Day 3.2)
- Databricks Jobs (Day 3.3)
- Unity Catalog (Day 3.4)
- BI & ML Integrations (Day 3.5)

**Nastƒôpny krok**: Workshop 03 - praktyczne ƒáwiczenia!

---

## üìö Dodatkowe zasoby

- [Power BI Integration Guide](https://docs.databricks.com/partners/bi/power-bi.html)
- [SQL Warehouses Documentation](https://docs.databricks.com/sql/admin/sql-endpoints.html)
- [MLflow Documentation](https://mlflow.org/docs/latest/index.html)
- [Feature Store Guide](https://docs.databricks.com/machine-learning/feature-store/index.html)

---