# DBT Dynamic Loading Example

This notebook demonstrates the new dynamic loading feature for dbt Python classes.

## Overview

The dynamic loading feature allows you to:
- Load and execute dbt SQL models at runtime without code generation
- Hot-reload SQL changes without regenerating Python classes
- Execute DAGs dynamically based on runtime manifest

## Setup

First, ensure you have compiled your dbt project and generated the necessary artifacts.

In [None]:
# Import required libraries
from pathlib import Path
from pyspark.sql import SparkSession

# Import dynamic runtime components
from ingen_fab.packages.dbt.runtime.dynamic import (
    DynamicModelLoader,
    DynamicSQLExecutor,
    DynamicDAGExecutor
)

In [None]:
# Initialize Spark session
spark = SparkSession.builder \
    .appName("DBT Dynamic Loading Example") \
    .config("spark.sql.adaptive.enabled", "true") \
    .config("spark.sql.adaptive.coalescePartitions.enabled", "true") \
    .getOrCreate()

print(f"Spark version: {spark.version}")

## 1. Using DynamicModelLoader

The `DynamicModelLoader` reads manifest and SQL files at runtime.

In [None]:
# Set your dbt project path
dbt_project_path = Path("./sample_project")

# Create a loader instance
loader = DynamicModelLoader(dbt_project_path)

# Get all nodes from the manifest
all_nodes = loader.get_all_nodes()
print(f"Total nodes in manifest: {len(all_nodes)}")
print(f"Sample nodes: {all_nodes[:5]}")

In [None]:
# Get nodes by type
models = loader.get_nodes_by_type("model")
tests = loader.get_nodes_by_type("test")
seeds = loader.get_nodes_by_type("seed")

print(f"Models: {len(models)}")
print(f"Tests: {len(tests)}")
print(f"Seeds: {len(seeds)}")

In [None]:
# Get metadata for a specific node
if models:
    sample_node = models[0]
    metadata = loader.get_node_metadata(sample_node)
    
    print(f"Node: {sample_node}")
    print(f"Resource Type: {metadata.get('resource_type')}")
    print(f"Path: {metadata.get('path')}")
    print(f"SQL Statements: {metadata.get('sql_count')}")
    print(f"Dependencies: {metadata.get('dependencies', [])}")

## 2. Using DynamicSQLExecutor

The `DynamicSQLExecutor` executes SQL statements loaded dynamically.

In [None]:
# Create an executor instance
executor = DynamicSQLExecutor(spark, loader)

# Validate a node before execution
if models:
    validation = executor.validate_node(models[0])
    print(f"Node: {models[0]}")
    print(f"Valid: {validation['valid']}")
    print(f"SQL Count: {validation.get('sql_count')}")
    print(f"Dependencies: {validation.get('dependency_count')} nodes")
    
    if validation['issues']:
        print(f"Issues: {validation['issues']}")
    if validation['warnings']:
        print(f"Warnings: {validation['warnings']}")

In [None]:
# Execute a single node
if models:
    try:
        result = executor.execute_node(models[0])
        print(f"Successfully executed: {models[0]}")
        
        if result:
            print(f"Result schema: {result.schema}")
            print(f"Row count: {result.count()}")
            result.show(5)
    except Exception as e:
        print(f"Execution failed: {e}")

## 3. Using DynamicDAGExecutor

The `DynamicDAGExecutor` orchestrates execution based on dependencies.

In [None]:
# Create a DAG executor
dag_executor = DynamicDAGExecutor(
    spark=spark,
    dbt_project_path=dbt_project_path,
    max_workers=4
)

# Validate the DAG
is_valid, cycles = dag_executor.validate_dag()
print(f"DAG is valid: {is_valid}")
if cycles:
    print(f"Cycles found: {cycles}")

In [None]:
# Get execution plan
plan = dag_executor.get_execution_plan()
print(f"Execution plan has {len(plan)} stages")

for i, stage in enumerate(plan[:3], 1):  # Show first 3 stages
    print(f"\nStage {i}: {len(stage)} nodes can run in parallel")
    for node in stage[:5]:  # Show first 5 nodes
        print(f"  - {node}")

In [None]:
# Execute specific resource types
results = dag_executor.execute_dag(
    resource_types=["seed"],  # Only execute seeds
    fail_fast=True
)

print(f"\nExecution Summary:")
print(f"Executed: {len(results['executed'])} nodes")
print(f"Failed: {len(results['failed'])} nodes")
print(f"Skipped: {len(results['skipped'])} nodes")
print(f"Success Rate: {results['success_rate']:.1%}")
print(f"Total Time: {results['total_time']:.2f}s")

## 4. Hot Reload Example

Demonstrate how to reload SQL changes without regenerating classes.

In [None]:
# Clear cache to reload from disk
loader.clear_cache()
print("Cache cleared - will reload from disk on next access")

# Reload manifest to pick up any changes
loader.reload_manifest()
print("Manifest reloaded")

# DAG executor can also reload
dag_executor.reload_manifest()
print("DAG executor reloaded")

## 5. Comparison: Static vs Dynamic Mode

### Generate Classes in Both Modes

In [None]:
%%bash
# Generate static classes (default)
export FABRIC_WORKSPACE_REPO_DIR="./sample_project"
export FABRIC_ENVIRONMENT="development"

ingen_fab dbt create-python-classes sample_project --execution-mode static

In [None]:
%%bash
# Generate dynamic wrappers
export FABRIC_WORKSPACE_REPO_DIR="./sample_project"
export FABRIC_ENVIRONMENT="development"

ingen_fab dbt create-python-classes sample_project --execution-mode dynamic

### Using Generated Dynamic Wrappers

In [None]:
# Import a generated dynamic wrapper class
# (This assumes you've generated classes for your project)
# from ingen_fab.packages.dbt.runtime.projects.sample_project.models import YourModel

# model = YourModel(spark)
# result = model.execute()
# model.reload()  # Hot reload SQL changes

## 6. Performance Monitoring

In [None]:
# Get execution history
history = executor.get_execution_history()

if history:
    print("Execution History:")
    for entry in history[-5:]:  # Last 5 executions
        print(f"\nNode: {entry['node_id']}")
        print(f"Status: {entry['status']}")
        print(f"Time: {entry['execution_time']:.2f}s")
        print(f"Statements: {entry.get('statement_count', 0)}")

In [None]:
# Get DAG execution status summary
status_summary = dag_executor.get_status_summary()
print("Execution Status Summary:")
for status, count in status_summary.items():
    print(f"  {status}: {count} nodes")

## 7. Best Practices

### When to Use Dynamic Mode

**Use Dynamic Mode when:**
- Developing and iterating on SQL models
- Need hot-reload capability
- Want smaller generated code footprint
- Working with frequently changing SQL

**Use Static Mode when:**
- Deploying to production
- Need maximum performance
- Want self-contained Python packages
- Working in environments without access to dbt artifacts

### Tips

1. **Cache Management**: Use `cache_sql=True` for better performance in production
2. **Parallel Execution**: Adjust `max_workers` based on your Spark cluster size
3. **Error Handling**: Use `fail_fast=False` to continue execution after failures
4. **Monitoring**: Check execution history regularly for performance insights

In [None]:
# Cleanup
spark.stop()
print("Spark session stopped")