# ConnectWise to Microsoft Fabric ETL Notebook

This notebook demonstrates how to use the ConnectWise PSA to Microsoft Fabric OneLake integration pipeline. The code uses the modernized ETL architecture that provides direct Delta writes to OneLake with proper table naming and partitioning strategies.

## Setup

First, install the package and any dependencies:

In [None]:
# Install the package (adjust path as needed)
%pip install /lakehouse/Files/dist/fabric_api-0.1.0-py3-none-any.whl

# Optional: Install any additional dependencies
%pip install delta-spark sparkdantic

## Set Environment Variables

Configure the environment variables for API access and logging:

In [None]:
import os
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")

# Set up environment variables from Fabric Key Vault
# These will be available automatically if you've configured secrets in your workspace Key Vault
required_vars = [
    "CW_COMPANY", 
    "CW_PUBLIC_KEY",
    "CW_PRIVATE_KEY",
    "CW_CLIENTID"
]

# Verify all required variables are available
missing_vars = [var for var in required_vars if not os.getenv(var)]
if missing_vars:
    raise ValueError(f"Missing required environment variables: {', '.join(missing_vars)}. "
                     f"Please add these secrets to your workspace Key Vault.")

print("Environment variables configured successfully.")

## Simple ETL Execution

Run a full ETL process to extract all entities and load them to OneLake:

In [None]:
from fabric_api.orchestration import run_onelake_etl

# Run a full ETL process (no date filtering)
table_paths = run_onelake_etl(
    mode="append",      # Use append or overwrite
    max_pages=100       # Limit the number of pages for testing
)

# Display the results
print("\nETL Results:")
for entity_name, path in table_paths.items():
    print(f"  {entity_name}: {path}")

## Incremental ETL with Date Filtering

Run an incremental ETL process to extract data for a specific date range:

In [None]:
# Run an incremental ETL process with date filtering
incremental_results = run_onelake_etl(
    start_date="2025-04-01",  # Start date in YYYY-MM-DD format
    end_date="2025-04-30",    # End date in YYYY-MM-DD format
    mode="append",            # Always use append for incremental updates
    max_pages=100             # Limit the number of pages for testing
)

# Display the results
print("\nIncremental ETL Results:")
for entity_name, path in incremental_results.items():
    print(f"  {entity_name}: {path}")

## Advanced: Processing Specific Entities

If you need more control over the ETL process, you can use the lower-level APIs:

In [None]:
from fabric_api.orchestration import ETLOrchestrator
from pyspark.sql import SparkSession

# Get the active Spark session in Fabric
spark = SparkSession.getActiveSession() or SparkSession.builder.getOrCreate()

# Create an ETL orchestrator with OneLake optimizations
orchestrator = ETLOrchestrator(
    spark=spark,
    write_mode="append",
    use_onelake=True
)

# Process specific entities
results = orchestrator.process_entities(
    entity_names=["Agreement", "TimeEntry"],  # Only process these entities
    max_pages=50                              # Limit the number of pages
)

# Display the results
print("\nSpecific Entity ETL Results:")
for entity_name, (_, row_count, error_count) in results.items():
    print(f"  {entity_name}: {row_count} rows written, {error_count} validation errors")

## Query the Loaded Data

Once the data is loaded, you can query it using Spark SQL:

In [None]:
# Query the data using Spark SQL
from fabric_api.onelake_utils import get_table_name

# Get fully qualified table names
agreement_table = get_table_name("Agreement")
invoice_table = get_table_name("PostedInvoice")
time_table = get_table_name("TimeEntry")

# Query agreements
agreement_df = spark.sql(f"""
SELECT id, name, type, agreementType, startDate, endDate
FROM {agreement_table}
ORDER BY startDate DESC
LIMIT 10
""")

# Display the results
print("Sample Agreements:")
agreement_df.show(truncate=False)

# Join invoices and time entries
joined_df = spark.sql(f"""
SELECT 
    i.invoice_number,
    i.date as invoice_date,
    i.total as invoice_total,
    t.time_entry_id,
    t.hours_worked,
    t.work_date,
    t.work_role_id,
    t.work_type
FROM {invoice_table} i
JOIN {time_table} t ON i.invoice_number = t.invoice_number
ORDER BY i.date DESC
LIMIT 10
""")

# Display the results
print("Sample Joined Data:")
joined_df.show(truncate=False)

## Next Steps

Now that the data is loaded into OneLake Delta tables, you can:

1. Create Power BI reports directly from the tables
2. Set up scheduled refreshes using Fabric Data Pipelines
3. Create derived tables with additional business logic
4. Export the data to other systems as needed

The modular design of the ETL process allows for easy extensibility and maintenance.