# Fabric Paginated Report Batch Executor

A production-ready framework for executing Microsoft Fabric paginated reports with parameter looping, supporting **four flexible parameter sources**, OneLake storage, and full pipeline integration.

---

## Table of Contents

1. [Features](#features)
2. [Quick Start](#quick-start)
3. [Architecture](#architecture)
4. [Parameter Sources](#parameter-sources)
5. [Configuration Reference](#configuration-reference)
6. [Usage Examples](#usage-examples)
7. [Advanced Features](#advanced-features)
8. [Troubleshooting](#troubleshooting)
9. [Best Practices](#best-practices)

---

## Features

### Core Capabilities
- **Four Flexible Parameter Sources**
  - **Semantic Model** (Power BI Dataset) - Query data with DAX, RLS enforced
  - **Lakehouse** (Delta Lake) - Native Spark SQL integration, best performance
  - **JSON Array** - Simple direct input for testing
  - **Warehouse** (SQL Server) - T-SQL queries for enterprise scenarios

- **Output Options**
  - OneLake storage with date-based folder hierarchy (`Files/reports/archive/YYYY/MM/DD/`)
  - Multiple export formats: PDF, XLSX, DOCX, PPTX, PNG
  - Flexible filename formatting with template support
  - Custom folder structure with template placeholders

- **Enterprise-Ready**
  - Retry logic with exponential backoff (3 attempts: 30s, 60s, 120s)
  - Automatic token refresh for long-running batches (handles 1-hour expiration)
  - Continue on failure (processes all parameters even if some fail)
  - Input validation and SQL injection protection
  - Comprehensive error handling and logging

- **Pipeline Integration**
  - Returns JSON with file list for downstream processing
  - Configurable performance tuning parameters
  - Detailed execution summary with success/failure counts
  - Pipeline ForEach items reference: `@json(activity('NotebookActivity').output.status.Output.result.exitValue).files`

---

## Quick Start

### 1. Choose Your Parameter Source

| Source | Best For | Setup Effort |
|--------|----------|--------------|
| **Semantic Model** | Report-driven parameters, business users | Low |
| **Lakehouse** | Large lists (1000+), data engineers | Low |
| **JSON** | Testing, static lists | Minimal |
| **Warehouse** | Enterprise, complex SQL, RLS | Medium |

### 2. Set Up Parameter Source

**Option A: Semantic Model** (Recommended for business users)
```json
{
  "report_partitioning_source": "semantic_model",
  "semantic_model_workspace_id": "workspace-guid",
  "semantic_model_dataset_id": "dataset-guid",
  "semantic_model_dax_query": "EVALUATE FILTER(DISTINCT('DimCustomer'[CustomerName]), 'DimCustomer'[IsActive] = TRUE)"
}
```

**Option B: Lakehouse** (Recommended for data engineers)
```sql
-- Create table in Lakehouse
CREATE TABLE parameter_config (
    Category STRING,
    ParameterValue STRING,
    IsActive BOOLEAN,
    SortOrder INT
) USING DELTA;

-- Insert data
INSERT INTO parameter_config VALUES
    ('MonthlyReportCustomers', 'Acme Corp', true, 1),
    ('MonthlyReportCustomers', 'TechStart Inc', true, 2);
```
```json
{
  "report_partitioning_source": "lakehouse",
  "lakehouse_table": "parameter_config",
  "lakehouse_category": "MonthlyReportCustomers",
  "lakehouse_column": "ParameterValue"
}
```

**Option C: JSON** (Recommended for testing)
```json
{
  "report_partitioning_source": "json",
  "report_partitioning_values": "[\"Acme Corp\", \"TechStart Inc\", \"Global Solutions\"]"
}
```

**Option D: Warehouse** (Recommended for enterprise)
```sql
-- Create table in Warehouse
CREATE TABLE dbo.ParameterConfig (
    Category NVARCHAR(100),
    ParameterValue NVARCHAR(500),
    IsActive BIT,
    SortOrder INT
);

-- Insert data
INSERT INTO dbo.ParameterConfig VALUES
    ('MonthlyReportCustomers', 'Acme Corp', 1, 1);
```
```json
{
  "report_partitioning_source": "warehouse",
  "warehouse_name": "EnterpriseWarehouse",
  "warehouse_table": "dbo.ParameterConfig",
  "warehouse_column": "ParameterValue",
  "warehouse_category": "MonthlyReportCustomers"
}
```

### 3. Create Pipeline

1. Go to **Pipelines** in Fabric workspace
2. Create new pipeline ‚Üí Add **Notebook activity**
3. Select this notebook
4. Configure parameters (see [Configuration Reference](#configuration-reference))
5. Add triggers (Daily, Weekly, Monthly)
6. Run manually to test
7. Verify files in OneLake: `Files/reports/archive/YYYY/MM/DD/`

---

## Architecture

### Component Structure

```
Pipeline (Scheduled/Manual Trigger)
  ‚îÇ
  ‚îú‚îÄ Parameters: workspace_id, report_id, output_format, static_params, etc.
  ‚îÇ
  ‚ñº
Fabric Notebook: Report Batch Executor
  ‚îÇ
  ‚îú‚îÄ Cell 1: Parameter Definitions (overridden by pipeline)
  ‚îú‚îÄ Cell 2: Configuration Building & OOP Implementation
  ‚îÇ   ‚îÇ
  ‚îÇ   ‚îú‚îÄ InputValidator: Validate and sanitize inputs (GUID, SQL, filenames)
  ‚îÇ   ‚îú‚îÄ FilenameFormatter: Template-based filename generation with date formatting
  ‚îÇ   ‚îú‚îÄ TokenManager: Auto-refresh Power BI tokens (45-min interval)
  ‚îÇ   ‚îú‚îÄ ParameterLoader: Load values from 4 sources with retry logic
  ‚îÇ   ‚îú‚îÄ PowerBIAPIClient: REST API interactions (export, poll, download)
  ‚îÇ   ‚îú‚îÄ OneLakeStorage: File operations with date-based hierarchy
  ‚îÇ   ‚îî‚îÄ PaginatedReportExecutor: Main orchestrator
  ‚îÇ
  ‚îî‚îÄ Execution Flow:
      1. Load partitioning parameter values from configured source
      2. FOR EACH parameter value:
         a. Merge static parameters with current value
         b. Initiate report export via Power BI REST API
         c. Poll export status until completion
         d. Download report file
         e. Save to OneLake with formatted filename
         f. Retry up to 3 times on failure
      3. Generate execution summary
      4. Return JSON result to pipeline
```

### Execution Flow Details

1. **Parameter Loading**
   - Query configured source (Semantic Model, Lakehouse, JSON, or Warehouse)
   - Deduplicate values while preserving order
   - Validate non-empty result
   - Warn if large batch (100+ values)

2. **Report Generation Loop**
   - Merge static params + current partitioning value
   - Ensure token validity (auto-refresh if needed)
   - Call Power BI REST API to initiate export
   - Poll status every 5 seconds (configurable)
   - Download binary file when ready
   - Continue to next value on failure

3. **File Storage**
   - Generate filename using template formatter
   - Apply date formatting to placeholders
   - Sanitize for filesystem compatibility
   - Save to OneLake with date-based folder structure
   - Log file path and size

4. **Result Summary**
   - Count successes and failures
   - Calculate total size and average duration
   - List all generated file paths
   - Return JSON for pipeline consumption

---

## Parameter Sources

### 1. Semantic Model (Recommended for Business Users)

Query data from Power BI semantic models using DAX.

**Advantages:**
- Uses trusted data from existing reports
- Row-Level Security (RLS) automatically enforced
- Includes business logic and calculations
- Fast with semantic layer caching
- Business users can maintain in Power BI

**Example DAX Queries:**

```dax
-- Get all active customers
EVALUATE FILTER(
    DISTINCT('DimCustomer'[CustomerName]),
    'DimCustomer'[IsActive] = TRUE
)

-- Get top 10 customers by sales
EVALUATE TOPN(
    10,
    SUMMARIZECOLUMNS(
        'DimCustomer'[CustomerName],
        "TotalSales", SUM('FactSales'[SalesAmount])
    ),
    [TotalSales], DESC
)

-- Get customers with sales in last 12 months
EVALUATE CALCULATETABLE(
    DISTINCT('DimCustomer'[CustomerName]),
    DATESINPERIOD('DimDate'[Date], TODAY(), -12, MONTH)
)

-- Multiple parameters (Region √ó Category combinations)
EVALUATE
SUMMARIZECOLUMNS(
    'DimGeography'[Region],
    'DimProduct'[Category]
)
```

**Configuration:**
```json
{
  "report_partitioning_source": "semantic_model",
  "semantic_model_workspace_id": "12345678-1234-1234-1234-123456789abc",
  "semantic_model_dataset_id": "87654321-4321-4321-4321-210987654321",
  "semantic_model_dax_query": "EVALUATE DISTINCT('DimCustomer'[CustomerName])"
}
```

---

### 2. Lakehouse (Recommended for Data Engineers)

Query Delta tables in Lakehouse using Spark SQL.

**Advantages:**
- Native Spark SQL (no connection strings needed)
- Best performance for large lists (1000+ values)
- Easy maintenance via Lakehouse UI or notebooks
- ACID compliance and time travel
- Zero authentication overhead
- Highest throughput

**Table Schema:**
```sql
CREATE TABLE parameter_config (
    Category STRING,          -- Report category/group
    ParameterName STRING,     -- Parameter display name
    ParameterValue STRING,    -- Actual value to pass to report
    IsActive BOOLEAN,         -- Enable/disable without deleting
    SortOrder INT,           -- Control execution order
    ValidFrom TIMESTAMP,     -- Optional: date range filtering
    ValidTo TIMESTAMP,       -- Optional: date range filtering
    Notes STRING             -- Optional: documentation
) USING DELTA;

-- Create sample data
INSERT INTO parameter_config VALUES
    ('MonthlyReportCustomers', 'Customer A', 'Acme Corp', true, 1, CURRENT_TIMESTAMP(), NULL, 'Active customer'),
    ('MonthlyReportCustomers', 'Customer B', 'TechStart Inc', true, 2, CURRENT_TIMESTAMP(), NULL, 'Active customer'),
    ('QuarterlyRegions', 'North America', 'North America', true, 1, CURRENT_TIMESTAMP(), NULL, NULL);
```

**Configuration:**
```json
{
  "report_partitioning_source": "lakehouse",
  "lakehouse_table": "parameter_config",
  "lakehouse_category": "MonthlyReportCustomers",
  "lakehouse_column": "ParameterValue",
  "lakehouse_filter": ""  // Optional: additional WHERE clause
}
```

**Advanced Filtering:**
```json
{
  "lakehouse_filter": "ValidFrom <= CURRENT_TIMESTAMP() AND (ValidTo IS NULL OR ValidTo >= CURRENT_TIMESTAMP())"
}
```

---

### 3. JSON Array (Recommended for Testing)

Provide parameters directly as JSON array.

**Advantages:**
- Simplest setup (no infrastructure)
- Perfect for testing and development
- Good for static, small lists
- Version control friendly
- Immediate changes without database updates

**Configuration:**
```json
{
  "report_partitioning_source": "json",
  "report_partitioning_values": "[\"Acme Corp\", \"TechStart Inc\", \"Global Solutions\"]"
}
```

**Use Cases:**
- Testing and proof of concept
- One-time or ad-hoc reports
- Very small lists (< 10 items)
- Quick validation of report parameters

---

### 4. Warehouse (Recommended for Enterprise)

Query Warehouse tables using T-SQL.

**Advantages:**
- Familiar T-SQL syntax
- Complex SQL logic (joins, CTEs, window functions)
- Row-Level Security (RLS)
- Column-Level Security (CLS)
- Integration with existing SQL Server workflows
- Stored procedures support

**Table Schema:**
```sql
CREATE TABLE dbo.ParameterConfig (
    Category NVARCHAR(100),
    ParameterName NVARCHAR(100),
    ParameterValue NVARCHAR(500),
    IsActive BIT DEFAULT 1,
    SortOrder INT,
    CreatedDate DATETIME DEFAULT GETDATE(),
    ModifiedDate DATETIME DEFAULT GETDATE()
);

-- Insert sample data
INSERT INTO dbo.ParameterConfig (Category, ParameterName, ParameterValue, IsActive, SortOrder)
VALUES
    ('MonthlyReportCustomers', 'Customer A', 'Acme Corp', 1, 1),
    ('MonthlyReportCustomers', 'Customer B', 'TechStart Inc', 1, 2);
```

**Configuration:**
```json
{
  "report_partitioning_source": "warehouse",
  "warehouse_name": "EnterpriseWarehouse",
  "warehouse_table": "dbo.ParameterConfig",
  "warehouse_column": "ParameterValue",
  "warehouse_category": "MonthlyReportCustomers"
}
```

**Row-Level Security Example:**
```sql
-- Create RLS function
CREATE FUNCTION dbo.fn_CustomerSecurityPredicate(@AssignedTo NVARCHAR(100))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN SELECT 1 AS Result
WHERE @AssignedTo = USER_NAME() OR USER_NAME() IN (SELECT UserName FROM dbo.Admins);

-- Apply security policy
CREATE SECURITY POLICY dbo.CustomerReportingPolicy
ADD FILTER PREDICATE dbo.fn_CustomerSecurityPredicate(AssignedTo)
ON dbo.ParameterConfig WITH (STATE = ON);
```

---

## Configuration Reference

### Required Parameters

| Parameter | Type | Description | Example |
|-----------|------|-------------|---------|
| `workspace_id` | GUID | Fabric workspace GUID | `"12345678-1234-..."` |
| `report_id` | GUID | Paginated report GUID | `"87654321-4321-..."` |
| `output_format` | String | Export format | `"PDF"`, `"XLSX"`, `"DOCX"`, `"PPTX"`, `"PNG"` |
| `static_params` | JSON | Fixed parameters for all reports | `"{\"start_date\": \"2024-01-01\", \"end_date\": \"2024-12-31\"}"` |
| `report_partitioning_column` | String | Parameter name to loop through | `"Customer"`, `"Region"`, `"Department"` |
| `report_partitioning_source` | String | Parameter source type | `"semantic_model"`, `"lakehouse"`, `"json"`, `"warehouse"` |

### Filename Template Parameters (NEW)

| Parameter | Type | Description | Example |
|-----------|------|-------------|---------|
| `report_name` | String | Filename template with placeholders | `"Report_<report_partitioning_column>_<report_partitioning_value>_<timestamp:yyyymmdd_hhmmss>"` |
| `lakehouse_folder` | String | Custom folder path template (optional) | `"reports/<report_partitioning_column>/<timestamp:yyyy>/<timestamp:mm>"` |

**Template Placeholders:**
- `<report_partitioning_column>` - Parameter name (e.g., "Customer")
- `<report_partitioning_value>` - Current value (e.g., "Acme Corp")
- `<timestamp>` or `<timestamp:format>` - Current timestamp
- `<param_name>` - Any parameter from `static_params`
- `<param_name:format>` - Date parameter with custom format

**Format Patterns:**
- `yyyy` - 4-digit year (2025)
- `yy` - 2-digit year (25)
- `mm` - 2-digit month (01-12)
- `dd` - 2-digit day (01-31)
- `hh` - 2-digit hour (00-23)
- `MM` - 2-digit minute (00-59)
- `ss` - 2-digit second (00-59)
- `ww` - ISO week number (01-53)

**Examples:**
```python
# Example 1: Simple filename
report_name = "Report_<report_partitioning_value>_<timestamp:yyyymmdd>"
# Output: Report_AcmeCorp_20250130.pdf

# Example 2: Include parameter name
report_name = "<report_partitioning_column>_<report_partitioning_value>_<timestamp:yyyy_mm_dd_hhMMss>"
# Output: Customer_AcmeCorp_2025_01_30_143022.pdf

# Example 3: Include static parameters (date range)
report_name = "SalesReport_<start_date:yyyymmdd>_to_<end_date:yyyymmdd>_<report_partitioning_value>"
# Output: SalesReport_20240101_to_20241231_AcmeCorp.pdf

# Example 4: Custom folder structure
lakehouse_folder = "reports/<report_partitioning_column>/<timestamp:yyyy>/<timestamp:mm>"
# Output path: Files/reports/Customer/2025/01/Report_AcmeCorp.pdf

# Example 5: Organize by parameter value
lakehouse_folder = "archive/<report_partitioning_value>/<timestamp:yyyy_mm>"
# Output path: Files/archive/AcmeCorp/2025_01/Report_AcmeCorp.pdf
```

### Source-Specific Parameters

**Semantic Model:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `semantic_model_workspace_id` | GUID | Workspace containing semantic model |
| `semantic_model_dataset_id` | GUID | Semantic model (dataset) GUID |
| `semantic_model_dax_query` | DAX | DAX query to get parameter values |

**Lakehouse:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `lakehouse_table` | String | Delta table name |
| `lakehouse_category` | String | Category filter value |
| `lakehouse_column` | String | Column containing parameter values |
| `lakehouse_filter` | String | Optional: additional WHERE clause |

**JSON:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `report_partitioning_values` | JSON Array | JSON array of parameter values |

**Warehouse:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `warehouse_name` | String | Warehouse name |
| `warehouse_table` | String | Table name (e.g., "dbo.ParameterConfig") |
| `warehouse_column` | String | Column containing parameter values |
| `warehouse_category` | String | Category filter value |

### Execution Options

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `archive_to_onelake` | String | `"true"` | Save files to OneLake |
| `max_retries` | String | `"3"` | Retry attempts per report |
| `export_timeout_seconds` | String | `"600"` | Max seconds to wait for export (10 min) |
| `poll_interval_seconds` | String | `"5"` | Seconds between status polls |
| `retry_backoff_base` | String | `"30"` | Base seconds for exponential backoff (30, 60, 120...) |

### Performance Tuning Parameters

| Parameter | Type | Default | Range | Description |
|-----------|------|---------|-------|-------------|
| `download_chunk_size_mb` | String | `"1"` | 1-100 | Download chunk size in MB |
| `file_size_warning_mb` | String | `"500"` | 10-5000 | File size threshold for warnings |
| `connection_timeout_seconds` | String | `"30"` | 5-300 | API connection timeout |
| `download_timeout_seconds` | String | `"120"` | 30-600 | File download timeout |
| `param_loader_retry_attempts` | String | `"3"` | 1-10 | Parameter loading retry count |
| `param_loader_retry_delay_seconds` | String | `"5"` | 1-60 | Delay between parameter loading retries |
| `token_refresh_interval_minutes` | String | `"45"` | 5-55 | Auto token refresh interval (must be < 60) |

**When to Adjust Performance Parameters:**

**Large Files (> 100MB):**
```json
{
  "download_chunk_size_mb": "5",
  "file_size_warning_mb": "1000",
  "download_timeout_seconds": "300"
}
```

**Slow Network:**
```json
{
  "connection_timeout_seconds": "60",
  "download_timeout_seconds": "240"
}
```

**Long-Running Batches (1000+ reports):**
```json
{
  "token_refresh_interval_minutes": "30"
}
```

**Unreliable Data Source:**
```json
{
  "param_loader_retry_attempts": "5",
  "param_loader_retry_delay_seconds": "10"
}
```

---

## Usage Examples

### Example 1: Monthly Customer Reports (Semantic Model)

**Scenario:** Generate monthly sales report for each active customer using data from Power BI semantic model.

```json
{
  "workspace_id": "12345678-1234-1234-1234-123456789abc",
  "report_id": "87654321-4321-4321-4321-210987654321",
  "output_format": "PDF",
  "static_params": "{\"start_date\": \"2024-01-01\", \"end_date\": \"2024-12-31\"}",
  "report_partitioning_column": "Customer",
  "report_name": "MonthlySales_<report_partitioning_value>_<start_date:yyyymm>",
  "lakehouse_folder": "",
  
  "report_partitioning_source": "semantic_model",
  "semantic_model_workspace_id": "12345678-1234-1234-1234-123456789abc",
  "semantic_model_dataset_id": "dataset-guid",
  "semantic_model_dax_query": "EVALUATE FILTER(DISTINCT('DimCustomer'[CustomerName]), 'DimCustomer'[IsActive] = TRUE)",
  
  "archive_to_onelake": "true",
  "max_retries": "3"
}
```

**Result:** PDF reports saved to `Files/reports/archive/2025/01/30/MonthlySales_AcmeCorp_202401.pdf`

---

### Example 2: Regional Reports (Lakehouse)

**Scenario:** Generate quarterly report for each region using parameter list from Lakehouse.

**Setup Lakehouse:**
```sql
INSERT INTO parameter_config (Category, ParameterValue, IsActive, SortOrder)
VALUES
    ('QuarterlyRegions', 'North America', true, 1),
    ('QuarterlyRegions', 'Europe', true, 2),
    ('QuarterlyRegions', 'Asia Pacific', true, 3),
    ('QuarterlyRegions', 'Latin America', true, 4);
```

**Pipeline Configuration:**
```json
{
  "workspace_id": "workspace-guid",
  "report_id": "report-guid",
  "output_format": "XLSX",
  "static_params": "{\"quarter\": \"Q1\", \"year\": \"2024\"}",
  "report_partitioning_column": "Region",
  "report_name": "QuarterlyReport_<year>_<quarter>_<report_partitioning_value>",
  "lakehouse_folder": "reports/<year>/<quarter>",
  
  "report_partitioning_source": "lakehouse",
  "lakehouse_table": "parameter_config",
  "lakehouse_category": "QuarterlyRegions",
  "lakehouse_column": "ParameterValue",
  "lakehouse_filter": ""
}
```

**Result:** Excel reports saved to `Files/reports/2024/Q1/QuarterlyReport_2024_Q1_NorthAmerica.xlsx`

---

### Example 3: Testing with JSON

**Scenario:** Test report generation for 3 specific customers before rolling out to all.

```json
{
  "workspace_id": "workspace-guid",
  "report_id": "report-guid",
  "output_format": "PDF",
  "static_params": "{\"start_date\": \"2024-01-01\", \"end_date\": \"2024-01-31\"}",
  "report_partitioning_column": "Customer",
  "report_name": "TestReport_<report_partitioning_value>_<timestamp:yyyymmdd>",
  
  "report_partitioning_source": "json",
  "report_partitioning_values": "[\"Test Customer A\", \"Test Customer B\", \"Test Customer C\"]"
}
```

**Result:** 3 test reports generated quickly without setting up database tables.

---

### Example 4: Enterprise Warehouse with RLS

**Scenario:** Generate reports for customers visible to current user based on Row-Level Security.

**Warehouse Setup:**
```sql
-- Table with RLS
CREATE TABLE dbo.CustomerReporting (
    CustomerName NVARCHAR(200),
    AssignedTo NVARCHAR(100),
    IsActive BIT
);

-- RLS policy
CREATE FUNCTION dbo.fn_CustomerSecurityPredicate(@AssignedTo NVARCHAR(100))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN SELECT 1 AS Result
WHERE @AssignedTo = USER_NAME() OR USER_NAME() IN (SELECT UserName FROM dbo.Admins);

CREATE SECURITY POLICY dbo.CustomerReportingPolicy
ADD FILTER PREDICATE dbo.fn_CustomerSecurityPredicate(AssignedTo)
ON dbo.CustomerReporting WITH (STATE = ON);
```

**Pipeline Configuration:**
```json
{
  "report_partitioning_source": "warehouse",
  "warehouse_name": "EnterpriseWarehouse",
  "warehouse_table": "dbo.CustomerReporting",
  "warehouse_column": "CustomerName",
  "warehouse_category": "",
  "report_name": "<report_partitioning_value>_Report_<timestamp:yyyy_mm_dd>"
}
```

**Result:** Each user only generates reports for their assigned customers (RLS enforced automatically).

---

## Advanced Features

### 1. Multiple Parameter Combinations

For reports requiring multiple parameters, use one of these approaches:

**Approach A: Semantic Model with SUMMARIZECOLUMNS**
```dax
-- Returns all combinations of Region √ó Category
EVALUATE
SUMMARIZECOLUMNS(
    'DimGeography'[Region],
    'DimProduct'[Category]
)
```
Result: One row per combination. Notebook loops through each row.

**Approach B: Pre-compute combinations in Lakehouse**
```sql
INSERT INTO parameter_config (Category, ParameterValue, IsActive, SortOrder)
SELECT
    'RegionCategoryCombo',
    CONCAT(Region, '|', Category) AS ParameterValue,
    true,
    ROW_NUMBER() OVER (ORDER BY Region, Category)
FROM region_category_combinations;
```
Then parse the combined value in your paginated report.

---

### 2. Conditional Parameter Lists

Filter parameters based on date, status, or other business logic:

**Semantic Model (DAX):**
```dax
-- Customers with sales > $10K in last 6 months
EVALUATE
FILTER(
    DISTINCT('DimCustomer'[CustomerName]),
    'DimCustomer'[IsActive] = TRUE
    && CALCULATE(
        SUM('FactSales'[Amount]),
        DATESINPERIOD('DimDate'[Date], TODAY(), -6, MONTH)
    ) > 10000
)
```

**Lakehouse (SQL):**
```json
{
  "lakehouse_filter": "ValidFrom <= CURRENT_TIMESTAMP() AND (ValidTo IS NULL OR ValidTo >= CURRENT_TIMESTAMP())"
}
```

---

### 3. Dynamic File Naming

The `report_name` parameter supports template placeholders for flexible naming:

```python
# Basic template
report_name = "Report_<report_partitioning_value>_<timestamp:yyyymmdd>"
# Output: Report_AcmeCorp_20250130.pdf

# With parameter name
report_name = "<report_partitioning_column>_<report_partitioning_value>_<timestamp:yyyy_mm_dd_hhMMss>"
# Output: Customer_AcmeCorp_2025_01_30_143022.pdf

# Include static parameters (automatically detects dates)
static_params = "{\"start_date\": \"2024-01-01\", \"end_date\": \"2024-12-31\"}"
report_name = "SalesReport_<start_date:yyyymmdd>_to_<end_date:yyyymmdd>_<report_partitioning_value>"
# Output: SalesReport_20240101_to_20241231_AcmeCorp.pdf
```

**Character Sanitization:**
- Special characters are removed or replaced with underscores
- Spaces become underscores
- Unicode characters are normalized to ASCII
- Length is enforced (max 255 characters)

---

### 4. Custom Folder Structure

The `lakehouse_folder` parameter supports template placeholders for organizing files:

```python
# Organize by year and month
lakehouse_folder = "reports/<timestamp:yyyy>/<timestamp:mm>"
# Output: Files/reports/2025/01/Report.pdf

# Organize by parameter value
lakehouse_folder = "reports/<report_partitioning_column>"
# Output: Files/reports/Customer/Report.pdf

# Organize by static parameter and date
lakehouse_folder = "archive/<Region>/<timestamp:yyyy>/<timestamp:mm>"
# Output: Files/archive/EMEA/2025/01/Report.pdf

# Leave empty for default date-based structure
lakehouse_folder = ""
# Output: Files/reports/archive/2025/01/30/Report.pdf
```

---

### 5. Error Handling and Retry

**Retry Logic:**
- 3 retry attempts per report (configurable via `max_retries`)
- Exponential backoff delays: 30s, 60s, 120s (configurable via `retry_backoff_base`)
- Token automatically refreshed before each attempt
- Continues to next parameter on final failure

**Error Categories:**
1. **Power BI API Errors:** Authentication, report not found, invalid parameters
2. **Export Failures:** Timeout, data source unavailable, memory issues
3. **Storage Errors:** OneLake write failures, permission issues
4. **Parameter Loading Errors:** Source unavailable, query errors

All errors are logged with detailed context for troubleshooting.

---

### 6. Token Management

**Automatic Token Refresh:**
- Power BI tokens expire after 60 minutes
- TokenManager tracks token age
- Auto-refreshes every 45 minutes by default (configurable)
- Prevents mid-batch authentication failures
- Logged with timestamp for audit trail

**For Long-Running Batches:**
```json
{
  "token_refresh_interval_minutes": "30"
}
```

---

## Troubleshooting

### Common Issues

#### 1. No Parameter Values Loaded

**Symptoms:** 
```
‚ùå No parameter values loaded from {source}! Check your configuration.
```

**Solutions by Source:**

**Semantic Model:**
- Test DAX query in Power BI Desktop first
- Verify workspace GUID and dataset GUID are correct
- Check semantic model is published and accessible
- Ensure query returns at least one column with values

**Lakehouse:**
- Verify table exists: `SELECT * FROM parameter_config`
- Check `Category` matches exactly (case-sensitive)
- Confirm `IsActive = true` rows exist
- Ensure notebook is attached to correct Lakehouse

**JSON:**
- Validate JSON syntax at jsonlint.com
- Ensure array is not empty: `[]` is invalid
- Check for proper escaping in pipeline parameters

**Warehouse:**
- Verify Warehouse exists and is accessible
- Test query in Warehouse query editor
- Check table name includes schema: `dbo.ParameterConfig`
- Confirm user has SELECT permission

---

#### 2. Authentication Errors

**Symptoms:**
```
‚ùå Failed to refresh Power BI API token
401 Unauthorized
```

**Solutions:**
- Verify managed identity is enabled for workspace
- Check notebook has permission to:
  - Report workspace (Contributor or Viewer)
  - Semantic model (Build permission)
  - Lakehouse (Read access)
  - Warehouse (CONNECT and SELECT permissions)
- Refresh workspace and reload notebook
- Check if workspace is in trial mode (may have restrictions)

---

#### 3. Report Export Timeout

**Symptoms:**
```
‚ùå Export timeout after 600 seconds
```

**Solutions:**
- **Simplify Report:** Reduce data volume, remove complex visuals
- **Increase Timeout:** Set `export_timeout_seconds` to `"900"` (15 min)
- **Check Data Sources:** Ensure underlying data sources are responsive
- **Validate Parameters:** Test report manually with same parameters in Power BI
- **Review Report Design:** Complex calculations and large datasets slow exports

---

#### 4. Slow Performance

**Symptoms:**
- Each report takes > 5 minutes
- Batch execution exceeds expected time

**Solutions:**

**Optimize Parameter Source:**
- **Semantic Model:** Use DISTINCT instead of VALUES, add filters
- **Lakehouse:** Create indexes on `Category` and `IsActive` columns
- **Warehouse:** Optimize query, add indexes

**Optimize Report:**
- Reduce data volume (add filters, limit rows)
- Simplify visuals (remove complex calculations)
- Pre-aggregate data in data source
- Use report-level filters

**Parallel Processing:**
For very large batches (500+ reports), split across multiple pipelines:
```json
// Pipeline 1: A-M customers
{"lakehouse_filter": "ParameterValue >= 'A' AND ParameterValue < 'N'"}

// Pipeline 2: N-Z customers
{"lakehouse_filter": "ParameterValue >= 'N'"}
```

---

#### 5. File Not Found in OneLake

**Symptoms:**
- Execution shows success but file missing in OneLake

**Solutions:**
- Check `archive_to_onelake` is set to `"true"`
- Verify Lakehouse is attached to notebook
- Confirm notebook has Write permission to Lakehouse
- Check OneLake path in execution logs
- Look in correct folder: `Files/reports/archive/YYYY/MM/DD/`
- If using custom `lakehouse_folder`, verify the path exists

---

#### 6. Invalid Filename Template

**Symptoms:**
```
‚ùå Invalid filename template: {error_message}
```

**Solutions:**
- Check for matching angle brackets: `<placeholder>`
- Use only valid format patterns: `yyyy`, `mm`, `dd`, `hh`, `MM`, `ss`, `ww`
- Don't use spaces in placeholder names: `<start_date>` not `<start date>`
- Ensure placeholders reference valid parameters
- Avoid special characters outside placeholders
- Keep total filename under 255 characters

---

### Debug Mode

To run notebook interactively for debugging:

1. Open notebook in Fabric
2. Set default values in **Cell 1** for all parameters
3. Run cells one by one to isolate issues
4. Check output and logs after each cell
5. Add temporary logging statements if needed:
   ```python
   logger.info(f"DEBUG: variable = {variable}")
   ```
6. Fix issues and re-run
7. Remove debug statements before production use

---

### Viewing Detailed Logs

**During Execution:**
- Monitor notebook cell output in real-time
- Watch for warnings (‚ö†) and errors (‚ùå)
- Note execution duration per report

**After Execution:**
- Review pipeline run history
- Check notebook output in pipeline activity
- Export execution logs if needed
- Review OneLake files to verify outputs

**Log Levels:**
- `INFO` - Normal execution steps
- `WARNING` - Non-critical issues (large files, retries)
- `ERROR` - Failed operations (with retry)
- `CRITICAL` - Unrecoverable failures

---

## Best Practices

### 1. Parameter Source Selection

| Scenario | Recommended Source | Why |
|----------|-------------------|-----|
| Business users maintain list | Semantic Model | Power BI interface, familiar tools |
| Large lists (1000+ values) | Lakehouse | Best performance, scalability |
| Need Row-Level Security | Semantic Model or Warehouse | RLS enforced automatically |
| Testing/Development | JSON | Quick setup, no infrastructure |
| Complex SQL logic needed | Warehouse | Full T-SQL support |
| Highest performance | Lakehouse | Native Spark, zero auth overhead |
| Enterprise security requirements | Warehouse | RLS, CLS, audit trails |

---

### 2. File Organization

**Default Structure (Recommended):**
```
Files/reports/archive/
  /2025/
    /01/
      /30/
        Report_Customer_AcmeCorp_20250130_143022.pdf
        Report_Customer_TechStart_20250130_143035.pdf
```

**Custom Structure Examples:**
```
# By customer and year-month
lakehouse_folder = "reports/<report_partitioning_value>/<timestamp:yyyy_mm>"
Files/reports/AcmeCorp/2025_01/Report.pdf

# By department and quarter
lakehouse_folder = "reports/<Department>/<timestamp:yyyy>/Q<timestamp:ww>"
Files/reports/Sales/2025/Q05/Report.pdf

# By region with nested dates
lakehouse_folder = "archive/<Region>/<timestamp:yyyy>/<timestamp:mm>"
Files/archive/EMEA/2025/01/Report.pdf
```

**Cleanup Recommendations:**
- Archive reports older than 1 year to cheaper storage
- Implement retention policy (e.g., delete after 2 years)
- Monitor OneLake capacity regularly
- Consider compressing old files

---

### 3. Scheduling Best Practices

**Recommended Schedules:**
- **Daily Reports:** 6:00 AM (off-peak hours)
- **Weekly Reports:** Monday 7:00 AM
- **Monthly Reports:** 1st or last day of month, 8:00 AM
- **Quarterly Reports:** 1st Monday of new quarter

**Avoid:**
- Running multiple large batches simultaneously (causes throttling)
- Peak business hours (9 AM - 5 PM) for large batches
- Running during platform maintenance windows

**Large Batch Strategies:**
- Split into multiple pipelines running sequentially
- Use pipeline dependencies to chain executions
- Stagger start times (Pipeline 1: 6:00 AM, Pipeline 2: 7:00 AM)
- Monitor execution times and adjust schedules accordingly

---

### 4. Security Best Practices

**Authentication & Authorization:**
- ‚úÖ Always use managed identity (never hardcode credentials)
- ‚úÖ Grant minimal required permissions (principle of least privilege)
- ‚úÖ Use Row-Level Security (RLS) in Semantic Models/Warehouses
- ‚úÖ Separate service accounts for different environments (Dev/Test/Prod)
- ‚úÖ Audit parameter access and report generation regularly

**Input Validation:**
- ‚úÖ All inputs validated by `InputValidator` class
- ‚úÖ GUIDs validated against format (36 chars, hyphens)
- ‚úÖ SQL identifiers validated (alphanumeric, underscore only)
- ‚úÖ SQL injection protection (parameterized queries, string escaping)
- ‚úÖ Filename sanitization (removes special chars, path traversal)

**Data Protection:**
- ‚úÖ Sensitive parameters logged at DEBUG level only
- ‚úÖ Error messages truncated to avoid exposing secrets
- ‚úÖ Files stored in workspace-specific Lakehouse (isolated)
- ‚ùå Never commit credentials or GUIDs to version control
- ‚ùå Never hardcode connection strings or API keys

**Access Control:**
- Configure Lakehouse permissions carefully
- Limit who can modify parameter tables
- Restrict pipeline execution to authorized users
- Enable audit logging for compliance

---

### 5. Performance Optimization

**Parameter Loading:**
- **Semantic Model:** Optimize DAX queries, use DISTINCT, add filters
- **Lakehouse:** Create indexes on `Category`, `IsActive`, `SortOrder`
- **Warehouse:** Add clustered index on `Category` and `SortOrder`
- Cache parameter lists if values don't change during execution

**Report Design:**
- Minimize data volume (filter at source, not in report)
- Pre-aggregate data in semantic model/warehouse
- Avoid complex DAX calculations in report visuals
- Use efficient visual types (tables instead of complex charts)
- Test report performance with large datasets

**Batch Processing:**
- For 500+ reports, split into multiple pipelines
- Use pipeline parameters to partition work:
  ```json
  // Pipeline A
  {"lakehouse_filter": "ParameterValue < 'M'"}
  
  // Pipeline B
  {"lakehouse_filter": "ParameterValue >= 'M'"}
  ```
- Run pipelines in parallel if infrastructure supports
- Monitor throttling and adjust concurrency

**Network & Storage:**
- Increase timeouts for slow connections
- Use larger chunk sizes for big files
- Pre-warm connections with a test report
- Place Lakehouse in same region as workspace

---

### 6. Monitoring & Maintenance

**Regular Tasks:**
- **Weekly:** Review pipeline success rates and execution times
- **Monthly:** Check OneLake storage consumption and cleanup old files
- **Quarterly:** Optimize parameter sources (remove inactive, reorder by frequency)
- **Annually:** Review and update security permissions

**Metrics to Track:**
- Success rate (target: > 95%)
- Average execution time per report
- Total batch execution time
- Storage consumption growth rate
- Error patterns and frequencies

**Alerts to Configure:**
- Pipeline failure notifications (email/Teams)
- Execution time exceeds threshold (e.g., > 2 hours)
- OneLake storage approaching limit
- Authentication failures

**Documentation:**
- Maintain parameter category definitions
- Document report purposes and audiences
- Keep DAX/SQL queries commented
- Record configuration changes in version control

---

## Pipeline Integration

### Consuming Notebook Output

The notebook exits with JSON containing execution results:

```json
{
  "files": [
    "Files/reports/archive/2025/01/30/Report_Customer_A.pdf",
    "Files/reports/archive/2025/01/30/Report_Customer_B.pdf"
  ],
  "status": "success",  // or "partial_success", "failed"
  "total": 10,
  "success_count": 10,
  "fail_count": 0,
  "total_size_mb": 45.7,
  "avg_duration_seconds": 23.4,
  "total_duration_seconds": 234.0,
  "errors": [],
  "timestamp": "2025-01-30T14:30:22Z",
  "parameter_name": "Customer",
  "source_type": "semantic_model"
}
```

### Using File List in ForEach Activity

**Reference file list in pipeline:**
```json
{
  "items": "@json(activity('NotebookActivity').output.status.Output.result.exitValue).files"
}
```

**Iterate through files:**
```json
{
  "activities": [
    {
      "name": "ProcessEachFile",
      "type": "ForEach",
      "typeProperties": {
        "items": "@json(activity('NotebookActivity').output.status.Output.result.exitValue).files",
        "activities": [
          {
            "name": "SendEmail",
            "type": "WebActivity",
            "inputs": {
              "url": "@{concat('https://api.sendgrid.com/v3/mail/send')}",
              "method": "POST",
              "body": {
                "personalizations": [{
                  "to": [{"email": "@{item().recipient}"}]
                }],
                "attachments": [{
                  "filename": "@{item().filename}",
                  "content": "@{base64(activity('ReadFile').output)}"
                }]
              }
            }
          }
        ]
      }
    }
  ]
}
```

### Conditional Logic Based on Status

```json
{
  "activities": [
    {
      "name": "NotebookActivity",
      "type": "SynapseNotebook"
    },
    {
      "name": "IfSuccess",
      "type": "IfCondition",
      "dependsOn": [{"activity": "NotebookActivity", "dependencyConditions": ["Succeeded"]}],
      "typeProperties": {
        "expression": {
          "@equals(json(activity('NotebookActivity').output.status.Output.result.exitValue).status, 'success')"
        },
        "ifTrueActivities": [
          {"name": "SendSuccessEmail", "type": "WebActivity"}
        ],
        "ifFalseActivities": [
          {"name": "SendFailureAlert", "type": "WebActivity"}
        ]
      }
    }
  ]
}
```

---

## Version History

**v1.0** (2025-01-30)
- Production-ready paginated report batch execution framework
- Four flexible parameter sources: Semantic Model (DAX), Lakehouse (Spark SQL), JSON, Warehouse (T-SQL)
- Automatic token refresh for long-running batches (handles 1-hour token expiration)
- Flexible filename formatting with template placeholders and date formatting
- Custom folder structure support with template placeholders
- OneLake archival with date-based folder hierarchy
- Retry logic with exponential backoff for robust error handling
- Configurable performance tuning parameters for enterprise scenarios
- Comprehensive input validation and SQL injection protection
- Pipeline integration with detailed JSON output for downstream processing
- Continue-on-failure support for large batch operations
- Managed identity authentication
- Object-oriented architecture for maintainability

---

## Additional Resources

**File Structure:**
```
/
‚îú‚îÄ‚îÄ paginated_report_batch_executor.ipynb  # This notebook
‚îú‚îÄ‚îÄ README.md                               # Detailed documentation
‚îú‚îÄ‚îÄ config/
‚îÇ   ‚îú‚îÄ‚îÄ example_semantic_model.json        # Semantic Model example
‚îÇ   ‚îú‚îÄ‚îÄ example_lakehouse_mode.json        # Lakehouse example
‚îÇ   ‚îú‚îÄ‚îÄ example_json_mode.json             # JSON example
‚îÇ   ‚îî‚îÄ‚îÄ example_warehouse_mode.json        # Warehouse example
‚îî‚îÄ‚îÄ pipeline/
    ‚îî‚îÄ‚îÄ pipeline_definition.json           # Sample pipeline definition
```

**Support:**
- Check README.md for extended documentation
- Review example configurations in `config/` folder
- Test with small parameter lists first
- Contact your Fabric administrator for permissions issues

---

**Author:** Generated with Claude Code  
**Version:** 1.0  
**Last Updated:** 2025-01-30

---

**Ready to generate thousands of reports with ease!** üöÄ

In [None]:
# CELL 1: Parameter Definitions
# These parameters can be overridden by pipeline

# Report configuration
workspace_id = ""                    # Fabric workspace GUID
report_id = ""                       # Paginated report GUID
output_format = "XLSX"                # PDF, XLSX, DOCX, PPTX, PNG
static_params = "{}"                 # JSON: {"start_date": "2024-01-01", "end_date": "2024-12-31"}
report_partitioning_column = "Producer"       # Column to partition reports by (loop through values)
report_name = "Report_<report_partitioning_column>_<report_partitioning_value>_<timestamp:yyyymmdd_hhmmss>"  # Filename template

# Lakehouse folder path (OPTIONAL - uses default if empty)
# Template supports same placeholders as report_name: <report_partitioning_column>, <report_partitioning_value>, <timestamp:format>, <param_name>
# Path is relative to Files/ directory (Files/ is automatically prepended)
# Examples:
#   ""                                              ‚Üí Files/reports/archive/{YYYY}/{MM}/{DD}/ (default)
#   "reports/custom"                                ‚Üí Files/reports/custom/
#   "reports/<report_partitioning_column>"          ‚Üí Files/reports/CustomerName/
#   "reports/<timestamp:yyyy>/<timestamp:mm>"       ‚Üí Files/reports/2025/01/
#   "archive/<Region>/<timestamp:yyyy>"             ‚Üí Files/archive/EMEA/2025/
lakehouse_folder = ""

# ============================================================
# SOURCE CONFIGURATION (supports 4 sources)
# ============================================================
report_partitioning_source = "semantic_model"  # "semantic_model" | "lakehouse" | "json" | "warehouse"

# -------------------- OPTION 1: SEMANTIC MODEL (RECOMMENDED) --------------------
semantic_model_workspace_id = ""       # Workspace containing the semantic model
semantic_model_dataset_id = ""         # Semantic model (dataset) GUID
semantic_model_dax_query = ""          # DAX query to get parameter values

# -------------------- OPTION 2: LAKEHOUSE TABLE --------------------
lakehouse_table = "parameter_config"  # Delta table name
lakehouse_category = "ProducerList"   # Category filter
lakehouse_column = "ParameterValue"   # Column containing values
lakehouse_filter = ""                 # Optional: additional WHERE clause

# -------------------- OPTION 3: JSON ARRAY --------------------
report_partitioning_values = "[]"       # JSON: ["Producer A", "Producer B", "Producer C"]

# -------------------- OPTION 4: WAREHOUSE --------------------
warehouse_name = ""                   # Warehouse name
warehouse_table = ""                  # Table name (e.g., "dbo.ParameterConfig")
warehouse_column = ""                 # Column name
warehouse_category = ""               # Category filter

# ============================================================
# EXECUTION OPTIONS
# ============================================================
archive_to_onelake = "true"           # Save to OneLake
max_retries = "3"                     # Retry attempts per report
export_timeout_seconds = "600"        # Max seconds to wait for export (10 minutes)
poll_interval_seconds = "5"           # Seconds between status polls
retry_backoff_base = "30"             # Base seconds for exponential backoff (30, 60, 120...)

# ============================================================
# PERFORMANCE TUNING (OPTIONAL)
# ============================================================
download_chunk_size_mb = "1"          # Download chunk size in MB
file_size_warning_mb = "500"          # File size warning threshold in MB
connection_timeout_seconds = "30"     # API connection timeout in seconds
download_timeout_seconds = "120"      # Download timeout in seconds
param_loader_retry_attempts = "3"     # Parameter loading retry attempts
param_loader_retry_delay_seconds = "5"  # Delay between parameter loading retries
token_refresh_interval_minutes = "45"  # Token refresh interval in minutes

# NOTE: Config dictionary is built in Cell 2 AFTER pipeline parameters are injected

In [None]:
# ============================================================================
# CELL 2: COMPLETE OOP IMPLEMENTATION
# ============================================================================

# ============================================================================
# IMPORTS AND PACKAGE INSTALLATION
# ============================================================================

import requests
import json
import time
import re
import struct
import sys
import os
import tempfile
import unicodedata
from datetime import datetime, timezone, timedelta
from typing import List, Dict, Any, Optional, Tuple

def install_package(package: str) -> None:
    """Install package if not already installed"""
    import subprocess
    try:
        __import__(package.replace('-', '_'))
    except ImportError:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package, "-q"])

# Install semantic-link for semantic model support
try:
    import sempy.fabric as fabric
except ImportError:
    install_package("semantic-link")
    import sempy.fabric as fabric

# Fabric imports
from notebookutils import mssparkutils

# ============================================================================
# LOGGER SETUP
# ============================================================================

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# ============================================================================
# BUILD CONFIGURATION DICTIONARY
# ============================================================================
# Pipeline parameters are injected between Cell 1 and Cell 2, so we build
# the config dictionary here to capture the injected values

# Parse JSON parameters
static_params_dict = json.loads(static_params)

# Strip whitespace from GUIDs (in case pipeline adds spaces)
workspace_id = workspace_id.strip() if workspace_id else ""
report_id = report_id.strip() if report_id else ""

# Build unified configuration dictionary for executor
config = {
    # Core report configuration
    'workspace_id': workspace_id,
    'report_id': report_id,
    'output_format': output_format.upper(),
    'static_params': static_params_dict,
    'report_partitioning_column': report_partitioning_column,
    'report_partitioning_source': report_partitioning_source,
    'report_name': report_name,
    'lakehouse_folder': lakehouse_folder,  # ADDED: Custom folder path support

    # Source-specific configurations
    'semantic_model': {
        'workspace_id': semantic_model_workspace_id.strip() if semantic_model_workspace_id else "",
        'dataset_id': semantic_model_dataset_id.strip() if semantic_model_dataset_id else "",
        'dax_query': semantic_model_dax_query
    },
    'lakehouse': {
        'table': lakehouse_table,
        'category': lakehouse_category,
        'column': lakehouse_column,
        'filter_clause': lakehouse_filter
    },
    'json': {
        'json_values': report_partitioning_values
    },
    'warehouse': {
        'warehouse_name': warehouse_name,
        'table': warehouse_table,
        'column': warehouse_column,
        'category': warehouse_category
    },

    # Execution settings (convert strings to appropriate types)
    'archive_to_onelake': archive_to_onelake.lower() == "true",
    'max_retries': int(max_retries),
    'export_timeout': int(export_timeout_seconds),
    'poll_interval': int(poll_interval_seconds),
    'retry_backoff_base': int(retry_backoff_base),
    'download_chunk_size_mb': int(download_chunk_size_mb),
    'file_size_warning_mb': int(file_size_warning_mb),
    'connection_timeout': int(connection_timeout_seconds),
    'download_timeout': int(download_timeout_seconds),
    'param_loader_max_retries': int(param_loader_retry_attempts),
    'param_loader_retry_delay': int(param_loader_retry_delay_seconds),
    'token_refresh_interval': int(token_refresh_interval_minutes)
}

logger.info("Config dictionary built successfully")
logger.info(f"  workspace_id: {config['workspace_id'][:8]}... (length={len(config['workspace_id'])})")
logger.info(f"  report_id: {config['report_id'][:8]}... (length={len(config['report_id'])})")
logger.info(f"  report_partitioning_source: {config['report_partitioning_source']}")


# ============================================================================
# INPUT VALIDATOR CLASS
# ============================================================================

class InputValidator:
    """Validate and sanitize user inputs to prevent injection attacks"""

    @staticmethod
    def is_valid_guid(value: str) -> bool:
        """Validate GUID format"""
        guid_pattern = r'^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'
        return bool(re.match(guid_pattern, value.lower())) if value else False

    @staticmethod
    def is_valid_sql_identifier(value: str) -> bool:
        """Validate SQL identifier (table/column name) - alphanumeric, underscore, dot only"""
        pattern = r'^[a-zA-Z_][a-zA-Z0-9_\.]*$'
        return bool(re.match(pattern, value)) and len(value) <= 128

    @staticmethod
    def is_valid_source_type(value: str) -> bool:
        """Validate parameter source type"""
        return value in ["semantic_model", "lakehouse", "json", "warehouse"]

    @staticmethod
    def is_valid_format(value: str) -> bool:
        """Validate output format"""
        return value.upper() in ["PDF", "XLSX", "DOCX", "PPTX", "PNG"]

    @staticmethod
    def sanitize_sql_string(value: str) -> str:
        """Sanitize string for use in SQL - escape single quotes"""
        if value is None:
            return ""
        return str(value).replace("'", "''")

    @staticmethod
    def validate_filename_template(template: str) -> Tuple[bool, Optional[str]]:
        """Validate filename template syntax

        Returns:
            Tuple of (is_valid, error_message)
        """
        if not template or len(template) == 0:
            return False, "Template cannot be empty"

        if len(template) > 500:
            return False, "Template too long (max 500 characters)"

        # Check for path traversal attempts
        if '../' in template or '..\\'  in template:
            return False, "Template contains path traversal characters"

        # Parse all placeholders using regex: <placeholder> or <placeholder:format>
        placeholder_pattern = r'<([^:>]+)(?::([^>]+))?>'
        placeholders = re.findall(placeholder_pattern, template)

        # Check for unclosed brackets
        open_count = template.count('<')
        close_count = template.count('>')
        if open_count != close_count:
            return False, f"Mismatched brackets: {open_count} '<' but {close_count} '>'"

        # Validate each placeholder
        valid_format_patterns = ['yyyy', 'yy', 'mm', 'dd', 'ww', 'hh', 'MM', 'ss', '_']
        for placeholder_name, format_spec in placeholders:
            if not placeholder_name:
                return False, "Empty placeholder name found"

            # Check placeholder name is valid (alphanumeric and underscore only)
            if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', placeholder_name):
                return False, f"Invalid placeholder name: '{placeholder_name}' (use only letters, numbers, underscores)"

            # If format specified, validate it contains only known patterns
            if format_spec:
                # Check that format only contains valid date/time patterns
                temp_format = format_spec
                for pattern in valid_format_patterns:
                    temp_format = temp_format.replace(pattern, '')
                if temp_format:  # If anything remains, it's invalid
                    return False, f"Invalid format pattern in '{placeholder_name}:{format_spec}' - unknown characters: '{temp_format}'"

        return True, None

    @staticmethod
    def validate_lakehouse_folder_template(template: str) -> Tuple[bool, Optional[str]]:
        """Validate lakehouse folder path template syntax

        Returns:
            Tuple of (is_valid, error_message)
        """
        # Empty string is valid (means use default)
        if not template or len(template.strip()) == 0:
            return True, None

        if len(template) > 500:
            return False, "Folder path template too long (max 500 characters)"

        # Check for path traversal attempts
        if '../' in template or '..\\'  in template:
            return False, "Folder path contains path traversal characters (../)"

        # Check for trailing slashes
        if template.endswith('/') or template.endswith('\\'):
            return False, "Folder path should not end with a slash"

        # Check for file extensions (common ones)
        file_extensions = ['.pdf', '.xlsx', '.docx', '.pptx', '.png', '.csv', '.txt', '.json']
        for ext in file_extensions:
            if template.lower().endswith(ext):
                return False, f"Folder path should not contain file extensions like '{ext}'"

        # Parse all placeholders using regex: <placeholder> or <placeholder:format>
        placeholder_pattern = r'<([^:>]+)(?::([^>]+))?>'

        # Check for unclosed brackets
        open_count = template.count('<')
        close_count = template.count('>')
        if open_count != close_count:
            return False, f"Mismatched brackets: {open_count} '<' but {close_count} '>'"

        placeholders = re.findall(placeholder_pattern, template)

        # Validate each placeholder
        valid_format_patterns = ['yyyy', 'yy', 'mm', 'dd', 'ww', 'hh', 'MM', 'ss', '_', '/']  # Note: '/' added for folder paths
        for placeholder_name, format_spec in placeholders:
            if not placeholder_name:
                return False, "Empty placeholder name found"

            # Check placeholder name is valid (alphanumeric and underscore only)
            if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', placeholder_name):
                return False, f"Invalid placeholder name: '{placeholder_name}' (use only letters, numbers, underscores)"

            # If format specified, validate it contains only known patterns
            if format_spec:
                # Check that format only contains valid date/time patterns and forward slashes
                temp_format = format_spec
                for pattern in valid_format_patterns:
                    temp_format = temp_format.replace(pattern, '')
                if temp_format:  # If anything remains, it's invalid
                    return False, f"Invalid format pattern in '{placeholder_name}:{format_spec}' - unknown characters: '{temp_format}'"

        # Validate characters in the entire path (excluding template placeholders)
        # Remove placeholders temporarily for validation
        path_without_placeholders = re.sub(placeholder_pattern, 'X', template)

        # Check for invalid characters (only allow alphanumeric, underscore, hyphen, forward slash)
        if not re.match(r'^[a-zA-Z0-9_\-/]+$', path_without_placeholders):
            invalid_chars = re.findall(r'[^a-zA-Z0-9_\-/]', path_without_placeholders)
            return False, f"Folder path contains invalid characters: {set(invalid_chars)} (only letters, numbers, underscore, hyphen, forward slash allowed)"

        # Check for double slashes
        if '//' in template:
            return False, "Folder path contains double slashes (//)"

        # Check for leading slash (path should be relative)
        if template.startswith('/'):
            return False, "Folder path should not start with a slash (path is relative to Files/)"

        return True, None


# ============================================================================
# FILENAME FORMATTER CLASS
# ============================================================================

class FilenameFormatter:
    """Process filename templates with parameter substitution and date formatting

    Supports placeholders:
    - <report_partitioning_column> - parameter name
    - <report_partitioning_value> - current value
    - <param_name> - value from static_params
    - <param_name:yyyymmdd> - formatted date from static_params
    - <timestamp> or <timestamp:yyyymmdd_hhmmss> - current timestamp
    """

    def __init__(self, template: str, static_params: Dict[str, Any], report_partitioning_column: str, lakehouse_folder_template: str = ""):
        """Initialize formatter with template and parameters"""
        self.template = template
        self.static_params = static_params
        self.report_partitioning_column = report_partitioning_column
        self.lakehouse_folder_template = lakehouse_folder_template
        self.parsed_dates = {}

        # Compile regex pattern for placeholders
        self.placeholder_pattern = re.compile(r'<([^:>]+)(?::([^>]+))?>')

        # Parse and cache dates from static_params
        self._parse_dates()

        logger.debug(f"FilenameFormatter initialized with template: {template}")
        if lakehouse_folder_template:
            logger.debug(f"  Lakehouse folder template: {lakehouse_folder_template}")
        logger.debug(f"  Found {len(self.parsed_dates)} date parameters")

    def _parse_dates(self):
        """Auto-detect and parse ISO 8601 dates in static_params"""
        for key, value in self.static_params.items():
            if value is None:
                continue

            value_str = str(value)

            # Try parsing as ISO 8601 date formats
            date_formats = [
                '%Y-%m-%d',           # 2025-01-30
                '%Y-%m-%dT%H:%M:%S',  # 2025-01-30T14:30:22
                '%Y-%m-%d %H:%M:%S',  # 2025-01-30 14:30:22
                '%Y%m%d',             # 20250130
            ]

            for fmt in date_formats:
                try:
                    parsed = datetime.strptime(value_str, fmt)
                    self.parsed_dates[key] = parsed
                    logger.debug(f"  Parsed '{key}' as date: {parsed}")
                    break
                except ValueError:
                    continue

    def _format_date(self, dt: datetime, format_spec: str) -> str:
        """Apply custom date format pattern

        Supported patterns:
        - yyyy: 4-digit year
        - yy: 2-digit year
        - mm: 2-digit month
        - dd: 2-digit day
        - ww: ISO week number
        - hh: 2-digit hour
        - MM: 2-digit minute (capital M to distinguish from month)
        - ss: 2-digit second
        """
        result = format_spec

        # Replace patterns
        result = result.replace('yyyy', dt.strftime('%Y'))
        result = result.replace('yy', dt.strftime('%y'))
        result = result.replace('mm', dt.strftime('%m'))
        result = result.replace('dd', dt.strftime('%d'))
        result = result.replace('ww', dt.strftime('%V'))  # ISO week number
        result = result.replace('hh', dt.strftime('%H'))
        result = result.replace('MM', dt.strftime('%M'))  # Minutes
        result = result.replace('ss', dt.strftime('%S'))

        return result

    def _sanitize_component(self, text: str, max_length: int = 100) -> str:
        """Sanitize individual filename component"""
        if not text:
            return ""

        text = str(text)
        # Normalize Unicode
        text = unicodedata.normalize('NFKD', text)
        text = text.encode('ascii', 'ignore').decode('ascii')
        # Remove non-alphanumeric except underscore and hyphen
        text = re.sub(r'[^\w\s-]', '', text)
        # Collapse whitespace
        text = re.sub(r'\s+', '_', text)
        # Remove leading/trailing separators
        text = text.strip('_-')
        # Enforce length
        if len(text) > max_length:
            text = text[:max_length]

        return text

    def format(self, report_partitioning_value: str, output_format: str) -> str:
        """Generate filename from template for specific parameter value

        Args:
            report_partitioning_value: Current value of the partitioning parameter
            output_format: File extension (PDF, XLSX, etc.)

        Returns:
            Sanitized filename with extension
        """
        timestamp = datetime.now(timezone.utc)
        result = self.template

        # Track what was replaced for logging
        replacements = {}

        def replace_placeholder(match):
            """Replace a single placeholder"""
            placeholder_name = match.group(1)
            format_spec = match.group(2)

            # Handle special built-in placeholders
            if placeholder_name == 'report_partitioning_column':
                value = self.report_partitioning_column
            elif placeholder_name == 'report_partitioning_value':
                value = report_partitioning_value
            elif placeholder_name == 'timestamp':
                if format_spec:
                    value = self._format_date(timestamp, format_spec)
                else:
                    value = timestamp.strftime('%Y%m%d_%H%M%S')
            else:
                # Look up in static_params
                if placeholder_name not in self.static_params:
                    # Parameter not found - skip this placeholder (remove it)
                    logger.debug(f"  Skipping placeholder <{placeholder_name}> - not found in static_params")
                    return ''

                param_value = self.static_params[placeholder_name]

                # Check if this is a date parameter and format requested
                if format_spec and placeholder_name in self.parsed_dates:
                    value = self._format_date(self.parsed_dates[placeholder_name], format_spec)
                else:
                    value = str(param_value) if param_value is not None else ''

            # Sanitize the value
            sanitized = self._sanitize_component(value)
            replacements[placeholder_name] = sanitized
            return sanitized

        # Replace all placeholders
        result = self.placeholder_pattern.sub(replace_placeholder, result)

        # Log replacements
        if replacements:
            logger.debug(f"  Template replacements: {replacements}")

        # Clean up any double underscores or hyphens from removed placeholders
        result = re.sub(r'_{2,}', '_', result)
        result = re.sub(r'-{2,}', '-', result)
        result = result.strip('_-')

        # If result is empty or too short, use fallback
        if not result or len(result) < 3:
            result = f"Report_{self._sanitize_component(report_partitioning_value)}_{timestamp.strftime('%Y%m%d_%H%M%S')}"
            logger.warning(f"Template produced empty/short result, using fallback: {result}")

        # Add extension
        extension = output_format.lower()
        filename = f"{result}.{extension}"

        # Final length check (filesystem limit is 255)
        if len(filename) > 255:
            # Truncate the base name
            max_base_length = 255 - len(extension) - 1  # -1 for the dot
            result = result[:max_base_length]
            filename = f"{result}.{extension}"
            logger.warning(f"Filename truncated to {len(filename)} characters")

        return filename

    def _sanitize_folder_component(self, text: str, max_length: int = 100) -> str:
        """Sanitize individual folder path component (preserves folder structure)"""
        if not text:
            return ""

        text = str(text)
        # Normalize Unicode
        text = unicodedata.normalize('NFKD', text)
        text = text.encode('ascii', 'ignore').decode('ascii')
        # Remove non-alphanumeric except underscore, hyphen, and forward slash
        text = re.sub(r'[^\w\s\-/]', '', text)
        # Collapse whitespace to underscore
        text = re.sub(r'\s+', '_', text)
        # Remove leading/trailing separators (but not slashes in the middle)
        text = text.strip('_-')
        # Enforce length
        if len(text) > max_length:
            text = text[:max_length]

        return text

    def format_folder_path(self, report_partitioning_value: str) -> str:
        """Generate folder path from template for specific parameter value

        Args:
            report_partitioning_value: Current value of the partitioning parameter

        Returns:
            Sanitized folder path (without Files/ prefix or trailing slash)
        """
        timestamp = datetime.now(timezone.utc)
        result = self.lakehouse_folder_template

        # Track what was replaced for logging
        replacements = {}

        def replace_placeholder(match):
            """Replace a single placeholder"""
            placeholder_name = match.group(1)
            format_spec = match.group(2)

            # Handle special built-in placeholders
            if placeholder_name == 'report_partitioning_column':
                value = self.report_partitioning_column
            elif placeholder_name == 'report_partitioning_value':
                value = report_partitioning_value
            elif placeholder_name == 'timestamp':
                if format_spec:
                    value = self._format_date(timestamp, format_spec)
                else:
                    value = timestamp.strftime('%Y%m%d_%H%M%S')
            else:
                # Look up in static_params
                if placeholder_name not in self.static_params:
                    # Parameter not found - skip this placeholder (remove it)
                    logger.debug(f"  Skipping placeholder <{placeholder_name}> in folder path - not found in static_params")
                    return ''

                param_value = self.static_params[placeholder_name]

                # Check if this is a date parameter and format requested
                if format_spec and placeholder_name in self.parsed_dates:
                    value = self._format_date(self.parsed_dates[placeholder_name], format_spec)
                else:
                    value = str(param_value) if param_value is not None else ''

            # Sanitize the value (but preserve slashes for folder structure)
            sanitized = self._sanitize_folder_component(value)
            replacements[placeholder_name] = sanitized
            return sanitized

        # Replace all placeholders
        result = self.placeholder_pattern.sub(replace_placeholder, result)

        # Log replacements
        if replacements:
            logger.debug(f"  Folder path template replacements: {replacements}")

        # Clean up any double slashes
        result = re.sub(r'/+', '/', result)

        # Clean up any double underscores or hyphens
        result = re.sub(r'_{2,}', '_', result)
        result = re.sub(r'-{2,}', '-', result)

        # Remove leading/trailing separators
        result = result.strip('_-/')

        return result


# ============================================================================
# TOKEN MANAGER CLASS
# ============================================================================

class TokenManager:
    """Manage Power BI API tokens with automatic refresh for long-running batches

    Power BI tokens typically expire after 1 hour. This class tracks token age
    and refreshes proactively to prevent mid-batch authentication failures.
    """

    def __init__(self, mssparkutils, refresh_interval_minutes: int = 45):
        """Initialize token manager"""
        self.mssparkutils = mssparkutils
        self.refresh_interval = timedelta(minutes=refresh_interval_minutes)
        self.powerbi_token = None
        self.token_acquired_at = None
        self.refresh_tokens()  # Acquire initial tokens

    def refresh_tokens(self) -> Tuple[str, Dict[str, str]]:
        """Refresh Power BI API tokens"""
        try:
            self.powerbi_token = self.mssparkutils.credentials.getToken("pbi")
            self.token_acquired_at = datetime.now(timezone.utc)
            logger.info("‚úì Power BI API token refreshed successfully")
            return self.powerbi_token, self.get_headers()
        except Exception as e:
            logger.error("‚ùå Failed to refresh Power BI API token")
            raise ValueError(f"Token refresh failed: {str(e)[:200]}")

    def get_headers(self) -> Dict[str, str]:
        """Get current API headers with bearer token"""
        return {"Authorization": f"Bearer {self.powerbi_token}", "Content-Type": "application/json"}

    def ensure_valid_token(self) -> Dict[str, str]:
        """Ensure token is valid, refresh if needed"""
        if self.token_acquired_at is None:
            return self.refresh_tokens()[1]
        else:
            time_since_refresh = datetime.now(timezone.utc) - self.token_acquired_at
            if time_since_refresh >= self.refresh_interval:
                logger.info(f"üîÑ Token is {time_since_refresh.total_seconds()/60:.1f} minutes old, refreshing...")
                self.refresh_tokens()
        return self.get_headers()


# ============================================================================
# PARAMETER LOADER CLASS
# ============================================================================

class ParameterLoader:
    """Load partitioning parameter values from multiple sources with security and retry logic"""

    def __init__(self, mssparkutils, spark=None, max_retries: int = 3, retry_delay: int = 5):
        """Initialize parameter loader"""
        self.mssparkutils = mssparkutils
        self.spark = spark
        self.max_retries = max_retries
        self.retry_delay = retry_delay
        self.validator = InputValidator()

    def load(self, source_type: str, **config) -> List[str]:
        """Load parameters with retry logic and deduplication"""
        if not self.validator.is_valid_source_type(source_type):
            raise ValueError(f"Invalid source type: {source_type}. Must be one of: semantic_model, lakehouse, json, warehouse")

        last_exception = None
        for attempt in range(self.max_retries):
            try:
                if source_type == "semantic_model":
                    values = self._load_from_semantic_model(**config)
                elif source_type == "lakehouse":
                    values = self._load_from_lakehouse(**config)
                elif source_type == "json":
                    values = self._load_from_json(**config)
                elif source_type == "warehouse":
                    values = self._load_from_warehouse(**config)

                # Deduplicate while preserving order
                seen = set()
                unique_values = []
                duplicates = []
                for v in values:
                    if v not in seen:
                        seen.add(v)
                        unique_values.append(v)
                    else:
                        duplicates.append(v)

                if duplicates:
                    logger.warning(f"‚ö† Removed {len(duplicates)} duplicate value(s) from parameter list")
                    logger.debug(f"  Duplicates: {duplicates[:10]}")

                return unique_values

            except Exception as e:
                last_exception = e
                if attempt < self.max_retries - 1:
                    logger.warning(f"‚ö† Parameter loading attempt {attempt + 1} failed: {str(e)[:200]}")
                    logger.info(f"  ‚Üí Retrying in {self.retry_delay} seconds...")
                    time.sleep(self.retry_delay)
                else:
                    logger.error(f"‚ùå All {self.max_retries} parameter loading attempts failed")
                    raise Exception(f"Failed to load parameters from {source_type} after {self.max_retries} attempts: {str(last_exception)[:500]}")

    def _load_from_semantic_model(self, workspace_id: str, dataset_id: str, dax_query: str, **kwargs) -> List[str]:
        """Load from Semantic Model (Power BI Dataset) via sempy"""
        if not self.validator.is_valid_guid(workspace_id):
            raise ValueError(f"Invalid workspace GUID format: {workspace_id}")
        if not self.validator.is_valid_guid(dataset_id):
            raise ValueError(f"Invalid dataset GUID format: {dataset_id}")
        if not dax_query or len(dax_query) > 10000:
            raise ValueError("DAX query must be between 1 and 10000 characters")

        logger.info(f"üìä Querying Semantic Model...")
        logger.info(f"   Workspace: {workspace_id[:8]}...")
        logger.info(f"   Dataset: {dataset_id[:8]}...")

        df = fabric.evaluate_dax(dataset=dataset_id, dax_string=dax_query, workspace=workspace_id)
        if df.empty:
            raise ValueError("Semantic model query returned no results")

        column_name = df.columns[0]
        values = [str(v) for v in df[column_name].tolist() if v is not None and str(v).strip()]
        logger.info(f"‚úì Loaded {len(values)} values from Semantic Model")
        return values

    def _load_from_lakehouse(self, table: str, category: str, column: str, filter_clause: str = "", **kwargs) -> List[str]:
        """Load from Lakehouse Delta table via Spark SQL"""
        if self.spark is None:
            raise Exception("Spark session not available for lakehouse source")
        if not self.validator.is_valid_sql_identifier(table):
            raise ValueError(f"Invalid table name: {table}")
        if not self.validator.is_valid_sql_identifier(column):
            raise ValueError(f"Invalid column name: {column}")

        safe_category = self.validator.sanitize_sql_string(category)
        logger.info(f"üìä Executing Lakehouse query...")

        query = f"SELECT {column} FROM {table} WHERE IsActive = true AND Category = '{safe_category}'"
        if filter_clause:
            dangerous_patterns = [';', '--', '/*', '*/', 'xp_', 'sp_', 'DROP', 'DELETE', 'TRUNCATE', 'ALTER', 'CREATE']
            if any(pattern.lower() in filter_clause.lower() for pattern in dangerous_patterns):
                raise ValueError(f"Filter clause contains potentially dangerous SQL keywords")
            query += f" AND ({filter_clause})"
        query += " ORDER BY SortOrder, ParameterName"

        df = self.spark.sql(query)
        values = [str(row[0]) for row in df.collect() if row[0] is not None]
        logger.info(f"‚úì Loaded {len(values)} values from Lakehouse")
        return values

    def _load_from_json(self, json_values: str, **kwargs) -> List[str]:
        """Load from JSON array"""
        logger.info(f"üìä Loading from JSON array...")
        if not json_values or json_values.strip() == "[]":
            raise ValueError("JSON parameter values cannot be empty")

        try:
            values = json.loads(json_values) if isinstance(json_values, str) else json_values
        except json.JSONDecodeError as e:
            raise ValueError(f"Invalid JSON format: {str(e)}")

        if not isinstance(values, list) or len(values) == 0:
            raise ValueError("JSON parameter values must be a non-empty array")

        if len(values) > 1000:
            logger.warning(f"‚ö† JSON array has {len(values)} values. Consider using Lakehouse or Warehouse for large lists.")

        values = [str(v) for v in values if v is not None and str(v).strip()]
        logger.info(f"‚úì Loaded {len(values)} values from JSON")
        return values

    def _load_from_warehouse(self, warehouse_name: str, table: str, column: str, category: str = "", **kwargs) -> List[str]:
        """Load from Warehouse via SQL endpoint"""
        import pyodbc

        if not warehouse_name or len(warehouse_name) > 128:
            raise ValueError("Invalid warehouse name")
        if not self.validator.is_valid_sql_identifier(table):
            raise ValueError(f"Invalid table name: {table}")
        if not self.validator.is_valid_sql_identifier(column):
            raise ValueError(f"Invalid column name: {column}")

        logger.info(f"üìä Querying Warehouse...")
        conn = None
        cursor = None

        try:
            token = self.mssparkutils.credentials.getToken("sql")
            token_bytes = token.encode("UTF-16-LE")
            token_struct = struct.pack(f'<I{len(token_bytes)}s', len(token_bytes), token_bytes)
            SQL_COPT_SS_ACCESS_TOKEN = 1256

            workspace_name = self.mssparkutils.runtime.context.get('workspaceName')
            if not workspace_name:
                raise ValueError("Could not determine workspace name from context")

            conn_str = f"Driver={{ODBC Driver 18 for SQL Server}};Server={workspace_name}.datawarehouse.fabric.microsoft.com;Database={warehouse_name};Encrypt=yes;TrustServerCertificate=no;"
            conn = pyodbc.connect(conn_str, attrs_before={SQL_COPT_SS_ACCESS_TOKEN: token_struct}, timeout=30)
            cursor = conn.cursor()

            safe_category = self.validator.sanitize_sql_string(category)
            query = f"SELECT {column} FROM {table} WHERE IsActive = 1"
            if category:
                query += f" AND Category = '{safe_category}'"
            query += " ORDER BY SortOrder"

            cursor.execute(query)
            values = [str(row[0]) for row in cursor.fetchall() if row[0] is not None]
            logger.info(f"‚úì Loaded {len(values)} values from Warehouse")
            return values

        except Exception as e:
            raise Exception(f"Warehouse query failed: {str(e)[:500]}")
        finally:
            if cursor:
                try:
                    cursor.close()
                except:
                    pass
            if conn:
                try:
                    conn.close()
                except:
                    pass


# ============================================================================
# POWER BI API CLIENT CLASS
# ============================================================================

class PowerBIAPIClient:
    """Power BI REST API interactions with retry logic"""

    def __init__(self, token_manager: TokenManager, config: Dict):
        """Initialize API client"""
        self.token_manager = token_manager
        self.workspace_id = config['workspace_id']
        self.report_id = config['report_id']
        self.output_format = config['output_format']
        self.connection_timeout = config['connection_timeout']
        self.poll_interval = config['poll_interval']
        self.export_timeout = config['export_timeout']
        self.download_chunk_size_mb = config['download_chunk_size_mb']
        self.file_size_warning_mb = config['file_size_warning_mb']
        self.download_timeout = config['download_timeout']

    def _handle_api_response(self, response: requests.Response, operation: str) -> None:
        """Handle API response with proper error handling and rate limiting"""
        if response.status_code == 429:
            retry_after = int(response.headers.get('Retry-After', '60'))
            logger.warning(f"‚ö† API rate limit hit during {operation}")
            logger.info(f"  Waiting {retry_after} seconds before retry...")
            time.sleep(retry_after)
            raise Exception(f"Rate limited: {operation}. Retry after {retry_after}s")
        elif response.status_code >= 400:
            error_msg = f"API error during {operation}: HTTP {response.status_code}"
            try:
                error_detail = response.json().get('error', {}).get('message', response.text[:200])
                error_msg += f" - {error_detail}"
            except:
                error_msg += f" - {response.text[:200]}"
            raise Exception(error_msg)

    def initiate_export(self, parameters: Dict[str, Any]) -> str:
        """Initiate paginated report export via Power BI REST API"""
        export_url = f"https://api.powerbi.com/v1.0/myorg/groups/{self.workspace_id}/reports/{self.report_id}/ExportTo"
        headers = self.token_manager.get_headers()
        body = {
            "format": self.output_format,
            "paginatedReportConfiguration": {
                "parameterValues": [{"name": k, "value": str(v)} for k, v in parameters.items()]
            }
        }
        response = requests.post(export_url, headers=headers, json=body, timeout=self.connection_timeout)
        self._handle_api_response(response, "initiate export")
        return response.json()['id']

    def poll_status(self, export_id: str) -> bool:
        """Poll export status until completion or timeout"""
        status_url = f"https://api.powerbi.com/v1.0/myorg/groups/{self.workspace_id}/reports/{self.report_id}/exports/{export_id}"
        headers = self.token_manager.get_headers()
        start_time = time.time()
        poll_count = 0

        while time.time() - start_time < self.export_timeout:
            poll_count += 1
            try:
                response = requests.get(status_url, headers=headers, timeout=self.connection_timeout)
                self._handle_api_response(response, "poll export status")
                status_data = response.json()
                status = status_data.get('status')

                if status == 'Succeeded':
                    logger.debug(f"Export succeeded after {poll_count} polls ({time.time() - start_time:.1f}s)")
                    return True
                elif status == 'Failed':
                    error = status_data.get('error', {}).get('message', 'Unknown error')
                    raise Exception(f"Export failed: {error}")
                elif status in ['Running', 'NotStarted']:
                    time.sleep(self.poll_interval)
                else:
                    logger.warning(f"‚ö† Unknown export status: {status}")
                    time.sleep(self.poll_interval)
            except Exception as e:
                if "Rate limited" in str(e):
                    raise
                logger.warning(f"‚ö† Error during status poll {poll_count}: {str(e)[:200]}")
                time.sleep(self.poll_interval)

        raise TimeoutError(f"Export timeout after {self.export_timeout} seconds ({poll_count} polls)")

    def download_file(self, export_id: str) -> bytes:
        """Download exported report file"""
        file_url = f"https://api.powerbi.com/v1.0/myorg/groups/{self.workspace_id}/reports/{self.report_id}/exports/{export_id}/file"
        headers = self.token_manager.get_headers()
        response = requests.get(file_url, headers=headers, stream=True, timeout=self.download_timeout)
        self._handle_api_response(response, "download report file")

        content = b''
        chunk_size_bytes = self.download_chunk_size_mb * 1024 * 1024
        file_size_warning_bytes = self.file_size_warning_mb * 1024 * 1024

        for chunk in response.iter_content(chunk_size=chunk_size_bytes):
            if chunk:
                content += chunk
                if len(content) > file_size_warning_bytes:
                    logger.warning(f"‚ö† Report file exceeds {self.file_size_warning_mb}MB, may cause memory issues")

        return content


# ============================================================================
# ONELAKE STORAGE CLASS
# ============================================================================

class OneLakeStorage:
    """File storage operations with flexible filename formatting and date-based hierarchy"""

    def __init__(self, mssparkutils, config: Dict, filename_formatter: FilenameFormatter):
        """Initialize storage handler"""
        self.mssparkutils = mssparkutils
        self.output_format = config['output_format']
        self.file_size_warning_mb = config['file_size_warning_mb']
        self.filename_formatter = filename_formatter
        self.lakehouse_folder = config.get('lakehouse_folder', '')

    def save(self, file_content: bytes, report_partitioning_value: str) -> str:
        """Save binary file to OneLake for archival with date-based organization

        This function properly handles binary content (PDF, XLSX, etc.) by:
        1. Generating filename using FilenameFormatter
        2. Writing to a temporary file first
        3. Copying to OneLake using mssparkutils.fs.cp()
        4. Cleaning up the temporary file

        Note: Requires a Lakehouse to be attached to the notebook as a data item.
        """
        timestamp = datetime.now(timezone.utc)

        # Use formatter to generate filename
        filename = self.filename_formatter.format(report_partitioning_value, self.output_format)
        logger.debug(f"  Generated filename: {filename}")

        # Determine folder path
        if self.lakehouse_folder:
            # User specified custom folder path - format it
            folder_path = self.filename_formatter.format_folder_path(report_partitioning_value)
            # Prepend Files/ (path is relative to Files/ directory)
            onelake_path = f"Files/{folder_path}/{filename}"
            logger.debug(f"  Using custom folder: {folder_path}")
        else:
            # Use default: Files/reports/archive/YYYY/MM/DD/
            date_path = timestamp.strftime('%Y/%m/%d')
            onelake_path = f"Files/reports/archive/{date_path}/{filename}"
            logger.debug(f"  Using default folder structure: reports/archive/{date_path}")

        file_size_mb = len(file_content) / (1024 * 1024)
        if file_size_mb > self.file_size_warning_mb:
            logger.warning(f"‚ö† File size is {file_size_mb:.1f}MB, which is very large")

        temp_file = None
        temp_path = None
        try:
            # Create a temporary file to write binary content
            with tempfile.NamedTemporaryFile(delete=False, suffix=f".{self.output_format.lower()}") as temp_file:
                temp_file.write(file_content)
                temp_path = temp_file.name

            # Copy from temp file to OneLake using file:// protocol
            temp_url = f"file://{temp_path}"

            # Delete existing file if it exists (since cp doesn't support overwrite parameter)
            try:
                self.mssparkutils.fs.rm(onelake_path)
            except Exception:
                pass  # File doesn't exist, that's fine

            # Use mssparkutils.fs.cp to copy the file to OneLake
            self.mssparkutils.fs.cp(temp_url, onelake_path)

            logger.debug(f"  Saved {file_size_mb:.2f}MB to OneLake")

        except Exception as e:
            error_msg = str(e)
            # Don't include binary content in error message
            if len(error_msg) > 200:
                error_msg = error_msg[:200]
            raise Exception(f"Failed to write to OneLake: {error_msg}")
        finally:
            # Clean up temporary file
            if temp_path and os.path.exists(temp_path):
                try:
                    os.unlink(temp_path)
                except Exception:
                    pass  # Ignore cleanup errors

        return onelake_path


# ============================================================================
# PAGINATED REPORT EXECUTOR CLASS (Main Orchestrator)
# ============================================================================

class PaginatedReportExecutor:
    """Main orchestrator for paginated report batch execution using composition"""

    def __init__(self, config: Dict, mssparkutils, spark=None):
        """Initialize executor and all components"""
        self.config = config
        self.mssparkutils = mssparkutils
        self.spark = spark

        # Initialize logger
        logger.info("\n" + "="*60)
        logger.info("INITIALIZING PAGINATED REPORT EXECUTOR")
        logger.info("="*60 + "\n")

        # Compose helper objects
        self.validator = InputValidator()

        # Validate filename template early
        is_valid, error_msg = self.validator.validate_filename_template(config.get('report_name', ''))
        if not is_valid:
            raise ValueError(f"Invalid filename template: {error_msg}")
        logger.info(f"‚úì Filename template validated")

        # Validate lakehouse folder template if provided
        lakehouse_folder_val = config.get('lakehouse_folder', '')
        if lakehouse_folder_val:
            is_valid, error_msg = self.validator.validate_lakehouse_folder_template(lakehouse_folder_val)
            if not is_valid:
                raise ValueError(f"Invalid lakehouse_folder template: {error_msg}")
            logger.info(f"‚úì Lakehouse folder template validated: {lakehouse_folder_val}")
        else:
            logger.info(f"‚úì Using default lakehouse folder structure")

        # Create filename formatter (parse dates once at initialization)
        self.filename_formatter = FilenameFormatter(
            template=config['report_name'],
            static_params=config['static_params'],
            report_partitioning_column=config['report_partitioning_column'],
            lakehouse_folder_template=lakehouse_folder_val
        )
        logger.info(f"‚úì Filename formatter created")

        self.token_manager = TokenManager(
            mssparkutils,
            refresh_interval_minutes=config['token_refresh_interval']
        )
        self.param_loader = ParameterLoader(
            mssparkutils,
            spark,
            max_retries=config['param_loader_max_retries'],
            retry_delay=config['param_loader_retry_delay']
        )
        self.api_client = PowerBIAPIClient(self.token_manager, config)

        # Storage is optional (only if archiving enabled)
        if config['archive_to_onelake']:
            self.storage = OneLakeStorage(mssparkutils, config, self.filename_formatter)
        else:
            self.storage = None

        # Validate all parameters
        self._validate_all_parameters()

        # Log configuration
        self._log_configuration()

        logger.info("‚úì Executor initialized successfully\n")

    def _validate_all_parameters(self):
        """Validate all configuration parameters"""
        if not self.validator.is_valid_guid(self.config['workspace_id']):
            raise ValueError(f"Invalid workspace_id format. Must be a valid GUID. Received: '{self.config['workspace_id']}'")
        if not self.validator.is_valid_guid(self.config['report_id']):
            raise ValueError(f"Invalid report_id format. Must be a valid GUID. Received: '{self.config['report_id']}'")
        if not self.validator.is_valid_format(self.config['output_format']):
            raise ValueError(f"Invalid output_format: {self.config['output_format']}. Must be one of: PDF, XLSX, DOCX, PPTX, PNG")
        if not self.validator.is_valid_source_type(self.config['report_partitioning_source']):
            raise ValueError(f"Invalid report_partitioning_source: {self.config['report_partitioning_source']}. Must be one of: semantic_model, lakehouse, json, warehouse")

        # Validate source-specific parameters based on selected source
        logger.info(f"üìã Validating configuration for source type: {self.config['report_partitioning_source']}")

        source_type = self.config['report_partitioning_source']
        if source_type == "semantic_model":
            source_config = self.config['semantic_model']
            if not self.validator.is_valid_guid(source_config['workspace_id']):
                raise ValueError(f"Invalid semantic_model workspace_id format. Required for semantic_model source.")
            if not self.validator.is_valid_guid(source_config['dataset_id']):
                raise ValueError(f"Invalid semantic_model dataset_id format. Required for semantic_model source.")
            if not source_config['dax_query']:
                raise ValueError("semantic_model_dax_query is required for semantic_model source")
            logger.info(f"‚úì Semantic Model configuration validated")

        elif source_type == "lakehouse":
            source_config = self.config['lakehouse']
            if not source_config['table']:
                raise ValueError("lakehouse_table is required for lakehouse source")
            if not source_config['category']:
                raise ValueError("lakehouse_category is required for lakehouse source")
            if not source_config['column']:
                raise ValueError("lakehouse_column is required for lakehouse source")
            logger.info(f"‚úì Lakehouse configuration validated")

        elif source_type == "json":
            source_config = self.config['json']
            if not source_config['json_values'] or source_config['json_values'].strip() == "[]":
                raise ValueError("report_partitioning_values is required for json source and cannot be empty")
            logger.info(f"‚úì JSON configuration validated")

        elif source_type == "warehouse":
            source_config = self.config['warehouse']
            if not source_config['warehouse_name']:
                raise ValueError("warehouse_name is required for warehouse source")
            if not source_config['table']:
                raise ValueError("warehouse_table is required for warehouse source")
            if not source_config['column']:
                raise ValueError("warehouse_column is required for warehouse source")
            logger.info(f"‚úì Warehouse configuration validated")

        # Validate static parameters
        if not isinstance(self.config['static_params'], dict):
            raise ValueError("static_params must be a dictionary")
        logger.info(f"‚úì Static parameters validated: {len(self.config['static_params'])} parameter(s)")

        logger.info("‚úì All configuration validated successfully")

    def _log_configuration(self):
        """Log startup configuration"""
        logger.info("Configuration:")
        logger.info(f"  Report ID: {self.config['report_id'][:8] if self.config['report_id'] else 'Not set'}...")
        logger.info(f"  Partitioning parameter: {self.config['report_partitioning_column']}")
        logger.info(f"  Source: {self.config['report_partitioning_source']}")
        logger.info(f"  Output format: {self.config['output_format']}")
        logger.info(f"  Filename template: {self.config['report_name']}")
        logger.info(f"  Lakehouse folder: {self.config.get('lakehouse_folder', '(default)')}")
        logger.info(f"  Archive to OneLake: {self.config['archive_to_onelake']}")
        logger.info(f"  Export timeout: {self.config['export_timeout']}s")
        logger.info(f"  Max retries: {self.config['max_retries']}")
        logger.info(f"  Token refresh interval: {self.config['token_refresh_interval']} minutes")

    def _load_partitioning_values(self) -> List[str]:
        """Load partitioning parameter values from configured source"""
        logger.info("\n" + "="*60)
        logger.info("LOADING PARTITIONING PARAMETER VALUES")
        logger.info("="*60 + "\n")

        source_type = self.config['report_partitioning_source']
        source_config = self.config[source_type]

        try:
            report_partitioning_values = self.param_loader.load(source_type, **source_config)
        except Exception as e:
            logger.error(f"‚ùå Failed to load parameters: {str(e)[:500]}")
            logger.error(f"   Source: {source_type}")
            logger.error(f"   Check your configuration and ensure:")
            logger.error(f"   - Data source exists and is accessible")
            logger.error(f"   - Permissions are granted")
            logger.error(f"   - Parameters are correctly formatted")
            raise

        logger.info(f"\n{'='*60}")
        logger.info(f"Parameter '{self.config['report_partitioning_column']}' loaded: {len(report_partitioning_values)} unique values")
        if len(report_partitioning_values) <= 10:
            logger.info(f"Values: {report_partitioning_values}")
        else:
            logger.info(f"First 10 values: {report_partitioning_values[:10]}")
            logger.info(f"... and {len(report_partitioning_values) - 10} more")
        logger.info(f"{'='*60}\n")

        # Validation
        if not report_partitioning_values or len(report_partitioning_values) == 0:
            raise ValueError(
                f"‚ùå No parameter values loaded from {source_type}! "
                f"Check your configuration:\n"
                f"  - Ensure the data source has data\n"
                f"  - Verify category/filter settings\n"
                f"  - Check permissions"
            )

        # Warnings for large batches
        if len(report_partitioning_values) > 100:
            estimated_minutes = len(report_partitioning_values) * 2
            logger.warning(f"‚ö† Processing {len(report_partitioning_values)} values may take a long time")
            logger.warning(f"  Estimated time: {estimated_minutes} minutes (assuming 2 min per report)")
            logger.warning(f"  Token will auto-refresh every {self.config['token_refresh_interval']} minutes")

        if len(report_partitioning_values) > 500:
            logger.warning(f"‚ö† Very large batch detected! This may take hours to complete.")
            logger.warning(f"  Recommendation: Use multiple pipelines to process in parallel")

        logger.info("‚úì Partitioning parameter values loaded, deduplicated, and validated")

        return report_partitioning_values

    def _execute_single_report(self, params: Dict, report_partitioning_value: str) -> Dict:
        """Execute report for a single parameter value with retry logic"""
        start_time = datetime.now(timezone.utc)

        for attempt in range(self.config['max_retries']):
            try:
                # Ensure token is valid (refreshes if needed)
                self.token_manager.ensure_valid_token()

                # Step 1: Initiate export
                logger.info(f"  Step 1/4: Initiating report export...")
                export_id = self.api_client.initiate_export(params)
                logger.info(f"    ‚úì Export initiated. Export ID: {export_id[:8]}...")

                # Step 2: Poll for completion
                logger.info(f"  Step 2/4: Waiting for export to complete...")
                self.api_client.poll_status(export_id)
                logger.info(f"    ‚úì Export completed successfully")

                # Step 3: Download file
                logger.info(f"  Step 3/4: Downloading report file...")
                file_content = self.api_client.download_file(export_id)
                file_size_mb = len(file_content) / (1024 * 1024)
                logger.info(f"    ‚úì Downloaded {file_size_mb:.2f} MB")

                # Step 4: Save to OneLake (if enabled)
                onelake_path = None
                if self.storage:
                    logger.info(f"  Step 4/4: Saving to OneLake archive...")
                    onelake_path = self.storage.save(file_content, report_partitioning_value)
                    logger.info(f"    ‚úì Saved to: {onelake_path}")

                # Success!
                end_time = datetime.now(timezone.utc)
                duration = (end_time - start_time).total_seconds()

                return {
                    'partitioning_value': report_partitioning_value,
                    'status': 'SUCCESS',
                    'onelake_path': onelake_path,
                    'file_size_mb': round(file_size_mb, 2),
                    'duration_seconds': round(duration, 2),
                    'timestamp': end_time.isoformat(),
                    'attempts': attempt + 1,
                    'error': None
                }

            except Exception as e:
                error_msg = str(e)[:500]

                if attempt < self.config['max_retries'] - 1:
                    wait_time = self.config['retry_backoff_base'] * (2 ** attempt)
                    logger.warning(f"  ‚úó Attempt {attempt + 1}/{self.config['max_retries']} failed: {error_msg}")
                    logger.info(f"  ‚Üí Retrying in {wait_time} seconds...")
                    time.sleep(wait_time)
                else:
                    # All retries exhausted
                    end_time = datetime.now(timezone.utc)
                    duration = (end_time - start_time).total_seconds()
                    logger.error(f"  ‚úó All {self.config['max_retries']} attempts failed for '{partitioning_value}'")
                    logger.error(f"     Final error: {error_msg}")

                    return {
                        'partitioning_value': report_partitioning_value,
                        'status': 'FAILED',
                        'onelake_path': None,
                        'file_size_mb': 0,
                        'duration_seconds': round(duration, 2),
                        'timestamp': end_time.isoformat(),
                        'attempts': self.config['max_retries'],
                        'error': error_msg
                    }

        # Should never reach here
        return {
            'partitioning_value': report_partitioning_value,
            'status': 'FAILED',
            'error': 'Unknown error - retry loop completed unexpectedly',
            'attempts': self.config['max_retries']
        }

    def _generate_summary(self, results: List[Dict], total_duration: float) -> Dict:
        """Generate execution summary and pipeline result JSON"""
        success_count = len([r for r in results if r['status'] == 'SUCCESS'])
        fail_count = len(results) - success_count
        total_size_mb = sum(r.get('file_size_mb', 0) for r in results)
        avg_duration = sum(r.get('duration_seconds', 0) for r in results) / len(results) if results else 0
        total_attempts = sum(r.get('attempts', 1) for r in results)

        logger.info(f"\n{'='*60}")
        logger.info("EXECUTION SUMMARY")
        logger.info(f"{'='*60}")
        logger.info(f"Total reports processed: {len(results)}")
        logger.info(f"‚úì Successful: {success_count}")
        logger.info(f"‚úó Failed: {fail_count}")
        logger.info(f"Total size: {total_size_mb:.2f} MB")
        logger.info(f"Average duration per report: {avg_duration:.1f} seconds")
        logger.info(f"Success rate: {(success_count/len(results)*100):.1f}%")
        logger.info(f"Total retry attempts: {total_attempts} (avg {total_attempts/len(results):.1f} per report)")

        # Print successful files
        if success_count > 0:
            logger.info(f"\n{'='*60}")
            logger.info(f"GENERATED FILES (Saved to OneLake): {success_count} files")
            logger.info(f"{'='*60}")
            # Show first 20, then summarize
            for idx, r in enumerate([r for r in results if r['status'] == 'SUCCESS'][:20], 1):
                logger.info(f"  {idx}. {r['onelake_path']} ({r['file_size_mb']} MB)")
            if success_count > 20:
                logger.info(f"  ... and {success_count - 20} more files")

        # Print failures
        if fail_count > 0:
            logger.info(f"\n{'='*60}")
            logger.error(f"FAILURES: {fail_count} reports failed")
            logger.info(f"{'='*60}")
            for idx, r in enumerate([r for r in results if r['status'] == 'FAILED'], 1):
                # Truncate error message for readability
                error_msg = r.get('error', 'Unknown error')
                if len(error_msg) > 200:
                    error_msg = error_msg[:200] + "..."
                logger.error(f"  {idx}. '{r['partitioning_value']}': {error_msg}")

        completion_time = datetime.now(timezone.utc)
        logger.info(f"\n{'='*60}")
        logger.info(f"Execution completed at: {completion_time.strftime('%Y-%m-%d %H:%M:%S UTC')}")
        logger.info(f"{'='*60}\n")

        # Prepare file list for pipeline consumption
        successful_files = [
            r['onelake_path']
            for r in results
            if r['status'] == 'SUCCESS' and r.get('onelake_path')
        ]

        # Prepare result for pipeline (must match pipeline's expected JSON structure)
        pipeline_result = {
            'files': successful_files,
            'status': 'success' if fail_count == 0 else 'partial_success' if success_count > 0 else 'failed',
            'total': len(results),
            'success_count': success_count,
            'fail_count': fail_count,
            'total_size_mb': round(total_size_mb, 2),
            'avg_duration_seconds': round(avg_duration, 2),
            'total_duration_seconds': round(total_duration, 2),
            'errors': [
                {'value': r['partitioning_value'], 'error': r['error'][:200] if r.get('error') else 'Unknown'}
                for r in results if r['status'] == 'FAILED'
            ][:50],  # Limit to first 50 errors to avoid huge JSON
            'timestamp': completion_time.isoformat(),
            'parameter_name': self.config['report_partitioning_column'],
            'source_type': self.config['report_partitioning_source']
        }

        logger.info(f"üì§ Returning result to pipeline:")
        logger.info(f"   Status: {pipeline_result['status']}")
        logger.info(f"   Files: {len(successful_files)} paths")
        logger.info(f"   Success: {success_count}/{len(results)}")

        return pipeline_result

    def execute_batch(self) -> Dict:
        """Main execution method - PUBLIC API"""
        logger.info("\n" + "="*60)
        logger.info("STARTING REPORT BATCH EXECUTION")
        logger.info("="*60 + "\n")
        sys.stdout.flush()

        execution_start_time = datetime.now(timezone.utc)

        print(f"\n‚è± Execution started at: {execution_start_time}")
        sys.stdout.flush()

        # Load partitioning parameter values
        report_partitioning_values = self._load_partitioning_values()

        print(f"üìä Total items to process: {len(report_partitioning_values)}")
        print(f"üîÑ Starting loop...")
        sys.stdout.flush()

        # Execute batch loop
        results = []
        for idx, partitioning_value in enumerate(report_partitioning_values, 1):
            # IMMEDIATE FEEDBACK - Print to stdout AND logger
            print(f"\n{'='*60}")
            print(f"‚ñ∂ PROCESSING {idx}/{len(report_partitioning_values)}: {self.config['report_partitioning_column']} = '{partitioning_value}'")
            print(f"{'='*60}")
            sys.stdout.flush()

            logger.info(f"\n{'='*60}")
            logger.info(f"Processing {idx}/{len(report_partitioning_values)}: {self.config['report_partitioning_column']} = '{partitioning_value}'")
            logger.info(f"{'='*60}\n")
            sys.stdout.flush()

            # Merge static parameters with current special value
            print(f"  ‚öô Merging parameters...")
            sys.stdout.flush()

            all_params = self.config['static_params'].copy()
            all_params[self.config['report_partitioning_column']] = partitioning_value

            print(f"  ‚úì Parameters merged")
            logger.info(f"  Parameters:")
            for key, value in all_params.items():
                value_str = str(value)
                if len(value_str) > 100:
                    value_str = value_str[:100] + "..."
                logger.info(f"    {key}: {value_str}")
            logger.info("")
            sys.stdout.flush()

            # Execute report with retry logic
            print(f"  üöÄ Calling execute_report_with_retry...")
            sys.stdout.flush()

            result = self._execute_single_report(all_params, partitioning_value)

            print(f"  ‚úì execute_report_with_retry returned")
            sys.stdout.flush()

            results.append(result)

            # Print result summary with immediate flush
            if result['status'] == 'SUCCESS':
                print(f"  ‚úÖ SUCCESS ({result['duration_seconds']}s, {result['attempts']} attempt(s))")
                sys.stdout.flush()
                logger.info(f"\n  ‚úÖ SUCCESS ({result['duration_seconds']}s, {result['attempts']} attempt(s))")
                logger.info(f"     OneLake: {result['onelake_path']}")
                logger.info(f"     Size: {result['file_size_mb']} MB")
            else:
                print(f"  ‚ùå FAILED ({result['duration_seconds']}s, {result['attempts']} attempt(s))")
                print(f"     Error: {result['error']}")
                sys.stdout.flush()
                logger.error(f"\n  ‚ùå FAILED ({result['duration_seconds']}s, {result['attempts']} attempt(s))")
                logger.error(f"     Error: {result['error']}")

            sys.stdout.flush()

            # Progress update for large batches
            if len(report_partitioning_values) > 20 and idx % 10 == 0:
                success_so_far = len([r for r in results if r['status'] == 'SUCCESS'])
                pct_complete = (idx / len(report_partitioning_values)) * 100
                print(f"\n  üìä Progress: {idx}/{len(report_partitioning_values)} ({pct_complete:.1f}%) - {success_so_far} successful")
                sys.stdout.flush()
                logger.info(f"\n  üìä Progress: {idx}/{len(report_partitioning_values)} ({pct_complete:.1f}%) - {success_so_far} successful")

        print("\n" + "=" * 60)
        print("LOOP COMPLETED")
        print("=" * 60)
        sys.stdout.flush()

        execution_end_time = datetime.now(timezone.utc)
        total_duration = (execution_end_time - execution_start_time).total_seconds()

        logger.info(f"\n{'='*60}")
        logger.info("EXECUTION COMPLETE")
        logger.info(f"{'='*60}")
        logger.info(f"Total time: {total_duration:.1f} seconds ({total_duration/60:.1f} minutes)")
        logger.info(f"Average per report: {total_duration/len(results):.1f} seconds")
        sys.stdout.flush()

        print(f"\n‚úÖ BATCH EXECUTION COMPLETED SUCCESSFULLY")
        print(f"   Processed: {len(results)} reports")
        print(f"   Duration: {total_duration:.1f} seconds")
        sys.stdout.flush()

        # Generate and return summary
        return self._generate_summary(results, total_duration)


# ============================================================================
# MAIN EXECUTION
# ============================================================================

if __name__ == "__main__":
    logger.info(f"\n{'='*60}")
    logger.info("PAGINATED REPORT BATCH EXECUTOR v1.0")
    logger.info(f"Started at: {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M:%S UTC')}")
    logger.info(f"{'='*60}\n")

    # Create executor with all configuration
    executor = PaginatedReportExecutor(
        config=config,
        mssparkutils=mssparkutils,
        spark=spark
    )

    # Execute batch
    result = executor.execute_batch()

    # Exit for pipeline integration
    logger.info("\nüì§ Exiting notebook with result for pipeline...")
    mssparkutils.notebook.exit(json.dumps(result))

    logger.info("‚úì Notebook completed successfully")
