# Monitor Hub Analysis

This notebook provides an interactive environment to run the Monitor Hub Analysis pipeline and explore the results.

## Recent Updates (v0.1.12)
- **Smart Scope Detection**: The pipeline now attempts **Tenant-Wide** extraction first. If Admin permissions are missing, it automatically falls back to **Member-Only** scope.
- **Parquet Support**: The pipeline exports enriched data to Parquet format, enabling faster loading and direct integration with Delta Tables.
- **Datetime Parsing Fix**: The pipeline has been updated to robustly handle mixed timezone formats in activity logs.
- **Pipeline Integration**: The notebook uses the updated `MonitorHubPipeline` class for end-to-end execution.

## Usage
1. Ensure your environment is activated: `conda activate fabric-monitoring`
2. Run the cells below to execute the analysis.
3. The pipeline will:
    - Extract historical data (Tenant-Wide with Fallback).
    - Enrich data with job details.
    - Generate CSV reports in the `exports/monitor_hub_analysis` directory (or configured output).

In [1]:
# ‚úÖ VERIFY INSTALLATION
# Since we have uploaded the .whl to your Fabric Environment, it should be installed automatically.
# Run this cell to confirm the correct version (v0.1.12) is loaded.

import importlib.metadata

try:
    version = importlib.metadata.version("usf_fabric_monitoring")
    print(f"‚úÖ Library found: usf_fabric_monitoring v{version}")
    
    if version >= "0.1.12":
        print("   You are using the correct version.")
    else:
        print(f"‚ö†Ô∏è  WARNING: Expected v0.1.12+ but found v{version}.")
        print("   Please check your Fabric Environment settings and ensure the new wheel is published.")
        
except importlib.metadata.PackageNotFoundError:
    print("‚ùå Library NOT found.")
    print("   Please ensure you have attached the 'Fabric Environment' containing the .whl file to this notebook.")
    print("   Alternatively, upload the .whl file to the Lakehouse 'Files' section and pip install it from there.")

‚úÖ Library found: usf_fabric_monitoring v0.1.6
   You are using the correct version.


# Monitor Hub Analysis Pipeline

## Overview
This notebook executes the **Monitor Hub Analysis Pipeline**, which is designed to provide deep insights into Microsoft Fabric activity. It extracts historical data, calculates key performance metrics, and generates comprehensive reports to help identify:
- Constant failures and reliability issues.
- Excess activity by users, locations, or domains.
- Historical performance trends over the last 90 days.

## Key Features & Recent Updates (v0.1.12)
The pipeline has been enhanced to support enterprise-grade monitoring workflows:

1.  **Smart Scope Detection (v0.1.12)**:
    -   **Primary Strategy**: Attempts to use Power BI Admin APIs for full **Tenant-Wide** visibility.
    -   **Automatic Fallback**: If Admin permissions are missing (401/403), it gracefully reverts to **Member-Only** mode.
    -   **Benefit**: Ensures maximum visibility allowed by your credentials without crashing.

2.  **Parquet Integration (New in v0.1.8)**:
    -   Automatically persists merged activity data (Activities, Workspaces, Items) to Parquet format.
    -   Enables direct integration with Delta Tables and downstream analytics (e.g., Power BI Direct Lake).
    -   Serves as the "Source of Truth" for the analysis steps in this notebook.

3.  **Automatic Persistence & Path Resolution**:
    -   **Automatic Lakehouse Resolution**: Relative paths (e.g., `exports/`) are automatically mapped to `/lakehouse/default/Files/` in Fabric.
    -   **Sequential Orchestration**: Handles the entire data lifecycle (Activity Extraction -> Job Detail Extraction -> Merging -> Analysis).
    -   **Enhanced Reliability**: Ensures JSON exports and CSV reports are saved to persistent storage, not ephemeral nodes.

## How to Use
1. **Install Package**: The first cell installs the `usf_fabric_monitoring` package into the current session.
2. **Configure Credentials**: Ensure your Service Principal credentials (`AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`, `AZURE_TENANT_ID`) are available.
3. **Set Parameters**:
    - `DAYS_TO_ANALYZE`: Number of days of history to fetch (default: 90).
    - `OUTPUT_DIR`: Path where reports will be saved (can now be relative!).
4. **Run Analysis**: Execute the pipeline cell. It will:
    - Fetch data from Fabric APIs.
    - Process and enrich the data.
    - Save CSV reports and Parquet files to the specified `OUTPUT_DIR`.

In [2]:
from usf_fabric_monitoring.core.pipeline import MonitorHubPipeline
import os

In [3]:
import inspect
import os
import usf_fabric_monitoring
from usf_fabric_monitoring.core.pipeline import MonitorHubPipeline

print(f"üì¶ Package Location: {os.path.dirname(usf_fabric_monitoring.__file__)}")

# Verify we are running the NEW code (v0.1.12)
try:
    # Check for the new _save_to_parquet method in pipeline which indicates v0.1.8+
    src = inspect.getsource(MonitorHubPipeline)
    if "_save_to_parquet" in src:
        print("‚úÖ SUCCESS: You are running the updated code (v0.1.12).")
        print("   Feature Verified: Parquet Integration & Smart Scope Detection")
    else:
        print("‚ùå WARNING: You are still running the OLD code.")
        print("   üëâ ACTION: Restart the kernel and run the install cell above again.")
except AttributeError:
    print("‚ùå WARNING: Could not inspect source code. You might be running an optimized .pyc version.")
except Exception as e:
    print(f"‚ö†Ô∏è Could not verify source code: {e}")

üì¶ Package Location: /home/sanmi/miniconda3/envs/fabric-monitoring/lib/python3.11/site-packages/usf_fabric_monitoring
   üëâ ACTION: Restart the kernel and run the install cell above again.


In [4]:
import os
from dotenv import load_dotenv

# --- CREDENTIAL MANAGEMENT ---

# Option 1: Load from .env file (Lakehouse or Local)
# We check the Lakehouse path first, then fallback to local .env
LAKEHOUSE_ENV_PATH = "/lakehouse/default/Files/dot_env_files/.env"
LOCAL_ENV_PATH = ".env"

# Force override=True to ensure we pick up changes to the file even if env vars are already set
if os.path.exists(LAKEHOUSE_ENV_PATH):
    print(f"Loading configuration from Lakehouse: {LAKEHOUSE_ENV_PATH}")
    load_dotenv(LAKEHOUSE_ENV_PATH, override=True)
elif os.path.exists(LOCAL_ENV_PATH):
    print(f"Loading configuration from Local: {os.path.abspath(LOCAL_ENV_PATH)}")
    load_dotenv(LOCAL_ENV_PATH, override=True)
else:
    print(f"Warning: No .env file found at {LAKEHOUSE_ENV_PATH} or {LOCAL_ENV_PATH}")

# Option 2: Load from Azure Key Vault (Best Practice)
# Uncomment and configure this section to use Azure Key Vault
# try:
#     from notebookutils import mssparkutils
#     KEY_VAULT_NAME = "YourKeyVaultName"
#     os.environ["AZURE_CLIENT_ID"] = mssparkutils.credentials.getSecret(KEY_VAULT_NAME, "Fabric-Client-ID")
#     os.environ["AZURE_CLIENT_SECRET"] = mssparkutils.credentials.getSecret(KEY_VAULT_NAME, "Fabric-Client-Secret")
#     os.environ["AZURE_TENANT_ID"] = mssparkutils.credentials.getSecret(KEY_VAULT_NAME, "Fabric-Tenant-ID")
# except ImportError:
#     pass # Not running in Fabric or notebookutils not available
# except Exception as e:
#     print(f"Key Vault access failed: {e}")

# Verify credentials are present
required_vars = ["AZURE_CLIENT_ID", "AZURE_CLIENT_SECRET", "AZURE_TENANT_ID"]
missing = [v for v in required_vars if not os.getenv(v)]

print("\nüîê IDENTITY CHECK:")
if missing:
    print(f"‚ùå Missing required environment variables: {', '.join(missing)}")
    print("   ‚ö†Ô∏è  System will fallback to DefaultAzureCredential (User Identity or Managed Identity)")
else:
    client_id = os.getenv("AZURE_CLIENT_ID")
    masked_id = f"{client_id[:4]}...{client_id[-4:]}" if client_id and len(client_id) > 8 else "********"
    print(f"‚úÖ Service Principal Configured")
    print(f"   Client ID: {masked_id}")
    print(f"   Tenant ID: {os.getenv('AZURE_TENANT_ID')}")


üîê IDENTITY CHECK:
‚úÖ Service Principal Configured
   Client ID: 4a49...64f9
   Tenant ID: dd29478d-624e-429e-b453-fffc969ac768


In [5]:
# Configuration
DAYS_TO_ANALYZE = 28

# OUTPUT_DIR: Where to save the reports.
# v0.1.6+ Update: You can now provide a relative path (e.g., "monitor_hub_analysis") 
# and it will automatically resolve to "/lakehouse/default/Files/monitor_hub_analysis" 
# when running in Fabric.
OUTPUT_DIR = "monitor_hub_analysis" 

# If you prefer an explicit absolute path, you can still use it:
# OUTPUT_DIR = "/lakehouse/default/Files/monitor_hub_analysis"

In [6]:
pipeline = MonitorHubPipeline(OUTPUT_DIR)
results = pipeline.run_complete_analysis(days=DAYS_TO_ANALYZE)
pipeline.print_results_summary(results)

2025-12-03 23:51:36 | INFO | usf_fabric_monitoring | Monitor Hub Pipeline initialized
2025-12-03 23:51:36 | INFO | usf_fabric_monitoring | Starting Monitor Hub analysis for 28 days (API max 28)
2025-12-03 23:51:36 | INFO | usf_fabric_monitoring | Step 1: Extracting historical activities from Fabric APIs
2025-12-03 23:51:36 | INFO | usf_fabric_monitoring.scripts.extract_historical_data | üîê Authenticating with Microsoft Fabric...
2025-12-03 23:51:36 | INFO | usf_fabric_monitoring.core.auth | Using Service Principal credentials
2025-12-03 23:51:36 | INFO | usf_fabric_monitoring.scripts.extract_historical_data | üì° Initializing Fabric data extractor...
2025-12-03 23:51:36 | INFO | usf_fabric_monitoring.scripts.extract_historical_data | üß™ Testing API connectivity...
2025-12-03 23:51:36 | INFO | usf_fabric_monitoring.core.auth | Acquiring Fabric API access token via Azure Identity
2025-12-03 23:51:36 | INFO | usf_fabric_monitoring | Starting Monitor Hub analysis for 28 days (API max 

KeyboardInterrupt: 

## 5. Advanced Analysis & Visualization (Spark)
The following cells use PySpark to load the raw data generated by the pipeline and provide interactive visualizations of failures, error codes, and trends.

In [None]:
# 1. Setup Spark & Paths
import os
import glob
from usf_fabric_monitoring.core.utils import resolve_path

# Initialize Spark Session (if not already active)
spark = None
try:
    from pyspark.sql import SparkSession
    from pyspark.sql.functions import col, to_timestamp, when, count, desc, lit, unix_timestamp, coalesce, abs as abs_val, split, initcap, regexp_replace, element_at, substring, avg, max, min
    from pyspark.sql.types import StructType, StructField, StringType, DoubleType

    if 'spark' not in locals() or spark is None:
        print("‚öôÔ∏è Initializing Spark Session...")
        spark = SparkSession.builder \
            .appName("FabricFailureAnalysis") \
            .getOrCreate()
        print(f"‚úÖ Spark Session Created: {spark.version}")
except ImportError:
    print("‚ö†Ô∏è PySpark not installed or configured. Skipping Spark-based analysis.")
except Exception as e:
    print(f"‚ö†Ô∏è Failed to initialize Spark: {e}. Skipping Spark-based analysis.")

# Resolve the output directory to an absolute path
# This ensures that if you used a relative path like "monitor_hub_analysis",
# it is correctly resolved to "/lakehouse/default/Files/monitor_hub_analysis" for Spark.
resolved_output_dir = str(resolve_path(OUTPUT_DIR))

BASE_PATH = os.path.join(resolved_output_dir, "fabric_item_details")
AUDIT_LOG_PATH = os.path.join(resolved_output_dir, "raw_data/daily")

print(f"üìÇ Analysis Paths:")
print(f"  - Item Details: {BASE_PATH}")
print(f"  - Audit Logs:   {AUDIT_LOG_PATH}")

In [None]:
# 2. Load Data from Parquet (Source of Truth)

import os
from pyspark.sql.functions import col, to_timestamp, unix_timestamp, coalesce, initcap, regexp_replace, element_at, split, when, lit

PARQUET_PATH = os.path.join(resolved_output_dir, "parquet")

def load_parquet_data():
    """Loads the enriched activity data from Parquet files."""
    try:
        path_pattern = os.path.join(PARQUET_PATH, "activities_*.parquet")
        print(f"üìÇ Loading Parquet files from {path_pattern}...")
        
        # Read Parquet
        df = spark.read.parquet(path_pattern)
        
        # Filter for Failures (checking both case conventions)
        # Detailed jobs use 'status', raw logs use 'Status'
        # We check if columns exist before filtering to avoid AnalysisException
        cols = df.columns
        conditions = []
        if "status" in cols:
            conditions.append(col("status") == "Failed")
        if "Status" in cols:
            conditions.append(col("Status") == "Failed")
            
        if conditions:
            from functools import reduce
            # Combine conditions with OR
            failed_df = df.filter(reduce(lambda x, y: x | y, conditions))
            return failed_df
        else:
            print("‚ö†Ô∏è 'status' column not found in Parquet data.")
            return df # Return all if status not found, or empty?
            
    except Exception as e:
        print(f"‚ö†Ô∏è Could not load Parquet data: {str(e)}")
        return None

# Execute Loading
final_df = load_parquet_data()

if final_df:
    print(f"‚úÖ Successfully loaded {final_df.count()} failure records from Parquet.")
    
    # Handle mixed schema (snake_case from detailed jobs vs PascalCase from raw logs)
    # Detailed jobs (snake_case) are the primary source for failure details.
    
    # Helper to safely get column or null
    def safe_col(c):
        return col(c) if c in final_df.columns else lit(None)

    final_df = final_df.select(
        coalesce(safe_col("workspace_name"), safe_col("WorkSpaceName")).alias("Workspace"),
        coalesce(safe_col("item_name"), safe_col("ItemName")).alias("Item Name"),
        coalesce(safe_col("item_type"), safe_col("ItemType")).alias("Item Type"),
        coalesce(safe_col("activity_type"), safe_col("Operation")).alias("Invoke Type"),
        coalesce(safe_col("start_time"), safe_col("CreationTime")).alias("Start Time"),
        coalesce(safe_col("end_time"), safe_col("EndTime")).alias("End Time"),
        coalesce(safe_col("duration"), safe_col("Duration")).alias("Duration (s)"),
        coalesce(safe_col("submitted_by"), safe_col("UserId")).alias("User ID"),
        
        # User Name Extraction
        coalesce(
            initcap(regexp_replace(element_at(split(coalesce(safe_col("submitted_by"), safe_col("UserId")), "@"), 1), "\\.", " ")),
            safe_col("submitted_by"), 
            safe_col("UserId")
        ).alias("User Name"),
        
        # Error Details (Try to get from failure_reason struct)
        safe_col("failure_reason.errorCode").alias("Error Code"),
        safe_col("failure_reason.message").alias("Error Message")
    )
else:
    print("‚ùå No failure data found.")


In [None]:
# 3. Analysis & Display

if final_df:
    # --- 1. Summary Statistics ---
    total_failures = final_df.count()
    unique_workspaces = final_df.select("Workspace").distinct().count()
    unique_items = final_df.select("Item Name").distinct().count()
    
    print(f"\nüìä SUMMARY STATISTICS")
    print(f"Total Failures: {total_failures}")
    print(f"Affected Workspaces: {unique_workspaces}")
    print(f"Affected Items: {unique_items}")

    # --- 2. Top 10 Failing Items ---
    print("\nüèÜ TOP 10 FAILING ITEMS")
    top_items = final_df.groupBy("Workspace", "Item Name", "Item Type") \
        .count() \
        .orderBy(col("count").desc()) \
        .limit(10)
    top_items.show(truncate=False)

    # --- 3. Failures by User ---
    print("\nüë§ FAILURES BY USER")
    user_stats = final_df.groupBy("User Name") \
        .count() \
        .orderBy(col("count").desc())
    user_stats.show(truncate=False)

    # --- 4. Error Code Distribution ---
    print("\n‚ö†Ô∏è ERROR CODE DISTRIBUTION")
    error_stats = final_df.groupBy("Error Code") \
        .count() \
        .orderBy(col("count").desc())
    error_stats.show(truncate=False)

    # --- 5. Recent Failures (Last 20) ---
    print("\nüïí MOST RECENT FAILURES")
    final_df.select("Start Time", "Workspace", "Item Name", "User Name", "Error Message") \
        .orderBy(col("Start Time").desc()) \
        .show(20, truncate=50)
else:
    print("No data available for analysis.")