# Monitor Hub Analysis (Fix)

This notebook performs the analysis using the raw downloaded data directly, bypassing the potentially incomplete CSV reports.

## Fixes Implemented:
1.  **Workspace & Error Messages**: Merges detailed job history to populate missing fields.
2.  **User ID Recovery (Smart Merge)**: Correlates detailed jobs with base activity logs (by Item ID & Time) to preserve the original `User ID` instead of defaulting to "System".
3.  **Non-Destructive**: Runs entirely within this notebook, leaving the core library untouched to prevent breaking changes.

In [20]:
import os
import pandas as pd
from usf_fabric_monitoring.core.pipeline import MonitorHubPipeline
from usf_fabric_monitoring.core.data_loader import load_activities_from_directory

# Configuration
OUTPUT_DIR = "monitor_hub_analysis" 

# Initialize Pipeline (to access helper methods)
pipeline = MonitorHubPipeline(OUTPUT_DIR)

print(f"üìÇ Output Directory: {pipeline.output_directory}")

2025-12-04 16:03:20 | INFO | usf_fabric_monitoring | Monitor Hub Pipeline initialized
üìÇ Output Directory: monitor_hub_analysis
üìÇ Output Directory: monitor_hub_analysis


In [21]:
# 1. Load Raw Data (Skip API Extraction)

# A. Load Base Activities from 'raw_data/daily'
extraction_dir = pipeline._prepare_extraction_directory()
print(f"Loading raw activities from: {extraction_dir}")
activities = load_activities_from_directory(str(extraction_dir))
print(f"‚úÖ Loaded {len(activities)} base activities.")

# B. Load Detailed Jobs from 'fabric_item_details'
print("Loading detailed job history...")
detailed_jobs = pipeline._load_detailed_jobs()
print(f"‚úÖ Loaded {len(detailed_jobs)} detailed job records.")

# C. Optimized Smart Merge (Pandas)
import pandas as pd
import numpy as np

print("üîÑ Starting Optimized Smart Merge (Pandas)...")

# 1. Convert to DataFrames
df_activities = pd.DataFrame(activities)
df_jobs = pd.DataFrame(detailed_jobs)

# 2. Pre-process for Merge
# Ensure timestamps are datetime and UTC
def to_utc(df, col):
    if col in df.columns:
        df[col] = pd.to_datetime(df[col], utc=True, errors='coerce')
    return df

df_activities = to_utc(df_activities, "start_time")
df_jobs = to_utc(df_jobs, "startTimeUtc")

# Filter out jobs without start time or item id
df_jobs = df_jobs.dropna(subset=["startTimeUtc", "itemId"])

# Rename job columns for merge preparation
# We map 'itemId' to 'item_id' for the join key
df_jobs = df_jobs.rename(columns={
    "startTimeUtc": "job_start_time",
    "itemId": "item_id", 
    "status": "job_status",
    "failureReason": "job_failure_reason"
})

# Sort for merge_asof (required)
df_activities = df_activities.sort_values("start_time")
df_jobs = df_jobs.sort_values("job_start_time")

# 3. Merge Asof
# Find the nearest job for each activity to enrich it
# Tolerance: 5 minutes (API logs vs Job History can drift)
merged_df = pd.merge_asof(
    df_activities,
    df_jobs,
    left_on="start_time",
    right_on="job_start_time",
    by="item_id",
    tolerance=pd.Timedelta("5min"),
    direction="nearest"
)

print(f"   - Merged {len(merged_df)} records.")

# 4. Enrich Data
# Extract error message from the job's failure details
def extract_error_msg(val):
    if pd.isna(val): return None
    if isinstance(val, dict): return val.get("message")
    return str(val)

def extract_error_code(val):
    if pd.isna(val): return None
    if isinstance(val, dict): return val.get("errorCode")
    return "Unknown"

# Ensure target columns exist before filling
for col_name in ["failure_reason", "error_message", "error_code"]:
    if col_name not in merged_df.columns:
        merged_df[col_name] = None

# Apply extraction if job data was found
if "job_failure_reason" in merged_df.columns:
    merged_df["job_error_message"] = merged_df["job_failure_reason"].apply(extract_error_msg)
    merged_df["job_error_code"] = merged_df["job_failure_reason"].apply(extract_error_code)
    
    # Coalesce with existing columns
    # If activity has no error info, take it from the job
    merged_df["failure_reason"] = merged_df["failure_reason"].fillna(merged_df["job_failure_reason"].astype(str))
    merged_df["error_message"] = merged_df["error_message"].fillna(merged_df["job_error_message"])
    merged_df["error_code"] = merged_df["error_code"].fillna(merged_df["job_error_code"])
    
    # Enrich other metadata
    if "_workspace_name" in merged_df.columns:
        merged_df["workspace_name"] = merged_df["workspace_name"].fillna(merged_df["_workspace_name"])
    if "_item_name" in merged_df.columns:
        merged_df["item_name"] = merged_df["item_name"].fillna(merged_df["_item_name"])
    if "_item_type" in merged_df.columns:
        merged_df["item_type"] = merged_df["item_type"].fillna(merged_df["_item_type"])
        
    # Update status: If job failed, the activity failed (even if API said InProgress)
    merged_df.loc[merged_df["job_status"] == "Failed", "status"] = "Failed"

# 5. Convert back to list of dicts for compatibility
merged_activities = merged_df.to_dict(orient="records")

print(f"‚úÖ Smart Merge Complete.")
print(f"   - Total Activities: {len(merged_activities)}")

Loading raw activities from: monitor_hub_analysis/raw_data
‚úÖ Loaded 1194917 base activities.
Loading detailed job history...
2025-12-04 16:03:38 | INFO | usf_fabric_monitoring | Loading detailed jobs from monitor_hub_analysis/fabric_item_details/jobs_20251203_170119.json
2025-12-04 16:03:38 | INFO | usf_fabric_monitoring | Loading detailed jobs from monitor_hub_analysis/fabric_item_details/jobs_20251203_144406.json
‚úÖ Loaded 1194917 base activities.
Loading detailed job history...
2025-12-04 16:03:38 | INFO | usf_fabric_monitoring | Loading detailed jobs from monitor_hub_analysis/fabric_item_details/jobs_20251203_170119.json
2025-12-04 16:03:38 | INFO | usf_fabric_monitoring | Loading detailed jobs from monitor_hub_analysis/fabric_item_details/jobs_20251203_144406.json
2025-12-04 16:03:38 | INFO | usf_fabric_monitoring | Loading detailed jobs from monitor_hub_analysis/fabric_item_details/jobs_20251203_161006.json
2025-12-04 16:03:38 | INFO | usf_fabric_monitoring | Loading detailed 

In [22]:
# 2. Prepare DataFrame for Analysis (Pandas Fallback)

# Note: We are using Pandas directly because the local Spark environment 
# is experiencing connection issues. The data volume is small enough for Pandas.

import pandas as pd
import numpy as np

print("üîÑ Preparing Analysis DataFrame (Pandas)...")

# Convert to Pandas DataFrame
df_pd = pd.DataFrame(merged_activities)

# Ensure critical columns exist
expected_cols = ["workspace_name", "failure_reason", "error_message", "error_code", "submitted_by", "item_name", "item_type"]
for c in expected_cols:
    if c not in df_pd.columns:
        df_pd[c] = None

# Filter for Failures
final_df = df_pd[df_pd["status"] == "Failed"].copy()

count = len(final_df)
print(f"‚úÖ Filtered to {count} failures.")

üîÑ Preparing Analysis DataFrame (Pandas)...
‚úÖ Filtered to 6792 failures.
‚úÖ Filtered to 6792 failures.


In [23]:
# 3. Prepare Analysis DataFrame (Pandas)

# Helper for Coalesce
def coalesce_series(*series):
    result = series[0].copy()
    for s in series[1:]:
        result = result.fillna(s)
    return result

# Helper for User Name Extraction
def extract_user_name(user_id):
    if pd.isna(user_id) or not isinstance(user_id, str):
        return user_id
    try:
        # Extract part before @ and replace . with space
        name_part = user_id.split('@')[0]
        return name_part.replace('.', ' ').title()
    except:
        return user_id

# Select and Rename columns
analysis_df = pd.DataFrame()

# Workspace
analysis_df["Workspace"] = coalesce_series(
    final_df["workspace_name"], 
    final_df["workspace_id"]
).fillna("Unknown")

# Item Name
analysis_df["Item Name"] = final_df["item_name"].fillna("Unknown")

# Item Type
analysis_df["Item Type"] = final_df["item_type"].fillna("Unknown")

# Invoke Type
analysis_df["Invoke Type"] = final_df["activity_type"]

# Time & Duration
analysis_df["Start Time"] = final_df["start_time"]
analysis_df["End Time"] = final_df["end_time"]
analysis_df["Duration (s)"] = final_df["duration_seconds"]

# User ID
analysis_df["User ID"] = final_df["submitted_by"]

# User Name
analysis_df["User Name"] = final_df["submitted_by"].apply(extract_user_name)
# Fallback to User ID if extraction failed or was null
analysis_df["User Name"] = analysis_df["User Name"].fillna(analysis_df["User ID"])

# Error Details
analysis_df["Error Message"] = coalesce_series(
    final_df["failure_reason"], 
    final_df["error_message"], 
    final_df["error_code"]
).fillna("Unknown Error")

analysis_df["Error Code"] = final_df["error_code"]

print("‚úÖ Analysis DataFrame Prepared.")
print(analysis_df.head(5))

‚úÖ Analysis DataFrame Prepared.
                     Workspace                              Item Name  \
338       ABBA Human Resources  Pipeline_Vanessa_Exit_Interview Final   
340      RE Service - Data Hub             DF_PL_000_Run_ETL_Pipeline   
342       ABBA Human Resources  Pipeline_Vanessa_Exit_Interview Final   
343      RE Service - Data Hub             DF_PL_000_Run_ETL_Pipeline   
2067  ABBA Lakehouse [PRJ UAT]            Orchestrate - Fusion MASTER   

         Item Type  Invoke Type                Start Time End Time  \
338   DataPipeline  RunArtifact 2025-11-06 01:00:03+00:00     None   
340   DataPipeline  RunArtifact 2025-11-06 01:00:08+00:00     None   
342   DataPipeline  RunArtifact 2025-11-06 01:01:02+00:00     None   
343   DataPipeline  RunArtifact 2025-11-06 01:01:02+00:00     None   
2067  DataPipeline  RunArtifact 2025-11-06 04:00:05+00:00     None   

      Duration (s)        User ID      User Name  \
338            0.0   Jaime.melero   Jaime Melero   
340

In [24]:
# 4. Execute Analysis (Pandas)

if not analysis_df.empty:
    # --- 1. Summary Statistics ---
    total_failures = len(analysis_df)
    unique_workspaces = analysis_df["Workspace"].nunique()
    unique_items = analysis_df["Item Name"].nunique()
    
    print(f"\nüìä SUMMARY STATISTICS")
    print(f"Total Failures: {total_failures}")
    print(f"Affected Workspaces: {unique_workspaces}")
    print(f"Affected Items: {unique_items}")

    # --- 2. Top 10 Failing Items ---
    print("\nüèÜ TOP 10 FAILING ITEMS")
    top_items = analysis_df.groupby(["Workspace", "Item Name", "Item Type"]) \
        .size() \
        .reset_index(name="count") \
        .sort_values("count", ascending=False) \
        .head(10)
    print(top_items.to_string(index=False))

    # --- 3. Failures by User ---
    print("\nüë§ FAILURES BY USER")
    user_stats = analysis_df.groupby("User Name") \
        .size() \
        .reset_index(name="count") \
        .sort_values("count", ascending=False)
    print(user_stats.to_string(index=False))

    # --- 4. Error Message Distribution ---
    print("\n‚ö†Ô∏è ERROR MESSAGE DISTRIBUTION")
    error_stats = analysis_df.groupby("Error Message") \
        .size() \
        .reset_index(name="count") \
        .sort_values("count", ascending=False)
    print(error_stats.to_string(index=False))

    # --- 5. Recent Failures (Last 20) ---
    print("\nüïí MOST RECENT FAILURES")
    recent_failures = analysis_df[["Start Time", "Workspace", "Item Name", "User Name", "Error Message"]] \
        .sort_values("Start Time", ascending=False) \
        .head(20)
    
    # Truncate long error messages for display
    pd.set_option('display.max_colwidth', 100)
    print(recent_failures.to_string(index=False))
else:
    print("No failure data found.")


üìä SUMMARY STATISTICS
Total Failures: 6792
Affected Workspaces: 31
Affected Items: 106

üèÜ TOP 10 FAILING ITEMS
             Workspace                                                 Item Name    Item Type  count
EDP HR Ingestion [DEV]                                 NB_Load_API_Data_To_Table     Notebook    804
   EDP Ingestion [DEV]                                010_GraphAPIADGroupMembers DataPipeline    430
EDP HR Ingestion [DEV]                      002_NB_populate_ipeople_date_columns     Notebook    379
EDP HR Ingestion [DEV]                                       010_iPeopleLoopData DataPipeline    230
EDP HR Ingestion [DEV]                               010_CornerStoneLoopDataLoad DataPipeline    223
 RE Finance - Hyperion Actual_Forecast_Budget - Current Month_FY - Process State     Dataflow    206
  ABBA Human Resources                                                  CCLookup     Dataflow    198
   EDP Ingestion [DEV]                                      NB_adgroups_fil

In [25]:
analysis_df.head(5)

Unnamed: 0,Workspace,Item Name,Item Type,Invoke Type,Start Time,End Time,Duration (s),User ID,User Name,Error Message,Error Code
338,ABBA Human Resources,Pipeline_Vanessa_Exit_Interview Final,DataPipeline,RunArtifact,2025-11-06 01:00:03+00:00,,0.0,Jaime.melero,Jaime Melero,"{'requestId': '26e6f207-0857-4b7e-b587-ccdac3373d55', 'errorCode': 'Failed', 'message': 'Operati...",Failed
340,RE Service - Data Hub,DF_PL_000_Run_ETL_Pipeline,DataPipeline,RunArtifact,2025-11-06 01:00:08+00:00,,0.0,Steven.Morris,Steven Morris,"{'requestId': '8fa8da31-7569-4f27-ad9d-043861614be9', 'errorCode': 'Failed', 'message': 'Operati...",Failed
342,ABBA Human Resources,Pipeline_Vanessa_Exit_Interview Final,DataPipeline,RunArtifact,2025-11-06 01:01:02+00:00,,0.0,Jaime.melero,Jaime Melero,"{'requestId': '26e6f207-0857-4b7e-b587-ccdac3373d55', 'errorCode': 'Failed', 'message': 'Operati...",Failed
343,RE Service - Data Hub,DF_PL_000_Run_ETL_Pipeline,DataPipeline,RunArtifact,2025-11-06 01:01:02+00:00,,0.0,Steven.Morris,Steven Morris,"{'requestId': '8fa8da31-7569-4f27-ad9d-043861614be9', 'errorCode': 'Failed', 'message': 'Operati...",Failed
2067,ABBA Lakehouse [PRJ UAT],Orchestrate - Fusion MASTER,DataPipeline,RunArtifact,2025-11-06 04:00:05+00:00,,0.0,Matt.Bailey,Matt Bailey,"{'requestId': 'c03d2d0b-c61c-483b-af49-ea16f422320a', 'errorCode': 'Failed', 'message': 'Operati...",Failed
