# Delta Lake Health - Visual Report

This notebook generates interactive visualizations for Delta Lake table health metrics using Plotly. It analyzes a Delta Lake table once and creates multiple visualizations:

- Health score dashboard
- File size distribution and partition analysis
- Partition skewness visualization
- Historical trends (simulated)
- Operation timeline visualization

The visualizations help identify tables that need maintenance through vacuum, optimization, or rebalancing.

In [1]:
import os
import sys
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "../../../")))
from deltalake import DeltaTable
from delta_lake_health.health_analyzers.delta_analyzer import DeltaAnalyzer
from delta_lake_health.visualization.notebook import (
    health_dashboard,
    skew_analysis,
    delta_operations
)

import plotly.io as pio
pio.templates.default = "plotly_white"

## Select and Analyze Delta Table

We'll analyze a complex table that has multiple operations and data skew across partitions.

In [None]:
from src.delta_lake_health.demos.populate_sample_delta import get_table_path, create_sample_delta_table

try:
    table_path = get_table_path("complex")
    dt = DeltaTable(table_path)
    print(f"Using existing complex Delta table: {table_path}")
except Exception:
    print("Creating sample Delta tables...")
    table_path = create_sample_delta_table()
    print(f"Created new complex Delta table: {table_path}")

analyzer = DeltaAnalyzer(environment="python")
metrics = analyzer.analyze(
    table_path=table_path,
    skew_threshold=0.2,
    vacuum_size_ratio_threshold=0.8,
    orphan_file_ratio_threshold=0.85,
    small_file_ratio_threshold=0.3
)

metrics.table_path = table_path
metrics.print_results()

Using existing complex Delta table: /Users/alvaromoure/desarrollo/delta-lake-health/src/delta_lake_health/demos/../../../data/tables/complex_delta

Delta Table Analysis Results:
----------------------------
Health Score: 28.3/100 (very_unhealthy)
Version Count: 16
Record Count: 1,773
Operations: 15 writes, 2 deletes, 0 optimizes
Skewness: 0.62 (Max), 0.28 (Avg)

Partition Skew Metrics:
Partition Columns: day
Partition Count: 5
Max Records: 601 (Partition: Mon)
Min Records: 228 (Partition: Thu)
Table Size: 0.05 MB
Folder Size: 0.12 MB
Total Files: 40 files
Data Files: 22 files
Small Files: 22 files
Orphan Files: 18 files
Needs Vacuum: True
Has Orphan Files: True
Needs Optimize: True


In [None]:
dashboard = health_dashboard.create_health_dashboard(metrics)
dashboard

In [None]:
skew_viz = skew_analysis.visualize_skew_analysis(metrics)
skew_viz

In [None]:
operations = delta_operations.visualize_delta_operations(metrics.table_path)
operations