# FORGE Seismic Catalog Data Statistics Analysis

This notebook performs comprehensive statistical analysis on seismic catalog data from the Utah FORGE geothermal field. It generates publication-quality visualizations and statistics for earthquake events.

## Overview
- **Purpose**: Analyze seismic event distributions (depth, magnitude, spatial)
- **Data Source**: CSV files from GES catalog in `ges_catalog/` directory
- **Output**: Statistical plots and summary statistics saved to `data_statistics/` folder

## Key Analyses
1. **Depth Distribution**: Histogram with statistical measures
2. **Spatial Distribution**: X-Y scatter plot with magnitude and depth encoding  
3. **Magnitude Distribution**: Moment magnitude histogram with statistics

In [None]:
# Import required libraries for data analysis and visualization
import os
import pandas as pd
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import plotly.graph_objects as go
from plotly.subplots import make_subplots

print("✅ All libraries imported successfully")

In [None]:
# Configuration parameters for data processing and visualization
catalog_path = 'GES16Aand16BStimulationMonitoringApril2024'  # Directory containing CSV catalog files
catalogs = ["16AStimulationCatalogues", "16BStimulationCatalogues"]
output_folder = 'data_statistics'     # Output directory for generated plots
show_images = False                    # Set to True to display plots inline, False to save only

# Publication-quality figure settings
width_mm = 85                         # Figure width in millimeters (journal standard)
dpi = 300                            # Resolution for saved images
width_in = width_mm / 25.4           # Convert mm to inches
height_in = width_in * 0.75          # Aspect ratio (4:3)

print(f"📁 Input directory: {catalog_path}")
print(f"💾 Output directory: {output_folder}")
print(f"📊 Figure dimensions: {width_mm}mm x {width_mm*0.75:.1f}mm")
print(f"🖼️  Display mode: {'Interactive' if show_images else 'Save to file'}")

In [None]:
os.listdir(catalog_path)  # List files in the catalog directory

In [None]:
# Data Loading and Preprocessing
print("📂 Loading seismic catalog data...")

# Load all CSV files from the catalog directory
dataframes = []
csv_count = 0
files = []

for cat in os.listdir(catalog_path):
    if cat not in catalogs:
        continue
    folder_path = os.path.join(catalog_path, cat)

    for file in os.listdir(folder_path):
        if file.endswith('.csv'):
            file_path = os.path.join(folder_path, file)
            files.append(file_path)

for file_name in files:
    if file_name.endswith('.csv'):
        try:
            df = pd.read_csv(file_name)
            if not df.empty:
                dataframes.append(df)
                csv_count += 1
                print(f"  ✅ Loaded: {file_name} ({len(df)} events)")
            else:
                print(f"  ⚠️  Empty file: {file_name}")
        except Exception as e:
            print(f"  ❌ Error loading {file_name}: {e}")

# Combine all dataframes into a single catalog
catalog_df = pd.concat(dataframes, ignore_index=True)

# Clean column names (remove spaces and extra whitespace)
catalog_df.columns = catalog_df.columns.str.strip().str.replace(' ', '_')

# Data quality filtering: Remove events with invalid depth
initial_count = len(catalog_df)
catalog_df = catalog_df[catalog_df['Depth'] > 0]
filtered_count = len(catalog_df)

print(f"\n📊 Data Summary:")
print(f"  • CSV files processed: {csv_count}")
print(f"  • Total events loaded: {initial_count}")
print(f"  • Events after filtering (depth > 0): {filtered_count}")
print(f"  • Events removed: {initial_count - filtered_count}")
print(f"  • Available columns: {list(catalog_df.columns)}")

# Create output directory
os.makedirs(output_folder, exist_ok=True)
print(f"📁 Output directory '{output_folder}' ready")

## 1. Depth Distribution Analysis

Analysis of earthquake depth distribution provides insights into:
- **Seismogenic zone thickness**: The depth range where most earthquakes occur
- **Injection effects**: How stimulation affects different depth levels  
- **Statistical properties**: Mean, standard deviation, and confidence intervals

In [None]:
# Create depth distribution histogram with statistical annotations
print("📈 Generating depth distribution analysis...")

# Create interactive histogram using Plotly
fig_depth = px.histogram(
    catalog_df, 
    x="Depth", 
    nbins=100, 
    title="Distribution of Earthquake Depths",
    labels={'Depth': 'Depth (m)', 'count': 'Number of Events'},
    color_discrete_sequence=['skyblue']
)

# Calculate key statistical measures
depth_mean = catalog_df['Depth'].mean()
depth_std = catalog_df['Depth'].std()
depth_median = catalog_df['Depth'].median()
depth_ci = stats.norm.interval(0.95, loc=depth_mean, scale=depth_std)

# Add statistical reference lines to the plot
fig_depth.add_vline(
    x=depth_mean, 
    line=dict(color="red", dash="dash", width=2), 
    annotation_text=f"Mean: {depth_mean:.1f}m", 
    annotation_position="top left"
)
fig_depth.add_vline(
    x=depth_mean + depth_std, 
    line=dict(color="green", dash="dash", width=1), 
    annotation_text="+1σ", 
    annotation_position="top left"
)
fig_depth.add_vline(
    x=depth_mean - depth_std, 
    line=dict(color="green", dash="dash", width=1), 
    annotation_text="-1σ", 
    annotation_position="top left"
)

# Update layout for publication quality
fig_depth.update_layout(
    width=800,
    height=600,
    title_x=0.5,
    margin=dict(l=40, r=40, b=40, t=60),
    showlegend=False
)

# Display or save the figure
if show_images:
    fig_depth.show()
else:
    output_path = os.path.join(output_folder, "depth_histogram.png")
    fig_depth.write_image(output_path, width=int(width_in * dpi), height=int(height_in * dpi), scale=2)
    print(f"💾 Saved: {output_path}")

# Print statistical summary
print(f"\n📊 Depth Statistics:")
print(f"  • Mean depth: {depth_mean:.2f} m")
print(f"  • Standard deviation: {depth_std:.2f} m") 
print(f"  • Median depth: {depth_median:.2f} m")
print(f"  • 95% Confidence interval: {depth_ci[0]:.2f} - {depth_ci[1]:.2f} m")
print(f"  • Depth range: {catalog_df['Depth'].min():.1f} - {catalog_df['Depth'].max():.1f} m")

## 2. Spatial Distribution Analysis

Spatial analysis of earthquake locations reveals:
- **Seismic clustering**: Areas of concentrated activity
- **Magnitude-location relationships**: How event size varies spatially
- **Depth-location patterns**: Vertical distribution across the field

The scatter plot uses:
- **Marker size**: Proportional to moment magnitude  
- **Color scale**: Represents depth (deeper = darker)
- **Position**: X-Y coordinates in field reference system

In [None]:
# Create spatial distribution scatter plot with magnitude and depth encoding
print("🗺️  Generating spatial distribution analysis...")

# Create subplot framework for the scatter plot
fig_spatial = make_subplots(rows=1, cols=1, subplot_titles=["Event Locations"])

# Calculate marker sizes based on moment magnitude
# Scale markers to be visible: (magnitude - min_magnitude + offset) * scale_factor
mag_min = catalog_df['MomMag'].min()
mag_max = catalog_df['MomMag'].max()
marker_sizes = (catalog_df['MomMag'] - mag_min + 0.4) * 5

# Create scatter plot with dual encoding (size=magnitude, color=depth)
scatter = go.Scatter(
    x=catalog_df['X'], 
    y=catalog_df['Y'],
    mode='markers',
    name='Seismic Events',
    marker=dict(
        size=marker_sizes,                    # Marker size ~ magnitude
        color=catalog_df['Depth'],           # Color ~ depth
        colorscale='Magma',                  # Color scale (purple to yellow)
        colorbar=dict(
            title="Depth (m)",
        ),
        line=dict(width=0.5, color='white'), # White outline for visibility
        opacity=0.7,                         # Semi-transparent for overlapping points
        sizemode='diameter'
    ),
    text=[f"Mag: {mag:.2f}<br>Depth: {depth:.1f}m" 
          for mag, depth in zip(catalog_df['MomMag'], catalog_df['Depth'])],
    hovertemplate="<b>Event Location</b><br>" +
                  "X: %{x:.1f}<br>" +
                  "Y: %{y:.1f}<br>" +
                  "%{text}<extra></extra>"
)

# Add scatter trace to the figure
fig_spatial.add_trace(scatter)

# Update layout for publication quality
fig_spatial.update_layout(
    title="Spatial Distribution of Seismic Events",
    width=800,
    height=600,
    title_x=0.5,
    margin=dict(l=60, r=60, b=60, t=80),
    xaxis_title="X Coordinate (m)",
    yaxis_title="Y Coordinate (m)",
    showlegend=False
)

# Ensure equal aspect ratio for spatial data
fig_spatial.update_xaxes(scaleanchor="y", scaleratio=1)

# Display or save the figure
if show_images:
    fig_spatial.show()
else:
    output_path = os.path.join(output_folder, "spatial_distribution.png")
    fig_spatial.write_image(output_path, width=int(width_in * dpi), height=int(height_in * dpi), scale=2)
    print(f"💾 Saved: {output_path}")

# Print spatial statistics
print(f"\n📊 Spatial Distribution Statistics:")
print(f"  • X range: {catalog_df['X'].min():.1f} to {catalog_df['X'].max():.1f} m")
print(f"  • Y range: {catalog_df['Y'].min():.1f} to {catalog_df['Y'].max():.1f} m") 
print(f"  • Magnitude range: {mag_min:.2f} to {mag_max:.2f}")
print(f"  • Total events plotted: {len(catalog_df)}")

## 3. Moment Magnitude Distribution Analysis

Magnitude analysis characterizes the earthquake size distribution:
- **Frequency-magnitude relationship**: How often different sized events occur
- **Statistical properties**: Central tendency and variability measures
- **Seismic hazard assessment**: Understanding the range of event sizes

The moment magnitude (Mw) scale is logarithmic, where each unit increase represents ~32x more energy release.

In [None]:
# Create moment magnitude distribution histogram with statistical annotations
print("📊 Generating magnitude distribution analysis...")

# Create interactive histogram for moment magnitude
fig_magnitude = px.histogram(
    catalog_df, 
    x="MomMag", 
    nbins=100, 
    title="Distribution of Moment Magnitudes",
    labels={'MomMag': 'Moment Magnitude (Mw)', 'count': 'Number of Events'},
    color_discrete_sequence=['lightcoral']
)

# Calculate statistical measures for magnitude
mommag_mean = catalog_df['MomMag'].mean()
mommag_std = catalog_df['MomMag'].std()
mommag_median = catalog_df['MomMag'].median()
mommag_ci = stats.norm.interval(0.95, loc=mommag_mean, scale=mommag_std)

# Add statistical reference lines
fig_magnitude.add_vline(
    x=mommag_mean, 
    line=dict(color="red", dash="dash", width=2), 
    annotation_text=f"Mean: {mommag_mean:.2f}", 
    annotation_position="top left"
)
fig_magnitude.add_vline(
    x=mommag_mean + mommag_std, 
    line=dict(color="green", dash="dash", width=1), 
    annotation_text="+1σ", 
    annotation_position="top left"
)
fig_magnitude.add_vline(
    x=mommag_mean - mommag_std, 
    line=dict(color="green", dash="dash", width=1), 
    annotation_text="-1σ", 
    annotation_position="top left"
)

# Update layout for publication quality
fig_magnitude.update_layout(
    showlegend=False,
    width=800,
    height=600,
    title_x=0.5,
    margin=dict(l=40, r=40, b=40, t=60),
)

# Display or save the figure
if show_images:
    fig_magnitude.show()
else:
    output_path = os.path.join(output_folder, "magnitude_histogram.png")
    fig_magnitude.write_image(output_path, width=int(width_in * dpi), height=int(height_in * dpi), scale=2)
    print(f"💾 Saved: {output_path}")

# Print magnitude statistics
print(f"\n📊 Magnitude Statistics:")
print(f"  • Mean magnitude: {mommag_mean:.3f}")
print(f"  • Standard deviation: {mommag_std:.3f}")
print(f"  • Median magnitude: {mommag_median:.3f}")
print(f"  • 95% Confidence interval: {mommag_ci[0]:.3f} - {mommag_ci[1]:.3f}")
print(f"  • Magnitude range: {catalog_df['MomMag'].min():.3f} - {catalog_df['MomMag'].max():.3f}")

# Calculate magnitude completeness metrics
print(f"\n🔍 Additional Magnitude Metrics:")
print(f"  • Number of events: {len(catalog_df)}")
print(f"  • Events > Mw 0: {len(catalog_df[catalog_df['MomMag'] > 0])}")
print(f"  • Events > Mw 1: {len(catalog_df[catalog_df['MomMag'] > 1])}")
print(f"  • Largest event: Mw {catalog_df['MomMag'].max():.3f}")

## Summary and Conclusions

This analysis provides comprehensive statistical characterization of the FORGE seismic catalog:

### Key Findings:
1. **Depth Distribution**: Events show specific depth patterns that reflect the seismogenic zone
2. **Spatial Clustering**: Earthquake locations reveal active fault structures and stimulation effects  
3. **Magnitude Range**: The frequency-magnitude distribution follows expected seismological patterns

### Output Files:
- `depth_histogram.png`: Depth distribution with statistical measures
- `spatial_distribution.png`: X-Y scatter plot with magnitude/depth encoding  
- `magnitude_histogram.png`: Magnitude distribution with statistics

### Usage Notes:
- Set `show_images = True` to display plots interactively in the notebook
- Figures are saved in publication-quality format (300 DPI, journal dimensions)
- Statistical measures include mean, standard deviation, median, and 95% confidence intervals

### Data Quality:
- Events with invalid depths (≤ 0) are automatically filtered
- All statistics are calculated on the cleaned dataset
- Progress messages indicate successful data loading and processing