# Indonesian Earthquake Markov Chain Model

This notebook implements a discrete-time Markov Chain model for analyzing earthquake patterns in Indonesia using the Kaggle dataset.

## Overview
- **Goal**: Build a Markov Chain model to analyze earthquake state transitions
- **Data**: Indonesian earthquake catalog (TSV format)
- **Approach**: Create states based on magnitude and optionally location, then compute transition probabilities

## Model Features
1. **Magnitude-only states**: Small, Medium, Large earthquakes
2. **Combined states**: Magnitude + coarse geographical regions
3. **N-step transition analysis**: Compute multi-step transition probabilities
4. **Exploratory analysis**: Summary statistics and insights

In [15]:
# Import required libraries
import pandas as pd
import numpy as np
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

Libraries imported successfully
Pandas version: 2.2.3
NumPy version: 2.1.3


## Configuration

Set up the data path and model parameters. **Edit the DATA_PATH variable to point to your local dataset file.**

In [16]:
# Configuration Section
# ===================

# Data path - EDIT THIS to point to your local dataset
DATA_PATH = "data/katalog_gempa_v2.tsv"  # Update this path as needed

# Magnitude thresholds for state definition
MAG_THRESHOLD_MIN = 2.0  # Minimum magnitude to include
MAG_SMALL_MAX = 4.0      # Small earthquakes: mag < 4.0
MAG_MEDIUM_MAX = 6.0     # Medium earthquakes: 4.0 <= mag < 6.0
                         # Large earthquakes: mag >= 6.0

# Geographical bins for Indonesia (approximate bounds)
# Indonesia roughly spans: lat [-12, 6], lon [90, 150]
LAT_BINS = [-12, -6, 0, 6]      # Southern, Central, Northern regions
LON_BINS = [90, 110, 130, 150]  # Western, Central, Eastern regions

print("Configuration loaded:")
print(f"Data path: {DATA_PATH}")
print(f"Magnitude thresholds: Small < {MAG_SMALL_MAX}, Medium < {MAG_MEDIUM_MAX}, Large >= {MAG_MEDIUM_MAX}")
print(f"Latitude bins: {LAT_BINS}")
print(f"Longitude bins: {LON_BINS}")

Configuration loaded:
Data path: data/katalog_gempa_v2.tsv
Magnitude thresholds: Small < 4.0, Medium < 6.0, Large >= 6.0
Latitude bins: [-12, -6, 0, 6]
Longitude bins: [90, 110, 130, 150]


## State Definition Functions

Helper functions to classify earthquakes into discrete states.

In [17]:
def assign_magnitude_state(mag: float) -> str:
    """
    Classify earthquake magnitude into discrete states.
    
    Args:
        mag (float): Earthquake magnitude
        
    Returns:
        str: State label ('Small', 'Medium', 'Large')
    """
    if pd.isna(mag):
        return 'Unknown'
    elif mag < MAG_SMALL_MAX:
        return 'Small'
    elif mag < MAG_MEDIUM_MAX:
        return 'Medium'
    else:
        return 'Large'


def assign_region_state(lat: float, lon: float) -> str:
    """
    Assign geographical region based on latitude and longitude bins.
    
    Args:
        lat (float): Latitude
        lon (float): Longitude
        
    Returns:
        str: Region identifier (e.g., 'R1', 'R2', ...)
    """
    if pd.isna(lat) or pd.isna(lon):
        return 'Unknown'
    
    # Find latitude bin
    lat_idx = np.digitize(lat, LAT_BINS) - 1
    lat_idx = max(0, min(lat_idx, len(LAT_BINS) - 2))
    
    # Find longitude bin
    lon_idx = np.digitize(lon, LON_BINS) - 1
    lon_idx = max(0, min(lon_idx, len(LON_BINS) - 2))
    
    # Create region ID
    region_id = lat_idx * (len(LON_BINS) - 1) + lon_idx + 1
    return f'R{region_id}'


def build_combined_state(row) -> str:
    """
    Create combined state from magnitude and region.
    
    Args:
        row: DataFrame row with magnitude and coordinates
        
    Returns:
        str: Combined state (e.g., 'R1_Small', 'R2_Large')
    """
    mag_state = assign_magnitude_state(row['mag'])
    region_state = assign_region_state(row['lat'], row['lon'])
    
    if mag_state == 'Unknown' or region_state == 'Unknown':
        return 'Unknown'
    
    return f'{region_state}_{mag_state}'


# Test the functions
print("Testing state assignment functions:")
print(f"Magnitude 3.5 -> {assign_magnitude_state(3.5)}")
print(f"Magnitude 5.2 -> {assign_magnitude_state(5.2)}")
print(f"Magnitude 7.1 -> {assign_magnitude_state(7.1)}")
print(f"Coordinates (-6.2, 106.8) -> {assign_region_state(-6.2, 106.8)}")
print(f"Combined state example: {build_combined_state(pd.Series({'mag': 5.2, 'lat': -6.2, 'lon': 106.8}))}")

Testing state assignment functions:
Magnitude 3.5 -> Small
Magnitude 5.2 -> Medium
Magnitude 7.1 -> Large
Coordinates (-6.2, 106.8) -> R1
Combined state example: R1_Medium


## Data Loading and Preprocessing Functions

In [18]:
def load_and_preprocess_data(file_path: str) -> pd.DataFrame:
    """
    Load earthquake data from TSV file and perform preprocessing.
    
    Args:
        file_path (str): Path to the TSV file
        
    Returns:
        pd.DataFrame: Cleaned and processed earthquake data
    """
    try:
        # Load data with tab separator
        print(f"Loading data from: {file_path}")
        df = pd.read_csv(file_path, sep='\t')
        print(f"Initial data shape: {df.shape}")
        print(f"Columns: {list(df.columns)}")
        
        # Display sample of raw data
        print("\nFirst few rows:")
        print(df.head())
        
        # Handle different possible column names
        column_mapping = {}
        
        # Map date columns
        if 'tgl' in df.columns:
            column_mapping['date'] = 'tgl'
        elif 'date' in df.columns:
            column_mapping['date'] = 'date'
        
        # Map origin time columns  
        if 'ot' in df.columns:
            column_mapping['origin_time'] = 'ot'
        elif 'origin_time' in df.columns:
            column_mapping['origin_time'] = 'origin_time'
        elif 'time' in df.columns:
            column_mapping['origin_time'] = 'time'
            
        # Map coordinate columns
        for coord in ['lat', 'lon', 'latitude', 'longitude']:
            if coord in df.columns:
                if coord.startswith('lat'):
                    column_mapping['lat'] = coord
                elif coord.startswith('lon'):
                    column_mapping['lon'] = coord
                    
        # Map magnitude column
        for mag_col in ['mag', 'magnitude', 'Magnitude']:
            if mag_col in df.columns:
                column_mapping['mag'] = mag_col
                break
                
        # Map depth column if available
        if 'depth' in df.columns:
            column_mapping['depth'] = 'depth'
        
        print(f"\nColumn mapping: {column_mapping}")
        
        # Rename columns to standard names
        rename_dict = {v: k for k, v in column_mapping.items()}
        df = df.rename(columns=rename_dict)
        
        # Create unified datetime column
        if 'date' in df.columns and 'origin_time' in df.columns:
            # Combine date and time
            df['datetime_str'] = df['date'].astype(str) + ' ' + df['origin_time'].astype(str)
            df['event_time'] = pd.to_datetime(df['datetime_str'], errors='coerce')
        elif 'origin_time' in df.columns:
            # Use origin_time directly
            df['event_time'] = pd.to_datetime(df['origin_time'], errors='coerce')
        elif 'date' in df.columns:
            # Use date only
            df['event_time'] = pd.to_datetime(df['date'], errors='coerce')
        else:

            raise ValueError("No suitable date/time columns found")
            
        # Convert numeric columns
        for col in ['lat', 'lon', 'mag', 'depth']:
            if col in df.columns:
                df[col] = pd.to_numeric(df[col], errors='coerce')
        
        # Remove rows with missing essential data
        initial_count = len(df)
        df = df.dropna(subset=['event_time', 'lat', 'lon', 'mag'])
        df = df[df['mag'] >= MAG_THRESHOLD_MIN]  # Filter minimum magnitude
        
        print(f"\nData cleaning:")
        print(f"Removed {initial_count - len(df)} rows with missing/invalid data")
        print(f"Final data shape: {df.shape}")
        
        # Sort by event time
        df = df.sort_values('event_time').reset_index(drop=True)
        
        # Add state columns
        df['state_mag'] = df['mag'].apply(assign_magnitude_state)
        df['state_region_mag'] = df.apply(build_combined_state, axis=1)
        
        print(f"\nMagnitude distribution:")
        print(df['state_mag'].value_counts())
        print(f"\nDate range: {df['event_time'].min()} to {df['event_time'].max()}")
        
        return df
        
    except Exception as e:
        print(f"Error loading data: {e}")
        print("Please check the file path and format.")
        return None


# Example of how to load data (will be used later)
print("Data loading function defined.")
print("Call load_and_preprocess_data(DATA_PATH) to load your earthquake data.")

Data loading function defined.
Call load_and_preprocess_data(DATA_PATH) to load your earthquake data.


## Markov Chain Construction Functions

In [19]:
def compute_transition_matrix(states: list, state_order: list = None) -> tuple:
    """
    Build transition count and probability matrices from sequence of states.
    
    Args:
        states (list): Time-ordered sequence of states
        state_order (list): Fixed ordering of states for matrix indexing
        
    Returns:
        tuple: (counts_array, probs_df, counts_df)
            - counts_array: 2D numpy array of transition counts
            - probs_df: DataFrame of transition probabilities  
            - counts_df: DataFrame of transition counts
    """
    # Remove any 'Unknown' states
    clean_states = [s for s in states if s != 'Unknown']
    
    if len(clean_states) < 2:
        raise ValueError("Need at least 2 valid states to build transition matrix")
    
    # Determine state order
    if state_order is None:
        state_order = sorted(list(set(clean_states)))
    
    n_states = len(state_order)
    state_to_idx = {state: idx for idx, state in enumerate(state_order)}
    
    # Initialize count matrix
    counts = np.zeros((n_states, n_states), dtype=int)
    
    # Count transitions
    for i in range(len(clean_states) - 1):
        current_state = clean_states[i]
        next_state = clean_states[i + 1]
        
        if current_state in state_to_idx and next_state in state_to_idx:
            current_idx = state_to_idx[current_state]
            next_idx = state_to_idx[next_state]
            counts[current_idx, next_idx] += 1
    
    # Convert to probability matrix
    # Handle rows with zero transitions
    probs = np.zeros_like(counts, dtype=float)
    for i in range(n_states):
        row_sum = counts[i, :].sum()
        if row_sum > 0:
            probs[i, :] = counts[i, :] / row_sum
        else:
            # If no transitions from this state, use uniform distribution
            probs[i, :] = 1.0 / n_states
    
    # Create DataFrames for better visualization
    counts_df = pd.DataFrame(counts, index=state_order, columns=state_order)
    probs_df = pd.DataFrame(probs, index=state_order, columns=state_order)
    
    return counts, probs_df, counts_df


def n_step_transition_matrix(P: np.ndarray, n: int) -> np.ndarray:
    """
    Compute n-step transition matrix P^n.
    
    Args:
        P (np.ndarray): One-step transition probability matrix
        n (int): Number of steps
        
    Returns:
        np.ndarray: n-step transition matrix
    """
    if n == 0:
        return np.eye(P.shape[0])
    elif n == 1:
        return P.copy()
    else:
        return np.linalg.matrix_power(P, n)


def n_step_transition_probability(P: np.ndarray, state_order: list, 
                                 start_state: str, end_state: str, n: int) -> float:
    """
    Compute probability of transitioning from start_state to end_state in n steps.
    
    Args:
        P (np.ndarray): One-step transition probability matrix
        state_order (list): Ordered list of state names
        start_state (str): Initial state
        end_state (str): Target state  
        n (int): Number of steps
        
    Returns:
        float: Transition probability
    """
    if start_state not in state_order:
        raise ValueError(f"Start state '{start_state}' not in state_order")
    if end_state not in state_order:
        raise ValueError(f"End state '{end_state}' not in state_order")
    
    start_idx = state_order.index(start_state)
    end_idx = state_order.index(end_state)
    
    P_n = n_step_transition_matrix(P, n)
    return P_n[start_idx, end_idx]


print("Markov chain functions defined successfully.")

Markov chain functions defined successfully.


## Analysis and Visualization Functions

In [20]:
def analyze_markov_chain(df: pd.DataFrame, state_column: str, analysis_name: str):
    """
    Perform comprehensive analysis of a Markov chain.
    
    Args:
        df (pd.DataFrame): Earthquake data
        state_column (str): Column name containing states
        analysis_name (str): Name for the analysis (for printing)
    """
    print(f"\n{'='*60}")
    print(f"MARKOV CHAIN ANALYSIS: {analysis_name}")
    print(f"{'='*60}")
    
    # Extract state sequence
    states = df[state_column].tolist()
    clean_states = [s for s in states if s != 'Unknown']
    
    print(f"Total events: {len(df)}")
    print(f"Valid states for analysis: {len(clean_states)}")
    
    if len(clean_states) < 2:
        print("Insufficient data for Markov chain analysis.")
        return None, None, None
    
    # Build transition matrices
    try:
        counts_array, probs_df, counts_df = compute_transition_matrix(clean_states)
        state_order = probs_df.index.tolist()
        
        print(f"\nState distribution:")
        state_counts = pd.Series(clean_states).value_counts()
        for state in state_order:
            count = state_counts.get(state, 0)
            pct = (count / len(clean_states)) * 100
            print(f"  {state}: {count} ({pct:.1f}%)")
        
        print(f"\nTransition Count Matrix:")
        print(counts_df)
        
        print(f"\nTransition Probability Matrix:")
        print(probs_df.round(4))
        
        # Analyze interesting patterns
        print(f"\nKey Transition Probabilities:")
        for from_state in state_order:
            for to_state in state_order:
                prob = probs_df.loc[from_state, to_state]
                if prob > 0.1:  # Only show significant probabilities
                    print(f"  P({from_state} -> {to_state}) = {prob:.3f}")
        
        # Find most likely transitions from each state
        print(f"\nMost Likely Next State from Each State:")
        for state in state_order:
            next_state = probs_df.loc[state].idxmax()
            prob = probs_df.loc[state, next_state]
            print(f"  From {state}: most likely -> {next_state} (p={prob:.3f})")
        
        # N-step analysis
        print(f"\nMulti-Step Transition Analysis:")
        steps = [2, 5, 10]
        
        for n in steps:
            print(f"\n{n}-step transitions:")
            P_n = n_step_transition_matrix(probs_df.values, n)
            
            # Show some example transitions
            for i, from_state in enumerate(state_order[:3]):  # Limit to first 3 states
                for j, to_state in enumerate(state_order):
                    prob_n = P_n[i, j]
                    if prob_n > 0.05:  # Only significant probabilities
                        prob_1 = probs_df.iloc[i, j]
                        print(f"    P({from_state} -> {to_state} in {n} steps) = {prob_n:.3f} (1-step: {prob_1:.3f})")
        
        return counts_array, probs_df, counts_df
        
    except Exception as e:
        print(f"Error in analysis: {e}")
        return None, None, None


def print_model_assumptions():
    """
    Print key assumptions and limitations of the Markov chain model.
    """
    print("\n" + "="*60)
    print("MODEL ASSUMPTIONS AND LIMITATIONS")
    print("="*60)
    print("""
    ASSUMPTIONS:
    1. First-order Markov property: Next state depends only on current state
    2. Time-homogeneous: Transition probabilities are constant over time
    3. Discrete states: Earthquakes classified into finite categories
    4. Sequential dependence: Order of earthquakes matters
    
    LIMITATIONS:
    1. No temporal spacing: Time between events is not modeled
    2. No spatial correlation: Beyond coarse regional binning
    3. No "no-earthquake" states: Only transitions between events
    4. Assumes stationarity: Real seismic patterns may change over time
    5. Limited by historical data: May not capture rare large events
    
    APPLICATIONS:
    - Understanding general earthquake patterns
    - Baseline for more complex models (HMM, PSHA)
    - Educational tool for seismic sequence analysis
    - Not suitable for precise earthquake prediction
    """)

print("Analysis functions defined successfully.")

Analysis functions defined successfully.


## Main Execution

**Note**: Before running this section, make sure to:
1. Download the dataset from: https://www.kaggle.com/datasets/kekavigi/earthquakes-in-indonesia
2. Update the `DATA_PATH` variable above to point to your local file
3. Ensure the file is in TSV (tab-separated) format

In [21]:
# Main execution function
def main():
    """
    Main analysis pipeline for Indonesian earthquake Markov chain model.
    """
    try:
        # Load and preprocess data
        print("Starting Indonesian Earthquake Markov Chain Analysis")
        print("=" * 60)
        
        df = load_and_preprocess_data(DATA_PATH)
        
        if df is None or len(df) == 0:
            print("Failed to load data. Please check the file path and format.")
            return
        
        print(f"\nSuccessfully loaded {len(df)} earthquake events")
        print(f"Date range: {df['event_time'].min()} to {df['event_time'].max()}")
        
        # Analyze magnitude-only Markov chain
        print("\n\n" + "#" * 80)
        print("ANALYSIS 1: MAGNITUDE-ONLY STATES")
        print("#" * 80)
        
        mag_counts, mag_probs, mag_counts_df = analyze_markov_chain(
            df, 'state_mag', 'Magnitude-Only States'
        )
        
        # Analyze combined magnitude+region Markov chain
        print("\n\n" + "#" * 80)
        print("ANALYSIS 2: MAGNITUDE + REGION STATES")
        print("#" * 80)
        
        region_counts, region_probs, region_counts_df = analyze_markov_chain(
            df, 'state_region_mag', 'Magnitude + Region States'
        )
        
        # Print model assumptions
        print_model_assumptions()
        
        # Summary insights
        print("\n" + "="*60)
        print("SUMMARY INSIGHTS")
        print("="*60)
        
        if mag_probs is not None:
            # Find highest probability transitions
            print("\nHighest probability transitions (magnitude-only):")
            max_prob = 0
            best_transition = None
            
            for i, from_state in enumerate(mag_probs.index):
                for j, to_state in enumerate(mag_probs.columns):
                    if i != j:  # Exclude self-transitions
                        prob = mag_probs.iloc[i, j]
                        if prob > max_prob:
                            max_prob = prob
                            best_transition = (from_state, to_state)
            
            if best_transition:
                print(f"  Strongest transition: {best_transition[0]} -> {best_transition[1]} (p={max_prob:.3f})")
            
            # Analyze persistence
            print("\nState persistence (probability of staying in same state):")
            for state in mag_probs.index:
                self_prob = mag_probs.loc[state, state]
                print(f"  P({state} -> {state}) = {self_prob:.3f}")
        
        print("\nAnalysis completed successfully!")
        
        # Return results for further analysis if needed
        return {
            'data': df,
            'magnitude_analysis': (mag_counts, mag_probs, mag_counts_df),
            'region_analysis': (region_counts, region_probs, region_counts_df)
        }
        
    except Exception as e:
        print(f"Error in main execution: {e}")
        import traceback
        traceback.print_exc()
        return None


# Display instructions
print("""READY TO RUN ANALYSIS!

To execute the complete analysis:
1. Ensure DATA_PATH points to your earthquake dataset
2. Run: results = main()

The main() function will:
- Load and preprocess the earthquake data
- Build magnitude-only Markov chain
- Build magnitude+region Markov chain  
- Compute transition probabilities
- Display multi-step transition analysis
- Show model assumptions and limitations
""")

READY TO RUN ANALYSIS!

To execute the complete analysis:
1. Ensure DATA_PATH points to your earthquake dataset
2. Run: results = main()

The main() function will:
- Load and preprocess the earthquake data
- Build magnitude-only Markov chain
- Build magnitude+region Markov chain  
- Compute transition probabilities
- Display multi-step transition analysis
- Show model assumptions and limitations



## Demo with Sample Data

If you don't have the dataset yet, here's a demonstration with synthetic data to test the functions:

In [22]:
def create_sample_data(n_events: int = 1000) -> pd.DataFrame:
    """
    Create synthetic earthquake data for testing purposes.
    
    Args:
        n_events (int): Number of synthetic events to generate
        
    Returns:
        pd.DataFrame: Synthetic earthquake data
    """
    np.random.seed(42)  # For reproducible results
    
    # Generate synthetic data with realistic patterns
    # Small earthquakes are most common, large ones are rare
    magnitude_probs = [0.7, 0.25, 0.05]  # Small, Medium, Large
    magnitudes = np.random.choice(['Small', 'Medium', 'Large'], 
                                  n_events, p=magnitude_probs)
    
    # Convert to numeric magnitudes
    mag_mapping = {'Small': (2.0, 4.0), 'Medium': (4.0, 6.0), 'Large': (6.0, 8.0)}
    numeric_mags = []
    for mag_type in magnitudes:
        min_mag, max_mag = mag_mapping[mag_type]
        numeric_mags.append(np.random.uniform(min_mag, max_mag))
    
    # Generate coordinates within Indonesia bounds
    lats = np.random.uniform(-12, 6, n_events)
    lons = np.random.uniform(90, 150, n_events)
    
    # Generate time sequence
    start_date = pd.Timestamp('2000-01-01')
    time_deltas = np.random.exponential(1, n_events)  # Random intervals
    event_times = [start_date + pd.Timedelta(days=sum(time_deltas[:i+1])) 
                   for i in range(n_events)]
    
    # Create DataFrame
    df = pd.DataFrame({
        'event_time': event_times,
        'lat': lats,
        'lon': lons,
        'mag': numeric_mags,
        'depth': np.random.uniform(10, 200, n_events)
    })
    
    # Add state columns
    df['state_mag'] = df['mag'].apply(assign_magnitude_state)
    df['state_region_mag'] = df.apply(build_combined_state, axis=1)
    
    return df


def run_demo():
    """
    Run a demonstration with synthetic data.
    """
    print("RUNNING DEMONSTRATION WITH SYNTHETIC DATA")
    print("="*60)
    
    # Create sample data
    demo_df = create_sample_data(500)  # 500 events for demo
    
    print(f"Created {len(demo_df)} synthetic earthquake events")
    print("\nSample data:")
    print(demo_df[['event_time', 'lat', 'lon', 'mag', 'state_mag', 'state_region_mag']].head(10))
    
    # Run analysis on synthetic data
    mag_counts, mag_probs, mag_counts_df = analyze_markov_chain(
        demo_df, 'state_mag', 'DEMO: Magnitude States'
    )
    
    if mag_probs is not None:
        # Test n-step transitions
        print("\nTesting n-step transition functions:")
        state_order = mag_probs.index.tolist()
        
        # Example: probability of going from Small to Large in various steps
        for n in [1, 3, 5]:
            try:
                prob = n_step_transition_probability(
                    mag_probs.values, state_order, 'Small', 'Large', n
                )
                print(f"P(Small -> Large in {n} steps) = {prob:.4f}")
            except Exception as e:
                print(f"Error computing {n}-step transition: {e}")
    
    print("\nDemo completed successfully!")
    return demo_df

# Run the demo
demo_results = run_demo()

RUNNING DEMONSTRATION WITH SYNTHETIC DATA
Created 500 synthetic earthquake events

Sample data:
                     event_time       lat         lon       mag state_mag  \
0 2000-01-01 07:16:54.859989191 -8.667607  121.144907  3.396323     Small   
1 2000-01-01 14:05:23.247829142 -2.245783  118.750913  7.072193     Large   
2 2000-01-03 22:54:06.949601431  3.713025   91.538524  4.619055    Medium   
3 2000-01-04 05:47:30.418693327  1.180048  110.474870  3.627590     Small   
4 2000-01-04 13:24:32.497869849  2.518101  112.811737  3.369462     Small   
5 2000-01-05 23:35:58.996580644 -0.141899  113.929367  2.325234     Small   
6 2000-01-06 13:56:11.254722585  0.460978  124.810342  3.821854     Small   
7 2000-01-08 01:55:09.599532116  3.285522  122.016153  5.645074    Medium   
8 2000-01-08 03:32:30.277441146 -7.505976  126.474306  3.899600     Small   
9 2000-01-08 19:35:16.755158909 -3.190351  135.892996  5.451439    Medium   

  state_region_mag  
0         R2_Small  
1         R5_L

## Run Analysis on Real Data

Once you have downloaded and set up the real earthquake dataset, run this cell to perform the complete analysis:

In [24]:
# Execute the complete analysis on real earthquake data
# Uncomment and run this when you have the real dataset

# results = main()

# If successful, results will contain:
# - results['data']: processed DataFrame
# - results['magnitude_analysis']: transition matrices for magnitude-only states  
# - results['region_analysis']: transition matrices for magnitude+region states

print(\"Ready to analyze real earthquake data!\")\nprint(\"Steps to run real analysis:\")\nprint(\"1. Download dataset from: https://www.kaggle.com/datasets/kekavigi/earthquakes-in-indonesia\")\nprint(\"2. Update DATA_PATH variable to point to your local file\")\nprint(\"3. Uncomment and run: results = main()\")\nprint(\"\\nAlternatively, test individual functions:\")\nprint(\"- load_and_preprocess_data(DATA_PATH)\")\nprint(\"- analyze_markov_chain(df, 'state_mag', 'Magnitude Analysis')\")

SyntaxError: unterminated string literal (detected at line 11); perhaps you escaped the end quote? (2090936694.py, line 11)

## Additional Utilities

Helpful functions for further analysis and exploration:

In [25]:
def save_results_to_file(results: dict, filename: str = "markov_analysis_results.txt"):
    """
    Save analysis results to a text file for future reference.
    
    Args:
        results (dict): Results from main() function
        filename (str): Output filename
    """
    if results is None:
        print("No results to save.")
        return
    
    with open(filename, 'w') as f:
        f.write("Indonesian Earthquake Markov Chain Analysis Results\n")
        f.write("=" * 60 + "\n\n")
        
        df = results['data']
        f.write(f"Dataset Summary:\n")
        f.write(f"Total events: {len(df)}\n")
        f.write(f"Date range: {df['event_time'].min()} to {df['event_time'].max()}\n\n")
        
        # Magnitude analysis
        if results['magnitude_analysis'][1] is not None:
            _, mag_probs, _ = results['magnitude_analysis']
            f.write("Magnitude-Only State Transition Probabilities:\n")
            f.write(str(mag_probs.round(4)))
            f.write("\n\n")
        
        # Region analysis  
        if results['region_analysis'][1] is not None:
            _, region_probs, _ = results['region_analysis']
            f.write("Magnitude+Region State Transition Probabilities:\n")
            f.write(str(region_probs.round(4)))
    
    print(f"Results saved to {filename}")


def compute_steady_state(P: np.ndarray, tolerance: float = 1e-10, max_iterations: int = 1000) -> np.ndarray:
    """
    Compute the steady-state distribution of a Markov chain.
    
    Args:
        P (np.ndarray): Transition probability matrix
        tolerance (float): Convergence tolerance
        max_iterations (int): Maximum number of iterations
        
    Returns:
        np.ndarray: Steady-state probability distribution
    """
    n = P.shape[0]
    # Start with uniform distribution
    pi = np.ones(n) / n
    
    for i in range(max_iterations):
        pi_new = pi @ P
        if np.allclose(pi, pi_new, atol=tolerance):
            return pi_new
        pi = pi_new
    
    print(f"Warning: Steady state did not converge after {max_iterations} iterations")
    return pi


def analyze_steady_state(probs_df: pd.DataFrame, state_name: str):
    """
    Analyze the steady-state distribution of a Markov chain.
    
    Args:
        probs_df (pd.DataFrame): Transition probability matrix
        state_name (str): Name of the state system for reporting
    """
    print(f"\nSteady-State Analysis for {state_name}:")
    print("-" * 40)
    
    steady_state = compute_steady_state(probs_df.values)
    
    print("Long-run state probabilities:")
    for i, state in enumerate(probs_df.index):
        print(f"  Ï€({state}) = {steady_state[i]:.4f}")
    
    # Find most probable long-term state
    max_idx = np.argmax(steady_state)
    max_state = probs_df.index[max_idx]
    print(f"\nMost probable long-term state: {max_state} ({steady_state[max_idx]:.4f})")


# Example usage functions
print("Additional utility functions defined:")
print("- save_results_to_file(results, filename)")
print("- compute_steady_state(P)")
print("- analyze_steady_state(probs_df, state_name)")
print("\nExample usage after running main():")
print("# results = main()")
print("# save_results_to_file(results)")
print("# analyze_steady_state(results['magnitude_analysis'][1], 'Magnitude States')")

Additional utility functions defined:
- save_results_to_file(results, filename)
- compute_steady_state(P)
- analyze_steady_state(probs_df, state_name)

Example usage after running main():
# results = main()
# save_results_to_file(results)
# analyze_steady_state(results['magnitude_analysis'][1], 'Magnitude States')
