<a href="https://colab.research.google.com/github/ch192703/MLFinalProject2024/blob/main/IDS6938_DigitalTwin_Earthquake_v5_lynn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Earthquake Prediction Pipeline Documentation**


## **Overview**

The Earthquake Prediction Pipeline is a comprehensive system that automates the collection, processing, and analysis of USGS earthquake data to predict future seismic activity. The pipeline implements a transformer-based model that learns from historical patterns to predict the number of earthquakes likely to occur in the next 24-hour period.

### **Key Features**

*   Automated USGS data collection and processing
*   Daily data segmentation and storage
*   Transformer-based sequence modeling
*   Continuous prediction and evaluation
*   Automated model optimization
*   Performance visualization and tracking
*   Modular architecture with comprehensive error handling



### **System Requirements**

Python 3.x w/Required Libraries:

*   pandas
*   numpy
*   torch (PyTorch)
*   requests
*   matplotlib
*   seaborn
*   scikit-learn


### **Directory Structure**


```
/earthquake_data/
├── data/             # Raw daily earthquake data
│   └── YYYY-MM/      # Organized by year-month
├── models/           # Saved model checkpoints
├── predictions/      # Daily prediction outputs
├── plots/           # Performance visualizations
└── evaluations/     # Evaluation metrics
```

### **Core Components**

1. Data Collection and Processing

*   USGS API Integration: Automated fetching of earthquake data
*   Data Filtering: Configurable magnitude threshold (default: 2.5)
*   Data Storage: Daily CSV files with comprehensive metadata
*   Feature Extraction: Geographic and seismic parameters

2. Model Architecture

*   Type: Transformer-based sequence model

*   Components:
 *   Input projection layer
 *   Positional encoding
 *   Multi-head attention layers
 *   Feed-forward networks
 *   Output projection layer

*   Parameters:
 *   Sequence Length: Configurable (default: 7 days)
 *   Hidden Dimensions: 64
 *   Number of Layers: 2
 *   Attention Heads: 4

3. Training Pipeline
*    **Baseline Training**

 1. Historical Data Processing

  *   Fetches specified number of days (default: 31)
  *   Splits data into daily segments
  *   Creates initial training sequences

 2. Model Training

  *   Sequences created from historical data
  *   Loss function: Mean Squared Error
  *   Optimizer: Adam
  *   Checkpoint saving based on performance

 3. Evaluation

  *   Daily prediction accuracy
  *   Error metrics calculation
  *   Performance visualization
  *   Metadata tracking

*    **Continuous Monitoring**

 1. Automated Data Collection

  *   Configurable update interval (default: 1 hour)
  *   Real-time USGS data integration

 2. Prediction Generation

  *   Daily earthquake count predictions
  *   Confidence interval calculation
  *   Prediction storage and tracking

 3. Model Optimization

  *   Performance evaluation against actual data
  *   Incremental model updates
  *   Automated checkpoint management

4. Performance Metrics

  *   Prediction Error (absolute and relative)
  *   Confidence Interval Coverage
  *   Standard Deviation Analysis
  *   Visualization of Trends

### **File Naming Conventions**

*   Data Files: *earthquake_data_YYYY-MM-DD.csv*
*   Predictions: *predictions_YYYY-MM-DD.csv*
*   Model Checkpoints: *model_checkpoint_YYYYMMDD_HHMMSS.pth*
*   Visualizations: *performance_Ndays_YYYYMMDD_HHMMSS.png*
*   Evaluations: *evaluation_YYYY-MM-DD.json*


### **Usage Examples**
##### **Initialize Pipeline**
```
pipeline = EarthquakePipeline(drive_path='/path/to/base/directory')
```
##### **Run Baseline Training**
```
pipeline.run_baseline_training(days_to_process=31)
```
##### **Start Continuous Monitoring**
```pipeline.run_continuous_monitoring(update_interval=3600)  # 1 hour interval
```

### **Future Enhancements**

*   Integration with additional data sources
*   Enhanced feature engineering
*   Advanced visualization capabilities
*   Automated parameter optimization
*   Real-time alerting system
*   Web interface for monitoring

### **Model Misc Info**
*   Authors: Stephen Moore, Steven Willhelm, Lynn Yingling
*   Version: 4.0
*   Last Updated: 19 November 2024

## 1.Imports

In [42]:
# 1. Required imports
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, TensorDataset
from datetime import datetime, timedelta
import requests
import os
import json
import glob
import time
from sklearn.preprocessing import MinMaxScaler
import pickle
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import drive

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

## 2.Create Drive Directory
**Lynn:** Made a few changes here for one key reason: to support regional data organization. Since we're moving from a global to a regional approach, we need separate subdirectories for each seismic region's data, models, predictions, and plots. This organizational structure allows us to:

* Keep each region's data separate and organized
* Store region-specific models and predictions
* Manage regional visualizations independently

The functionality is otherwise unchanged - it still mounts Google Drive and creates the base directory structure, just with added regional subdivisions.

In [2]:
# 2. Create Drive Directory
def setup_drive_directory(base_path='earthquake_data'):
    """
    Mount Google Drive and create necessary directories including regional subdirectories.

    Args:
        base_path (str): Base directory name for earthquake data

    Returns:
        str: Full path to the created directory

    Creates directory structure:
    /base_path/
    ├── data/
    │   ├── pacific_northwest/
    │   ├── california/
    │   ├── alaska/
    │   ├── hawaii/
    │   └── central_us/
    ├── models/
    │   ├── pacific_northwest/
    │   ├── california/
    │   ├── alaska/
    │   ├── hawaii/
    │   └── central_us/
    ├── predictions/
    │   ├── pacific_northwest/
    │   ├── california/
    │   ├── alaska/
    │   ├── hawaii/
    │   └── central_us/
    └── plots/
        ├── pacific_northwest/
        ├── california/
        ├── alaska/
        ├── hawaii/
        └── central_us/
    """
    # Mount Google Drive
    drive.mount('/content/drive')

    # Create base directory path
    full_path = f'/content/drive/My Drive/{base_path}'

    # Create main directories
    subdirs = ['data', 'models', 'predictions', 'plots']

    # Create base directories
    for subdir in subdirs:
        dir_path = os.path.join(full_path, subdir)
        if not os.path.exists(dir_path):
            os.makedirs(dir_path)
            print(f"Created directory: {dir_path}")

        # Create regional subdirectories
        for region in SEISMIC_REGIONS.keys():
            region_path = os.path.join(dir_path, region)
            if not os.path.exists(region_path):
                os.makedirs(region_path)
                print(f"Created regional directory: {region_path}")

    print(f"Directory setup complete at: {full_path}")
    return full_path

## 3.Gather USGS Data
**Lynn:** The revisions to Chunk 3 were made purely for documentation clarity - to highlight that the latitude/longitude data will be used for regional assignment later in the pipeline. No functional changes were needed since the raw data collection requirements remain the same regardless of regional processing.

In [3]:
# 3. Gather USGS Data
def fetch_earthquake_data(self, start_time=None, end_time=None, min_magnitude=2.5):
    """
    Fetch earthquake data from USGS API for a specified time period.
    Returns data suitable for regional processing.

    Args:
        start_time (datetime): Start date for data collection. Defaults to yesterday if None.
        end_time (datetime): End date for data collection. Defaults to today if None.
        min_magnitude (float): Minimum earthquake magnitude to include (default: 2.5)

    Returns:
        pandas.DataFrame: DataFrame containing earthquake data with columns:
            - time: Timestamp of earthquake occurrence
            - magnitude: Earthquake magnitude
            - place: Location description
            - longitude: Geographic longitude
            - latitude: Geographic latitude (needed for regional assignment)
            - depth: Depth in kilometers
            - type: Event type
            - alert: Alert level (if any)
            - tsunami: Tsunami warning flag
            - sig: Significance value

    Raises:
        requests.RequestException: If API request fails
        ValueError: If date parameters are invalid
    """
    try:
        base_url = "https://earthquake.usgs.gov/fdsnws/event/1/query"

        if start_time is None:
            start_time = datetime.now() - timedelta(days=1)

        if end_time is None:
            end_time = datetime.now()

        params = {
            'format': 'geojson',
            'starttime': start_time.strftime('%Y-%m-%d'),
            'endtime': (end_time + timedelta(days=1)).strftime('%Y-%m-%d'),
            'minmagnitude': min_magnitude,
            'orderby': 'time'
        }

        print(f"Fetching data from {params['starttime']} to {params['endtime']}")

        response = requests.get(base_url, params=params)
        response.raise_for_status()

        data = response.json()
        earthquakes = data['features']

        processed_data = []
        for quake in earthquakes:
            properties = quake['properties']
            coordinates = quake['geometry']['coordinates']

            processed_data.append({
                'time': datetime.fromtimestamp(properties['time'] / 1000),
                'magnitude': properties['mag'],
                'place': properties['place'],
                'longitude': coordinates[0],
                'latitude': coordinates[1],
                'depth': coordinates[2],
                'type': properties['type'],
                'alert': properties.get('alert', 'none'),
                'tsunami': properties['tsunami'],
                'sig': properties['sig']
            })

        df = pd.DataFrame(processed_data)

        if len(df) > 0:
            print("\nData Collection Summary:")
            print("-" * 30)
            print(f"Total earthquakes collected: {len(df)}")
            print(f"Date range: {df['time'].min()} to {df['time'].max()}")
            print(f"Magnitude range: {df['magnitude'].min():.1f} to {df['magnitude'].max():.1f}")
            print("-" * 30)

        return df

    except Exception as e:
        print(f"Error fetching data: {e}")
        return None

## 4.Fetch Training Data
**Lynn:** Revisions were made to:

* Add regional data separation by assigning each earthquake to its appropriate region
* Save data in region-specific files rather than one global file
* Support the regional training structure that will be used later in the pipeline

In [4]:
# 4. Fetch training data
def fetch_training_data(self, start_date, end_date):
    """Fetch training data for specified date range and organize by region"""
    df = self.fetch_earthquake_data(
        start_time=start_date,
        end_time=end_date,
        min_magnitude=2.5
    )

    if df is not None:
        # Add region assignment to data
        df['region'] = df.apply(
            lambda row: self.assign_region(row['latitude'], row['longitude']),
            axis=1
        )

        # Create separate files for each region
        for region in SEISMIC_REGIONS.keys():
            region_data = df[df['region'] == region]
            if len(region_data) > 0:
                filename = f'{region}_data_{start_date.strftime("%Y%m%d")}_to_{end_date.strftime("%Y%m%d")}.csv'
                filepath = os.path.join(self.drive_path, 'data', region, filename)
                region_data.to_csv(filepath, index=False)

        return df, filepath
    return None, None

## 5.Fetch New Data
**Lynn:** Revisions were made to:

* Add regional assignment to new data
* Save data separately by region instead of in one file
* Return a dictionary of filepaths organized by region instead of a single filepath

In [5]:
# 5. Fetch new data
def fetch_new_data(self, last_timestamp):
    """Fetch and organize new earthquake data by region since last timestamp"""
    df = self.fetch_earthquake_data(
        start_time=last_timestamp,
        end_time=datetime.now(),
        min_magnitude=2.5
    )

    if df is not None:
        # Filter to only new events and assign regions
        new_data = df[df['time'] > last_timestamp]
        new_data['region'] = new_data.apply(
            lambda row: self.assign_region(row['latitude'], row['longitude']),
            axis=1
        )

        if len(new_data) > 0:
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

            # Save data by region
            regional_files = {}
            for region in SEISMIC_REGIONS.keys():
                region_data = new_data[new_data['region'] == region]
                if len(region_data) > 0:
                    filename = f'{region}_new_data_{timestamp}.csv'
                    filepath = os.path.join(self.dirs['data'], region, filename)
                    region_data.to_csv(filepath, index=False)
                    regional_files[region] = filepath

            return new_data, regional_files
    return None, None

## 6.Create Data Structure for Transformer

**Lynn:** Changes made:

* Added 'region' parameter to track data source
* Added explanatory comments for each line of code
* Maintained comprehensive docstrings
* Updated example to show regional usage
* Kept core functionality intact

In [6]:
# 6. Create Data Structure for Transformer
class EarthquakeDataset(Dataset):
    """
    Custom dataset for handling earthquake sequence data.
    This dataset creates sequences of earthquake data for training the transformer
    model, where each sequence consists of multiple days of data points.

    Args:
        features (torch.Tensor): Input features for each earthquake event
        targets (torch.Tensor): Target values for prediction
        seq_length (int): Number of days in each sequence
        region (str): Identifier for the seismic region this data represents

    Attributes:
        features (torch.Tensor): Storage for input features
        targets (torch.Tensor): Storage for target values
        seq_length (int): Length of each sequence
        region (str): Region identifier for tracking and analysis

    Methods:
        __len__: Returns the number of sequences in the dataset
        __getitem__: Returns a sequence and its corresponding target

    Example:
        >>> features = torch.randn(100, 5)  # 100 events with 5 features each
        >>> targets = torch.randn(100, 1)   # Target count for each event
        >>> dataset = EarthquakeDataset(features, targets, seq_length=7, region='california')
        >>> sequence, target = dataset[0]  # Get first sequence and its target
    """
    def __init__(self, features, targets, seq_length, region):
        """
        Initialize the dataset with features, targets, sequence length, and region.

        Args:
            features (torch.Tensor): Input features for each earthquake event
            targets (torch.Tensor): Target values for prediction
            seq_length (int): Number of days to include in each sequence
            region (str): Identifier for the seismic region
        """
        # Store the input features tensor for sequence creation
        self.features = features
        # Store the target values tensor for prediction
        self.targets = targets
        # Store sequence length for windowing the data
        self.seq_length = seq_length
        # Store region identifier for tracking and analysis
        self.region = region

    def __len__(self):
        """
        Return the number of possible sequences in the dataset.
        Accounts for sequence length when calculating available sequences.
        """
        # Calculate maximum number of sequences possible given the data length and sequence length
        return max(0, len(self.features) - self.seq_length)

    def __getitem__(self, idx):
        """
        Get a sequence of features and its corresponding target.

        Args:
            idx (int): Index of the sequence to retrieve

        Returns:
            tuple: (feature_sequence, target) where feature_sequence is a sequence of
                  'seq_length' days of data and target is the next day's parameters
        """
        # Extract sequence of features starting at index
        feature_seq = self.features[idx:idx + self.seq_length]
        # Get corresponding target value
        target = self.targets[idx + self.seq_length - 1]

        return feature_seq, target

## 7.Create Transformer

**Lynn:** Changes made:

* Added clear code comments for each operation
* Added dropout parameter with default value
* Updated documentation to reflect regional context
* Maintained comprehensive docstrings
* Kept core architecture unchanged as it works for both global and regional predictions

In [7]:
# 7. Create Transformer
class TransformerPredictor(nn.Module):
    """
    Transformer-based model for regional earthquake count prediction.

    This model uses a transformer architecture to learn temporal patterns in
    earthquake sequences and predict future occurrence counts for specific regions.

    Architecture:
        - Input projection layer
        - Positional encoding
        - Transformer encoder layers
        - Output projection layers

    Args:
        input_dim (int): Dimension of input features
        hidden_dim (int): Dimension of hidden layers
        num_layers (int): Number of transformer layers
        num_heads (int): Number of attention heads
        max_seq_length (int): Maximum sequence length (default: 7)
        dropout (float): Dropout rate (default: 0.1)

    Attributes:
        hidden_dim (int): Dimension of hidden layers
        input_projection (nn.Linear): Input projection layer
        pos_encoding (nn.Parameter): Positional encoding
        transformer (nn.TransformerEncoder): Transformer encoder
        output_projection (nn.Sequential): Output projection layers

    Example:
        >>> model = TransformerPredictor(
                input_dim=1,
                hidden_dim=64,
                num_layers=2,
                num_heads=4
            )
        >>> input_sequence = torch.randn(32, 7, 1)  # (batch_size, seq_length, features)
        >>> predictions = model(input_sequence)
    """
    def __init__(self, input_dim, hidden_dim, num_layers, num_heads, max_seq_length=7, dropout=0.1):
        super().__init__()
        # Store hidden dimension for use in forward pass
        self.hidden_dim = hidden_dim

        # Project input features to hidden dimension space
        self.input_projection = nn.Linear(input_dim, hidden_dim)

        # Create learnable positional encoding for sequence positions
        self.pos_encoding = nn.Parameter(torch.randn(1, max_seq_length, hidden_dim))

        # Create transformer encoder layer with specified parameters
        encoder_layer = nn.TransformerEncoderLayer(
            d_model=hidden_dim,
            nhead=num_heads,
            dim_feedforward=hidden_dim*4,
            dropout=dropout,
            batch_first=True
        )

        # Stack multiple encoder layers
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers)

        # Project transformer output to prediction space
        self.output_projection = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim//2),
            nn.ReLU(),
            nn.Linear(hidden_dim//2, 1)
        )

    def forward(self, x):
        """
        Process input sequence through transformer model.

        Args:
            x (torch.Tensor): Input tensor of shape (batch_size, seq_length, input_dim)

        Returns:
            torch.Tensor: Predictions of shape (batch_size, 1)
        """
        # Ensure input has correct dimensionality
        if len(x.shape) == 2:
            x = x.unsqueeze(-1)

        # Project input to hidden dimension
        x = self.input_projection(x)

        # Add positional encoding to input
        x = x + self.pos_encoding[:, :x.size(1)]

        # Apply transformer layers
        x = self.transformer(x)

        # Take final sequence element for prediction
        x = x[:, -1]

        # Project to output dimension
        x = self.output_projection(x)

        return x

## 8.Regional Pipeline

In [50]:
# 8. Create Regional Pipeline (Lynn)
# Part 1 (Initialization))


class RegionalEarthquakePipeline:
    """
    Enhanced earthquake prediction pipeline with regional prediction capabilities.
    Implements region-based data collection, processing, and model management.
    """
    def __init__(self, drive_path, seq_length=7, prediction_horizon=1):
        """Initialize regional earthquake pipeline."""
        # Store basic configuration
        self.drive_path = drive_path
        self.seq_length = seq_length
        self.prediction_horizon = prediction_horizon

        # Set up regional components
        self.regions = SEISMIC_REGIONS
        self.regional_models = {}
        self.regional_scalers = {}
        self.regional_performance_history = {region: [] for region in SEISMIC_REGIONS.keys()}

        # Create directory structure
        self.dirs = {
            'data': os.path.join(drive_path, 'data'),
            'models': os.path.join(drive_path, 'models'),
            'predictions': os.path.join(drive_path, 'predictions'),
            'plots': os.path.join(drive_path, 'plots'),
            'evaluations': os.path.join(drive_path, 'evaluations')
        }

        # Create regional subdirectories
        for dir_path in self.dirs.values():
            for region_id in self.regions.keys():
                os.makedirs(os.path.join(dir_path, region_id), exist_ok=True)

        # Initialize transformer models for each region
        for region_id in self.regions.keys():
            self.regional_models[region_id] = TransformerPredictor(
                input_dim=1,  # For count prediction
                hidden_dim=64,
                num_layers=2,
                num_heads=4,
                max_seq_length=seq_length
            )

        # Initialize metadata tracking
        self.metadata_path = os.path.join(drive_path, 'pipeline_metadata.json')
        self.model_dates = {
            'last_training_date': None,
            'last_optimization_date': None,
            'latest_data_date': None,
            'prediction_target_date': None,
            'first_data_date': None
        }
        self._load_or_create_metadata()

    def _create_new_metadata(self):
        """Create new metadata structure with region-specific tracking."""
        self.metadata = {
            'creation_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'data_dates': [],
            'model_versions': {region: [] for region in self.regions.keys()},
            'predictions': {region: [] for region in self.regions.keys()},
            'evaluations': {region: [] for region in self.regions.keys()},
            'pipeline_config': {
                'sequence_length': self.seq_length,
                'prediction_horizon': self.prediction_horizon,
                'regions': list(self.regions.keys())
            }
        }
        self._save_metadata()

    def _load_or_create_metadata(self):
        """Initialize or load existing metadata."""
        if os.path.exists(self.metadata_path):
            try:
                with open(self.metadata_path, 'r') as f:
                    self.metadata = json.load(f)
                self._ensure_metadata_structure()
            except Exception as e:
                print(f"Error loading metadata: {str(e)}. Creating new metadata.")
                self._create_new_metadata()
        else:
            self._create_new_metadata()

    def _ensure_metadata_structure(self):
        """Ensure all required fields exist in metadata."""
        required_fields = {
            'creation_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'data_dates': [],
            'model_versions': {region: [] for region in self.regions.keys()},
            'predictions': {region: [] for region in self.regions.keys()},
            'evaluations': {region: [] for region in self.regions.keys()},
            'pipeline_config': {
                'sequence_length': self.seq_length,
                'prediction_horizon': self.prediction_horizon,
                'regions': list(self.regions.keys())
            }
        }

        for key, default_value in required_fields.items():
            if key not in self.metadata:
                self.metadata[key] = default_value
                print(f"Added missing metadata field: {key}")

        for region in self.regions.keys():
            for field in ['model_versions', 'predictions', 'evaluations']:
                if region not in self.metadata[field]:
                    self.metadata[field][region] = []
                    print(f"Added missing {field} for region: {region}")

    def _save_metadata(self, verbose=False):
        """Save pipeline metadata to JSON file."""
        try:
            metadata = {
                'last_update': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                'pipeline_info': {
                    'creation_date': self.metadata.get('creation_date'),
                    'sequence_length': self.seq_length,
                    'regions': list(self.regions.keys())
                },
                'data_range': {
                    'start': self.model_dates.get('first_data_date'),
                    'end': self.model_dates.get('latest_data_date'),
                    'total_days_processed': len(self.metadata.get('data_dates', []))
                },
                'training_info': {
                    'last_training': self.model_dates.get('last_training_date'),
                    'last_optimization': self.model_dates.get('last_optimization_date'),
                    'model_versions': self.metadata.get('model_versions', {})
                },
                'regional_data': {
                    region: {
                        'predictions': self.metadata['predictions'][region],
                        'evaluations': self.metadata['evaluations'][region]
                    }
                    for region in self.regions.keys()
                }
            }

            def convert_to_serializable(obj):
                if isinstance(obj, (np.integer, np.floating)):
                    return float(obj)
                elif isinstance(obj, np.ndarray):
                    return obj.tolist()
                elif isinstance(obj, datetime):
                    return obj.strftime('%Y-%m-%d %H:%M:%S')
                elif isinstance(obj, pd.Timestamp):
                    return obj.strftime('%Y-%m-%d %H:%M:%S')
                return obj

            def process_dict(d):
                result = {}
                for k, v in d.items():
                    if isinstance(v, dict):
                        result[k] = process_dict(v)
                    elif isinstance(v, list):
                        result[k] = [
                            process_dict(item) if isinstance(item, dict)
                            else convert_to_serializable(item)
                            for item in v
                        ]
                    else:
                        result[k] = convert_to_serializable(v)
                return result

            serializable_metadata = process_dict(metadata)

            with open(self.metadata_path, 'w') as f:
                json.dump(serializable_metadata, f, indent=4)

            if verbose:
                print(f"Metadata saved to: {self.metadata_path}")

        except Exception as e:
            print(f"Error saving metadata: {str(e)}")

# Part 2 (Data Processing) (Lynn)

    def fetch_earthquake_data(self, start_time=None, end_time=None, min_magnitude=2.5):
        """Fetch earthquake data from USGS API."""
        try:
            base_url = "https://earthquake.usgs.gov/fdsnws/event/1/query"

            if start_time is None:
                start_time = datetime.now() - timedelta(days=1)
            if end_time is None:
                end_time = datetime.now()

            params = {
                'format': 'geojson',
                'starttime': start_time.strftime('%Y-%m-%d'),
                'endtime': (end_time + timedelta(days=1)).strftime('%Y-%m-%d'),
                'minmagnitude': min_magnitude,
                'orderby': 'time'
            }

            print(f"Fetching data from {params['starttime']} to {params['endtime']}")

            response = requests.get(base_url, params=params)
            response.raise_for_status()

            data = response.json()
            earthquakes = data['features']

            processed_data = []
            for quake in earthquakes:
                properties = quake['properties']
                coordinates = quake['geometry']['coordinates']

                processed_data.append({
                    'time': datetime.fromtimestamp(properties['time'] / 1000),
                    'magnitude': properties['mag'],
                    'place': properties['place'],
                    'longitude': coordinates[0],
                    'latitude': coordinates[1],
                    'depth': coordinates[2],
                    'type': properties['type'],
                    'alert': properties.get('alert', 'none'),
                    'tsunami': properties['tsunami'],
                    'sig': properties['sig']
                })

            df = pd.DataFrame(processed_data)

            if len(df) > 0:
                self._log_data_summary(df)

            return df

        except Exception as e:
            print(f"Error fetching data: {e}")
            return None

    def _log_data_summary(self, df):
        """Log summary of fetched data."""
        print("\nData Collection Summary:")
        print("-" * 30)
        print(f"Total earthquakes collected: {len(df)}")
        print(f"Date range: {df['time'].min()} to {df['time'].max()}")
        print(f"Magnitude range: {df['magnitude'].min():.1f} to {df['magnitude'].max():.1f}")
        print("-" * 30)

    def assign_region(self, lat, lon):
        """Assign earthquake to appropriate seismic region based on coordinates."""
        for region_id, region_info in self.regions.items():
            bounds = region_info['bounds']
            if (bounds['min_lat'] <= lat <= bounds['max_lat'] and
                bounds['min_lon'] <= lon <= bounds['max_lon']):
                return region_id
        return 'other'

    def process_regional_data(self, df):
        """Split earthquake data into regional datasets."""
        if df is None or len(df) == 0:
            return {}

        # Assign region to each earthquake
        df['region'] = df.apply(
            lambda row: self.assign_region(row['latitude'], row['longitude']),
            axis=1
        )

        # Split into regional dataframes
        regional_data = {
            region_id: df[df['region'] == region_id].copy()
            for region_id in self.regions.keys()
        }

        # Add 'other' region for events outside main regions
        regional_data['other'] = df[df['region'] == 'other'].copy()

        return regional_data

    def prepare_regional_sequences(self, df, region_id, for_training=True):
        """Process earthquake data into sequences for a specific region."""
        try:
            if df is None or len(df) == 0:
                return None, None

            # Convert to daily counts
            df['date'] = pd.to_datetime(df['time']).dt.date
            daily_counts = df.groupby('date').size().reset_index(name='count')
            daily_counts = daily_counts.sort_values('date')

            # Create sequences with proper length
            sequences = []
            targets = []

            # Ensure we have enough data for a sequence
            if len(daily_counts) >= self.seq_length + 1:
                for i in range(len(daily_counts) - self.seq_length):
                    # Create sequence using proper window
                    seq = daily_counts['count'].iloc[i:i+self.seq_length].values
                    target = daily_counts['count'].iloc[i+self.seq_length]

                    sequences.append(seq)
                    targets.append(target)

                if sequences:
                    sequences = torch.FloatTensor(sequences)
                    targets = torch.FloatTensor(targets).reshape(-1, 1)
                    return sequences, targets

            # Handle cases with insufficient data
            print(f"Warning: Insufficient data for region {region_id}. "
                  f"Need at least {self.seq_length + 1} days, got {len(daily_counts)}")
            return None, None

        except Exception as e:
            print(f"Error preparing sequences for region {region_id}: {str(e)}")
            return None, None

    def save_regional_data(self, df, date_str):
        """Save earthquake data separated by region."""
        if df is None or len(df) == 0:
            return

        year_month = date_str[:7]  # YYYY-MM
        regional_data = self.process_regional_data(df)

        for region_id, region_df in regional_data.items():
            if len(region_df) > 0:
                # Create region-specific directory
                region_dir = os.path.join(self.dirs['data'], region_id, year_month)
                os.makedirs(region_dir, exist_ok=True)

                # Save data
                filename = f'earthquake_data_{date_str}.csv'
                filepath = os.path.join(region_dir, filename)
                region_df.to_csv(filepath, index=False)

                # Save summary
                self._save_regional_summary(region_id, region_df, date_str, region_dir)

        # Update metadata
        self.metadata['data_dates'].append(date_str)
        self.model_dates['latest_data_date'] = date_str
        self._save_metadata()

    def _save_regional_summary(self, region_id, df, date_str, region_dir):
        """Save summary statistics for regional data."""
        summary = {
            'date': date_str,
            'region': region_id,
            'total_events': len(df),
            'magnitude_stats': {
                'min': float(df['magnitude'].min()),
                'max': float(df['magnitude'].max()),
                'mean': float(df['magnitude'].mean())
            },
            'saved_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        }

        summary_path = os.path.join(region_dir, f'summary_{date_str}.json')
        with open(summary_path, 'w') as f:
            json.dump(summary, f, indent=4)

    def load_regional_data(self, region_id, date_str):
        """Load earthquake data for a specific region and date."""
        try:
            year_month = date_str[:7]
            filename = f'earthquake_data_{date_str}.csv'
            filepath = os.path.join(self.dirs['data'], region_id, year_month, filename)

            if os.path.exists(filepath):
                df = pd.read_csv(filepath)
                df['time'] = pd.to_datetime(df['time'])
                return df
            else:
                print(f"No data file found for region {region_id} on {date_str}")
                return None

        except Exception as e:
            print(f"Error loading data for region {region_id}: {str(e)}")
            return None


# Part 3 (Training and Prediction) (Lynn)

    def train_regional_model(self, region_id, sequence_tensor, target_tensor, epochs=100, batch_size=32):
        """Train transformer model for a specific region."""
        try:
            print(f"\n🔄 Training model for region: {self.regions[region_id]['name']}")

            model = self.regional_models[region_id]
            criterion = nn.MSELoss()
            optimizer = torch.optim.Adam(model.parameters())

            # Create data loader
            dataset = TensorDataset(sequence_tensor, target_tensor)
            dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

            best_loss = float('inf')
            for epoch in range(epochs):
                total_loss = 0
                for sequences, targets in dataloader:
                    # Forward pass
                    optimizer.zero_grad()
                    predictions = model(sequences)
                    loss = criterion(predictions, targets)

                    # Backward pass
                    loss.backward()
                    optimizer.step()
                    total_loss += loss.item()

                avg_loss = total_loss / len(dataloader)
                if avg_loss < best_loss:
                    best_loss = avg_loss
                    self.save_model_checkpoint(region_id, epoch, avg_loss)
                    checkpoint_saved = "✓"
                else:
                    checkpoint_saved = " "

                if epoch % 10 == 0:
                    print(f"Epoch {epoch:3d}/{epochs} | Loss: {avg_loss:.4f} {checkpoint_saved}")

            return best_loss

        except Exception as e:
            print(f"Error training model for region {region_id}: {str(e)}")
            return None

    def predict_regional_events(self, region_id, recent_data):
        """Generate predictions for a specific region."""
        try:
            model = self.regional_models[region_id]
            sequence_tensor, _ = self.prepare_regional_sequences(recent_data, region_id, for_training=False)

            if sequence_tensor is not None:
                with torch.no_grad():
                    predicted_count = model(sequence_tensor)
                    last_prediction = predicted_count[-1].item()

                    prediction = {
                        'predicted_count': int(last_prediction),
                        'lower_bound': int(last_prediction * 0.9),
                        'upper_bound': int(last_prediction * 1.1),
                        'region_name': self.regions[region_id]['name']
                    }

                    return prediction
            return None

        except Exception as e:
            print(f"Error generating predictions for region {region_id}: {str(e)}")
            return None

    def optimize_regional_model(self, region_id, new_data, performance_metrics):
        """Optimize model for a specific region using new data."""
        try:
            if performance_metrics is None:
                return

            sequence_tensor, target_tensor = self.prepare_regional_sequences(new_data, region_id)
            if sequence_tensor is None or target_tensor is None:
                return

            print(f"\n🔄 Optimizing model for region: {self.regions[region_id]['name']}")

            model = self.regional_models[region_id]
            optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)
            criterion = nn.MSELoss()

            # Run optimization steps
            for step in range(5):
                optimizer.zero_grad()
                predictions = model(sequence_tensor)
                loss = criterion(predictions, target_tensor)
                loss.backward()
                optimizer.step()

                if step % 2 == 0:
                    print(f"Step {step}: Loss = {loss.item():.4f}")

            return loss.item()

        except Exception as e:
            print(f"Error optimizing model for region {region_id}: {str(e)}")
            return None

    def save_model_checkpoint(self, region_id, epoch, loss, metrics=None):
        """Save model checkpoint for a specific region."""
        try:
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

            # Create region-specific model directory
            model_dir = os.path.join(self.dirs['models'], region_id)
            if self.model_dates['latest_data_date']:
                model_dir = os.path.join(model_dir, self.model_dates['latest_data_date'][:7])
            os.makedirs(model_dir, exist_ok=True)

            # Save model state
            checkpoint = {
                'epoch': epoch,
                'model_state_dict': self.regional_models[region_id].state_dict(),
                'loss': loss,
                'metrics': metrics,
                'timestamp': timestamp
            }

            model_path = os.path.join(model_dir, f'model_checkpoint_{timestamp}.pth')
            torch.save(checkpoint, model_path)

            # Update metadata
            self.metadata['model_versions'][region_id].append({
                'timestamp': timestamp,
                'loss': float(loss),
                'metrics': metrics
            })
            self._save_metadata()

        except Exception as e:
            print(f"Error saving checkpoint for region {region_id}: {str(e)}")

    def load_regional_models(self):
        """Load latest model checkpoints for all regions."""
        success = True
        for region_id in self.regions.keys():
            try:
                checkpoint_pattern = os.path.join(
                    self.dirs['models'],
                    region_id,
                    '**',
                    'model_checkpoint_*.pth'
                )
                model_files = glob.glob(checkpoint_pattern, recursive=True)

                if model_files:
                    latest_model = max(model_files, key=os.path.getctime)
                    checkpoint = torch.load(latest_model)

                    self.regional_models[region_id].load_state_dict(
                        checkpoint['model_state_dict']
                    )

                    print(f"Loaded model for {self.regions[region_id]['name']}")
                else:
                    print(f"No saved model found for {self.regions[region_id]['name']}")
                    success = False

            except Exception as e:
                print(f"Error loading model for {region_id}: {str(e)}")
                success = False

        return success

# Part 4 (Evaluation and Visualization) (Lynn)

    def evaluate_regional_predictions(self, region_id, predictions, actual_data, prediction_date=None):
        """Evaluate predictions for a specific region."""
        try:
            if predictions is None or actual_data is None:
                return None

            actual_count = len(actual_data)
            pred_count = predictions['predicted_count']

            if prediction_date is None:
                prediction_date = actual_data['time'].dt.date.iloc[0]

            metrics = {
                'date': prediction_date,
                'region': self.regions[region_id]['name'],
                'predicted_count': pred_count,
                'actual_count': actual_count,
                'prediction_error': abs(pred_count - actual_count),
                'within_bounds': (actual_count >= predictions['lower_bound'] and
                                actual_count <= predictions['upper_bound']),
                'relative_error': abs(pred_count - actual_count) / max(1, actual_count) * 100
            }

            # Update performance history and save evaluation
            self.regional_performance_history[region_id].append(metrics)
            self.save_evaluation_metrics(region_id, metrics, prediction_date)

            # Print evaluation summary
            print(f"\n📊 Evaluation for {self.regions[region_id]['name']}:")
            print(f"Predicted Count: {pred_count}")
            print(f"Actual Count:    {actual_count}")
            print(f"Error:           {metrics['prediction_error']} events")
            print(f"Relative Error:  {metrics['relative_error']:.1f}%")
            print(f"Within Bounds:   {'✅' if metrics['within_bounds'] else '❌'}")

            return metrics

        except Exception as e:
            print(f"Error evaluating predictions for region {region_id}: {str(e)}")
            return None

    def save_evaluation_metrics(self, region_id, metrics, date_str):
        """Save evaluation metrics for a specific region."""
        try:
            # Convert date to string if it's a date object
            if hasattr(date_str, 'strftime'):
                date_str = date_str.strftime('%Y-%m-%d')

            year_month = date_str[:7]
            eval_dir = os.path.join(self.dirs['evaluations'], region_id, year_month)
            os.makedirs(eval_dir, exist_ok=True)

            evaluation_data = {
                'date': str(date_str),
                'region': self.regions[region_id]['name'],
                'metrics': {
                    'predicted_count': int(metrics['predicted_count']),
                    'actual_count': int(metrics['actual_count']),
                    'prediction_error': float(metrics['prediction_error']),
                    'relative_error': float(metrics['relative_error']),
                    'within_bounds': bool(metrics['within_bounds'])
                },
                'model_info': {
                    'last_training': str(self.model_dates.get('last_training_date')),
                    'last_optimization': str(self.model_dates.get('last_optimization_date'))
                }
            }

            filepath = os.path.join(eval_dir, f'evaluation_{date_str}.json')
            with open(filepath, 'w') as f:
                json.dump(evaluation_data, f, indent=4)

            self.metadata['evaluations'][region_id].append(evaluation_data)
            self._save_metadata()

        except Exception as e:
            print(f"Error saving evaluation metrics for region {region_id}: {str(e)}")

    def save_regional_visualization(self, start_date=None, end_date=None):
        """Create visualization for each region's performance."""
        try:
            print("\n📈 Generating Regional Performance Visualization")

            # Check for available data
            has_data = any(len(hist) > 0 for hist in self.regional_performance_history.values())
            if not has_data:
                print("No performance data available for visualization")
                return

            # Create subplots for each region
            n_regions = len(self.regions)
            fig, axes = plt.subplots(n_regions, 1, figsize=(15, 5*n_regions))

            for i, (region_id, region_info) in enumerate(self.regions.items()):
                if not self.regional_performance_history[region_id]:
                    continue

                df = pd.DataFrame(self.regional_performance_history[region_id])
                ax = axes[i] if n_regions > 1 else axes

                # Plot predicted vs actual
                self._plot_region_performance(ax, df, region_info)
                self._add_performance_metrics(ax, df)

            plt.tight_layout()

            # Save the plot
            self._save_visualization_plot()

            # Print performance summary
            self._print_performance_summary()

        except Exception as e:
            print(f"Error creating visualization: {str(e)}")
            import traceback
            print(traceback.format_exc())

    def _plot_region_performance(self, ax, df, region_info):
        """Plot performance data for a specific region."""
        # Plot predicted values
        ax.plot(df['date'], df['predicted_count'],
               label='Predicted', marker='o', linestyle='-',
               color=region_info.get('color', 'blue'))

        # Plot actual values
        ax.plot(df['date'], df['actual_count'],
               label='Actual', marker='x', linestyle='-',
               color='green')

        # Add confidence interval
        ax.fill_between(df['date'],
                       df['predicted_count'] * 0.9,
                       df['predicted_count'] * 1.1,
                       alpha=0.2, color=region_info.get('color', 'blue'),
                       label='90% Confidence Interval')

        # Customize plot
        ax.set_title(f'{region_info["name"]} Prediction Performance')
        ax.set_xlabel('Date')
        ax.set_ylabel('Number of Earthquakes')
        ax.legend()
        ax.grid(True, alpha=0.3)
        ax.tick_params(axis='x', rotation=45)

    def _add_performance_metrics(self, ax, df):
        """Add performance metrics to plot."""
        avg_error = df['prediction_error'].mean()
        accuracy = (df['within_bounds'].sum() / len(df)) * 100
        text = f'Avg Error: {avg_error:.1f}\nAccuracy: {accuracy:.1f}%'
        ax.text(0.02, 0.98, text, transform=ax.transAxes,
               verticalalignment='top',
               bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

    def _save_visualization_plot(self):
        """Save visualization plot to file."""
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        plot_path = os.path.join(
            self.dirs['plots'],
            f'regional_performance_{timestamp}.png'
        )
        plt.savefig(plot_path, bbox_inches='tight', dpi=300)
        plt.close()
        print(f"✅ Regional visualization saved: {plot_path}")

    def _print_performance_summary(self):
        """Print summary of performance metrics for all regions."""
        print("\n📊 Overall Performance Summary")
        print("-" * 40)
        for region_id, region_info in self.regions.items():
            if self.regional_performance_history[region_id]:
                df = pd.DataFrame(self.regional_performance_history[region_id])
                print(f"\n{region_info['name']}:")
                print(f"Average Error: {df['prediction_error'].mean():.1f} events")
                print(f"Accuracy: {(df['within_bounds'].sum() / len(df)) * 100:.1f}%")
        print("-" * 40)

# Part 5: Pipeline Execution (Lynn)
    def run_baseline_training(self, days_to_process=31):
        """Execute baseline training for each region."""
        print("\n🚀 Initializing Regional Baseline Training Pipeline")
        print("=" * 60)

        # Set date range
        end_date = datetime.now() - timedelta(days=1)
        start_date = end_date - timedelta(days=days_to_process)

        print(f"\n📅 Processing Range: {start_date.date()} to {end_date.date()}")
        print("-" * 60)

        try:
            # Fetch historical data
            historical_data = self.fetch_earthquake_data(start_date, end_date)
            if historical_data is None:
                print("❌ Failed to fetch historical data")
                return False

            # Process data by region
            regional_data = self.process_regional_data(historical_data)

            # Train models for each region
            for region_id, region_df in regional_data.items():
                if region_id == 'other' or len(region_df) == 0:
                    continue

                print(f"\n🔄 Processing {self.regions[region_id]['name']}")

                # Prepare sequences and train
                sequences, targets = self.prepare_regional_sequences(region_df, region_id)
                if sequences is not None and targets is not None:
                    self.train_regional_model(region_id, sequences, targets)

                    # Generate and evaluate predictions
                    predictions = self.predict_regional_events(region_id, region_df)
                    if predictions:
                        # Get actual data for next day
                        next_day = end_date + timedelta(days=1)
                        actual_data = self.fetch_earthquake_data(
                            start_time=next_day,
                            end_time=next_day + timedelta(days=1)
                        )
                        if actual_data is not None:
                            actual_regional = self.process_regional_data(actual_data)
                            if region_id in actual_regional:
                                self.evaluate_regional_predictions(
                                    region_id,
                                    predictions,
                                    actual_regional[region_id],
                                    next_day.date()
                                )

            # Save final visualization
            self.save_regional_visualization()
            print("\n✅ Regional Baseline Training Completed")
            return True

        except Exception as e:
            print(f"\n❌ Error in baseline training: {str(e)}")
            return False

    def run_continuous_monitoring(self, update_interval=3600):
        """Run continuous monitoring for all regions."""
        try:
            print("\n🔄 Starting Regional Continuous Monitoring")
            print("=" * 60)
            print(f"Update Interval: {update_interval} seconds")

            while True:
                current_time = datetime.now()
                process_date = current_time - timedelta(days=1)

                print(f"\n📅 Processing Data for: {process_date.date()}")
                print("-" * 60)

                # Fetch and process data
                data = self.fetch_earthquake_data(
                    start_time=process_date,
                    end_time=current_time
                )

                if data is not None:
                    regional_data = self.process_regional_data(data)

                    # Process each region
                    for region_id, region_df in regional_data.items():
                        if region_id == 'other' or len(region_df) == 0:
                            continue

                        print(f"\n🔄 Processing {self.regions[region_id]['name']}")

                        # Generate predictions
                        predictions = self.predict_regional_events(region_id, region_df)

                        if predictions:
                            # Get actual data for current period
                            actual_data = self.fetch_earthquake_data(
                                start_time=current_time.replace(hour=0, minute=0, second=0),
                                end_time=current_time
                            )

                            if actual_data is not None:
                                actual_regional = self.process_regional_data(actual_data)
                                if region_id in actual_regional:
                                    # Evaluate predictions
                                    metrics = self.evaluate_regional_predictions(
                                        region_id,
                                        predictions,
                                        actual_regional[region_id],
                                        current_time.date()
                                    )

                                    # Optimize model if needed
                                    if metrics and metrics['relative_error'] > 20:  # 20% threshold
                                        self.optimize_regional_model(
                                            region_id,
                                            actual_regional[region_id],
                                            metrics
                                        )

                    # Update visualizations
                    self.save_regional_visualization()

                # Schedule next update
                next_update = datetime.now() + timedelta(seconds=update_interval)
                print(f"\n⏰ Next Update: {next_update.strftime('%Y-%m-%d %H:%M:%S')}")
                print("=" * 60)
                time.sleep(update_interval)

        except KeyboardInterrupt:
            print("\n👋 Monitoring stopped by user")
        except Exception as e:
            print(f"\n❌ Monitoring error: {str(e)}")
            raise

## 9.Pipeline Initialization and Execution
* After running, you will be prompted to enter (1): Training or (2): Monitoring
** If (1) is selected, you will be prompted to enter a number 1-31 (default is 31) for days to train
** If (2) is selected, you will be prompted to enter the update interval in seconds (default is 3600)

In [52]:
# 9. Pipeline Initialization and Execution

import os
from google.colab import drive

# Define seismic regions
SEISMIC_REGIONS = {
    'pacific_northwest': {
        'name': 'Pacific Northwest',
        'bounds': {'min_lat': 40.0, 'max_lat': 49.0, 'min_lon': -125.0, 'max_lon': -116.0},
        'description': 'Cascadia Subduction Zone region',
        'color': '#1f77b4'
    },
    'california': {
        'name': 'California',
        'bounds': {'min_lat': 32.0, 'max_lat': 42.0, 'min_lon': -124.0, 'max_lon': -114.0},
        'description': 'San Andreas Fault region',
        'color': '#ff7f0e'
    },
    'alaska': {
        'name': 'Alaska',
        'bounds': {'min_lat': 52.0, 'max_lat': 71.0, 'min_lon': -169.0, 'max_lon': -130.0},
        'description': 'Alaska-Aleutian region',
        'color': '#2ca02c'
    },
    'hawaii': {
        'name': 'Hawaii',
        'bounds': {'min_lat': 18.0, 'max_lat': 23.0, 'min_lon': -160.0, 'max_lon': -154.0},
        'description': 'Hawaiian volcanic region',
        'color': '#d62728'
    },
    'central_us': {
        'name': 'Central US',
        'bounds': {'min_lat': 35.0, 'max_lat': 40.0, 'min_lon': -97.0, 'max_lon': -89.0},
        'description': 'New Madrid Seismic Zone',
        'color': '#9467bd'
    }
}

def run_earthquake_pipeline():
    try:
        # Mount Google Drive
        drive.mount('/content/drive')

        # Set up base directory
        base_path = '/content/drive/My Drive/earthquake_data'

        # Initialize pipeline
        pipeline = RegionalEarthquakePipeline(drive_path=base_path)

        print("\nMonitoring the following seismic regions:")
        for region_id, info in SEISMIC_REGIONS.items():
            print(f"- {info['name']}: {info['description']}")

        # Choose pipeline mode
        mode = input("\nSelect mode (1: Training, 2: Monitoring): ").strip()

        if mode == "1":
            days = int(input("Enter number of days for training (default 31): ") or "31")
            print("\nStarting baseline training...")
            pipeline.run_baseline_training(days_to_process=days)
        elif mode == "2":
            interval = int(input("Enter update interval in seconds (default 3600): ") or "3600")
            print("\nStarting continuous monitoring...")
            pipeline.run_continuous_monitoring(update_interval=interval)
        else:
            print("Invalid mode selected")
            return

        print("\nPipeline execution completed successfully")

    except Exception as e:
        print(f"Pipeline execution failed: {str(e)}")
        raise

if __name__ == "__main__":
    run_earthquake_pipeline()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Added missing metadata field: creation_date
Added missing metadata field: data_dates
Added missing metadata field: model_versions
Added missing metadata field: predictions
Added missing metadata field: evaluations
Added missing metadata field: pipeline_config

Monitoring the following seismic regions:
- Pacific Northwest: Cascadia Subduction Zone region
- California: San Andreas Fault region
- Alaska: Alaska-Aleutian region
- Hawaii: Hawaiian volcanic region
- Central US: New Madrid Seismic Zone

Select mode (1: Training, 2: Monitoring): 2
Enter update interval in seconds (default 3600): 15

Starting continuous monitoring...

🔄 Starting Regional Continuous Monitoring
Update Interval: 15 seconds

📅 Processing Data for: 2024-11-19
------------------------------------------------------------
Fetching data from 2024-11-19 to 2024-11-21

Data Collection Summary:
-

Here's what's missing and what needs to be added:

Currently Implemented:

* Deep Learning: Via the transformer model for regression
* Basic Regression: Daily earthquake count prediction

Missing Components:

* Classification (from Steve's document): Could add earthquake severity classification (e.g., minor, moderate, major)
* Dimensionality Reduction: Could apply PCA or t-SNE to analyze patterns in multi-dimensional features (magnitude, depth, location)
* More advanced regression: Could add multivariate regression using additional features