# **Earthquake Prediction Pipeline Documentation**


## **Overview**

The Earthquake Prediction Pipeline is a comprehensive system that automates the collection, processing, and analysis of USGS earthquake data to predict future seismic activity. The pipeline implements a transformer-based model that learns from historical patterns to predict the number of earthquakes likely to occur in the next 24-hour period.

### **Key Features**

*   Automated USGS data collection and processing
*   Daily data segmentation and storage
*   Transformer-based sequence modeling
*   Continuous prediction and evaluation
*   Automated model optimization
*   Performance visualization and tracking
*   Modular architecture with comprehensive error handling



### **System Requirements**

Python 3.x w/Required Libraries:

*   pandas
*   numpy
*   torch (PyTorch)
*   requests
*   matplotlib
*   seaborn
*   scikit-learn


### **Directory Structure**


```
/earthquake_data/
├── data/             # Raw daily earthquake data
│   └── YYYY-MM/      # Organized by year-month
├── models/           # Saved model checkpoints
├── predictions/      # Daily prediction outputs
├── plots/           # Performance visualizations
└── evaluations/     # Evaluation metrics
```

### **Core Components**

1. Data Collection and Processing

*   USGS API Integration: Automated fetching of earthquake data
*   Data Filtering: Configurable magnitude threshold (default: 2.5)
*   Data Storage: Daily CSV files with comprehensive metadata
*   Feature Extraction: Geographic and seismic parameters

2. Model Architecture

*   Type: Transformer-based sequence model

*   Components:
 *   Input projection layer
 *   Positional encoding
 *   Multi-head attention layers
 *   Feed-forward networks
 *   Output projection layer

*   Parameters:
 *   Sequence Length: Configurable (default: 7 days)
 *   Hidden Dimensions: 64
 *   Number of Layers: 2
 *   Attention Heads: 4

3. Training Pipeline
*    **Baseline Training**

 1. Historical Data Processing

  *   Fetches specified number of days (default: 31)
  *   Splits data into daily segments
  *   Creates initial training sequences

 2. Model Training

  *   Sequences created from historical data
  *   Loss function: Mean Squared Error
  *   Optimizer: Adam
  *   Checkpoint saving based on performance

 3. Evaluation

  *   Daily prediction accuracy
  *   Error metrics calculation
  *   Performance visualization
  *   Metadata tracking

*    **Continuous Monitoring**

 1. Automated Data Collection

  *   Configurable update interval (default: 1 hour)
  *   Real-time USGS data integration

 2. Prediction Generation

  *   Daily earthquake count predictions
  *   Confidence interval calculation
  *   Prediction storage and tracking

 3. Model Optimization

  *   Performance evaluation against actual data
  *   Incremental model updates
  *   Automated checkpoint management

4. Performance Metrics

  *   Prediction Error (absolute and relative)
  *   Confidence Interval Coverage
  *   Standard Deviation Analysis
  *   Visualization of Trends

### **File Naming Conventions**

*   Data Files: *earthquake_data_YYYY-MM-DD.csv*
*   Predictions: *predictions_YYYY-MM-DD.csv*
*   Model Checkpoints: *model_checkpoint_YYYYMMDD_HHMMSS.pth*
*   Visualizations: *performance_Ndays_YYYYMMDD_HHMMSS.png*
*   Evaluations: *evaluation_YYYY-MM-DD.json*


### **Usage Examples**
##### **Initialize Pipeline**
```
pipeline = EarthquakePipeline(drive_path='/path/to/base/directory')
```
##### **Run Baseline Training**
```
pipeline.run_baseline_training(days_to_process=31)
```
##### **Start Continuous Monitoring**
```pipeline.run_continuous_monitoring(update_interval=3600)  # 1 hour interval
```

### **Future Enhancements**

*   Integration with additional data sources
*   Enhanced feature engineering
*   Advanced visualization capabilities
*   Automated parameter optimization
*   Real-time alerting system
*   Web interface for monitoring

### **Model Misc Info**
*   Authors: Stephen Moore, Steven Willhelm, Lynn Yingling
*   Version: 4.0
*   Last Updated: 19 November 2024

## Imports

In [69]:
# Required imports
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader, TensorDataset
from datetime import datetime, timedelta
import requests
import os
import json
import glob
import time
from sklearn.preprocessing import MinMaxScaler
import pickle
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import drive

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

## Create Drive Directory

In [70]:
def setup_drive_directory(base_path='earthquake_data'):
    """Mount Google Drive and create necessary directories"""
    drive.mount('/content/drive')
    full_path = f'/content/drive/My Drive/{base_path}'
    if not os.path.exists(full_path):
        os.makedirs(full_path)
        print(f"Created directory: {full_path}")
    else:
        print(f"Directory already exists: {full_path}")
    return full_path

## Gather USGS Data

In [71]:
def fetch_earthquake_data(self, start_time=None, end_time=None, min_magnitude=2.5):
    """
    Fetch earthquake data from USGS API for a specified time period.

    Args:
        start_time (datetime): Start date for data collection. Defaults to yesterday if None.
        end_time (datetime): End date for data collection. Defaults to today if None.
        min_magnitude (float): Minimum earthquake magnitude to include (default: 2.5)

    Returns:
        pandas.DataFrame: DataFrame containing earthquake data with columns:
            - time: Timestamp of earthquake occurrence
            - magnitude: Earthquake magnitude
            - place: Location description
            - longitude: Geographic longitude
            - latitude: Geographic latitude
            - depth: Depth in kilometers
            - type: Event type
            - alert: Alert level (if any)
            - tsunami: Tsunami warning flag
            - sig: Significance value

    Raises:
        requests.RequestException: If API request fails
        ValueError: If date parameters are invalid

    Example:
        >>> start = datetime(2024, 11, 1)
        >>> end = datetime(2024, 11, 2)
        >>> data = pipeline.fetch_earthquake_data(start, end, min_magnitude=3.0)
    """
    try:
        base_url = "https://earthquake.usgs.gov/fdsnws/event/1/query"

        if start_time is None:
            start_time = datetime.now() - timedelta(days=1)

        if end_time is None:
            end_time = start_time + timedelta(days=1)

        params = {
            'format': 'geojson',
            'starttime': start_time.strftime('%Y-%m-%d'),
            'endtime': end_time.strftime('%Y-%m-%d'),
            'minmagnitude': min_magnitude,
            'orderby': 'time'
        }

        print(f"Fetching data for: {start_time.strftime('%Y-%m-%d')}")

        response = requests.get(base_url, params=params)
        response.raise_for_status()

        data = response.json()
        earthquakes = data['features']

        processed_data = []
        for quake in earthquakes:
            properties = quake['properties']
            coordinates = quake['geometry']['coordinates']

            processed_data.append({
                'time': datetime.fromtimestamp(properties['time'] / 1000),
                'magnitude': properties['mag'],
                'place': properties['place'],
                'longitude': coordinates[0],
                'latitude': coordinates[1],
                'depth': coordinates[2],
                'type': properties['type'],
                'alert': properties.get('alert', 'none'),
                'tsunami': properties['tsunami'],
                'sig': properties['sig']
            })

        df = pd.DataFrame(processed_data)

        if len(df) > 0:
            print("\nData Collection Summary:")
            print("-" * 30)
            print(f"Total earthquakes collected: {len(df)}")
            print(f"Date range: {df['time'].min()} to {df['time'].max()}")
            print(f"Magnitude range: {df['magnitude'].min():.1f} to {df['magnitude'].max():.1f}")
            print("-" * 30)

        return df

    except Exception as e:
        print(f"Error fetching data: {e}")
        return None

In [72]:
def fetch_training_data(self, start_date, end_date):
    """Fetch training data for specified date range"""
    df = self.fetch_earthquake_data(
        start_time=start_date,
        end_time=end_date,
        min_magnitude=2.5
    )

    if df is not None:
        # Save with date range
        filename = f'earthquake_data_{start_date.strftime("%Y%m%d")}_to_{end_date.strftime("%Y%m%d")}.csv'
        filepath = os.path.join(self.drive_path, filename)
        df.to_csv(filepath, index=False)

        return df, filepath
    return None, None

In [73]:
def fetch_new_data(self, last_timestamp):
    """Fetch only new data since last recorded timestamp"""
    df = self.fetch_earthquake_data(
        start_time=last_timestamp,
        end_time=datetime.now(),
        min_magnitude=2.5
    )

    if df is not None:
        # Filter to only new events
        new_data = df[df['time'] > last_timestamp]

        if len(new_data) > 0:
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            filename = f'new_data_{timestamp}.csv'
            filepath = os.path.join(self.dirs['data'], filename)
            new_data.to_csv(filepath, index=False)

            return new_data, filepath
    return None, None

## Create Data Structure for Transformer

In [74]:
class EarthquakeDataset(Dataset):
    """
    Custom dataset for handling earthquake sequence data.

    This dataset creates sequences of earthquake data for training the transformer
    model, where each sequence consists of multiple days of data points.

    Args:
        features (torch.Tensor): Input features for each earthquake event
        targets (torch.Tensor): Target values for prediction
        seq_length (int): Number of days in each sequence

    Attributes:
        features (torch.Tensor): Storage for input features
        targets (torch.Tensor): Storage for target values
        seq_length (int): Length of each sequence

    Methods:
        __len__: Returns the number of sequences in the dataset
        __getitem__: Returns a sequence and its corresponding target

    Example:
        >>> features = torch.randn(100, 5)  # 100 events with 5 features each
        >>> targets = torch.randn(100, 1)   # Target count for each event
        >>> dataset = EarthquakeDataset(features, targets, seq_length=7)
        >>> sequence, target = dataset[0]  # Get first sequence and its target
    """

    def __init__(self, features, targets, seq_length):
        """
        Initialize the dataset with features, targets, and sequence length.

        Args:
            features (torch.Tensor): Input features for each earthquake event
            targets (torch.Tensor): Target values for prediction
            seq_length (int): Number of days to include in each sequence
        """
        self.features = features
        self.targets = targets
        self.seq_length = seq_length

    def __len__(self):
        """Return the number of possible sequences in the dataset."""
        return max(0, len(self.features) - self.seq_length)

    def __getitem__(self, idx):
        """
        Get a sequence of features and its corresponding target.

        Args:
            idx (int): Index of the sequence to retrieve

        Returns:
            tuple: (feature_sequence, target) where feature_sequence is a sequence of
                  'seq_length' days of data and target is the next day's parameters
        """
        feature_seq = self.features[idx:idx + self.seq_length]
        target = self.targets[idx + self.seq_length - 1]
        return feature_seq, target

## Create Transformer

In [75]:
class TransformerPredictor(nn.Module):
    """
    Transformer-based model for earthquake count prediction.

    This model uses a transformer architecture to learn temporal patterns in
    earthquake sequences and predict future occurrence counts.

    Architecture:
        - Input projection layer
        - Positional encoding
        - Transformer encoder layers
        - Output projection layers

    Args:
        input_dim (int): Dimension of input features
        hidden_dim (int): Dimension of hidden layers
        num_layers (int): Number of transformer layers
        num_heads (int): Number of attention heads
        max_seq_length (int): Maximum sequence length (default: 7)

    Attributes:
        hidden_dim (int): Dimension of hidden layers
        input_projection (nn.Linear): Input projection layer
        pos_encoding (nn.Parameter): Positional encoding
        transformer (nn.TransformerEncoder): Transformer encoder
        output_projection (nn.Sequential): Output projection layers

    Example:
        >>> model = TransformerPredictor(
                input_dim=1,
                hidden_dim=64,
                num_layers=2,
                num_heads=4
            )
        >>> input_sequence = torch.randn(32, 7, 1)  # (batch_size, seq_length, features)
        >>> predictions = model(input_sequence)
    """
    def __init__(self, input_dim, hidden_dim, num_layers, num_heads, max_seq_length=7):
        super().__init__()
        self.hidden_dim = hidden_dim
        self.input_projection = nn.Linear(input_dim, hidden_dim)
        self.pos_encoding = nn.Parameter(torch.randn(1, max_seq_length, hidden_dim))

        encoder_layer = nn.TransformerEncoderLayer(
            d_model=hidden_dim,
            nhead=num_heads,
            dim_feedforward=hidden_dim*4,
            dropout=0.1,
            batch_first=True
        )
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers)

        self.output_projection = nn.Sequential(
            nn.Linear(hidden_dim, hidden_dim//2),
            nn.ReLU(),
            nn.Linear(hidden_dim//2, 1)
        )

    def forward(self, x):
        # Ensure input is the right shape (batch_size, seq_length, input_dim)
        if len(x.shape) == 2:
            x = x.unsqueeze(-1)

        # Project input to hidden dimension
        x = self.input_projection(x)

        # Add positional encoding
        x = x + self.pos_encoding[:, :x.size(1)]

        # Apply transformer
        x = self.transformer(x)

        # Take the last sequence element and project to output
        x = x[:, -1]
        x = self.output_projection(x)

        return x

## Create Pipeline

In [76]:
class EarthquakePipeline:
    """
    A comprehensive pipeline for earthquake prediction using USGS data and transformer models.

    The pipeline implements automated data collection, processing, model training, and continuous
    monitoring capabilities for predicting earthquake occurrences.

    Attributes:
        drive_path (str): Base directory path for storing all pipeline data
        seq_length (int): Number of days to use in prediction sequences (default: 7)
        prediction_horizon (int): Days ahead to predict (default: 1)
        dirs (dict): Dictionary of directory paths for different data types
        model_dates (dict): Tracking dates for pipeline operations
        metadata (dict): Pipeline metadata and configuration information
        performance_history (list): List of historical prediction performance metrics

    Directory Structure:
        /drive_path/
        ├── data/          - Raw earthquake data organized by date
        ├── models/        - Model checkpoints and configurations
        ├── predictions/   - Prediction outputs and evaluations
        ├── plots/         - Performance visualizations
        └── evaluations/   - Detailed evaluation metrics

    Example:
        >>> pipeline = EarthquakePipeline('/path/to/data')
        >>> pipeline.run_baseline_training(days_to_process=31)
        >>> pipeline.run_continuous_monitoring(update_interval=3600)
    """

    def __init__(self, drive_path, seq_length=7, prediction_horizon=1):
        """Initialize earthquake prediction pipeline."""
        self.drive_path = drive_path

        # Create directory structure
        self.dirs = {
            'data': os.path.join(drive_path, 'data'),
            'models': os.path.join(drive_path, 'models'),
            'predictions': os.path.join(drive_path, 'predictions'),
            'plots': os.path.join(drive_path, 'plots'),
            'evaluations': os.path.join(drive_path, 'evaluations')
        }

        for dir_path in self.dirs.values():
            os.makedirs(dir_path, exist_ok=True)

        # Initialize parameters
        self.seq_length = seq_length
        self.prediction_horizon = prediction_horizon
        self.performance_history = []

        # Define features and targets (for metadata)
        self.feature_columns = ['count']  # Simplified for count prediction
        self.target_columns = ['count']   # Single target

        # Date tracking system
        self.model_dates = {
            'last_training_date': None,
            'last_optimization_date': None,
            'latest_data_date': None,
            'prediction_target_date': None,
            'first_data_date': None
        }

        # Initialize model
        self.model = TransformerPredictor(
            input_dim=1,  # For count prediction
            hidden_dim=64,
            num_layers=2,
            num_heads=4,
            max_seq_length=seq_length
        )

        # Initialize metadata
        self.metadata_path = os.path.join(drive_path, 'pipeline_metadata.json')
        self._load_or_create_metadata()

        # Save initial setup
        self._save_metadata()

    def _create_new_metadata(self):
        """Create new metadata structure with all required fields."""
        self.metadata = {
            'creation_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'data_dates': [],  # List to store dates of processed data
            'model_versions': [],  # List to store model version information
            'predictions': [],  # List to store prediction records
            'evaluations': [],  # List to store evaluation results
            'pipeline_config': {
                'sequence_length': self.seq_length,
                'prediction_horizon': self.prediction_horizon,
                'feature_columns': self.feature_columns,
                'target_columns': self.target_columns
            }
        }
        self._save_metadata()

    def _ensure_metadata_structure(self):
        """Ensure all required fields exist in metadata."""
        required_fields = {
            'creation_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'data_dates': [],
            'model_versions': [],
            'predictions': [],
            'evaluations': [],
            'pipeline_config': {
                'sequence_length': self.seq_length,
                'prediction_horizon': self.prediction_horizon,
                'feature_columns': self.feature_columns,
                'target_columns': self.target_columns
            }
        }

        # Add any missing fields
        for key, default_value in required_fields.items():
            if key not in self.metadata:
                self.metadata[key] = default_value
                print(f"Added missing metadata field: {key}")

        # Add any missing nested fields in pipeline_config
        if 'pipeline_config' in self.metadata:
            for key, value in required_fields['pipeline_config'].items():
                if key not in self.metadata['pipeline_config']:
                    self.metadata['pipeline_config'][key] = value
                    print(f"Added missing config field: {key}")

    def _save_metadata(self, verbose=False):
        """
        Save pipeline metadata to JSON file.

        Handles serialization of metadata including:
        - Pipeline configuration and status
        - Data tracking and ranges
        - Model versions and training history
        - Prediction history and performance metrics
        - File paths and timestamps
        """
        try:
            from datetime import date, datetime
            import numpy as np
            import pandas as pd

            # Create comprehensive metadata structure
            metadata = {
                'last_update': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                'pipeline_info': {
                    'creation_date': self.metadata.get('creation_date'),
                    'sequence_length': self.seq_length,
                    'feature_columns': self.feature_columns,
                    'target_columns': self.target_columns
                },
                'data_range': {
                    'start': self.model_dates.get('first_data_date'),
                    'end': self.model_dates.get('latest_data_date'),
                    'total_days_processed': len(self.metadata.get('data_dates', []))
                },
                'training_info': {
                    'last_training': self.model_dates.get('last_training_date'),
                    'last_optimization': self.model_dates.get('last_optimization_date'),
                    'model_versions': self.metadata.get('model_versions', [])
                },
                'prediction_stats': self.performance_history,
                'file_paths': {
                    'data_directory': self.dirs['data'],
                    'model_directory': self.dirs['models'],
                    'predictions_directory': self.dirs['predictions'],
                    'plots_directory': self.dirs['plots']
                }
            }

            # Ensure metadata is JSON serializable
            def convert_to_serializable(obj):
                if isinstance(obj, (np.integer, np.floating)):
                    return float(obj)
                elif isinstance(obj, np.ndarray):
                    return obj.tolist()
                elif isinstance(obj, datetime):
                    return obj.strftime('%Y-%m-%d %H:%M:%S')
                elif isinstance(obj, date):  # Add handling for date objects
                    return obj.strftime('%Y-%m-%d')
                elif pd.isnull(obj):  # Changed from isna to isnull
                    return None
                return obj

            # Process all entries recursively
            def process_dict(d):
                result = {}
                for k, v in d.items():
                    if isinstance(v, dict):
                        result[k] = process_dict(v)
                    elif isinstance(v, list):
                        result[k] = [
                            process_dict(item) if isinstance(item, dict)
                            else convert_to_serializable(item)
                            for item in v
                        ]
                    else:
                        result[k] = convert_to_serializable(v)
                return result

            # Process metadata
            print("\nProcessing metadata for saving...")
            serializable_metadata = process_dict(metadata)

            # Save to file with pretty printing
            if verbose:
                print("Processing metadata for saving...")
            with open(self.metadata_path, 'w') as f:
                json.dump(serializable_metadata, f, indent=4)
            if verbose:
                print(f"Metadata saved successfully to: {self.metadata_path}")

            # Save a backup copy with timestamp
            backup_path = os.path.join(
                os.path.dirname(self.metadata_path),
                f'metadata_backup_{datetime.now().strftime("%Y%m%d_%H%M%S")}.json'
            )
            with open(backup_path, 'w') as f:
                json.dump(serializable_metadata, f, indent=4)
            if verbose:
                print(f"Metadata backup saved to: {backup_path}")

        except Exception as e:
            print(f"\nError saving metadata: {str(e)}")
            print("Metadata path:", self.metadata_path)
            print("Error details:", str(e))

            # Additional debugging information
            print("\nMetadata structure:")
            for key, value in metadata.items():
                print(f"{key}: {type(value)}")

            raise

    def _load_or_create_metadata(self):
        """Initialize or load existing metadata with all required fields."""
        if os.path.exists(self.metadata_path):
            try:
                with open(self.metadata_path, 'r') as f:
                    self.metadata = json.load(f)
                # Ensure all required fields exist even in loaded metadata
                self._ensure_metadata_structure()
            except Exception as e:
                print(f"Error loading metadata: {str(e)}. Creating new metadata.")
                self._create_new_metadata()
        else:
            self._create_new_metadata()

    def get_metadata_summary(self):
        """
        Get a summary of current pipeline metadata.

        Returns:
            dict: Summary of pipeline state and history
        """
        return {
            'creation_date': self.metadata['creation_date'],
            'data_count': len(self.metadata['data_dates']),
            'model_versions': len(self.metadata['model_versions']),
            'predictions_made': len(self.metadata['predictions']),
            'latest_data': self.model_dates['latest_data_date'],
            'last_training': self.model_dates['last_training_date'],
            'last_optimization': self.model_dates['last_optimization_date']
        }

    def fetch_earthquake_data(self, start_time=None, end_time=None, min_magnitude=2.5):
        """
        Fetch earthquake data from USGS API for a specific time period.

        Args:
            start_time (datetime): Start of time period
            end_time (datetime): End of time period
            min_magnitude (float): Minimum earthquake magnitude to include
        """
        try:
            # Construct the query URL for the USGS API
            base_url = "https://earthquake.usgs.gov/fdsnws/event/1/query"

            # Format dates for the API
            if start_time is None:
                start_time = datetime.now() - timedelta(days=1)
            if end_time is None:
                end_time = datetime.now()

            params = {
                'format': 'geojson',
                'starttime': start_time.strftime('%Y-%m-%d'),
                'endtime': (end_time + timedelta(days=1)).strftime('%Y-%m-%d'),  # Add 1 day to include full end date
                'minmagnitude': min_magnitude,
                'orderby': 'time'
            }

            print(f"Fetching data from {params['starttime']} to {params['endtime']}")

            # Make the API request
            response = requests.get(base_url, params=params)
            response.raise_for_status()

            # Parse the JSON response
            data = response.json()
            earthquakes = data['features']

            processed_data = []
            for quake in earthquakes:
                properties = quake['properties']
                coordinates = quake['geometry']['coordinates']

                processed_data.append({
                    'time': datetime.fromtimestamp(properties['time'] / 1000),
                    'magnitude': properties['mag'],
                    'place': properties['place'],
                    'longitude': coordinates[0],
                    'latitude': coordinates[1],
                    'depth': coordinates[2],
                    'type': properties['type'],
                    'alert': properties.get('alert', 'none'),
                    'tsunami': properties['tsunami'],
                    'sig': properties['sig']
                })

            df = pd.DataFrame(processed_data)

            if len(df) > 0:
                print("\nData Collection Summary:")
                print("-" * 30)
                print(f"Total earthquakes collected: {len(df)}")
                print(f"Date range: {df['time'].min()} to {df['time'].max()}")
                print(f"Magnitude range: {df['magnitude'].min():.1f} to {df['magnitude'].max():.1f}")
                print("-" * 30)

            return df

        except Exception as e:
            print(f"Error fetching data: {e}")
            return None

    def prepare_data(self, df, for_training=True):
        """
        Process earthquake data into sequences for model training or prediction.

        Args:
            df (pandas.DataFrame): Raw earthquake data
            for_training (bool): If True, prepare data for training; if False, for prediction

        Returns:
            tuple: (sequence_tensor, target_tensor)
                - sequence_tensor: torch.Tensor of shape (n_sequences, seq_length, features)
                - target_tensor: torch.Tensor of shape (n_sequences, 1) for counts

        Notes:
            - Sequences are created by grouping earthquakes by day
            - Features are normalized using MinMaxScaler
            - Single-day support is implemented for prediction mode

        Example:
            >>> data = pipeline.fetch_earthquake_data(start_date, end_date)
            >>> sequences, targets = pipeline.prepare_data(data, for_training=True)
        """
        try:
            if df is None or len(df) == 0:
                print("No data to process")
                return None, None

            # Convert to daily counts
            df['date'] = pd.to_datetime(df['time']).dt.date
            daily_counts = df.groupby('date').size().reset_index(name='count')
            daily_counts = daily_counts.sort_values('date')

            # Create sequences - now supporting single day
            sequences = []
            targets = []

            # For single day, use the count directly
            if len(daily_counts) == 1:
                sequences = torch.FloatTensor([[daily_counts['count'].iloc[0]]])
                targets = torch.FloatTensor([[daily_counts['count'].iloc[0]]])
                return sequences, targets

            # For multiple days, create proper sequences
            for i in range(len(daily_counts) - 1):  # -1 to always have a target
                seq = daily_counts['count'].iloc[i:i+1].values  # Take current day
                target = daily_counts['count'].iloc[i+1]  # Next day is target
                sequences.append(seq)
                targets.append(target)

            if not sequences:
                return None, None

            sequences = torch.FloatTensor(sequences)
            targets = torch.FloatTensor(targets).reshape(-1, 1)

            return sequences, targets

        except Exception as e:
            print(f"Error preparing data: {str(e)}")
            return None, None

    def train_model(self, sequence_tensor, target_tensor, epochs=100, batch_size=32):
        """
        Train the transformer model on earthquake sequence data.

        Args:
            sequence_tensor (torch.Tensor): Input sequences of shape (n_sequences, seq_length, features)
            target_tensor (torch.Tensor): Target values of shape (n_sequences, 1)
            epochs (int): Number of training epochs (default: 100)
            batch_size (int): Batch size for training (default: 32)

        Training Process:
            1. Data is batched and shuffled using DataLoader
            2. Model is trained using MSE loss and Adam optimizer
            3. Best model is saved based on validation loss
            4. Progress is logged with detailed metrics

        Returns:
            None: Updates model in-place and saves checkpoints

        Example:
            >>> sequences, targets = pipeline.prepare_data(training_data)
            >>> pipeline.train_model(sequences, targets, epochs=150, batch_size=64)
        """
        try:
            if sequence_tensor is None or target_tensor is None:
                print("\n❌ No valid training data provided")
                return

            print("\n🔄 Starting Model Training")
            print("=" * 50)
            print(f"Training Details:")
            print(f"- Sequences: {len(sequence_tensor)}")
            print(f"- Batch Size: {batch_size}")
            print(f"- Epochs: {epochs}")
            print("-" * 50)

            criterion = nn.MSELoss()
            optimizer = torch.optim.Adam(self.model.parameters())

            dataset = TensorDataset(sequence_tensor, target_tensor)
            dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

            best_loss = float('inf')
            for epoch in range(epochs):
                total_loss = 0
                for sequences, targets in dataloader:
                    optimizer.zero_grad()
                    predictions = self.model(sequences)
                    loss = criterion(predictions, targets)
                    loss.backward()
                    optimizer.step()
                    total_loss += loss.item()

                avg_loss = total_loss / len(dataloader)
                if avg_loss < best_loss:
                    best_loss = avg_loss
                    self.save_model_checkpoint(epoch, avg_loss)
                    checkpoint_saved = "✓"
                else:
                    checkpoint_saved = " "

                if epoch % 10 == 0:
                    print(f"Epoch {epoch:3d}/{epochs} | Loss: {avg_loss:.4f} {checkpoint_saved}")

            print("\n✅ Training completed")
            print(f"Final Loss: {avg_loss:.4f}")
            print(f"Best Loss: {best_loss:.4f}")
            print("=" * 50)

        except Exception as e:
            print(f"\n❌ Training error: {str(e)}")

    def predict_next_day(self, recent_data):
        """
        Generate earthquake count predictions for the next day.

        Args:
            recent_data (pandas.DataFrame): Recent earthquake data for prediction

        Returns:
            dict: Prediction information including:
                - predicted_count (int): Predicted number of earthquakes
                - lower_bound (int): Lower bound of prediction interval
                - upper_bound (int): Upper bound of prediction interval

        Notes:
            - Confidence bounds are set at ±10% of predicted value
            - Predictions use the most recent sequence of data
            - Values are rounded to integers for practical use

        Example:
            >>> recent_data = pipeline.fetch_earthquake_data(start_date, end_date)
            >>> prediction = pipeline.predict_next_day(recent_data)
            >>> print(f"Predicted earthquakes: {prediction['predicted_count']}")
        """
        sequence_tensor, _ = self.prepare_data(recent_data, for_training=False)

        with torch.no_grad():
            predicted_count = self.model(sequence_tensor)
            # Take the last prediction since we only want the next day
            last_prediction = predicted_count[-1].item()

        prediction_range = {
            'predicted_count': int(last_prediction),
            'lower_bound': int(last_prediction * 0.9),
            'upper_bound': int(last_prediction * 1.1)
        }

        return prediction_range

    def evaluate_predictions(self, predictions, actual_data, prediction_date=None):
        """
        Evaluate prediction accuracy against actual earthquake data.

        Args:
            predictions (dict): Prediction data with counts and bounds
            actual_data (pandas.DataFrame): Actual earthquake data for the period
            prediction_date (datetime): Date of the prediction (default: None)

        Returns:
            dict: Evaluation metrics including:
                - date: Prediction date
                - predicted_count: Predicted number of earthquakes
                - actual_count: Actual number of earthquakes
                - prediction_error: Absolute error in count
                - relative_error: Percentage error
                - within_bounds: Boolean indicating if actual was within confidence bounds

        Notes:
            - Metrics are saved to evaluation directory
            - Results are added to performance history
            - Detailed logs are generated for analysis

        Example:
            >>> predictions = pipeline.predict_next_day(recent_data)
            >>> actual = pipeline.fetch_earthquake_data(target_date, target_date + timedelta(days=1))
            >>> metrics = pipeline.evaluate_predictions(predictions, actual, target_date)
        """
        try:
            actual_count = len(actual_data)
            pred_count = predictions['predicted_count']

            if prediction_date is None:
                prediction_date = actual_data['time'].dt.date.iloc[0]

            metrics = {
                'date': prediction_date,
                'predicted_count': pred_count,
                'actual_count': actual_count,
                'prediction_error': abs(pred_count - actual_count),
                'within_bounds': (actual_count >= predictions['lower_bound'] and
                                actual_count <= predictions['upper_bound']),
                'relative_error': abs(pred_count - actual_count) / actual_count * 100
            }

            self.performance_history.append(metrics)

            # Save evaluation metrics using string date
            self.save_evaluation_metrics(metrics, prediction_date.strftime('%Y-%m-%d'))

            print("\n📊 Prediction Evaluation")
            print("-" * 40)
            print(f"Date:            {prediction_date}")
            print(f"Predicted Count: {pred_count}")
            print(f"Actual Count:    {actual_count}")
            print(f"Error:           {metrics['prediction_error']} events")
            print(f"Relative Error:  {metrics['relative_error']:.1f}%")
            print(f"Within Bounds:   {'✅' if metrics['within_bounds'] else '❌'}")
            print(f"Days Tracked:    {len(self.performance_history)}")
            print("-" * 40)

            return metrics

        except Exception as e:
            print(f"\n❌ Error evaluating predictions: {str(e)}")
            return None

    def save_evaluation_metrics(self, metrics, date_str):
        """Save detailed evaluation metrics to JSON file"""
        try:
            os.makedirs(self.dirs['evaluations'], exist_ok=True)

            year_month = date_str[:7]
            eval_dir = os.path.join(self.dirs['evaluations'], year_month)
            os.makedirs(eval_dir, exist_ok=True)

            # Convert dates to strings for JSON serialization
            evaluation_data = {
                'date': date_str,
                'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                'metrics': {
                    'predicted_count': int(metrics['predicted_count']),
                    'actual_count': int(metrics['actual_count']),
                    'prediction_error': float(metrics['prediction_error']),
                    'relative_error': float(metrics['relative_error']),
                    'within_bounds': bool(metrics['within_bounds'])
                },
                'model_info': {
                    'last_training': str(self.model_dates.get('last_training_date')),
                    'last_optimization': str(self.model_dates.get('last_optimization_date'))
                }
            }

            filename = f'evaluation_{date_str}.json'
            filepath = os.path.join(eval_dir, filename)
            with open(filepath, 'w') as f:
                json.dump(evaluation_data, f, indent=4)

            print(f"📊 Evaluation metrics saved: {filepath}")

        except Exception as e:
            print(f"❌ Error saving evaluation metrics: {str(e)}")

    def analyze_trends(self, df):
        df['date'] = pd.to_datetime(df['time']).dt.date
        daily_counts = df.groupby('date').size()

        trends = {
            'moving_avg_7d': daily_counts.rolling(7).mean(),
            'moving_avg_30d': daily_counts.rolling(30).mean(),
            'std_dev': daily_counts.rolling(7).std(),
            'min_count': daily_counts.rolling(7).min(),
            'max_count': daily_counts.rolling(7).max()
        }

        return trends

    def optimize_model(self, new_data, metrics):
        """Optimize model with single day of data"""
        try:
            if metrics is None:
                return

            sequence_tensor, target_tensor = self.prepare_data(new_data)
            if sequence_tensor is None or target_tensor is None:
                return

            print("\n🔄 Optimizing Model")
            optimizer = torch.optim.Adam(self.model.parameters(), lr=0.0001)
            criterion = nn.MSELoss()

            # Run a few optimization steps
            for step in range(5):
                optimizer.zero_grad()
                predictions = self.model(sequence_tensor)
                loss = criterion(predictions, target_tensor)
                loss.backward()
                optimizer.step()

                if step % 2 == 0:
                    print(f"Step {step}: Loss = {loss.item():.4f}")

        except Exception as e:
            print(f"Error optimizing model: {str(e)}")

    def save_model_checkpoint(self, epoch, loss, metrics=None):
        """Save model checkpoint with reduced logging"""
        try:
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

            if self.model_dates['latest_data_date']:
                model_dir = os.path.join(self.dirs['models'],
                                      self.model_dates['latest_data_date'][:7])
                os.makedirs(model_dir, exist_ok=True)
            else:
                model_dir = self.dirs['models']

            # Save model state
            model_path = os.path.join(model_dir, f'model_checkpoint_{timestamp}.pth')
            config_path = os.path.join(model_dir, f'model_config_{timestamp}.json')

            checkpoint = {
                'epoch': epoch,
                'model_state_dict': self.model.state_dict(),
                'loss': loss,
                'metrics': metrics,
                'timestamp': timestamp,
                'model_dates': self.model_dates.copy()
            }

            torch.save(checkpoint, model_path)

            # Save minimal configuration
            config = {
                'timestamp': timestamp,
                'loss': float(loss),
                'metrics': metrics
            }

            with open(config_path, 'w') as f:
                json.dump(config, f)

        except Exception as e:
            print(f"Error saving checkpoint: {str(e)}")

    def load_latest_model(self):
        try:
            checkpoint_pattern = os.path.join(self.dirs['models'], '**',
                                          'model_checkpoint_*.pth')
            model_files = glob.glob(checkpoint_pattern, recursive=True)

            if not model_files:
                print("No saved models found")
                return False

            latest_model = max(model_files, key=os.path.getctime)
            checkpoint = torch.load(latest_model)

            self.model.load_state_dict(checkpoint['model_state_dict'])
            if 'model_dates' in checkpoint:
                self.model_dates.update(checkpoint['model_dates'])

            print(f"Loaded model from: {latest_model}")
            print(f"Checkpoint epoch: {checkpoint['epoch']}")
            print(f"Loss: {checkpoint['loss']}")

            return True

        except Exception as e:
            print(f"Error loading model: {str(e)}")
            return False

    def save_daily_data(self, df, date=None):
        """
        Save daily earthquake data with comprehensive metadata.

        Args:
            df (DataFrame): Earthquake data to save
            date (str, optional): Specific date for the data. Defaults to current date.

        Example:
            >>> data = pipeline.fetch_earthquake_data('day')
            >>> pipeline.save_daily_data(data, '2024-11-18')
        """
        if date is None:
            date = datetime.now().strftime('%Y-%m-%d')

        # Create dated directory structure
        year_month = date[:7]  # YYYY-MM
        data_dir = os.path.join(self.dirs['data'], year_month)
        os.makedirs(data_dir, exist_ok=True)

        # Save data
        filename = f'earthquake_data_{date}.csv'
        filepath = os.path.join(data_dir, filename)
        df.to_csv(filepath, index=False)

        # Save daily summary
        summary_path = os.path.join(data_dir, f'summary_{date}.json')
        summary = {
            'date': date,
            'total_earthquakes': len(df),
            'magnitude_range': {
                'min': float(df['magnitude'].min()),
                'max': float(df['magnitude'].max()),
                'mean': float(df['magnitude'].mean())
            },
            'location_bounds': {
                'lat': {'min': float(df['latitude'].min()),
                       'max': float(df['latitude'].max())},
                'lon': {'min': float(df['longitude'].min()),
                       'max': float(df['longitude'].max())}
            },
            'depth_stats': {
                'min': float(df['depth'].min()),
                'max': float(df['depth'].max()),
                'mean': float(df['depth'].mean())
            },
            'saved_at': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
        }

        with open(summary_path, 'w') as f:
            json.dump(summary, f, indent=4)

        # Update metadata
        self.metadata['data_dates'].append({
            'date': date,
            'filepath': filepath,
            'summary_path': summary_path,
            'stats': summary
        })
        self._save_metadata(verbose=True)

        # Update pipeline date tracking
        self.model_dates['latest_data_date'] = date

        print(f"Data saved: {filepath}")
        print(f"Summary saved: {summary_path}")

    def load_daily_data(self, date_str):
        """
        Load earthquake data for a specific date from saved files.

        Args:
            date_str (str): Date in 'YYYY-MM-DD' format

        Returns:
            pandas.DataFrame: DataFrame containing earthquake data for the specified date
        """
        try:
            # Construct the file path
            year_month = date_str[:7]  # YYYY-MM
            filename = f'earthquake_data_{date_str}.csv'
            filepath = os.path.join(self.dirs['data'], year_month, filename)

            if os.path.exists(filepath):
                # Load the data
                df = pd.read_csv(filepath)

                # Convert time column back to datetime
                df['time'] = pd.to_datetime(df['time'])

                print(f"Loaded data for {date_str}: {len(df)} earthquakes")
                return df
            else:
                print(f"No data file found for {date_str}")
                return None

        except Exception as e:
            print(f"Error loading data for {date_str}: {str(e)}")
            return None

    def save_predictions(self, predictions, prediction_date, actual_data=None):
        """Save predictions and optionally actual data for comparison."""
        try:
            # Create prediction directory structure
            year_month = prediction_date[:7]  # YYYY-MM
            pred_dir = os.path.join(self.dirs['predictions'], year_month)
            os.makedirs(pred_dir, exist_ok=True)

            # Convert predictions dict to DataFrame
            pred_df = pd.DataFrame([{
                'date': prediction_date,
                'predicted_count': predictions['predicted_count'],
                'lower_bound': predictions['lower_bound'],
                'upper_bound': predictions['upper_bound']
            }])

            # Save predictions
            pred_filename = f'predictions_{prediction_date}.csv'
            pred_filepath = os.path.join(pred_dir, pred_filename)
            pred_df.to_csv(pred_filepath, index=False)
            print(f"\nPredictions saved to: {pred_filepath}")

            # Save comparison if actual data is available
            if actual_data is not None:
                actual_count = len(actual_data)
                comparison = {
                    'date': prediction_date,
                    'predicted_count': predictions['predicted_count'],
                    'actual_count': actual_count,
                    'prediction_error': abs(predictions['predicted_count'] - actual_count),
                    'within_bounds': (actual_count >= predictions['lower_bound'] and
                                    actual_count <= predictions['upper_bound']),
                    'timestamp': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
                }

                # Save comparison summary
                summary_filename = f'comparison_{prediction_date}.json'
                summary_filepath = os.path.join(pred_dir, summary_filename)
                with open(summary_filepath, 'w') as f:
                    json.dump(comparison, f, indent=4)
                print(f"Comparison summary saved to: {summary_filepath}")

                # Update performance history
                self.performance_history.append(comparison)

        except Exception as e:
            print(f"\nError saving predictions: {str(e)}")
            print(f"Prediction date: {prediction_date}")
            print(f"Number of predictions: {len(predictions)}")
            if actual_data is not None:
                print(f"Number of actual events: {len(actual_data)}")
            raise

    def load_predictions(self, prediction_date):
        """
        Load saved predictions for a specific date.

        Args:
            prediction_date (str): Date to load predictions for (YYYY-MM-DD)

        Returns:
            DataFrame: Loaded predictions, or None if not found
        """
        try:
            # Construct file path
            year_month = prediction_date[:7]  # YYYY-MM
            pred_dir = os.path.join(self.dirs['predictions'], year_month)
            pred_filepath = os.path.join(pred_dir, f'predictions_{prediction_date}.csv')

            if os.path.exists(pred_filepath):
                predictions = pd.read_csv(pred_filepath)
                print(f"\nLoaded predictions for {prediction_date}")
                print(f"Number of predictions: {len(predictions)}")
                return predictions
            else:
                print(f"\nNo predictions found for {prediction_date}")
                return None

        except Exception as e:
            print(f"\nError loading predictions: {str(e)}")
            print(f"Attempted to load from: {pred_filepath}")
            return None

    def plot_performance(self, actual_counts, predicted_counts, dates):
        plt.figure(figsize=(12, 6))
        plt.plot(dates, actual_counts, label='Actual', marker='o')
        plt.plot(dates, predicted_counts, label='Predicted', marker='x')
        plt.fill_between(dates,
                        [p*0.9 for p in predicted_counts],
                        [p*1.1 for p in predicted_counts],
                        alpha=0.2, label='10% Confidence Interval')
        plt.title('Earthquake Count Prediction Performance')
        plt.xlabel('Date')
        plt.ylabel('Number of Earthquakes')
        plt.legend()
        plt.grid(True)

    def save_visualization(self, start_date=None, end_date=None):
        """
        Generate and save visualization of model prediction performance.

        Args:
            start_date (datetime): Start date for visualization window
            end_date (datetime): End date for visualization window

        Creates:
            - Line plot comparing predicted vs actual counts
            - Confidence interval visualization
            - Error trend analysis
            - Performance metrics summary

        Saves:
            - PNG file with timestamp in plots directory
            - Performance metrics in evaluation directory

        Notes:
            - Uses seaborn for enhanced visualization
            - Automatically adjusts date range if not specified
            - Includes comprehensive performance metrics

        Example:
            >>> pipeline.save_visualization(
                    start_date=datetime(2024, 10, 1),
                    end_date=datetime(2024, 11, 1)
                )
        """
        if not self.performance_history:
            print("\n❌ No performance data available for visualization")
            return

        try:
            print("\n📈 Generating Performance Visualization")

            # Import required libraries
            import seaborn as sns
            import matplotlib.dates as mdates

            # Create plot directory if needed
            os.makedirs(self.dirs['plots'], exist_ok=True)

            # Convert performance history to DataFrame
            performance_df = pd.DataFrame(self.performance_history)
            performance_df['date'] = pd.to_datetime(performance_df['date'])

            # Set up the figure with better styling
            sns.set_style("whitegrid")
            plt.rcParams['figure.figsize'] = [15, 10]
            fig, (ax1, ax2) = plt.subplots(2, 1)

            # Format dates for x-axis
            locator = mdates.AutoDateLocator(minticks=5, maxticks=10)
            formatter = mdates.DateFormatter('%Y-%m-%d')

            # Plot predicted vs actual counts with seaborn color palette
            colors = sns.color_palette("deep")
            ax1.plot(performance_df['date'], performance_df['predicted_count'],
                    label='Predicted', marker='o', linestyle='-', markersize=6, color=colors[0])
            ax1.plot(performance_df['date'], performance_df['actual_count'],
                    label='Actual', marker='x', linestyle='-', markersize=6, color=colors[1])
            ax1.fill_between(performance_df['date'],
                            performance_df['predicted_count'] * 0.9,
                            performance_df['predicted_count'] * 1.1,
                            alpha=0.2, label='10% Confidence Interval', color=colors[0])

            # Configure first subplot
            ax1.set_title(f'Earthquake Count Prediction Performance\n(Last {len(performance_df)} Days)',
                        pad=20, fontsize=12)
            ax1.set_xlabel('Date', fontsize=10)
            ax1.set_ylabel('Number of Earthquakes', fontsize=10)
            ax1.legend(fontsize=10)
            ax1.xaxis.set_major_locator(locator)
            ax1.xaxis.set_major_formatter(formatter)
            ax1.tick_params(axis='x', rotation=45)

            # Plot prediction error
            sns.lineplot(data=performance_df, x='date', y='prediction_error',
                        marker='o', ax=ax2, color=colors[3], label='Prediction Error')

            # Configure second subplot
            ax2.set_title('Prediction Error Over Time', pad=20, fontsize=12)
            ax2.set_xlabel('Date', fontsize=10)
            ax2.set_ylabel('Absolute Error', fontsize=10)
            ax2.xaxis.set_major_locator(locator)
            ax2.xaxis.set_major_formatter(formatter)
            ax2.tick_params(axis='x', rotation=45)

            # Set date limits explicitly
            min_date = performance_df['date'].min() - pd.Timedelta(days=1)
            max_date = performance_df['date'].max() + pd.Timedelta(days=1)
            ax1.set_xlim(min_date, max_date)
            ax2.set_xlim(min_date, max_date)

            # Add grid to both plots
            ax1.grid(True, alpha=0.3)
            ax2.grid(True, alpha=0.3)

            # Adjust layout and save
            plt.tight_layout()
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            plot_path = os.path.join(
                self.dirs['plots'],
                f'performance_{len(performance_df)}days_{timestamp}.png'
            )
            plt.savefig(plot_path, bbox_inches='tight', dpi=300)
            plt.close()

            print(f"✅ Visualization saved: {plot_path}")

            # Print performance metrics
            metrics_summary = {
                'days_tracked': len(performance_df),
                'mean_error': float(performance_df['prediction_error'].mean()),
                'max_error': float(performance_df['prediction_error'].max()),
                'accuracy_within_bounds': float(performance_df['within_bounds'].mean() * 100)
            }

            print("\n📊 Performance Summary")
            print("-" * 40)
            print(f"Days Tracked:      {metrics_summary['days_tracked']}")
            print(f"Mean Error:        {metrics_summary['mean_error']:.1f} events")
            print(f"Max Error:         {metrics_summary['max_error']:.1f} events")
            print(f"Within Bounds:     {metrics_summary['accuracy_within_bounds']:.1f}%")
            print("-" * 40)

        except Exception as e:
            print(f"\n❌ Error creating visualization: {str(e)}")
            import traceback
            print(traceback.format_exc())

    def run_baseline_training(self, days_to_process=31):
        """
        Execute baseline model training using historical earthquake data.

        Args:
            days_to_process (int): Number of days of historical data to process (default: 31)

        Process:
            1. Fetches historical data for specified period
            2. Processes data into daily sequences
            3. Trains initial model on historical patterns
            4. Generates and evaluates predictions
            5. Creates performance visualizations
            6. Saves model checkpoints and metrics

        Returns:
            bool: True if training completed successfully

        Example:
            >>> success = pipeline.run_baseline_training(days_to_process=60)
            >>> if success:
            >>>     print("Baseline training completed successfully")
        """
        print("\n🚀 Initializing Baseline Training Pipeline")
        print("=" * 60)

        end_date = datetime.now() - timedelta(days=1)
        start_date = end_date - timedelta(days=days_to_process)

        print(f"\n📅 Processing Range: {start_date.date()} to {end_date.date()}")
        print("-" * 60)

        current_date = start_date
        all_data = []

        while current_date <= end_date:
            print(f"\n📅 Processing Date: {current_date.date()}")

            # Get data for current day
            data = self.fetch_earthquake_data(
                start_time=current_date,
                end_time=current_date + timedelta(days=1)
            )

            if data is not None:
                # Save the daily data
                self.save_daily_data(data, current_date.strftime('%Y-%m-%d'))
                all_data.append(data)

                if len(all_data) >= 2:  # Need at least 2 days to train/predict
                    # Train/optimize on available data
                    combined_data = pd.concat(all_data[:-1])  # Use all but last day
                    sequence_tensor, target_tensor = self.prepare_data(combined_data)

                    if sequence_tensor is not None and target_tensor is not None:
                        print("\n🔄 Training/Optimizing Model")
                        self.train_model(sequence_tensor, target_tensor, epochs=50)

                        # Generate prediction for the last day
                        next_date = current_date + timedelta(days=1)
                        prediction = self.predict_next_day(combined_data)
                        if prediction is not None:
                            print("\n🔮 Evaluating Prediction")
                            metrics = self.evaluate_predictions(prediction, all_data[-1], next_date.date())

                            # Save prediction and visualization
                            self.save_predictions(
                                prediction,
                                next_date.strftime('%Y-%m-%d')
                            )
                            self.save_visualization(start_date, current_date)

            current_date += timedelta(days=1)
            print("-" * 60)

        print("\n✅ Baseline Training Completed")
        print("=" * 60)

        # Generate final performance visualization
        if len(all_data) > 0:
            print("\n📊 Final Performance Summary")
            self.save_visualization(start_date, end_date)

        return True

    def run_continuous_monitoring(self, update_interval=3600):
        """
        Run continuous monitoring and prediction pipeline.

        Args:
            update_interval (int): Seconds between updates (default: 3600 for hourly)

        Process:
            1. Continuously fetches new earthquake data
            2. Generates predictions for next period
            3. Evaluates predictions against actual data
            4. Optimizes model based on performance
            5. Updates visualizations and metrics
            6. Saves updated model checkpoints

        Notes:
            - Runs indefinitely until interrupted
            - Handles API timeouts and errors
            - Maintains continuous performance logs
            - Automatic model optimization

        Example:
            >>> try:
            >>>     pipeline.run_continuous_monitoring(update_interval=7200)  # 2-hour intervals
            >>> except KeyboardInterrupt:
            >>>     print("Monitoring stopped by user")
        """
        try:
            print("\n🔄 Starting Continuous Monitoring")
            print("=" * 60)
            print(f"Update Interval: {update_interval} seconds")

            while True:
                current_time = datetime.now()
                process_date = current_time - timedelta(days=1)

                print(f"\n📅 Processing Data for: {process_date.date()}")
                print("-" * 60)

                # Fetch yesterday's data
                data = self.fetch_earthquake_data(
                    start_time=process_date,
                    end_time=current_time
                )

                if data is not None:
                    self.save_daily_data(data, process_date.strftime('%Y-%m-%d'))

                    prediction = self.predict_next_day(data)
                    if prediction is not None:
                        self.save_predictions(
                            prediction,
                            current_time.strftime('%Y-%m-%d')
                        )

                        actual_data = self.fetch_earthquake_data(
                            start_time=current_time.replace(hour=0, minute=0, second=0),
                            end_time=current_time
                        )

                        if actual_data is not None:
                            metrics = self.evaluate_predictions(
                                prediction,
                                actual_data,
                                current_time.date()
                            )
                            self.optimize_model(actual_data, metrics)

                            self.save_model_checkpoint(
                                epoch=None,
                                loss=metrics.get('relative_error', 0),
                                metrics=metrics
                            )

                            vis_start = current_time - timedelta(days=30)
                            self.save_visualization(vis_start, current_time)

                next_update = datetime.now() + timedelta(seconds=update_interval)
                print(f"\n⏰ Next Update: {next_update.strftime('%Y-%m-%d %H:%M:%S')}")
                print("=" * 60)
                time.sleep(update_interval)

        except KeyboardInterrupt:
            print("\n👋 Monitoring stopped by user")
        except Exception as e:
            print(f"\n❌ Monitoring error: {str(e)}")
            raise

## Run Code

In [67]:
# First, mount Google Drive (if using Colab)
from google.colab import drive
drive.mount('/content/drive')

# Set up base directory in Google Drive
base_path = '/content/drive/My Drive/earthquake_data'

# Initialize the pipeline
pipeline = EarthquakePipeline(drive_path=base_path)

# Run historical processing then switch to continuous
pipeline.run_baseline_training(days_to_process=31)

# OR just run continuous monitoring with existing model
# pipeline.run_continuous_monitoring(update_interval=3600)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Added missing metadata field: creation_date
Added missing metadata field: data_dates
Added missing metadata field: model_versions
Added missing metadata field: predictions
Added missing metadata field: evaluations
Added missing metadata field: pipeline_config

Processing metadata for saving...

🔄 Starting Continuous Monitoring
Update Interval: 3600 seconds

📅 Processing Data for: 2024-11-18
------------------------------------------------------------
Fetching data from 2024-11-18 to 2024-11-20

Data Collection Summary:
------------------------------
Total earthquakes collected: 46
Date range: 2024-11-18 00:00:44.709000 to 2024-11-19 18:48:09.220000
Magnitude range: 2.5 to 5.6
------------------------------

Processing metadata for saving...
Processing metadata for saving...
Metadata saved successfully to: /content/drive/My Drive/earthquake_data/pipeline_metad