<a href="https://colab.research.google.com/github/3ligraphy/PredictingSolarEnergy/blob/master/EQTransformerviaGoogleColab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# EQTransformer via Google Colab
**Author:** [Hao Mai](https://github.com/maihao14)<br>
**Date created:** 2021/11/14<br>
**Last modified:** 2021/11/14<br>

## Description

EQTransformer is an AI-based earthquake signal detector and phase (P&S) picker based on a deep neural network with an attention mechanism. It has a hierarchical architecture specifically designed for earthquake signals. EQTransformer has been trained on global seismic data and can perform detection and arrival time picking simultaneously and efficiently. In addition to the prediction probabilities, it can also provide estimated model uncertainties.

The EQTransformer python 3 package includes modules for downloading continuous seismic data, preprocessing, performing earthquake signal detection, and phase (P & S) picking using pre-trained models, building and testing new models, and performing a simple phase association.

**Developer:** [S. Mostafa Mousavi](https://github.com/smousavi05/EQTransformer#Contributing) <br>

**Reference:**

Mousavi, S.M., Ellsworth, W.L., Zhu, W., Chuang, L, Y., and Beroza, G, C. Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat Commun 11, 3952 (2020). https://doi.org/10.1038/s41467-020-17591-w

## Installation from Source
The sources for EQTransformer can be downloaded from the [Github repo](https://github.com/smousavi05/EQTransformer).

### Prerequisite package: ObsPy
When ObsPy installed, restart the runtime.

Menu -> Runtime -> Restart Runtime

In [4]:
!pip install obspy



### Clone the public repository:



In [2]:
! git clone https://github.com/smousavi05/EQTransformer

Cloning into 'EQTransformer'...
remote: Enumerating objects: 2202, done.[K
remote: Counting objects: 100% (71/71), done.[K
remote: Compressing objects: 100% (37/37), done.[K
remote: Total 2202 (delta 37), reused 63 (delta 34), pack-reused 2131[K
Receiving objects: 100% (2202/2202), 51.32 MiB | 27.26 MiB/s, done.
Resolving deltas: 100% (1215/1215), done.


### Once you have a copy of the source, you can cd to EQTransformer directory

In [3]:
%cd /content/EQTransformer

/content/EQTransformer


### Rewrite `setup.py`  :
```
'numpy==1.20.3' -> 'numpy==1.19.2'
```

### Install

Need Restart Runtime again when this cell running is done.

Menu -> Runtime -> Restart Runtime

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [20]:
import pandas as pd
from obspy.core import Stream, Trace, UTCDateTime
import os

def create_daily_miniseeds(df, output_dir):
    # Check the first timestamp value to determine its range and adjust if necessary
    first_timestamp = df['Timestamp'].iloc[0]
    print(f"First timestamp raw value: {first_timestamp}")

    try:
        # Try converting assuming the timestamp is in milliseconds
        df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='ms')
    except Exception as e:
        # If out of bounds or other conversion issue, try converting without specifying the unit
        print(f"Timestamp conversion issue: {e}, trying alternative method...")
        df['Timestamp'] = pd.to_datetime(df['Timestamp'], unit='s')  # Assuming the timestamps are in seconds

    # Set the index to the timestamp for resampling
    df.set_index('Timestamp', inplace=True)

    # Resample the data into daily chunks
    daily_groups = df.resample('D')

    # Process each group
    for (day, group) in daily_groups:
        if not group.empty:
            stream = Stream([
                Trace(data=group['X'].values, header={'channel': 'HHX', 'starttime': UTCDateTime(day), 'sampling_rate': 100}),
                Trace(data=group['Y'].values, header={'channel': 'HHY', 'starttime': UTCDateTime(day), 'sampling_rate': 100}),
                Trace(data=group['Z'].values, header={'channel': 'HHZ', 'starttime': UTCDateTime(day), 'sampling_rate': 100})
            ])

            # Write to MiniSeed format
            filename = day.strftime('%Y%m%d') + '.mseed'
            stream.write(os.path.join(output_dir, filename), format='MSEED')

# Example usage
data_path = '/content/drive/MyDrive/testMob/data_1600_accel_phone.csv'
output_directory = 'path_to_output_directory'
data = pd.read_csv(data_path)
create_daily_miniseeds(data, output_directory)


First timestamp raw value: 252207666810782
Timestamp conversion issue: cannot convert input with unit 'ms', trying alternative method...


OutOfBoundsDatetime: cannot convert input with unit 's'

In [21]:
import pandas as pd

# Load the dataset
data_path = '/content/drive/MyDrive/testMob/data_1600_accel_phone.csv'  # Replace with the path to your data file
data = pd.read_csv(data_path)

# Display the first few rows of the dataframe, focusing on the timestamp column
print(data['Timestamp'].head())

# Check the range of the timestamps
print("Timestamp Info:")
print("Min timestamp value:", data['Timestamp'].min())
print("Max timestamp value:", data['Timestamp'].max())


0    252207666810782
1    252207717164786
2    252207767518790
3    252207817872794
4    252207868226798
Name: Timestamp, dtype: int64
Timestamp Info:
Min timestamp value: 241598622217012
Max timestamp value: 254351103296079


In [23]:
import pandas as pd
from datetime import datetime, timedelta
from obspy.core import Stream, Trace, UTCDateTime
import numpy as np
import os

def convert_to_datetime(timestamps, reference):
    """ Convert microseconds based on a reference to datetime objects. """
    return [datetime(1970, 1, 1) + timedelta(microseconds=int(ts - reference)) for ts in timestamps]

def create_miniseed_files(data, output_dir):
    """ Create MiniSeed files from the resampled data. """
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    for index, row in data.iterrows():
        # Ensure that each data point is an array even if it's a single value
        x = np.array([row['X']])
        y = np.array([row['Y']])
        z = np.array([row['Z']])

        # Create traces for each accelerometer axis
        tr_x = Trace(data=x, header={
            'network': 'XX',
            'station': 'MOB1',
            'location': '',
            'channel': 'HHX',
            'starttime': UTCDateTime(index),
            'sampling_rate': 100  # Adjust this as per your actual data's sampling rate
        })
        tr_y = Trace(data=y, header={
            'network': 'XX',
            'station': 'MOB1',
            'location': '',
            'channel': 'HHY',
            'starttime': UTCDateTime(index),
            'sampling_rate': 100
        })
        tr_z = Trace(data=z, header={
            'network': 'XX',
            'station': 'MOB1',
            'location': '',
            'channel': 'HHZ',
            'starttime': UTCDateTime(index),
            'sampling_rate': 100
        })

        # Create a stream and write to a MiniSeed file
        st = Stream(traces=[tr_x, tr_y, tr_z])
        filename = os.path.join(output_dir, f"{index.strftime('%Y%m%d')}.mseed")
        st.write(filename, format='MSEED')

def main():
    data_path = '/content/drive/MyDrive/testMob/data_1600_accel_phone.csv'  # Replace with the path to your data file
    output_directory = 'path_to_output_directory'

    # Load data
    data = pd.read_csv(data_path)

    # Convert timestamps to datetime objects
    reference_timestamp = data['Timestamp'].min() - 1000000000000  # Subtraction to avoid overflow
    data['Datetime'] = convert_to_datetime(data['Timestamp'], reference_timestamp)

    # Set datetime as index
    data.set_index('Datetime', inplace=True)
    data.drop(columns=['Timestamp'], inplace=True)  # Drop original timestamp column if no longer needed

    # Resample data to daily mean (you can change the aggregation method)
    daily_data = data.resample('D').mean()

    # Create MiniSeed files
    create_miniseed_files(daily_data, output_directory)

    print("Data processing complete. MiniSeed files created.")

if __name__ == "__main__":
    main()


Data processing complete. MiniSeed files created.


In [37]:
from EQTransformer.core.mseed_predictor import mseed_predictor
import os

def predict_earthquakes(input_directory, model_path, output_directory):
    """
    Predicts earthquakes using the EQTransformer model from MiniSeed files.

    Args:
    input_directory (str): Path to the directory containing MiniSeed files.
    model_path (str): Path to the pre-trained EQTransformer model.
    output_directory (str): Directory to save the detection outputs.
    """
    # Ensure the output directory exists
    os.makedirs(output_directory, exist_ok=True)

    # Perform detection using EQTransformer
    mseed_predictor(input_dir=input_directory,
                    input_model=model_path,
                    output_dir=output_directory,
                stations_json='json/station_list.json',
                detection_threshold=0.2,
                P_threshold=0.1,
                S_threshold=0.1,
                number_of_plots=10,
                plot_mode='time_frequency',
                batch_size=500,
                overlap=0.3)

# Define paths
input_dir = '/content/EQTransformer/downloads_mseeds'
model_path = '/content/EQTransformer/ModelsAndSampleData/EqT_original_model.h5' # You need to have a pre-trained EQT model file
output_dir = '/content/EQTransformer/ModelsAndSampleData/detecteddd'

# Run the prediction function
predict_earthquakes(input_dir, model_path, output_dir)

print("Earthquake detection completed and results are saved.")


 --> Type (Yes or y) to create a new empty directory! This will erase your previous results so make a copy if you want them.y


TypeError: Dimensions of C (41, 151) should be one smaller than X(151) and Y(41) while using shading='flat' see help(pcolormesh)

In [5]:
!pip install -e .

Obtaining file:///content/EQTransformer
  Preparing metadata (setup.py) ... [?25l[?25hdone
Installing collected packages: EQTransformer
  Attempting uninstall: EQTransformer
    Found existing installation: EQTransformer 0.1.61
    Uninstalling EQTransformer-0.1.61:
      Successfully uninstalled EQTransformer-0.1.61
  Running setup.py develop for EQTransformer
Successfully installed EQTransformer


## Downloading Continuous Data
The following will download the information on the stations that are available based on your search criteria:



In [15]:
import os
json_basepath = os.path.join(os.getcwd(),"json/station_list.json")

from EQTransformer.utils.downloader import makeStationList

makeStationList(json_path=json_basepath, client_list=["SCEDC"], min_lat=35.50, max_lat=35.60, min_lon=-117.80, max_lon=-117.40, start_time="2019-09-01 00:00:00.00", end_time="2019-09-03 00:00:00.00", channel_list=["HH[ZNE]", "HH[Z21]", "BH[ZNE]"], filter_network=["SY"], filter_station=[])

GS--CA06
GS--CA10
PB--B921
ZY--SV08


The above function will generate station_list.json file containing the station information. Next, you can use this file and download 1 day of data for the available stations at Ridgecrest, California from Southern California Earthquake Data Center or IRIS using the following:

In [17]:
from EQTransformer.utils.downloader import downloadMseeds
downloadMseeds(client_list=["SCEDC", "IRIS"], stations_json=json_basepath, output_dir="downloads_mseeds", min_lat=35.50, max_lat=35.60, min_lon=-117.80, max_lon=-117.40, start_time="2019-09-01 00:00:00.00", end_time="2019-09-03 00:00:00.00", chunk_size=1, channel_list=[], n_processor=2)

[2024-05-04 22:51:25,600] - obspy.clients.fdsn.mass_downloader - INFO: Initializing FDSN client(s) for SCEDC, IRIS.
[2024-05-04 22:51:25,600] - obspy.clients.fdsn.mass_downloader - INFO: Initializing FDSN client(s) for SCEDC, IRIS.


####### There are 4 stations in the list. #######


[2024-05-04 22:51:25,894] - obspy.clients.fdsn.mass_downloader - INFO: Successfully initialized 2 client(s): SCEDC, IRIS.
[2024-05-04 22:51:25,894] - obspy.clients.fdsn.mass_downloader - INFO: Successfully initialized 2 client(s): SCEDC, IRIS.
[2024-05-04 22:51:25,904] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:51:25,904] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:51:25,905] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:51:25,905] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:51:25,909] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:51:25,909] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:51:25,910] - obspy.clients.fdsn.mass_down



[2024-05-04 22:51:26,268] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.35 seconds)
[2024-05-04 22:51:26,268] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.35 seconds)
[2024-05-04 22:51:26,270] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.36 seconds)
[2024-05-04 22:51:26,270] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.36 seconds)
[2024-05-04 22:51:26,280] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 0 stations (0 channels).
[2024-05-04 22:51:26,287] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-05-04 22:51:26,280] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 0 stations (0 channels).
[2024-05-04 22:51:26,287] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' -

** done with --> CA10 -- GS -- 2019-09-01


[2024-05-04 22:51:30,480] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-05-04 22:51:30,480] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-05-04 22:51:30,580] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-05-04 22:51:30,580] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-05-04 22:51:30,602] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-05-04 22:51:30,602] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully downloaded 1 channels (of 1)
[2024-05-04 22:51:30,608] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Launching basic QC checks...
[2024-05-04 22:51:30,608] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Launching basic

** done with --> CA06 -- GS -- 2019-09-01


[2024-05-04 22:51:55,501] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:51:55,501] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:51:55,506] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:51:55,506] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:51:55,836] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.33 seconds)
[2024-05-04 22:51:55,836] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.33 seconds)
[2024-05-04 22:51:55,843] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 0 stations (0 channels).
[2024-05-04 22:51:55,843] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 0 stations (0 channels).
[2024-05-0

** done with --> CA10 -- GS -- 2019-09-02


[2024-05-04 22:52:02,877] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:52:02,877] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:52:02,880] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:52:02,880] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:52:03,137] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.25 seconds)
[2024-05-04 22:52:03,137] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.25 seconds)
[2024-05-04 22:52:03,142] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-05-04 22:52:03,142] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-05-0

** done with --> CA06 -- GS -- 2019-09-02


[2024-05-04 22:52:23,184] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:52:23,184] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:52:23,188] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:52:23,188] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.




[2024-05-04 22:52:23,451] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.26 seconds)
[2024-05-04 22:52:23,451] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.26 seconds)
[2024-05-04 22:52:23,485] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-05-04 22:52:23,485] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-05-04 22:52:23,490] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Will attempt to download data from 1 stations.
[2024-05-04 22:52:23,490] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Will attempt to download data from 1 stations.
[2024-05-04 22:52:23,494] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Status for 3 time intervals/channels before downloading: NEEDS_DOWNLOADING
[2024-05-04 22:52:23,494] - obspy.clients.fdsn.mass

** done with --> B921 -- PB -- 2019-09-01


[2024-05-04 22:52:34,534] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:52:34,534] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:52:34,537] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:52:34,537] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.




[2024-05-04 22:52:34,793] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.25 seconds)
[2024-05-04 22:52:34,793] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.25 seconds)
[2024-05-04 22:52:34,797] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-05-04 22:52:34,797] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-05-04 22:52:34,800] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Will attempt to download data from 1 stations.
[2024-05-04 22:52:34,800] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Will attempt to download data from 1 stations.
[2024-05-04 22:52:34,805] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Status for 3 time intervals/channels before downloading: NEEDS_DOWNLOADING
[2024-05-04 22:52:34,805] - obspy.clients.fdsn.mass

** done with --> SV08 -- ZY -- 2019-09-01


[2024-05-04 22:52:56,329] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:52:56,329] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:52:56,332] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:52:56,332] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:52:56,621] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.29 seconds)
[2024-05-04 22:52:56,621] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.29 seconds)
[2024-05-04 22:52:56,672] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-05-04 22:52:56,672] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-05-0

** done with --> B921 -- PB -- 2019-09-02


[2024-05-04 22:53:10,069] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:53:10,069] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0
[2024-05-04 22:53:10,073] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:53:10,073] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Requesting unreliable availability.
[2024-05-04 22:53:10,333] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.26 seconds)
[2024-05-04 22:53:10,333] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Successfully requested availability (0.26 seconds)
[2024-05-04 22:53:10,343] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-05-04 22:53:10,343] - obspy.clients.fdsn.mass_downloader - INFO: Client 'SCEDC' - Found 1 stations (3 channels).
[2024-05-0

** done with --> SV08 -- ZY -- 2019-09-02


# Detection and Picking
To perform detection & picking you need a pre-trained model of EQTransformer which you can get from folder: `EQTransformer/ModelsAndSampleData/`.


EQTransformer provides two different option for performing the detection & picking on the continuous data:
## Option (I) using pre-processed data (hdf5 files):
This option is recommended for smaller periods (a few days to a month). This allows you to test the performance and explore the effects of different parameters while the provided hdf5 file makes it easy to access the waveforms.

For this option, you first need to convert your MiniSeed files for each station into 1-min long Numpy arrays in a single hdf5 file and generated a CSV file containing the list of traces in the hdf5 file. You can do this using the following command:

In [None]:
from EQTransformer.utils.hdf5_maker import preprocessor

preprocessor(preproc_dir="preproc", mseed_dir='downloads_mseeds', stations_json=json_basepath, overlap=0.3, n_processor=2)

  * CA06 (1) .. 20190901 --> 20190902 .. 3 components .. sampling rate: 100.0
  * B921 (1) .. 20190901 --> 20190902 .. 3 components .. sampling rate: 100.0
  * B921 (2) .. 20190902 --> 20190903 .. 3 components .. sampling rate: 100.0
  * CA06 (2) .. 20190902 --> 20190903 .. 3 components .. sampling rate: 100.0
 Station CA06 had 2 chuncks of data
4112 slices were written, 4114.0 were expected.
Number of 1-components: 0. Number of 2-components: 0. Number of 3-components: 2.
Original samplieng rate: 100.0.
  * SV08 (1) .. 20190901 --> 20190902 .. 3 components .. sampling rate: 100.0 Station B921 had 2 chuncks of data

4112 slices were written, 4114.0 were expected.
Number of 1-components: 0. Number of 2-components: 0. Number of 3-components: 2.
Original samplieng rate: 100.0.
  * SV08 (2) .. 20190902 --> 20190903 .. 3 components .. sampling rate: 100.0
 Station SV08 had 2 chuncks of data
4112 slices were written, 4114.0 were expected.
Number of 1-components: 0. Number of 2-components: 0. 

In [None]:
from EQTransformer.core.predictor import predictor

predictor(input_dir= 'downloads_mseeds_processed_hdfs', input_model='EQTransformer/ModelsAndSampleData/EqT_model.h5', output_dir='detections', detection_threshold=0.3, P_threshold=0.1, S_threshold=0.1, number_of_plots=100, plot_mode='time')

Running EqTransformer  0.1.61
 *** Loading the model ...
*** Loading is complete!
 *** /content/detections already exists!
 --> Type (Yes or y) to create a new empty directory! otherwise it will overwrite!   y
######### There are files for 3 stations in downloads_mseeds_processed_hdfs directory. #########



  0%|                                                                         | 0/9 [00:00<?, ?it/s][A[A[A


 22%|██████████████▍                                                  | 2/9 [01:03<03:41, 31.63s/it][A[A[A


 33%|█████████████████████▋                                           | 3/9 [01:15<02:34, 25.68s/it][A[A[A


 44%|████████████████████████████▉                                    | 4/9 [01:26<01:47, 21.51s/it][A[A[A


 56%|████████████████████████████████████                             | 5/9 [01:38<01:14, 18.63s/it][A[A[A


 67%|███████████████████████████████████████████▎                     | 6/9 [01:50<00:49, 16.52s/it][A[A[A


 78%|█████████████

<Figure size 576x432 with 0 Axes>

This will generate one `station_name.hdf5` and one `station_name.csv` file for each of your station’s data and put them into a directory named mseed_dir+_hdfs. Then you need to pass the name of this directory (which contains all of your hdf5 & CSV files) and a model to the following command:

In [None]:
from EQTransformer.core.mseed_predictor import mseed_predictor

mseed_predictor(input_dir='downloads_mseeds',
                input_model='EQTransformer/ModelsAndSampleData/EqT_model.h5',
                stations_json='json/station_list.json',
                output_dir='detection_results',
                detection_threshold=0.2,
                P_threshold=0.1,
                S_threshold=0.1,
                number_of_plots=10,
                plot_mode='time_frequency',
                batch_size=500,
                overlap=0.3)

11-14 23:41 [INFO] [EQTransformer] Running EqTransformer  0.1.61
11-14 23:41 [INFO] [EQTransformer] *** Loading the model ...
11-14 23:41 [INFO] [EQTransformer] *** Loading is complete!
11-14 23:41 [INFO] [EQTransformer] *** /content/detection_results already exists!


 --> Type (Yes or y) to create a new empty directory! This will erase your previous results so make a copy if you want them.y


11-15 01:25 [INFO] [EQTransformer] There are files for 3 stations in downloads_mseeds directory.
11-15 01:25 [INFO] [EQTransformer] Started working on B921, 1 out of 3 ...
11-15 01:25 [INFO] [EQTransformer] 20190901T000000Z__20190902T000000Z.mseed
11-15 01:26 [DEBUG] [matplotlib.font_manager] findfont: Matching serif:style=normal:variant=normal:weight=normal:stretch=condensed:size=12.0.
11-15 01:26 [DEBUG] [matplotlib.font_manager] findfont: score(<Font 'DejaVu Sans Mono' (DejaVuSansMono-Bold.ttf) normal normal 700 normal>) = 10.535
11-15 01:26 [DEBUG] [matplotlib.font_manager] findfont: score(<Font 'DejaVu Sans Display' (DejaVuSansDisplay.ttf) normal normal 400 normal>) = 10.25
11-15 01:26 [DEBUG] [matplotlib.font_manager] findfont: score(<Font 'DejaVu Sans' (DejaVuSans.ttf) normal normal 400 normal>) = 10.25
11-15 01:26 [DEBUG] [matplotlib.font_manager] findfont: score(<Font 'DejaVu Serif' (DejaVuSerif-Italic.ttf) italic normal 400 normal>) = 1.25
11-15 01:26 [DEBUG] [matplotlib.font





11-15 01:27 [INFO] [EQTransformer] Finished the prediction in: 0 hours and 1 minutes and 51.34 seconds.
11-15 01:27 [INFO] [EQTransformer] *** Detected: 2926 events.
11-15 01:27 [INFO] [EQTransformer]  *** Wrote the results into --> " /content/detection_results/B921_outputs "
11-15 01:27 [INFO] [EQTransformer] Started working on CA06, 2 out of 3 ...
11-15 01:27 [INFO] [EQTransformer] 20190901T000000Z__20190902T000000Z.mseed
11-15 01:27 [INFO] [EQTransformer] 20190902T000000Z__20190903T000000Z.mseed






11-15 01:28 [INFO] [EQTransformer] Finished the prediction in: 0 hours and 1 minutes and 25.2 seconds.
11-15 01:28 [INFO] [EQTransformer] *** Detected: 2879 events.
11-15 01:28 [INFO] [EQTransformer]  *** Wrote the results into --> " /content/detection_results/CA06_outputs "
11-15 01:28 [INFO] [EQTransformer] Started working on SV08, 3 out of 3 ...
11-15 01:28 [INFO] [EQTransformer] 20190901T000000Z__20190902T000000Z.mseed
11-15 01:29 [INFO] [EQTransformer] 20190902T000000Z__20190903T000000Z.mseed






11-15 01:29 [INFO] [EQTransformer] Finished the prediction in: 0 hours and 1 minutes and 22.6 seconds.
11-15 01:29 [INFO] [EQTransformer] *** Detected: 1648 events.
11-15 01:29 [INFO] [EQTransformer]  *** Wrote the results into --> " /content/detection_results/SV08_outputs "


<Figure size 720x720 with 0 Axes>

## Option (II) directly from mseed files:
You can perform the detection & phase picking directly on downloaded MiniSeed files. This saves both preprocessing time and the extra space needed for the hdf5 file and is recommended for larger (longer) datasets. However, it can be more memory intensive. So it is better to have your MiniSeed fils being shorter than one month or so.

This option also does not allow you to estimate the uncertainties, save the prediction probabilities, or use the advantages of having hdf5 files which makes it easy to access the raw event waveforms based on detection results.

In [40]:
from EQTransformer.core.mseed_predictor import mseed_predictor

In [39]:

mseed_predictor(input_dir='downloads_mseeds',
                input_model='/content/EQTransformer/ModelsAndSampleData/EqT_original_model.h5',
                stations_json='json/station_list.json',
                output_dir='detection_results',
                detection_threshold=0.2,
                P_threshold=0.1,
                S_threshold=0.1,
                number_of_plots=10,
                plot_mode='time_frequency',
                batch_size=500,
                overlap=0.3)

 --> Type (Yes or y) to create a new empty directory! This will erase your previous results so make a copy if you want them.y


TypeError: Dimensions of C (41, 151) should be one smaller than X(151) and Y(41) while using shading='flat' see help(pcolormesh)

In [45]:
import json
import numpy as np
from obspy.core import Stream, Trace, UTCDateTime
import os

def read_json(json_path):
    """ Load JSON data from a file. """
    with open(json_path, 'r') as file:
        data = json.load(file)
    return data

def json_to_mseed(json_data, output_dir, sampling_rate=100):
    """ Convert JSON entries to MiniSEED format, segmenting into one-hour windows. """
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    # Initialize storage for time series data
    timeseries_data = {'X': [], 'Y': [], 'Z': []}
    timestamps = []

    for entry in json_data:
        # Collect all data points and their timestamps
        utc_time = UTCDateTime(entry['start_time'])
        timestamps.append(utc_time)
        for component in ['X', 'Y', 'Z']:
            timeseries_data[component].append(entry['data']['timeseries'][component])

    # Convert lists to numpy arrays
    for component in timeseries_data:
        timeseries_data[component] = np.array(timeseries_data[component], dtype=float)

    timestamps = np.array(timestamps)

    # Determine the range of timestamps and segment into one-hour intervals
    start_time = min(timestamps)
    end_time = max(timestamps)
    current_time = start_time

    while current_time <= end_time:
        next_time = current_time + 3600  # One hour later
        # Select indices for the current one-hour segment
        indices = (timestamps >= current_time) & (timestamps < next_time)

        if np.any(indices):  # Ensure 'indices' is used correctly with numpy.any()
            traces = []
            for component in ['X', 'Y', 'Z']:
                segment_data = timeseries_data[component][indices]
                if len(segment_data) > 0:
                    trace = Trace(data=segment_data)
                    trace.stats.station = "MOB1"
                    trace.stats.network = 'XX'
                    trace.stats.location = ''
                    trace.stats.channel = component
                    trace.stats.starttime = current_time
                    trace.stats.sampling_rate = sampling_rate
                    traces.append(trace)

            if traces:
                stream = Stream(traces=traces)
                filename = f"{current_time.strftime('%Y%m%dT%H%M%S')}.mseed"
                stream.write(os.path.join(output_dir, filename), format='MSEED')

        current_time = next_time

def main():
    json_path = '/content/EQTransformer/output2.json'
    output_dir = '/content/EQTransformer/sss'
    json_data = read_json(json_path)
    json_to_mseed(json_data, output_dir)
    print("Conversion to MiniSEED completed for each one-hour window.")

if __name__ == "__main__":
    main()


KeyboardInterrupt: 

In [None]:
import tensorflow as tf
from EQTransformer.core.mseed_predictor import mseed_predictor

# Enable eager execution
tf.config.run_functions_eagerly(True)

def run_eqtransformer(input_dir, output_dir, model_dir, stations_json):
    mseed_predictor(
        input_dir=input_dir,
        input_model=os.path.join(model_dir),
        stations_json=stations_json,
        output_dir=output_dir,
        detection_threshold=0.2,
        P_threshold=0.1,
        S_threshold=0.1,
        number_of_plots=0,
        plot_mode='time'
    )

def main():
    input_dir = '/content/EQTransformer/downloads_mseeds'
    output_dir = 'path_to_output_results1234'
    model_dir = '/content/EQTransformer/ModelsAndSampleData/EqT_original_model.h5'
    stations_json = '/content/EQTransformer/json/station_list.json'  # Ensure this is the correct path

    run_eqtransformer(input_dir, output_dir, model_dir, stations_json)
    print("EQTransformer processing completed.")

if __name__ == "__main__":
    main()


 --> Type (Yes or y) to create a new empty directory! This will erase your previous results so make a copy if you want them.Y






In [2]:
from EQTransformer.core.mseed_predictor import mseed_predictor

mseed_predictor(input_dir='/content/EQTransformer/test1',
                input_model='/content/EQTransformer/ModelsAndSampleData/EqT_original_model.h5',
                stations_json='/content/EQTransformer/stations.json',
                output_dir='path_to_output_results123456',
                detection_threshold=0.2,
                P_threshold=0.1,
                S_threshold=0.1,
                number_of_plots=10,
                plot_mode='time_frequency',
                batch_size=500,
                overlap=0.3)





ValueError: Unexpected result of `predict_function` (Empty batch_outputs). Please use `Model.compile(..., run_eagerly=True)`, or `tf.config.run_functions_eagerly(True)` for more information of where went wrong, or file a issue/bug to `tf.keras`.

In [5]:
import tensorflow as tf
from tensorflow.keras.models import load_model
try:
    from keras_self_attention import SeqSelfAttention
except ImportError:
    !pip install keras-self-attention
    from keras_self_attention import SeqSelfAttention

def load_custom_model(path):
    custom_objects = {'SeqSelfAttention': SeqSelfAttention}
    try:
        model = load_model(path, custom_objects=custom_objects)
        print("Model loaded successfully.")
        return model
    except Exception as e:
        print(f"An error occurred: {e}")

model = load_custom_model('/content/EQTransformer/ModelsAndSampleData/EqT_original_model.h5')


Collecting keras-self-attention
  Downloading keras-self-attention-0.51.0.tar.gz (11 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: keras-self-attention
  Building wheel for keras-self-attention (setup.py) ... [?25l[?25hdone
  Created wheel for keras-self-attention: filename=keras_self_attention-0.51.0-py3-none-any.whl size=18894 sha256=dba7e6ad79ad60af58f5696427788f9db642a0d4eb1aff87e2fb62fcce7fe4e7
  Stored in directory: /root/.cache/pip/wheels/b8/f7/24/607b483144fb9c47b4ba2c5fba6b68e54aeee2d5bf6c05302e
Successfully built keras-self-attention
Installing collected packages: keras-self-attention
Successfully installed keras-self-attention-0.51.0
An error occurred: Unknown layer: FeedForward. Please ensure this object is passed to the `custom_objects` argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details.


In [49]:
import os

def rename_files(directory):
    for filename in os.listdir(directory):
        if filename.endswith(".mseed"):
            parts = filename.split('T')
            date_part = parts[0]
            time_part = parts[1].split('.')[0]
            new_filename = f"NET_STA__LOC_CHAN__{date_part}{time_part}.mseed"
            os.rename(os.path.join(directory, filename), os.path.join(directory, new_filename))
            print(f"Renamed {filename} to {new_filename}")

directory = '/content/EQTransformer/test1/ttt'  # Update this path
rename_files(directory)


Renamed 19770902T144342.mseed to NET_STA__LOC_CHAN__19770902144342.mseed
Renamed 19770902T174342.mseed to NET_STA__LOC_CHAN__19770902174342.mseed
Renamed 19770902T164342.mseed to NET_STA__LOC_CHAN__19770902164342.mseed
Renamed 19770902T154342.mseed to NET_STA__LOC_CHAN__19770902154342.mseed


In [44]:
import pandas as pd
import json
from datetime import datetime

def load_data(filepath):
    """ Load data from a CSV file. """
    return pd.read_csv(filepath)

def preprocess_timestamps(df, timestamp_col='Timestamp'):
    """ Convert timestamps from UNIX time in microseconds to a readable format. """
    df[timestamp_col] = pd.to_datetime(df[timestamp_col], unit='us')
    return df

def create_stead_entry(df):
    """ Create STEAD-like entries from accelerometer data. """
    stead_entries = []
    for index, row in df.iterrows():
        entry = {
            "trace_id": f"MOB{index}",
            "station": {
                "station_code": "MOB1",
                "location": {
                    "latitude": 35.6895,  # Example latitude
                    "longitude": -117.8500,  # Example longitude
                    "elevation": 150.0  # Example elevation
                },
                "instrument": {
                    "type": "Accelerometer",
                    "band": "High-Gain"
                }
            },
            "sampling_rate": 100,  # Assume 100 Hz sampling rate
            "start_time": str(row['Timestamp']),
            "end_time": str(row['Timestamp'] + pd.Timedelta(seconds=1/100)),  # Assuming each row is a sample
            "event": {
                "origin_time": str(row['Timestamp']),
                "location": {
                    "latitude": 35.7051,  # Hypothetical location
                    "longitude": -117.7635,
                    "depth": 8.0
                },
                "magnitude": 4.5  # Hypothetical magnitude
            },
            "snr": 20.0,  # Hypothetical Signal-to-Noise Ratio
            "phases": {
                "P": str(row['Timestamp']),
                "S": str(row['Timestamp'] + pd.Timedelta(seconds=10))  # Hypothetical S-phase arrival
            },
            "data": {
                "timeseries": {
                    "X": row['X'],
                    "Y": row['Y'],
                    "Z": row['Z']
                }
            }
        }
        stead_entries.append(entry)
    return stead_entries

def save_to_json(data, filepath):
    """ Save data to a JSON file. """
    with open(filepath, 'w') as f:
        json.dump(data, f, indent=4)

def main():
    file_path = '/content/drive/MyDrive/testMob/data_1600_accel_phone.csv'  # Update this path
    output_path = 'output2.json'  # Output JSON file path

    # Load and process the data
    df = load_data(file_path)
    df = preprocess_timestamps(df)

    # Convert to STEAD format
    stead_data = create_stead_entry(df)

    # Save to JSON
    save_to_json(stead_data, output_path)
    print("Data converted and saved to JSON.")

if __name__ == "__main__":
    main()


Data converted and saved to JSON.
