# (II) Detection and Picking
This notebook demonstrates the use of EQTransformer for performing the earthquake signal detection and seismic phase (P & S) picking on continuous data. Once you have your seismic data - preferentially in mseed format and in individual subfolders for each station- you can perform the detection/picking using the following options:


### Option (I) on preprocessed (hdf5) files:

This option is recommended for smaller time periods (a few days to a month). This allows you to test the perfomance and explore the effects of different parameters while the provided hdf5 file makes it easy to access the waveforms.

For this option you first need to convert your MiniSeed files for each station into a single hdf5 file and a csv file containting the list of traces in the hdf5 file.

You can convert MiniSeed files to a hdf5 file using the following command:

In [1]:
import os
from EQTransformer.utils.hdf5_maker import preprocessor

json_basepath = os.path.join(os.getcwd(),"json/station_list.json")

preprocessor(preproc_dir="preproc",
             mseed_dir='downloads_mseeds', 
             stations_json=json_basepath, 
             overlap=0.3, 
             n_processor=2)

 Station SV08 had 0 chuncks of data
0 slices were written, 0 were expected.
Number of 1-components: 0. Number of 2-components: 0. Number of 3-components: 0.
  * CA06 (1) .. 20190901 --> 20190902 .. 3 components .. sampling rate: 100.0
  * B921 (1) .. 20190901 --> 20190902 .. 3 components .. sampling rate: 100.0
  * B921 (2) .. 20190902 --> 20190903 .. 3 components .. sampling rate: 100.0
  * CA06 (2) .. 20190902 --> 20190903 .. 3 components .. sampling rate: 100.0
 Station B921 had 2 chuncks of data
4112 slices were written, 4114.0 were expected.
Number of 1-components: 0. Number of 2-components: 0. Number of 3-components: 2.
Original samplieng rate: 100.0.
 Station CA06 had 2 chuncks of data
4112 slices were written, 4114.0 were expected.
Number of 1-components: 0. Number of 2-components: 0. Number of 3-components: 2.
Original samplieng rate: 100.0.


This will generate one "station_name.hdf5" and one "station_name.csv" file for each of your stations and put them into a directory named "mseed_dir+_hdfs". Then you need to pass the name of the directory containing your hdf5 & CSV files and a model. You can use relatively low threshold values for the detection and picking since EQTransformer is very robust to false positives. Enabling uncertaintiy estimation, outputing probabilities, or plotting all the detected events will slow down the process.

In [2]:
from EQTransformer.core.predictor import predictor
predictor(input_dir='downloads_mseeds_processed_hdfs',   
         input_model='../ModelsAndSampleData/EqT_original_model.h5',
         output_dir='detections1',
         estimate_uncertainty=False, 
         output_probabilities=False,
         number_of_sampling=5,
         loss_weights=[0.02, 0.40, 0.58],          
         detection_threshold=0.3,                
         P_threshold=0.3,
         S_threshold=0.3, 
         number_of_plots=10,
         plot_mode='time',
         batch_size=500,
         number_of_cpus=4,
         keepPS=False,
         spLimit=60) 

Running EqTransformer  0.1.61
 *** Loading the model ...


2023-09-25 12:53:43.917539: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


*** Loading is complete!
######### There are files for 3 stations in downloads_mseeds_processed_hdfs directory. #########
  0%|                                                                         | 0/9 [00:00<?, ?it/s]

2023-09-25 12:53:48.257511: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)


100%|█████████████████████████████████████████████████████████████████| 9/9 [03:58<00:00, 30.23s/it]

 *** Finished the prediction in: 0 hours and 4 minutes and 16.88 seconds.
 *** Detected: 5356 events.
 *** Wrote the results into --> " /Users/amirreza/Desktop/Desktop/Project/Final Project/EQTransformer/examples/detections1/B921_outputs "
100%|█████████████████████████████████████████████████████████████████| 9/9 [04:17<00:00, 28.56s/it]


 *** Finished the prediction in: 0 hours and 4 minutes and 20.7 seconds.
 *** Detected: 5448 events.
 *** Wrote the results into --> " /Users/amirreza/Desktop/Desktop/Project/Final Project/EQTransformer/examples/detections1/CA06_outputs "
100%|█████████████████████████████████████████████████████████████████| 9/9 [04:20<00:00, 28.97s/it]


 *** Finished the prediction in: 0 hours and 0 minutes and 0.04 seconds.
 *** Detected: 0 events.
 *** Wrote the results into --> " /Users/amirreza/Desktop/Desktop/Project/Final Project/EQTransformer/examples/dete

If you are using local MiniSeed files you can generate a station_list.json by supplying an absolute path to a directory containing Miniseed files and a station location dictionary using the stationListFromMseed function like the following:

Module didn't work

In [3]:
from EQTransformer.utils.hdf5_maker import stationListFromMseed

mseed_directory = '/Users/username/Downloads/EQTransformer/examples/downloads_mseeds'
station_locations = {"CA06": [35.59962, -117.49268, 796.4], "CA10": [35.56736, -117.667427, 835.9]}
stationListFromMseed(mseed_directory, station_locations)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/username/Downloads/EQTransformer/examples/downloads_mseeds'

### Option (II) directly on downloaded MiniSeed files:

You can perform the detection/picking directly on .mseed files. 
This save both prerpcessing time and the extra space needed for hdf5 file. However, it can be more memory intensive. So it is recommended when mseed fils are one month long or shorter.
This option also does not allow you to estimate the uncertainties, write the prediction probabilities, or use the advantages of having hdf5 files which makes it easy to access the raw event waveforms based on detection results.   

In [5]:
from EQTransformer.core.mseed_predictor import mseed_predictor
mseed_predictor(input_dir='downloads_mseeds',   
         input_model='../ModelsAndSampleData/EqT_original_model.h5',
         stations_json=json_basepath,
         output_dir='detections2',
         loss_weights=[0.02, 0.40, 0.58],          
         detection_threshold=0.7,                
         P_threshold=0.3,
         S_threshold=0.3, 
         number_of_plots=10,
         plot_mode='time_frequency',
         normalization_mode='std',
         batch_size=500,
         overlap=0.9,
         gpuid=None,
         gpu_limit=None) 

09-25 13:16 [INFO] [EQTransformer] Running EqTransformer  0.1.61
09-25 13:16 [INFO] [EQTransformer] *** Loading the model ...
09-25 13:16 [INFO] [EQTransformer] *** Loading is complete!
09-25 13:16 [INFO] [EQTransformer] *** /Users/amirreza/Desktop/Desktop/Project/Final Project/EQTransformer/examples/detections2 already exists!


Okay.


Prediction outputs for each station will be written in your output directory (i.e. 'detections').

'X_report.txt' contains processing info on input parameters used for the detection/picking and final 
results such as running time, the total number of detected events (these are unique events and duplicated ones have been already removed). 

'X_prediction_results.csv' contains detection/picking results in the figures folder you can find the plots for the number of events that you specified in the above comment.