# RISProcess
## Signal Processing Workflow Control

By William Jenkins
<br>wjenkins [at] ucsd [dot] edu
<br>Scripps Institution of Oceanography, UC San Diego
<br>January 2021

**Introduction & Considerations**
<br>This notebook provides a general framework for how to use RISCluster.  Within the notebook, you can configure the workflow parameters and save the configuration files to disk.  These files are required to run the RISCluster processing scripts.  Once the configuration files are saved to disk, copy the `script code` provided by the notebook, and paste them into Terminal.  Ensure that the appropriate Python environment is activated, and that the paths and working directories between the notebook and Terminal are consistent.



**Contents:**
<br>1. Download Data
<br>2. Pre-process Data
<br>3. Detect Events & Build Catalogue
<br>4. Build HDF Database from Catalogue</p>

In [1]:
from IPython.display import Markdown as md

from RISProcess.io import config

## 1 Download Data
Not yet implemented.

In [23]:
basepath = './outputs'
datapath = f"{basepath}/raw"
configpath = basepath

parameters = {
    'start': '20141201T0000',
    'stop': '20141203T0000',
    'mode': 'download',
    'datapath': datapath,
    'network': 'XH',
    'station': '*',
    'channel': 'HH*',
}
config_file = config('w', path=configpath, parameters=parameters)

## 2 Pre-process Data
In this workflow, raw seismic data is read from `datapath`, processed, and saved to `writepath` according to the file structure: `MSEED/Network/Station/Network.Station.Channel.Year.Yearday.mseed`

### 2.1 Set processing configuration parameters and create configuration file.

In [22]:
datapath = '/Volumes/RISData'
basepath = './outputs'
writepath = f"{basepath}/MSEED"
configpath = basepath

parameters = {
    'start': '20141201T0000',
    'stop': '20141203T0000',
    'mode': 'preprocess',
    'sourcepath': datapath,
    'writepath': writepath,
    'network': 'XH',
    'channel': 'HHZ',
    'taper': 60,
    'prefeed': 60,
    'fs2': 50,
    'cutoff': '3, 20',
    'output': 'acc',
    'prefilt': '0.004, 0.01, 500, 1000',
    'waterlevel': 14,
    'detector': 'z',
    'on': 8,
    'off': 4,
    'num_workers': 4
}
config_file = config('w', path=configpath, parameters=parameters)

### 2.2 Process data.
To execute the script, run the following code in terminal with the appropriate `conda` environment activated:

In [18]:
md(f"`process {config_file}`")

`process /Users/williamjenkins/Research/Workflows/RIS_Seismic_Processing/Source/RISProcess/outputs/config_preprocess.ini`

## 3 Detect Events & Build Catalogue
In this workflow, raw seismic data in `datapath` is processed in 24-hour segments, and an event detection algorithm is applied.  The results of the event detector are compiled into a catalogue that is saved to disk at `writepath`.  This catalogue serves as a useful pointer for follow-on processing of events of interest, rather than continuous data.

### 3.1 Set processing configuration parameters and create configuration file.

In [2]:
datapath = '/Volumes/RISData'
basepath = './outputs'
writepath = basepath
configpath = basepath

parameters = {
    'start': '20141201T0000',
    'stop': '20141203T0000',
    'mode': 'detect',
    'sourcepath': datapath,
    'writepath': writepath,
    'network': 'XH',
    'channel': 'HHZ',
    'taper': 60,
    'prefeed': 60,
    'fs2': 50,
    'cutoff': '3, 20',
    'output': 'acc',
    'prefilt': '0.004, 0.01, 500, 1000',
    'waterlevel': 14,
    'detector': 'z',
    'on': 8,
    'off': 4,
    'num_workers': 4
}
config_file = config('w', path=configpath, parameters=parameters)

### 3.2 Run script.
To execute the script, run the following code in terminal with the appropriate `conda` environment activated:

In [14]:
md(f"`process {config_file}`")

`process /Users/williamjenkins/Research/Workflows/RIS_Seismic_Processing/Source/RISProcess/outputs/config_cat2h5.ini`

### 3.3 Clean catalogue.
Remove duplicate detections, and if desired, detections that occur within a window (s) following an initial detection.

In [8]:
window = 10

md(f"`cleancat {basepath}/catalogue.csv --dest {basepath}/catalogue2.csv --window {window}`")

`cleancat ./outputs/catalogue.csv --dest ./outputs/catalogue2.csv --window 10`

## 4 Build HDF Database from Catalogue
In this workflow, a catalogue of detections at `catalogue` is used to process raw seismic data in `datapath`.  In addition to pre-processing, the traces, spectrograms, and metadata of the detections are saved to an HDF database located at `writepath`.  Because this workflow is implemented in parallel and results are returned asynchronously, a new catalogue is saved to `writepath.csv` that corresponds to the indexing within the HDF dataset.  The index within `writepath.csv` corresponds to the original catalogue at `catalogue`.

### 4.1 Set processing configuration parameters and create configuration file.

In [11]:
datapath = '/Volumes/RISData'
basepath = './outputs'
writepath = f"{basepath}/RISData.h5"
catalogue = f"{basepath}/catalogue2.csv"
configpath = basepath

parameters = {
    'start': '20141201T0000',
    'stop': '20141203T0000',
    'mode': 'cat2h5',
    'sourcepath': datapath,
    'writepath': writepath,
    'catalogue': catalogue,
    'network': 'XH',
    'channel': 'HHZ',
    'taper': 10,
    'prefeed': 10,
    'fs2': 50,
    'cutoff': '3, 20',
    'T_seg': 4,
    'NFFT': 256,
    'tpersnap': 0.4,
    'overlap': 0.9,
    'output': 'acc',
    'prefilt': '0.004, 0.01, 500, 1000',
    'waterlevel': 14,
    'detector': 'z',
    'on': 8,
    'off': 4,
    'det_window': 5,
    'num_workers': 4
}
config_file = config('w', path=configpath, parameters=parameters)

### 4.2 Run script.
To execute the script, run the following code in terminal with the appropriate `conda` environment activated:

In [13]:
md(f"`process {config_file}`")

`process /Users/williamjenkins/Research/Workflows/RIS_Seismic_Processing/Source/RISProcess/outputs/config_cat2h5.ini`