# Pyaflowa

`Pyaflowa` can be used standalone, and also provides the necessary interface to work with SeisFlows3. When used within the SeisFlows3 preprocess module, Pyaflowa reduces the overhead required to include Pyatoa functionality into a SeisFlows3 inversion.

## Standalone (w/ SPECFEM3D)

Let's say you want to run forward and adjoint simulations to generate synthetic seismograms and sensitivity kernels using SPECFEM3D. To make this happen, a user will need some specific pieces that correspond to the problem at hand.

* Forward simulations: DATA/CMTSOLUTION, DATA/STATIONS  
* Adjoint simulations: DATA/STATIONS_ADJOINT, /sem/\*.adj 

We can use `Pyatoa` to prepare the necessary files for SPECFEM3D forward simulatinos, and then use `Pyaflowa` to: 1) compare synthetic seismograms with real data for a number of stations and generate adjoint sources which can be fed directly back into a SPECFEM3D adjoint simulation. 

### Prep: Generate CMTSOLUTION

First we must generate the source file used to represent the moment tensor in our earthquake simulation. In this example, we'll use my favorite New Zealand [example event, 2018p130600](https://www.geonet.org.nz/earthquake/2018p130600), an M5.2 that occurred in the central North Island, New Zealand while I was doing my PhD in Wellington.

With `ObsPy` + `Pyatoa` we'll be able to gather this event from FDSN webservices and save the resulting moment tensor and event information to the required [CMTSOLUTION format](https://www.globalcmt.org/CMTsearch.html) expected by SPECFEM3D. 

In [1]:
import os
from pyatoa import append_focal_mechanism
from obspy.clients.fdsn import Client

In [2]:
# We will just move to an empty directory for this example problem
print(os.getcwd())
docs_path = os.path.abspath("../tests/test_data/docs_data/pyaflowa_doc")
if not os.path.exists(docs_path):
    os.mkdir(docs_path)
os.chdir(docs_path)
print(os.getcwd())

/home/bchow/REPOSITORIES/pyatoa/pyatoa/docs
/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc


In [3]:
# Using ObsPy's FDSN Client, we can retrieve event information using an event id
c = Client("GEONET")
cat = c.get_events(eventid="2018p130600")

# Using Pyatoa, we can append a focal mechanism from the GeoNet moment tensor catalog
cat[0] = append_focal_mechanism(cat[0], client="GEONET")

# ObsPy has built in support for writing CMTSOLUTION files expected by SPECFEM3D
cat.write("CMTSOLUTION", format="CMTSOLUTION")



In [4]:
# Lets just have a look at the file that's been created, which is a CMTSOLUTION that is ready
# to be used in SPECFEM3D
!cat "CMTSOLUTION"

 PDE 2018 02 18 07 43 48.13  -39.9490  176.2995  20.6 5.2 5.2 NORTH ISLAND, NEW ZEALAND
event name:           81EE9F
time shift:           0.0000
half duration:        0.6989
latitude:           -39.9490
longitude:          176.2995
depth:               20.5946
Mrr:           -2.479380E+23
Mtt:            1.314880E+23
Mpp:            1.164500E+23
Mrt:            5.032500E+22
Mrp:            6.607700E+22
Mtp:            9.359300E+22


### Prep: Generate STATIONS file

SPECFEM3D also requires a STATIONS file which defines the locations of receivers for simulation output. 
As in Step 1 we'll generate a list of stations using `ObsPy` and write them into the required STATIONS file using `Pyatoa` and a corresponding obspy.Inventory object.

> **__NOTE:__**  
In the ObsPy function get_stations(), the reasoning behind the following arguments provided:
* __network = "NZ"__ refers to the code for New Zealand's permament seismic netnwork
* __station = "??Z"__ means we only want 3 letter station codes that end in Z, which GeoNet usually reserves for broadband seismometers
* __channel = "HH?"__ refers to a broadband (first 'H') seismometer (second 'H'), for any available component (wildcard '?'), usually N/E/Z. This follows [SEED naming convention](https://ds.iris.edu/ds/nodes/dmc/data/formats/seed-channel-naming/).
* This __min__ and __max latitude / longitude__ defines a small region where we want to search for stations

In [5]:
from pyatoa import write_stations

In [8]:
inv = c.get_stations(network="NZ", station="??Z", channel="HH?",
                     minlatitude=-41, maxlatitude=-39,
                     minlongitude=173, maxlongitude=176)
write_stations(inv, fid="STATIONS")

  version, ", ".join(READABLE_VERSIONS)))


In [9]:
# Let's have a look at the stations we picked up from FDSN
print(inv)

Inventory created at 2022-02-27T20:54:58.000000Z
	Created by: Delta
		    
	Sending institution: GeoNet (WEL(GNS_Test))
	Contains:
		Networks (1):
			NZ
		Stations (4):
			NZ.MRZ (Mangatainoka River)
			NZ.TSZ (Takapari Road)
			NZ.VRZ (Vera Road)
			NZ.WAZ (Wanganui)
		Channels (0):



In [10]:
# And lets have a look at the STATIONS file that's been created
!cat "STATIONS"

   MRZ    NZ    -40.6605    175.5785    0.0    0.0
   TSZ    NZ    -40.0586    175.9611    0.0    0.0
   VRZ    NZ    -39.1243    174.7585    0.0    0.0
   WAZ    NZ    -39.7546    174.9855    0.0    0.0


### Forward Simulation: Generate synthetics using SPECFEM3D [external]

Unfortunately this cannot be shown in a Jupyter notebook as generating synthetics requires interfacing with the SPECFEM3D code, which usually takes place on a cluster. In this example we assume this step has been completed successfully, with resultant synthetic waveforms produced by SPECFEM3D for the given event and stations defined above. 

> **__NOTE:__**  
Output synthetic seismograms are expected to be formatted as two-column ASCII files, which I have pre-generated for this example. File names follow the expected output from SPECFEM3D. Adherance to this format is very important for running Pyaflowa. 

> **__NOTE:__**
By default synthetic waveform data is expected to be separated by event ID, e.g., PATH/TO/SYNTHETICS/{EVENT_ID}/*semd

In [15]:
# Let's copy the premade synthetic data into our current working directory
!ls ../../synthetics
!mkdir -p synthetics
!cp -r ../../synthetics/201?p?????? ./synthetics

2012p242656  2018p130600  NZ.BFZ.BXE.semd  NZ.BFZ.BXN.semd  NZ.BFZ.BXZ.semd


In [18]:
!head ./synthetics/2018p130600/NZ.MRZ.BX?.semd

==> ./synthetics/2018p130600/NZ.MRZ.BXE.semd <==
  -20.0000000       0.00000000    
  -19.9850006       0.00000000    
  -19.9699993       0.00000000    
  -19.9549999       0.00000000    
  -19.9400005       0.00000000    
  -19.9249992       0.00000000    
  -19.9099998       0.00000000    
  -19.8950005       0.00000000    
  -19.8799992       0.00000000    
  -19.8649998       0.00000000    

==> ./synthetics/2018p130600/NZ.MRZ.BXN.semd <==
  -20.0000000       0.00000000    
  -19.9850006       0.00000000    
  -19.9699993       0.00000000    
  -19.9549999       0.00000000    
  -19.9400005       0.00000000    
  -19.9249992       0.00000000    
  -19.9099998       0.00000000    
  -19.8950005       0.00000000    
  -19.8799992       0.00000000    
  -19.8649998       0.00000000    

==> ./synthetics/2018p130600/NZ.MRZ.BXZ.semd <==
  -20.0000000       0.00000000    
  -19.9850006       0.00000000    
  -19.9699993       0.00000000    
  -19.9549999     

### Pyaflowa's directory structure

`Pyaflowa` abstracts away the enigmatic inner machinations of `Pyatoa`. To do so it manages an internal directory structure to search for inputs and store outputs.

When used standalone, `Pyaflowa` creates its own directory structure within a given working directory. When used in conjunction with SeisFlows3, `Pyaflowa` will work within the preset internal directory structure of SeisFlows3 (see Pyaflowa + SeisFlows3).

Let's start by initiating `Pyaflowa`. As with any usage of Pyatoa, a Config object is required to define internally used parameters which will inturn be used to control gathering, waveform processing, and misfit quantification.

In [19]:
from pyatoa import Pyaflowa, Config

In [20]:
cfg = Config(iteration=1, step_count=0, client="GEONET", min_period=10, max_period=30,
             pyflex_preset="nznorth_10-30s")

pf = Pyaflowa(structure="standalone", workdir="./", config=cfg)

In [22]:
# We can take a look at Pyaflowa's DEFAULT internal directory structure with the path_structure attribute
pf.path_structure

cwd          : './'
data         : './input/DATA'
datasets     : './datasets'
figures      : './figures'
logs         : './logs'
ds_file      : './datasets/{source_name}.h5'
stations_file: './{source_name}/STATIONS'
responses    : './input/responses'
waveforms    : './input/waveforms'
synthetics   : './input/synthetics/{source_name}'
adjsrcs      : './adjsrcs/{source_name}'
event_figures: './figures/{source_name}'

If you want to change the Pyaflowa directory structure from the default values shown above, you can simply pass the keys of the `path_structure` attribute as keyword arguments in the initialization of Pyaflowa. Let's generate a non-standard path structure to point to our existing data.

In [23]:
# Make sure we're in the correct directory so that we don't start making dir. randomly
assert os.path.basename(os.getcwd()) == "pyaflowa_doc"

# Custom set directory structure
base_path = os.getcwd()
kwargs = {"workdir": base_path,
          "synthetics": os.path.join(base_path, "synthetics", "{source_name}"),
          "stations_file": os.path.join(base_path, "STATIONS"),
          "data": base_path,
          "datasets": base_path,
         }

pf = Pyaflowa(structure="standalone", config=cfg, **kwargs)
pf.path_structure

cwd          : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
data         : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
datasets     : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
figures      : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/figures'
logs         : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/logs'
ds_file      : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/{source_name}.h5'
stations_file: '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/STATIONS'
responses    : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/responses'
waveforms    : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/waveforms'
synthetics   : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/doc

----------------------

### The IO (input/output) class

By running the Pyaflowa.setup() function, Pyaflowa will make the required directory structure defined above. It will also return an `IO` object. This internally used object store information related to paths, configurations and processing.

The user does __not__ need to interact with the `IO` object, but we can take a look at it for clarity. It contains the internal directory structure used by `Pyaflowa`, the `Config` object which will control all of the Manager processing that will take place, and internal attributes which keep track of how processing occurs.

In [24]:
io = pf.setup(source_name="2018p130600")

In [25]:
for key, val in io.items():
    print(key, val)

paths cwd          : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
data         : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
datasets     : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
figures      : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/figures'
logs         : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/logs'
ds_file      : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/2018p130600.h5'
stations_file: '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/STATIONS'
responses    : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/responses'
waveforms    : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/waveforms'
synthetics   : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data

-----------------------
### Running Pyaflowa (gather and process waveforms)

Great, we're all set up to run `Pyaflowa`. Internally `Pyaflowa` knows the event, path structure and stations that we want to use for misfit quantification. Now when we run it, `Pyaflowa` will instantiate `Manager` classes, attempt to gather data from disk or from web services, preprocess data and synthetic waveforms according to the `Config` object, and generate misfit windows and adjoint sources.

The example code block below is a an example of what Pyaflowa is doing under the hood: it simply abstracts commands that are used to run processing for multiple stations. It also contains a few internal checks to make sure unexpected errors don't throw the processing step off the rails.::

    from pyasdf import ASDFDataSet 
    from pyatoa import Manager

    with ASDFDataSet(io.paths.ds_file) as ds:
        mgmt = Manager(ds=ds, config=io.config)
        for code in ["NZ.BFZ.*.HH?"]:
            mgmt.gather(code=code)
            mgmt.flow()

In [28]:
pf.process_event(source_name="2018p130600", loc="*", cha="HH?")

[2022-02-28 12:37:27] - pyatoa - DEBUG: gathering event
[2022-02-28 12:37:27] - pyatoa - INFO: searching ASDFDataSet for event info
[2022-02-28 12:37:27] - pyatoa - INFO: searching local filesystem for event info
[2022-02-28 12:37:27] - pyatoa - DEBUG: searching for event data: /home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/CMTSOLUTION
[2022-02-28 12:37:27] - pyatoa - INFO: reading source using ObsPy: /home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/CMTSOLUTION
[2022-02-28 12:37:27] - pyatoa - INFO: retrieved local file:
/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/CMTSOLUTION
[2022-02-28 12:37:27] - pyatoa - DEBUG: event QuakeML added to ASDFDataSet
[2022-02-28 12:37:27] - pyatoa - INFO: 

NZ.MRZ.*.HH?

[2022-02-28 12:37:27] - pyatoa - DEBUG: gathering event
[2022-02-28 12:37:27] - pyatoa - INFO: searching ASDFDataSet for event info
[2022-02-28 12:37:27] - pyatoa - DEBUG: matching event found:

[2022-02-28 12:37:29,930] - pyflex - INFO: Rejection based on travel times retained 10 windows.
[2022-02-28 12:37:29,930] - pyflex - INFO: Global SNR checks passed. Integrated SNR: 13630877755876006.000000, Amplitude SNR: 439117916.987834
[2022-02-28 12:37:29,930] - pyflex - INFO: Rejection based on minimum window length retained 9 windows.
[2022-02-28 12:37:29,931] - pyflex - INFO: Water level rejection retained 3 windows
[2022-02-28 12:37:29,931] - pyflex - INFO: Single phase group rejection retained 3 windows
[2022-02-28 12:37:29,931] - pyflex - INFO: Removing duplicates retains 2 windows.
[2022-02-28 12:37:29,931] - pyflex - INFO: Rejection based on minimum window length retained 2 windows.
[2022-02-28 12:37:29,932] - pyflex - INFO: SN amplitude ratio window rejection retained 2 windows
[2022-02-28 12:37:29,939] - pyflex - INFO: Rejection based on data fit criteria retained 2 windows.
[2022-02-28 12:37:29,939] - pyflex - INFO: Weighted interval schedule optimization retained 1 wind

[2022-02-28 12:37:35] - pyatoa - DEBUG: searching for responses: /home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/responses/VRZ.NZ/RESP.NZ.VRZ.*.HH?
[2022-02-28 12:37:35] - pyatoa - DEBUG: querying client GEONET
  version, ", ".join(READABLE_VERSIONS)))
[2022-02-28 12:37:35] - pyatoa - INFO: matching StationXML found
[2022-02-28 12:37:35] - pyatoa - INFO: saved to ASDFDataSet
[2022-02-28 12:37:35] - pyatoa - INFO: gathering synthetic waveforms
[2022-02-28 12:37:35] - pyatoa - INFO: searching ASDFDataSet for synthetics
[2022-02-28 12:37:35] - pyatoa - INFO: searching local filesystem for synthetics
[2022-02-28 12:37:35] - pyatoa - DEBUG: searching for synthetics: /home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/synthetics/2018p130600/{net}.{sta}.*{cmp}.sem{dva}
[2022-02-28 12:37:35] - pyatoa - INFO: retrieved synthetics locally:
/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/synthetics/2018p13

[2022-02-28 12:37:37] - pyatoa - INFO: 

NZ.WAZ.*.HH?

[2022-02-28 12:37:37] - pyatoa - INFO: gathering data for NZ.WAZ.*.HH?
[2022-02-28 12:37:37] - pyatoa - INFO: gathering observed waveforms
[2022-02-28 12:37:37] - pyatoa - INFO: searching ASDFDataSet for observations
[2022-02-28 12:37:37] - pyatoa - INFO: searching local filesystem for observations
[2022-02-28 12:37:37] - pyatoa - DEBUG: searching for observations: /home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/waveforms/2018/NZ/WAZ/HH?/NZ.WAZ.*.HH?.2018.049
[2022-02-28 12:37:37] - pyatoa - DEBUG: querying client GEONET
[2022-02-28 12:37:39] - pyatoa - INFO: matching observed waveforms found
[2022-02-28 12:37:39] - pyatoa - INFO: saved to ASDFDataSet with tag 'observed'
[2022-02-28 12:37:39] - pyatoa - INFO: gathering StationXML
[2022-02-28 12:37:39] - pyatoa - INFO: searching ASDFDataSet for station info
[2022-02-28 12:37:39] - pyatoa - INFO: searching local filesystem for station info
[2022-02

[2022-02-28 12:37:40,629] - pyflex - INFO: Rejection based on data fit criteria retained 2 windows.
[2022-02-28 12:37:40,630] - pyflex - INFO: Weighted interval schedule optimization retained 1 windows.
[2022-02-28 12:37:40] - pyatoa - INFO: 1 window(s) selected for comp Z
[2022-02-28 12:37:40] - pyatoa - DEBUG: saving misfit windows to ASDFDataSet
[2022-02-28 12:37:40] - pyatoa - INFO: 2 window(s) total found
[2022-02-28 12:37:40] - pyatoa - DEBUG: running Pyadjoint w/ type: cc_traveltime_misfit
[2022-02-28 12:37:40] - pyatoa - INFO: 1.037 misfit for comp E
[2022-02-28 12:37:40] - pyatoa - INFO: 0.293 misfit for comp Z
[2022-02-28 12:37:40] - pyatoa - DEBUG: saving adjoint sources to ASDFDataSet
[2022-02-28 12:37:40] - pyatoa - INFO: total misfit 1.329
[2022-02-28 12:37:40] - pyatoa - INFO: 

	OBS WAVS:  3
	SYN WAVS:  3
	WINDOWS:   2
	MISFIT:    1.33

[2022-02-28 12:37:40] - pyatoa - INFO: saving figure to: /home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/

0.138628125

### Inspect Pyaflowa outputs

Iwe have a look at the work directory, we can see the outputs of the Pyaflowa workflow, which will be:
* An ASDFDataSet with waveforms, metadata, misfit windows and adjoint sources
* Waveform figures for all the stations processed
* Adjoint source ASCII files (.adj) required for a SPECFEM3D adjoint simulation
* STATIONS_ADJOINT file required for a SPECFEM3D adjoint simulation
* The output log which shows the 

In [29]:
# Here is the working directory with all the inputs and outputs
!ls

2018p130600.h5	CMTSOLUTION  input  STATIONS	      synthetics
adjsrcs		figures      logs   STATIONS_ADJOINT


In [30]:
# Each event will output adjoint source files that can be fed directly into SPECFEM3D
!ls adjsrcs/2018p130600

NZ.BFZ.BXE.adj	NZ.MRZ.BXE.adj	NZ.VRZ.BXE.adj	NZ.WAZ.BXE.adj
NZ.BFZ.BXN.adj	NZ.MRZ.BXN.adj	NZ.VRZ.BXN.adj	NZ.WAZ.BXN.adj
NZ.BFZ.BXZ.adj	NZ.MRZ.BXZ.adj	NZ.VRZ.BXZ.adj	NZ.WAZ.BXZ.adj


In [32]:
# Adjoint source files are created as two-column ASCII files, in the same manner as the synthetics 
# generated by SPECFEM3D
!head adjsrcs/2018p130600/NZ.MRZ.*.adj

==> adjsrcs/2018p130600/NZ.MRZ.BXE.adj <==
-2.000000000000000000e+01 0.000000000000000000e+00
-1.998499999999999943e+01 0.000000000000000000e+00
-1.996999999999999886e+01 0.000000000000000000e+00
-1.995499999999999829e+01 0.000000000000000000e+00
-1.994000000000000128e+01 0.000000000000000000e+00
-1.992500000000000071e+01 0.000000000000000000e+00
-1.991000000000000014e+01 0.000000000000000000e+00
-1.989499999999999957e+01 0.000000000000000000e+00
-1.987999999999999901e+01 0.000000000000000000e+00
-1.986499999999999844e+01 0.000000000000000000e+00

==> adjsrcs/2018p130600/NZ.MRZ.BXN.adj <==
-2.000000000000000000e+01 0.000000000000000000e+00
-1.998499999999999943e+01 0.000000000000000000e+00
-1.996999999999999886e+01 0.000000000000000000e+00
-1.995499999999999829e+01 0.000000000000000000e+00
-1.994000000000000128e+01 0.000000000000000000e+00
-1.992500000000000071e+01 0.000000000000000000e+00
-1.991000000000000014e+01 0.000000000000000000e+00
-1.989499999999999957e+01 

In [33]:
# A composite PDF of all waveform figures for each source-receiver pair will be generated
# for the user to quickly evaluate data-synthetic misfit graphically
!ls figures/2018p130600

i01s00_2018p130600.pdf


In [58]:
# This doesn't work, image needs to be stored within the dir. that I opened jupyter notebook with
# from IPython.display import IFrame
# IFrame("figures/2018p130600/i01s00_2018p130600.pdf", width=600, height=300)

In [34]:
# Text log files help the user keep track of all processing steps, and misfit information
!ls logs

i01s00_2018p130600.log


In [35]:
!cat logs/i01s00_2018p130600.log

[2022-02-28 12:37:27] - pyatoa - INFO: 

NZ.MRZ.*.HH?

[2022-02-28 12:37:27] - pyatoa - DEBUG: gathering event
[2022-02-28 12:37:27] - pyatoa - INFO: searching ASDFDataSet for event info
[2022-02-28 12:37:27] - pyatoa - DEBUG: matching event found: 81EE9F
[2022-02-28 12:37:27] - pyatoa - INFO: gathering data for NZ.MRZ.*.HH?
[2022-02-28 12:37:27] - pyatoa - INFO: gathering observed waveforms
[2022-02-28 12:37:27] - pyatoa - INFO: searching ASDFDataSet for observations
[2022-02-28 12:37:27] - pyatoa - INFO: searching local filesystem for observations
[2022-02-28 12:37:27] - pyatoa - DEBUG: searching for observations: /home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/waveforms/2018/NZ/MRZ/HH?/NZ.MRZ.*.HH?.2018.049
[2022-02-28 12:37:27] - pyatoa - DEBUG: querying client GEONET
[2022-02-28 12:37:28] - pyatoa - INFO: matching observed waveforms found
[2022-02-28 12:37:28] - pyatoa - INFO: saved to ASDFDataSet with tag 'observed'
[2022-02-28 1

### Multi-processing with Pyaflowa

Pyaflowa allows for multi-processing using Python's concurrent.futures. This means that multiple events can be processed in parallel, potentially allowing for large speed up when running waveform processing. We'll include another event [2012p242656](https://www.geonet.org.nz/earthquake/2012p242656), in our work directory and show how simple it is to use this functionality.

In [39]:
!ls synthetics/2012p242656

NZ.MRZ.BXE.semd  NZ.TSZ.BXE.semd  NZ.VRZ.BXE.semd  NZ.WAZ.BXE.semd
NZ.MRZ.BXN.semd  NZ.TSZ.BXN.semd  NZ.VRZ.BXN.semd  NZ.WAZ.BXN.semd
NZ.MRZ.BXZ.semd  NZ.TSZ.BXZ.semd  NZ.VRZ.BXZ.semd  NZ.WAZ.BXZ.semd


In [40]:
# Here we initiate almost the same key word arguments, however we add a formatting statement in the synthetics
# kwarg so that Pyaflowa knows which directory to search when looking for synthetics.
# Here the STATION FILE is the same
pf.path_structure

cwd          : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
data         : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
datasets     : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
figures      : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/figures'
logs         : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/logs'
ds_file      : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/{source_name}.h5'
stations_file: '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/STATIONS'
responses    : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/responses'
waveforms    : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/waveforms'
synthetics   : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/doc

In [43]:
# Now, all we need to do to invoke multiprocessing is input a list of source names to the run function
pf.multi_event_process(source_names=["2018p130600", "2012p242656"], cha="HH?", loc="*")

Beginning parallel processing of 2 events...


[2022-02-28 12:40:08] - pyatoa - DEBUG: gathering event
[2022-02-28 12:40:08] - pyatoa - INFO: searching ASDFDataSet for event info
[2022-02-28 12:40:08] - pyatoa - DEBUG: gathering event
[2022-02-28 12:40:08] - pyatoa - INFO: searching ASDFDataSet for event info
[2022-02-28 12:40:08] - pyatoa - DEBUG: matching event found: 81EE9F
[2022-02-28 12:40:08] - pyatoa - DEBUG: matching event found: 81EE9F
[2022-02-28 12:40:09,105] - pyflex - INFO: Calculated travel times.
[2022-02-28 12:40:09,105] - pyflex - INFO: Calculating envelope of synthetics.
[2022-02-28 12:40:09,106] - pyflex - INFO: Calculating STA/LTA.
[2022-02-28 12:40:09,108] - pyflex - INFO: Initial window selection yielded 2 possible windows.
[2022-02-28 12:40:09,109] - pyflex - INFO: Rejection based on travel times retained 2 windows.
[2022-02-28 12:40:09,109] - pyflex - INFO: Global SNR checks passed. Integrated SNR: 264017440516.493011, Amplitude SNR: 1258959.382416
[2022-02-28 12:40:09,110] - pyflex - INFO: Rejection based o

[2022-02-28 12:40:12,232] - pyflex - INFO: Rejection based on minimum window length retained 2 windows.
[2022-02-28 12:40:12,232] - pyflex - INFO: SN amplitude ratio window rejection retained 2 windows
[2022-02-28 12:40:12,243] - pyflex - INFO: Rejection based on data fit criteria retained 2 windows.
[2022-02-28 12:40:12,244] - pyflex - INFO: Weighted interval schedule optimization retained 1 windows.
[2022-02-28 12:40:13,485] - pyflex - INFO: Calculated travel times.
[2022-02-28 12:40:13,486] - pyflex - INFO: Calculating envelope of synthetics.
[2022-02-28 12:40:13,487] - pyflex - INFO: Calculating STA/LTA.
[2022-02-28 12:40:13,489] - pyflex - INFO: Initial window selection yielded 22 possible windows.
[2022-02-28 12:40:13,489] - pyflex - INFO: Rejection based on travel times retained 22 windows.
[2022-02-28 12:40:13,490] - pyflex - INFO: Global SNR checks passed. Integrated SNR: 28382657308.924114, Amplitude SNR: 494242.727553
[2022-02-28 12:40:13,490] - pyflex - INFO: Rejection base

IndexError: list index out of range

Unfortunately I don't think it's possible to run parallel tasks in a Jupyter notebook and I'm not prepared to put in the effort to figure out how to do this, so I hope you can take my word for it that this simply executes two parallel processing steps using the concurrent.futures multiprocessing machinery. neat!
___

## Automating earthquake-based FWT with Pyaflowa and SeisFlows3

Earthquake-based full waveform tomography (FWT) is a complicated procedure involving performing data-synthetic comparisons for large numbers of source-receiver pairs. This is ideally done in parallel as this processing step is (mostly) identical and independent for each source-receiver pair. 

There are many available workflows and workflow tools for automating FWT, but here we show the combination of `Pyatoa` and [SeisFlows3](https://github.com/bch0w/seisflows3), a Python-based workflow tool for automating full waveform tomography, adjoint tomography, or full waveform inversion (FWI) on a variety of compute systems.

One question that you might have regarding Pyatoa and SeisFlows 3 is: why do I need both? Well, SeisFlows was originally written as an automated workflow tool for full waveform inversion. It allowed for generalizerd interfacing with a number of numerical solvers and compute interfaces. However, it lacked some key features necessary for earthquake-based tomography, namely: data gathering, waveform preprocessing, windowing, flexible adjoint source creation, and figure generation. These tasks can be performed manually with existing tools, however the name of the game here is automation. This is where Pyatoa steps in, by providing high-level interfacing for SeisFlows to automate the above-named tasks. Pyatoa was written as a standalong package however, and it can still be code-heavy to write it into the SeisFlows framework. Consequentially, Pyaflowa was written in order to contain the code-complexity within the Pyatoa package, while providing SeisFlows (or other workflow tools) an easy-to-use API for automating the steps above.