# Pyaflowa

Pyaflowa is Pyatoa's workflow management class. It is in charge of facilitating automation of misfit quantification using the underlying machinery defined by the Config, Manager and Gatherer classes. Pyaflowa provides a structure to process en-masse waveform data from multiple stations and multiple events using only a few function calls.

Pyaflowa can be used standalone in conjunction with SPECFEM3D, and also provides with SeisFlows. When used within the SeisFlows preprocess module, Pyaflowa reduces the overhead required to include Pyatoa functionality into a SeisFlows inversion.
____

## Pyaflowa + SPECFEM3D Cartesian 

Let's start with an example of running Pyaflowa standalone, e.g. if you are simply trying to retrieve an event-specific sensitivity kernels using the machinery of SPECFEM3D Cartesian, then you would need to compare observed and synthetic waveforms using misfit quantification, and output adjoint sources that SPECFEM3D will use in its adjoint simulation. Let's walk through step by step.

### Generate CMTSOLUTION using ObsPy

First we must generate the source file used to represent the moment tensor in our earthquake simulation. We'll use my favorite New Zealand [example event, 2018p130600](https://www.geonet.org.nz/earthquake/2018p130600), an M5.2 that occurred in the central North Island, New Zealand. 

With ObsPy + Pyatoa we'll be able to gather this event from FDSN webservices and save the resulting moment tensor and event information to the required [CMTSOLUTION format](https://www.globalcmt.org/CMTsearch.html) expected by SPECFEM3D.

In [31]:
from pyatoa import append_focal_mechanism
from obspy.clients.fdsn import Client

In [32]:
# Using ObsPy's FDSN Client, we can retrieve event information using an event id
c = Client("GEONET")
cat = c.get_events(eventid="2018p130600")

# Using Pyatoa, we can append a focal mechanism from the GeoNet moment tensor catalog
cat[0] = append_focal_mechanism(cat[0], client="GEONET")

# ObsPy has built in support for writing CMTSOLUTION files expected by SPECFEM3D
cat.write("/Users/Chow/Desktop/pyaflowa_example/CMTSOLUTION", format="CMTSOLUTION")

[2020-10-14 14:24:03] - pyatoa - INFO: geonet moment tensor found for: 2018p130600
[2020-10-14 14:24:03] - pyatoa - DEBUG: GeoNet moment tensor is in units of Newton*meters
[2020-10-14 14:24:03] - pyatoa - INFO: GeoNet moment tensor appended to Event


In [33]:
# Lets just have a look at the file that's been created
!cat /Users/Chow/Desktop/pyaflowa_example/CMTSOLUTION

 PDE 2018 02 18 07 43 48.13  -39.9490  176.2995  20.6 5.2 5.2 NORTH ISLAND, NEW ZEALAND
event name:           5E4E8F
time shift:           0.0000
half duration:        0.6989
latitude:           -39.9490
longitude:          176.2995
depth:               20.5946
Mrr:           -2.479380E+23
Mtt:            1.314880E+23
Mpp:            1.164500E+23
Mrt:            5.032500E+22
Mrp:            6.607700E+22
Mtp:            9.359300E+22


### Generate STATIONS file using ObsPy + Pyatoa

SPECFEM3D also requires a STATIONS file which defines the locations of receivers for simulation output. 
As in Step 1 we'll generate a list of stations using ObsPy and write them into the required STATIONS file using Pyatoa. 

In [5]:
from pyatoa import write_stations

In [12]:
inv = c.get_stations(network="NZ", station="??Z", channel="HH?",
                     minlatitude=-41, maxlatitude=-39,
                     minlongitude=173, maxlongitude=176)
write_stations(inv, fid="/Users/Chow/Desktop/pyaflowa_example/STATIONS")

  version, ", ".join(READABLE_VERSIONS)))


> **__NOTE:__**  
In the ObsPy function get_stations(), the following arguments are provided:
* __network = "NZ"__ refers to the code for New Zealand's permament seismic netnwork
* __station = "??Z"__ means we only want 3 letter station codes that end in Z, which GeoNet usually reserves for broadband seismometers
* __channel = "HH?"__ refers to a broadband (first 'H') seismometer (second 'H'), for any available component (wildcard '?'), usually N/E/Z. This follows [SEED naming convention](https://ds.iris.edu/ds/nodes/dmc/data/formats/seed-channel-naming/).
* This min and max latitude / longitude defines a small region where we want to search for stations

In [13]:
# Lets just have a look at the file that's been created
!cat "/Users/Chow/Desktop/pyaflowa_example/STATIONS"

   MRZ    NZ    -40.6605    175.5785    0.0    0.0
   TSZ    NZ    -40.0586    175.9611    0.0    0.0
   VRZ    NZ    -39.1243    174.7585    0.0    0.0
   WAZ    NZ    -39.7546    174.9855    0.0    0.0


### Generate synthetics using SPECFEM3D [EXTERNAL]

Unfortunately this cannot be shown in a Jupyter notebook as generating synthetics requires interfacing with the SPECFEM3D code, usually on a cluster. Here we assume this step has been completed successfully, with resultant synthetic waveforms produced by SPECFEM3D for the given event and stations. 

The output synthetic seismograms are expected to be formatted as two-column ASCII files. I have placed my synthetic waveforms in the following directory. Note that the file names follow the expected output from SPECFEM3D. Adherance to this format is very important for running Pyaflowa:

In [15]:
!ls /Users/Chow/Desktop/pyaflowa_example/synthetics/2018p130600

NZ.MRZ.BXE.semd NZ.TSZ.BXE.semd NZ.VRZ.BXE.semd NZ.WAZ.BXE.semd
NZ.MRZ.BXN.semd NZ.TSZ.BXN.semd NZ.VRZ.BXN.semd NZ.WAZ.BXN.semd
NZ.MRZ.BXZ.semd NZ.TSZ.BXZ.semd NZ.VRZ.BXZ.semd NZ.WAZ.BXZ.semd


### Initiate Pyaflowa directory structure

Pyaflowa simplifies the enigmatic inner machinations of Pyatoa. To do so, it manages an internal directory structure to search for inputs and store outputs.

When used standalone, Pyaflowa creates its own directory structure within a given working directory. When used in conjunction with SeisFlows, Pyaflowa will work within the bounds of the internal directory structure of SeisFlows.

Let's start by initiating Pyaflowa. As with any usage of Pyatoa, a Config object is required to define internally used parameters.

In [1]:
from pyatoa import Pyaflowa, Config

In [34]:
cfg = Config(iteration=1, step_count=0, client="GEONET", min_period=10, max_period=30,
             pyflex_preset="nznorth_10-30s")

pf = Pyaflowa(structure="standalone", workdir="/Users/Chow/Desktop/pyaflowa_example/workdir", config=cfg)

[2020-10-14 14:57:37] - pyatoa - DEBUG: Component list set to E/N/Z


In [3]:
# We can take a look at Pyaflowa's internal directory structure with the path_structure attribute
pf.path_structure

cwd: '/Users/Chow/Desktop/pyaflowa_example/workdir/{source_name}'
datasets: '/Users/Chow/Desktop/pyaflowa_example/workdir/datasets'
figures: '/Users/Chow/Desktop/pyaflowa_example/workdir/{source_name}/figures'
logs: '/Users/Chow/Desktop/pyaflowa_example/workdir/logs'
stations_file: '/Users/Chow/Desktop/pyaflowa_example/workdir/{source_name}/STATIONS'
responses: '/Users/Chow/Desktop/pyaflowa_example/workdir/input/responses'
waveforms: '/Users/Chow/Desktop/pyaflowa_example/workdir/input/waveforms'
synthetics: '/Users/Chow/Desktop/pyaflowa_example/workdir/input/synthetics/{source_name}'
adjsrcs: '/Users/Chow/Desktop/pyaflowa_example/workdir/{source_name}/adjsrcs'

In [35]:
# If you want different directories than the chosen defaults shown above, you can simply pass the keys 
# path_strcuture attribute as keyword arguments in the intialization of Pyaflowa
kwargs = {"workdir": "/Users/Chow/Desktop/pyaflowa_example/workdir",
          "synthetics": "/Users/Chow/Desktop/pyaflowa_example/synthetics/2018p130600",
          "stations_file": "/Users/Chow/Desktop/pyaflowa_example/STATIONS"
         }

pf = Pyaflowa(structure="standalone", config=cfg, **kwargs)
pf.path_structure

cwd: '/Users/Chow/Desktop/pyaflowa_example/workdir/{source_name}'
datasets: '/Users/Chow/Desktop/pyaflowa_example/workdir/datasets'
figures: '/Users/Chow/Desktop/pyaflowa_example/workdir/{source_name}/figures'
logs: '/Users/Chow/Desktop/pyaflowa_example/workdir/logs'
stations_file: '/Users/Chow/Desktop/pyaflowa_example/STATIONS'
responses: '/Users/Chow/Desktop/pyaflowa_example/workdir/input/responses'
waveforms: '/Users/Chow/Desktop/pyaflowa_example/workdir/input/waveforms'
synthetics: '/Users/Chow/Desktop/pyaflowa_example/synthetics/2018p130600'
adjsrcs: '/Users/Chow/Desktop/pyaflowa_example/workdir/{source_name}/adjsrcs'

In [4]:
# By running setup, Pyaflow will generate the required directory structure.
# It will also return an IO object, which will be used to store internal attributes for
# each source processing. You don't need to interact with the IO object, but we can take a look
# at it for clarity
io = pf.setup(source_name="2018p130600")

### Run Pyaflowa (gather and process waveforms)

Great, that's all we need to do for setup, Pyaflowa knows the event, path structure and stations that we want to use for misfit quantification, all we have to do is run it.

In [5]:
pf.run(source_names="2018p130600")

[2020-10-14 12:49:10] - pyatoa - INFO: 

NZ.MRZ.??.HH?

[2020-10-14 12:49:10] - pyatoa - DEBUG: gathering event
[2020-10-14 12:49:11] - pyatoa - DEBUG: matching event found: 2018p130600
[2020-10-14 12:49:11] - pyatoa - INFO: gathering data for NZ.MRZ.??.HH?
[2020-10-14 12:49:11] - pyatoa - INFO: gathering observed waveforms
[2020-10-14 12:49:11] - pyatoa - DEBUG: searching ASDFDataSet
[2020-10-14 12:49:11] - pyatoa - INFO: matching observed waveforms found
[2020-10-14 12:49:11] - pyatoa - INFO: gathering StationXML
[2020-10-14 12:49:11] - pyatoa - DEBUG: searching ASDFDataSet
[2020-10-14 12:49:11] - pyatoa - INFO: matching StationXML found
[2020-10-14 12:49:11] - pyatoa - INFO: gathering synthetic waveforms
[2020-10-14 12:49:11] - pyatoa - DEBUG: searching ASDFDataSet
[2020-10-14 12:49:11] - pyatoa - DEBUG: searching local filesystem
[2020-10-14 12:49:11] - pyatoa - DEBUG: retrieved local file:
/Users/Chow/Desktop/pyaflowa_example/synthetics/2018p130600/NZ.MRZ.BXZ.semd
[2020-10-14 12:4

[2020-10-14 12:49:15] - pyatoa - DEBUG: searching ASDFDataSet
[2020-10-14 12:49:15] - pyatoa - DEBUG: searching local filesystem
[2020-10-14 12:49:15] - pyatoa - DEBUG: querying client GEONET
[2020-10-14 12:49:16] - pyatoa - INFO: matching observed waveforms found
[2020-10-14 12:49:16] - pyatoa - INFO: saved to ASDFDataSet with tag 'observed'
[2020-10-14 12:49:16] - pyatoa - INFO: gathering StationXML
[2020-10-14 12:49:16] - pyatoa - DEBUG: searching ASDFDataSet
[2020-10-14 12:49:16] - pyatoa - DEBUG: searching local filesystem
[2020-10-14 12:49:16] - pyatoa - DEBUG: querying client GEONET
  version, ", ".join(READABLE_VERSIONS)))
[2020-10-14 12:49:17] - pyatoa - INFO: matching StationXML found
[2020-10-14 12:49:17] - pyatoa - INFO: saved to ASDFDataSet
[2020-10-14 12:49:17] - pyatoa - INFO: gathering synthetic waveforms
[2020-10-14 12:49:17] - pyatoa - DEBUG: searching ASDFDataSet
[2020-10-14 12:49:17] - pyatoa - DEBUG: searching local filesystem
[2020-10-14 12:49:17] - pyatoa - DEBUG

[2020-10-14 12:49:23,371] - pyflex - INFO: Calculating envelope of synthetics.
[2020-10-14 12:49:23,374] - pyflex - INFO: Calculating STA/LTA.
[2020-10-14 12:49:23,377] - pyflex - INFO: Initial window selection yielded 13 possible windows.
[2020-10-14 12:49:23,378] - pyflex - INFO: Rejection based on travel times retained 13 windows.
[2020-10-14 12:49:23,379] - pyflex - INFO: Global SNR checks passed. Integrated SNR: 31526068792.084240, Amplitude SNR: 495295.417949
[2020-10-14 12:49:23,381] - pyflex - INFO: Rejection based on minimum window length retained 13 windows.
[2020-10-14 12:49:23,382] - pyflex - INFO: Water level rejection retained 4 windows
[2020-10-14 12:49:23,384] - pyflex - INFO: Single phase group rejection retained 4 windows
[2020-10-14 12:49:23,389] - pyflex - INFO: Removing duplicates retains 3 windows.
[2020-10-14 12:49:23,391] - pyflex - INFO: Rejection based on minimum window length retained 3 windows.
[2020-10-14 12:49:23,395] - pyflex - INFO: SN amplitude ratio wi

[2020-10-14 12:49:29,626] - pyflex - INFO: Calculated travel times.
[2020-10-14 12:49:29,627] - pyflex - INFO: Calculating envelope of synthetics.
[2020-10-14 12:49:29,629] - pyflex - INFO: Calculating STA/LTA.
[2020-10-14 12:49:29,632] - pyflex - INFO: Initial window selection yielded 10 possible windows.
[2020-10-14 12:49:29,634] - pyflex - INFO: Rejection based on travel times retained 10 windows.
[2020-10-14 12:49:29,637] - pyflex - INFO: Global SNR checks passed. Integrated SNR: 60358322016.511925, Amplitude SNR: 576780.923209
[2020-10-14 12:49:29,639] - pyflex - INFO: Rejection based on minimum window length retained 9 windows.
[2020-10-14 12:49:29,641] - pyflex - INFO: Water level rejection retained 3 windows
[2020-10-14 12:49:29,648] - pyflex - INFO: Single phase group rejection retained 3 windows
[2020-10-14 12:49:29,650] - pyflex - INFO: Removing duplicates retains 2 windows.
[2020-10-14 12:49:29,651] - pyflex - INFO: Rejection based on minimum window length retained 2 window

### Inspect Pyaflowa outputs

Iwe have a look at the work directory, we can see the outputs of the Pyaflowa workflow, which will be:
* An ASDFDataSet with waveforms, metadata, misfit windows and adjoint sources
* Waveform figures for all the stations processed
* Adjoint source ASCII files (.adj) required for a SPECFEM3D adjoint simulation
* STATIONS_ADJOINT file required for a SPECFEM3D adjoint simulation
* The output log which shows the 

In [6]:
# Here is the working directory with all the inputs and outputs
!ls "/Users/Chow/Desktop/pyaflowa_example/workdir"

[1m[32m2018p130600[m[m [1m[32mdatasets[m[m    [1m[32minput[m[m       [1m[32mlogs[m[m


In [7]:
# The ASDFDataSet contains all the data and metadata collected and created during the workflow
# This can be viewed using the functionalities of PyASDF
!ls "/Users/Chow/Desktop/pyaflowa_example/workdir/datasets"

2018p130600.h5


In [15]:
# Each event will output figures and adjoint source files
!ls "/Users/Chow/Desktop/pyaflowa_example/workdir/2018p130600"

[1m[32madjsrcs[m[m [1m[32mfigures[m[m


In [18]:
# Adjoint sources are created for all components. Components that have no measurements will be written with zeros
!ls "/Users/Chow/Desktop/pyaflowa_example/workdir/2018p130600/adjsrcs"

NZ.MRZ.BXE.adj NZ.MRZ.BXZ.adj NZ.VRZ.BXN.adj NZ.WAZ.BXE.adj NZ.WAZ.BXZ.adj
NZ.MRZ.BXN.adj NZ.VRZ.BXE.adj NZ.VRZ.BXZ.adj NZ.WAZ.BXN.adj


In [19]:
# Adjoint source files are created as two-column ASCII files, in the same manner as the synthetics 
# generated by SPECFEM3D
!head "/Users/Chow/Desktop/pyaflowa_example/workdir/2018p130600/adjsrcs/NZ.MRZ.BXE.adj"

-2.000000000000000000e+01 0.000000000000000000e+00
-1.998499999999999943e+01 0.000000000000000000e+00
-1.996999999999999886e+01 0.000000000000000000e+00
-1.995499999999999829e+01 0.000000000000000000e+00
-1.994000000000000128e+01 0.000000000000000000e+00
-1.992500000000000071e+01 0.000000000000000000e+00
-1.991000000000000014e+01 0.000000000000000000e+00
-1.989499999999999957e+01 0.000000000000000000e+00
-1.987999999999999901e+01 0.000000000000000000e+00
-1.986499999999999844e+01 0.000000000000000000e+00


In [21]:
!ls "/Users/Chow/Desktop/pyaflowa_example/workdir/2018p130600/figures"

i01_s00_2018p130600.pdf


In [12]:
!tail "/Users/Chow/Desktop/pyaflowa_example/workdir/logs/i01s00_2018p130600.log"


SUMMARY

SOURCE NAME: 2018p130600
STATIONS: 3 / 4
WINDOWS: 8
RAW MISFIT: 2.27
UNEXPECTED ERRORS: 0


### Multi-processing with Pyaflowa

Pyaflowa allows for multi-processing using Python's concurrent.futures. This means that multiple events can be processed in parallel, potentially allowing for large speed up when running waveform processing. We'll include another event [2012p242656](https://www.geonet.org.nz/earthquake/2012p242656), in our work directory and show how simple it is to use this functionality.

In [22]:
!ls "/Users/Chow/Desktop/pyaflowa_example/synthetics/2012p242656"

NZ.MRZ.BXE.semd NZ.TSZ.BXE.semd NZ.VRZ.BXE.semd NZ.WAZ.BXE.semd
NZ.MRZ.BXN.semd NZ.TSZ.BXN.semd NZ.VRZ.BXN.semd NZ.WAZ.BXN.semd
NZ.MRZ.BXZ.semd NZ.TSZ.BXZ.semd NZ.VRZ.BXZ.semd NZ.WAZ.BXZ.semd


In [26]:
# Here we initiate almost the same key word arguments, however we add a formatting statement in the synthetics
# kwarg so that Pyaflowa knows which directory to search when looking for synthetics.
# Here the STATION FILE is the same
kwargs = {"workdir": "/Users/Chow/Desktop/pyaflowa_example/workdir",
          "synthetics": "/Users/Chow/Desktop/pyaflowa_example/synthetics/{source_name}",
          "stations_file": "/Users/Chow/Desktop/pyaflowa_example/STATIONS"
         }

pf = Pyaflowa(structure="standalone", config=cfg, **kwargs)
pf.path_structure

cwd: '/Users/Chow/Desktop/pyaflowa_example/workdir/{source_name}'
datasets: '/Users/Chow/Desktop/pyaflowa_example/workdir/datasets'
figures: '/Users/Chow/Desktop/pyaflowa_example/workdir/{source_name}/figures'
logs: '/Users/Chow/Desktop/pyaflowa_example/workdir/logs'
stations_file: '/Users/Chow/Desktop/pyaflowa_example/STATIONS'
responses: '/Users/Chow/Desktop/pyaflowa_example/workdir/input/responses'
waveforms: '/Users/Chow/Desktop/pyaflowa_example/workdir/input/waveforms'
synthetics: '/Users/Chow/Desktop/pyaflowa_example/synthetics/{source_name}'
adjsrcs: '/Users/Chow/Desktop/pyaflowa_example/workdir/{source_name}/adjsrcs'

In [29]:
# Now, all we need to do to invoke multiprocessing is input a list of source names to the run function
pf.run(source_names=["2018p130600", "2012p242656"])

Unfortunately I don't think it's possible to run parallel tasks in a Jupyter notebook and I'm not prepared to put in the effort to figure out how to do this, so I hope you can take my word for it that this simply executes two parallel processing steps using the concurrent.futures multiprocessing machinery. neat!
___

## Pyaflowa + SeisFlows

Coming soon to a docs page near you.