# Pyaflowa

`Pyaflowa` can be used standalone, and also provides the necessary interface to work with SeisFlows3. When used within the SeisFlows3 preprocess module, Pyaflowa reduces the overhead required to include Pyatoa functionality into a SeisFlows3 inversion.

## Standalone (w/ SPECFEM3D)

Let's say you want to run forward and adjoint simulations to generate synthetic seismograms and sensitivity kernels using SPECFEM3D. To make this happen, a user will need some specific pieces that correspond to the problem at hand.

* Forward simulations: DATA/CMTSOLUTION, DATA/STATIONS  
* Adjoint simulations: DATA/STATIONS_ADJOINT, /sem/\*.adj 

We can use `Pyatoa` to prepare the necessary files for SPECFEM3D forward simulatinos, and then use `Pyaflowa` to: 1) compare synthetic seismograms with real data for a number of stations and generate adjoint sources which can be fed directly back into a SPECFEM3D adjoint simulation. 

### Prep: Generate CMTSOLUTION

First we must generate the source file used to represent the moment tensor in our earthquake simulation. In this example, we'll use my favorite New Zealand [example event, 2018p130600](https://www.geonet.org.nz/earthquake/2018p130600), an M5.2 that occurred in the central North Island, New Zealand while I was doing my PhD in Wellington.

With `ObsPy` + `Pyatoa` we'll be able to gather this event from FDSN webservices and save the resulting moment tensor and event information to the required [CMTSOLUTION format](https://www.globalcmt.org/CMTsearch.html) expected by SPECFEM3D. 

In [1]:
import os
from pyatoa import append_focal_mechanism
from obspy.clients.fdsn import Client

In [4]:
# We will just move to an empty directory for this example problem
print(os.getcwd())
docs_path = os.path.abspath("../tests/test_data/docs_data/pyaflowa_doc")
if not os.path.exists(docs_path):
    os.mkdir(docs_path)
os.chdir(docs_path)
print(os.getcwd())

/home/bchow/REPOSITORIES/pyatoa/pyatoa/docs
/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc


In [5]:
# Using ObsPy's FDSN Client, we can retrieve event information using an event id
c = Client("GEONET")
cat = c.get_events(eventid="2018p130600")

# Using Pyatoa, we can append a focal mechanism from the GeoNet moment tensor catalog
cat[0] = append_focal_mechanism(cat[0], client="GEONET")

# ObsPy has built in support for writing CMTSOLUTION files expected by SPECFEM3D
cat.write("CMTSOLUTION", format="CMTSOLUTION")



In [6]:
# Lets just have a look at the file that's been created, which is a CMTSOLUTION that is ready
# to be used in SPECFEM3D
!cat "CMTSOLUTION"

 PDE 2018 02 18 07 43 48.13  -39.9490  176.2995  20.6 5.2 5.2 NORTH ISLAND, NEW ZEALAND
event name:           CBBE31
time shift:           0.0000
half duration:        0.6989
latitude:           -39.9490
longitude:          176.2995
depth:               20.5946
Mrr:           -2.479380E+23
Mtt:            1.314880E+23
Mpp:            1.164500E+23
Mrt:            5.032500E+22
Mrp:            6.607700E+22
Mtp:            9.359300E+22


### Prep: Generate STATIONS file

SPECFEM3D also requires a STATIONS file which defines the locations of receivers for simulation output. 
As in Step 1 we'll generate a list of stations using `ObsPy` and write them into the required STATIONS file using `Pyatoa` and a corresponding obspy.Inventory object.

> **__NOTE:__**  
In the ObsPy function get_stations(), the reasoning behind the following arguments provided:
* __network = "NZ"__ refers to the code for New Zealand's permament seismic netnwork
* __station = "??Z"__ means we only want 3 letter station codes that end in Z, which GeoNet usually reserves for broadband seismometers
* __channel = "HH?"__ refers to a broadband (first 'H') seismometer (second 'H'), for any available component (wildcard '?'), usually N/E/Z. This follows [SEED naming convention](https://ds.iris.edu/ds/nodes/dmc/data/formats/seed-channel-naming/).
* This __min__ and __max latitude / longitude__ defines a small region where we want to search for stations

In [7]:
from pyatoa import write_stations

In [38]:
inv = c.get_stations(network="NZ", station="BFZ", channel="HH?")
write_stations(inv, fid="STATIONS")

  version, ", ".join(READABLE_VERSIONS)))


In [39]:
# Let's have a look at the stations we picked up from FDSN
print(inv)

Inventory created at 2022-02-27T20:54:58.000000Z
	Created by: Delta
		    
	Sending institution: GeoNet (WEL(GNS_Test))
	Contains:
		Networks (1):
			NZ
		Stations (1):
			NZ.BFZ (Birch Farm)
		Channels (0):



In [40]:
# And lets have a look at the STATIONS file that's been created
!cat "STATIONS"

   BFZ    NZ    -40.6796    176.2462    0.0    0.0


### Forward Simulation: Generate synthetics using SPECFEM3D [external]

Unfortunately this cannot be shown in a Jupyter notebook as generating synthetics requires interfacing with the SPECFEM3D code, which usually takes place on a cluster. In this example we assume this step has been completed successfully, with resultant synthetic waveforms produced by SPECFEM3D for the given event and stations defined above. 

> **__NOTE:__**  
Output synthetic seismograms are expected to be formatted as two-column ASCII files, which I have pre-generated for this example. File names follow the expected output from SPECFEM3D. Adherance to this format is very important for running Pyaflowa. 

> **__NOTE:__**
By default synthetic waveform data is expected to be separated by event ID, e.g., PATH/TO/SYNTHETICS/{EVENT_ID}/*semd

In [11]:
# Let's copy the premade synthetic data into our current working directory
!ls ../../synthetics
!mkdir -p synthetics/2018p130600
!cp ../../synthetics/* ./synthetics/2018p130600

NZ.BFZ.BXE.semd  NZ.BFZ.BXN.semd  NZ.BFZ.BXZ.semd


In [12]:
!head ./synthetics/2018p130600/*

==> ./synthetics/2018p130600/NZ.BFZ.BXE.semd <==
  -20.0000000         0.0000000
  -19.9700000         0.0000000
  -19.9400000         0.0000000
  -19.9100000         0.0000000
  -19.8800000         0.0000000
  -19.8500000         0.0000000
  -19.8200000         0.0000000
  -19.7900000         0.0000000
  -19.7600000         0.0000000
  -19.7300000         0.0000000

==> ./synthetics/2018p130600/NZ.BFZ.BXN.semd <==
  -20.0000000         0.0000000
  -19.9700000         0.0000000
  -19.9400000         0.0000000
  -19.9100000         0.0000000
  -19.8800000         0.0000000
  -19.8500000         0.0000000
  -19.8200000         0.0000000
  -19.7900000         0.0000000
  -19.7600000         0.0000000
  -19.7300000         0.0000000

==> ./synthetics/2018p130600/NZ.BFZ.BXZ.semd <==
  -20.0000000         0.0000000
  -19.9700000         0.0000000
  -19.9400000         0.0000000
  -19.9100000         0.0000000
  -19.8800000         0.0000000
  -19.8500000        

### Pyaflowa's directory structure

`Pyaflowa` abstracts away the enigmatic inner machinations of `Pyatoa`. To do so it manages an internal directory structure to search for inputs and store outputs.

When used standalone, `Pyaflowa` creates its own directory structure within a given working directory. When used in conjunction with SeisFlows3, `Pyaflowa` will work within the preset internal directory structure of SeisFlows3 (see Pyaflowa + SeisFlows3).

Let's start by initiating `Pyaflowa`. As with any usage of Pyatoa, a Config object is required to define internally used parameters which will inturn be used to control gathering, waveform processing, and misfit quantification.

In [13]:
from pyatoa import Pyaflowa, Config

In [14]:
cfg = Config(iteration=1, step_count=0, client="GEONET", min_period=10, max_period=30,
             pyflex_preset="nznorth_10-30s")

pf = Pyaflowa(structure="standalone", workdir="./", config=cfg)

In [15]:
# We can take a look at Pyaflowa's internal directory structure with the path_structure attribute
pf.path_structure

cwd          : './'
data         : './input/DATA'
datasets     : './datasets'
figures      : './figures'
logs         : './logs'
ds_file      : './datasets/{source_name}.h5'
stations_file: './{source_name}/STATIONS'
responses    : './input/responses'
waveforms    : './input/waveforms'
synthetics   : './input/synthetics/{source_name}'
adjsrcs      : './adjsrcs/{source_name}'
event_figures: './figures/{source_name}'

If you want different directories than the chosen defaults shown above, you can simply pass the keys of the `path_structure` attribute as keyword arguments in the initialization of Pyaflowa. Let's generate a non-standard path structure to point to our existing data.

In [23]:
# Make sure we're in the correct directory so that we don't start making dir. randomly
assert os.path.basename(os.getcwd()) == "pyaflowa_doc"

# Custom set directory structure
base_path = os.getcwd()
kwargs = {"workdir": base_path,
          "synthetics": os.path.join(base_path, "synthetics", "{source_name}"),
          "stations_file": os.path.join(base_path, "STATIONS"),
          "data": base_path,
          "datasets": base_path,
         }

pf = Pyaflowa(structure="standalone", config=cfg, **kwargs)
pf.path_structure

cwd          : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
data         : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
datasets     : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
figures      : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/figures'
logs         : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/logs'
ds_file      : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/{source_name}.h5'
stations_file: '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/STATIONS'
responses    : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/responses'
waveforms    : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/waveforms'
synthetics   : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/doc

----------------------

### The IO (input/output) class

By running the Pyaflowa.setup() function, Pyaflowa will make the required directory structure defined above. It will also return an `IO` object. This internally used object store information related to paths, configurations and processing.

The user does __not__ need to interact with the `IO` object, but we can take a look at it for clarity. It contains the internal directory structure used by `Pyaflowa`, the `Config` object which will control all of the Manager processing that will take place, and internal attributes which keep track of how processing occurs.

In [41]:
io = pf.setup(source_name="2018p130600")

[2022-02-28 12:08:24] - pyatoa - DEBUG: gathering event
[2022-02-28 12:08:24] - pyatoa - INFO: searching ASDFDataSet for event info
[2022-02-28 12:08:24] - pyatoa - DEBUG: matching event found: 2018p130600


In [26]:
for key, val in io.items():
    print(key, val)

paths cwd          : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
data         : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
datasets     : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc'
figures      : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/figures'
logs         : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/logs'
ds_file      : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/2018p130600.h5'
stations_file: '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/STATIONS'
responses    : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/responses'
waveforms    : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/input/waveforms'
synthetics   : '/home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data

-----------------------
### Running Pyaflowa (gather and process waveforms)

Great, we're all set up to run `Pyaflowa`. Internally `Pyaflowa` knows the event, path structure and stations that we want to use for misfit quantification. Now when we run it, `Pyaflowa` will instantiate `Manager` classes, attempt to gather data from disk or from web services, preprocess data and synthetic waveforms according to the `Config` object, and generate misfit windows and adjoint sources.

The example code block below is a an example of what Pyaflowa is doing under the hood: it simply abstracts commands that are used to run processing for multiple stations. It also contains a few internal checks to make sure unexpected errors don't throw the processing step off the rails.::

    from pyasdf import ASDFDataSet 
    from pyatoa import Manager

    with ASDFDataSet(io.paths.ds_file) as ds:
        mgmt = Manager(ds=ds, config=io.config)
        for code in ["NZ.BFZ.*.*"]:
            mgmt.gather(code=code)
            mgmt.flow()

In [42]:
pf.process_event(source_name="2018p130600")

[2022-02-28 12:08:27] - pyatoa - DEBUG: gathering event
[2022-02-28 12:08:27] - pyatoa - INFO: searching ASDFDataSet for event info
[2022-02-28 12:08:27] - pyatoa - DEBUG: matching event found: 2018p130600
[2022-02-28 12:08:27] - pyatoa - INFO: 

NZ.BFZ.*.*

[2022-02-28 12:08:27] - pyatoa - DEBUG: gathering event
[2022-02-28 12:08:27] - pyatoa - INFO: searching ASDFDataSet for event info
[2022-02-28 12:08:28] - pyatoa - DEBUG: matching event found: 2018p130600
[2022-02-28 12:08:28] - pyatoa - INFO: gathering data for NZ.BFZ.*.*
[2022-02-28 12:08:28] - pyatoa - INFO: gathering observed waveforms
[2022-02-28 12:08:28] - pyatoa - INFO: searching ASDFDataSet for observations
[2022-02-28 12:08:28] - pyatoa - INFO: matching observed waveforms found
[2022-02-28 12:08:28] - pyatoa - INFO: gathering StationXML
[2022-02-28 12:08:28] - pyatoa - INFO: searching ASDFDataSet for station info
[2022-02-28 12:08:28] - pyatoa - INFO: matching StationXML found
[2022-02-28 12:08:28] - pyatoa - INFO: saved

[2022-02-28 12:08:28] - pyatoa - INFO: 0.365 misfit for comp E
[2022-02-28 12:08:28] - pyatoa - INFO: 1.620 misfit for comp N
[2022-02-28 12:08:28] - pyatoa - INFO: 3.251 misfit for comp Z
[2022-02-28 12:08:28] - pyatoa - DEBUG: saving adjoint sources to ASDFDataSet
[2022-02-28 12:08:29] - pyatoa - INFO: total misfit 5.237
[2022-02-28 12:08:29] - pyatoa - INFO: 

	OBS WAVS:  3
	SYN WAVS:  3
	WINDOWS:   3
	MISFIT:    5.24

[2022-02-28 12:08:29] - pyatoa - INFO: saving figure to: /home/bchow/REPOSITORIES/pyatoa/pyatoa/tests/test_data/docs_data/pyaflowa_doc/figures/2018p130600/i01_s00_NZ_BFZ.pdf
[2022-02-28 12:08:29] - pyatoa - INFO: 

FINALIZE

[2022-02-28 12:08:29] - pyatoa - INFO: creating single .pdf file of all output figures
[2022-02-28 12:08:29] - pyatoa - INFO: generating STATIONS_ADJOINT file for SPECFEM
[2022-02-28 12:08:29] - pyatoa - INFO: 

SUMMARY

SOURCE NAME: 2018p130600
STATIONS: 1 / 1
WINDOWS: 3
RAW MISFIT: 5.24
UNEXPECTED ERRORS: 0


0.8727564202568289

### Inspect Pyaflowa outputs

Iwe have a look at the work directory, we can see the outputs of the Pyaflowa workflow, which will be:
* An ASDFDataSet with waveforms, metadata, misfit windows and adjoint sources
* Waveform figures for all the stations processed
* Adjoint source ASCII files (.adj) required for a SPECFEM3D adjoint simulation
* STATIONS_ADJOINT file required for a SPECFEM3D adjoint simulation
* The output log which shows the 

In [44]:
# Here is the working directory with all the inputs and outputs
!ls

2018p130600.h5	CMTSOLUTION  input  STATIONS	      synthetics
adjsrcs		figures      logs   STATIONS_ADJOINT


In [49]:
# Each event will output adjoint source files that can be fed directly into SPECFEM3D
!ls adjsrcs/2018p130600

NZ.BFZ.BXE.adj	NZ.BFZ.BXN.adj	NZ.BFZ.BXZ.adj


In [50]:
# Adjoint source files are created as two-column ASCII files, in the same manner as the synthetics 
# generated by SPECFEM3D
!head adjsrcs/2018p130600/*adj

==> adjsrcs/2018p130600/NZ.BFZ.BXE.adj <==
-2.000000000000000000e+01 0.000000000000000000e+00
-1.996999999999999886e+01 0.000000000000000000e+00
-1.994000000000000128e+01 0.000000000000000000e+00
-1.991000000000000014e+01 0.000000000000000000e+00
-1.987999999999999901e+01 0.000000000000000000e+00
-1.985000000000000142e+01 0.000000000000000000e+00
-1.982000000000000028e+01 0.000000000000000000e+00
-1.978999999999999915e+01 0.000000000000000000e+00
-1.976000000000000156e+01 0.000000000000000000e+00
-1.973000000000000043e+01 0.000000000000000000e+00

==> adjsrcs/2018p130600/NZ.BFZ.BXN.adj <==
-2.000000000000000000e+01 0.000000000000000000e+00
-1.996999999999999886e+01 0.000000000000000000e+00
-1.994000000000000128e+01 0.000000000000000000e+00
-1.991000000000000014e+01 0.000000000000000000e+00
-1.987999999999999901e+01 0.000000000000000000e+00
-1.985000000000000142e+01 0.000000000000000000e+00
-1.982000000000000028e+01 0.000000000000000000e+00
-1.978999999999999915e+01 

In [52]:
# A composite PDF of all waveform figures for each source-receiver pair will be generated
# for the user to quickly evaluate data-synthetic misfit graphically
!ls figures/2018p130600

i01s00_2018p130600.pdf


In [58]:
# This doesn't work, image needs to be stored within the dir. that I opened jupyter notebook with
# from IPython.display import IFrame
# IFrame("figures/2018p130600/i01s00_2018p130600.pdf", width=600, height=300)

In [59]:
# Text log files help the user keep track of all processing steps, and misfit information
!ls logs

i01s00_2018p130600.log


In [60]:
!cat logs/i01s00_2018p130600.log

[2022-02-28 12:08:27] - pyatoa - INFO: 

NZ.BFZ.*.*

[2022-02-28 12:08:27] - pyatoa - DEBUG: gathering event
[2022-02-28 12:08:27] - pyatoa - INFO: searching ASDFDataSet for event info
[2022-02-28 12:08:28] - pyatoa - DEBUG: matching event found: 2018p130600
[2022-02-28 12:08:28] - pyatoa - INFO: gathering data for NZ.BFZ.*.*
[2022-02-28 12:08:28] - pyatoa - INFO: gathering observed waveforms
[2022-02-28 12:08:28] - pyatoa - INFO: searching ASDFDataSet for observations
[2022-02-28 12:08:28] - pyatoa - INFO: matching observed waveforms found
[2022-02-28 12:08:28] - pyatoa - INFO: gathering StationXML
[2022-02-28 12:08:28] - pyatoa - INFO: searching ASDFDataSet for station info
[2022-02-28 12:08:28] - pyatoa - INFO: matching StationXML found
[2022-02-28 12:08:28] - pyatoa - INFO: saved to ASDFDataSet
[2022-02-28 12:08:28] - pyatoa - INFO: gathering synthetic waveforms
[2022-02-28 12:08:28] - pyatoa - INFO: searching ASDFDataSet for synthetics
[2022-02-28 12:08:28] - pyat

### Multi-processing with Pyaflowa

Pyaflowa allows for multi-processing using Python's concurrent.futures. This means that multiple events can be processed in parallel, potentially allowing for large speed up when running waveform processing. We'll include another event [2012p242656](https://www.geonet.org.nz/earthquake/2012p242656), in our work directory and show how simple it is to use this functionality.

In [25]:
!ls "/Users/Chow/Desktop/pyaflowa_example/synthetics/2012p242656"

ls: cannot access '/Users/Chow/Desktop/pyaflowa_example/synthetics/2012p242656': No such file or directory


In [26]:
# Here we initiate almost the same key word arguments, however we add a formatting statement in the synthetics
# kwarg so that Pyaflowa knows which directory to search when looking for synthetics.
# Here the STATION FILE is the same
kwargs = {"workdir": "/Users/Chow/Desktop/pyaflowa_example/workdir",
          "synthetics": "/Users/Chow/Desktop/pyaflowa_example/synthetics/{source_name}",
          "stations_file": "/Users/Chow/Desktop/pyaflowa_example/STATIONS"
         }

pf = Pyaflowa(structure="standalone", config=cfg, **kwargs)
pf.path_structure

cwd          : '/Users/Chow/Desktop/pyaflowa_example/workdir'
data         : '/Users/Chow/Desktop/pyaflowa_example/workdir/input/DATA'
datasets     : '/Users/Chow/Desktop/pyaflowa_example/workdir/datasets'
figures      : '/Users/Chow/Desktop/pyaflowa_example/workdir/figures'
logs         : '/Users/Chow/Desktop/pyaflowa_example/workdir/logs'
ds_file      : '/Users/Chow/Desktop/pyaflowa_example/workdir/datasets/{source_name}.h5'
stations_file: '/Users/Chow/Desktop/pyaflowa_example/STATIONS'
responses    : '/Users/Chow/Desktop/pyaflowa_example/workdir/input/responses'
waveforms    : '/Users/Chow/Desktop/pyaflowa_example/workdir/input/waveforms'
synthetics   : '/Users/Chow/Desktop/pyaflowa_example/synthetics/{source_name}'
adjsrcs      : '/Users/Chow/Desktop/pyaflowa_example/workdir/adjsrcs/{source_name}'
event_figures: '/Users/Chow/Desktop/pyaflowa_example/workdir/figures/{source_name}'

In [27]:
# Now, all we need to do to invoke multiprocessing is input a list of source names to the run function
pf.run(source_names=["2018p130600", "2012p242656"])

AttributeError: 'Pyaflowa' object has no attribute 'run'

Unfortunately I don't think it's possible to run parallel tasks in a Jupyter notebook and I'm not prepared to put in the effort to figure out how to do this, so I hope you can take my word for it that this simply executes two parallel processing steps using the concurrent.futures multiprocessing machinery. neat!
___

## Pyaflowa + SeisFlows3

Coming soon to a docs page near you.