# 2024 SCOPED Workshop — Wavefield Simulations Using SPECFEM

## Notebook 5: Introduction to SeisFlows

- In this notebook we will introduce two open-source Python packages for facilitating/automating seismic imaging  
- **Objective**: To introduce and tour around SeisFlows and Pyatoa, and see how they can be used to simplify working with SPECFEM     
- These instructions should be run from inside a Docker container, using Jupyter Lab (see instructions [here](https://github.com/adjtomo/adjdocs/blob/main/readmes/docker_image_install.md)).  
-----------

**Relevant Links:** 
- This Notebook: https://github.com/adjtomo/adjdocs/blob/main/workshops/2024-5-21_scoped_uw/5_intro_seisflows.ipynb

**adjTomo Software Suite:** 
- adjTomo: https://github.com/adjtomo
- SeisFlows GitHub Page: https://github.com/adjtomo/seisflows
- SeisFlows Documentation: https://seisflows.readthedocs.io/en/latest/


**Jupyter Quick Tips:**

- **Run cells** one-by-one by hitting the $\blacktriangleright$ button at the top, or by hitting `Shift + Enter`
- **Run all cells** by hitting the $\blacktriangleright\blacktriangleright$ button at the top, or by running `Run -> Run All Cells`
- **Currently running cells** that are still processing will have a `[*]` symbol next to them
- **Finished cells** will have a `[1]` symbol next to them. The number inside the brackets represents what order this cell has been run in.
- Commands that start with `!` are Bash commands (i.e., commands you would run from the terminal)
- Commands that start with `%` are Jupyter Magic commands.

----------
## 0) Background

- Full waveform inversion / adjoint tomography is an algorithmically and computationally complex procedure  
- For real-world regional scale inversion, the number of events and stations can range from tens to hundreds  
- For 3 component seismograms, this can reach tens of thousands of waveforms and misfit quantification calculations over the course of hundreds of simulations  
- Automated workflow tools cut down on human time and operator error when running repetitive and iterative inversions  
- They also free up research time to focus on details of an inversion, rather than implementation

![fwi_workflow](https://user-images.githubusercontent.com/23055374/194435095-8def121f-edc7-4408-be46-c0b84352ac6c.png)


### SeisFlows

- SeisFlows is one available tool for automating forward and adjoint simulations, as well as seismic inversions. 
- It comes with a built-in command line tool, and is written completely in Python  
- It provides `system` modules that allow it to interact with a variety of compute systems (laptops to HPC) using the same interface  
- SPECFEM2D and local capabilities allow quick prototyping and facilitate transition to 3D runs on clusters
- **BOTTOM LINE**: If you can successfully set up a forward problem yourself, then SeisFlows can take it from there and run the inverse problem.

### Pyatoa

- *Python's Adjoint Tomography Operations Assistant*: an ObsPy-like Python package used for misfit quantification  
- High-level wrapper for Pyflex, Pyadjoint, PyASDF, Pandas and ObsPy, all focused on the seismic imaging problem  
- Takes care of visualization and inversion assessment, implemented directly in the SeisFlows package but also operates as a standalone package 
- Motivated by tools and functionality I wish I had when performing a seismic inversion  

In [None]:
! seisflows -h

---------
## 1) Automating the Forward Problem

- We'll first show how SeisFlows automates the forward problem  
- Users will need to supply starting model as well as real data or a target model  
- A single `parameters.yaml` file controls all of the SeisFlows workflow  

### 1a) Setting Up a SeisFlows Example Problem

- This setup procedure does exactly what we did in Day 1B, takes the Tape 2007 example problem and runs a forward simulation 
- This particular example uses the perturbation checkerboard model as the underlying model  
- See https://seisflows.readthedocs.io/en/devel/specfem2d_example.html for more examples
- See https://seisflows.readthedocs.io/en/devel/2D_example_walkthrough.html to figure out what's going on under the hood  

In [None]:
# Required Python packages for today's notebook
from pyasdf import ASDFDataSet
from pyatoa import Inspector
from IPython.display import Image

In [None]:
# Make sure we're in an empty working directory
! rm -r /home/scoped/work/intro_seisflows
! mkdir -p /home/scoped/work/intro_seisflows

%cd /home/scoped/work/intro_seisflows

# Setup Example 3, en-masse forward simulations. Don't run
! seisflows examples setup 3 -r  /home/scoped/specfem2d/ --with_mpi --nproc 1

In [None]:
# Under the hood, the example SETUP procedure has run a forward simulation to generate our starting model  
! ls 
! echo
# Similar to the working directories we have been using during Days 1-3
! ls specfem2d_workdir
! echo
# OUTPUT_FILES contains model parameters
! ls specfem2d_workdir/OUTPUT_FILES_INIT
! echo
# 
! ls specfem2d_workdir/OUTPUT_FILES_INIT/*.bin

### 1b) The SeisFlows `parameters.yaml` file

- Similar to the SPECFEM `Par_file`, the SeisFlows `parameters.yaml` file controls the SeisFlows workflow
- Each 'module' of SeisFlows has a separate set of parameters
- The 'modules' of SeisFlows include: 
    - **Workflow:** the type of workflow and collection of tasks to run (e.g., forward, migration, inversion)  
    - **System:** controls interaction with the compute system (e.g., workstation, Slurm, Chinook)   
    - **Solver:** choose *which* external solver SeisFlows will interact with (i.e, specfem2d, 3d, 3d_globe (W.I.P.))
    - **Preprocess:** the preprocessing module to use for generating adjoint sources (i.e, default, Pyaflowa)
    - **Optimize:** the nonlinear optimization algorithm to use for model updates (e.g., gradient descent, L-BFGS)

In [None]:
# Looking at the available modules
! head -n 33 parameters.yaml

In [None]:
# Workflow: Forward parameters
! head -n 75 parameters.yaml | tail -n 42

In [None]:
# Solver: SPECFEM2D parameters
! head -n 213 parameters.yaml | tail -n 78

### 1c) Shared `DATA/` Directory

- SeisFlows borrows files from the SPECFEM *DATA/* directory but requires some special formatting  
- SeisFlows will look for `ntask` events with the prefix `source_prefix`
- Therefore, in this case all source files must be in the form `SOURCE_*`
- The suffix can be event names, ID numbers etc. They will be used to name and identify events during a workflow  


In [None]:
! seisflows par ntask
! seisflows par source_prefix
! echo
! ls specfem2d_workdir/DATA/SOURCE_*

#### Required Paths

- SeisFlows needs to know the path to the *DATA/* directory to grab these files  
- SeisFlows also needs to know the path to the *bin/* directory so it can run SPECFEM executables  
- Finally, SeisFlows needs to know the path to your **model** files. The User is responsible for generating their mesh and model!  
- Additionally, SeisFlows maintains its own internal directory structure  

In [None]:
! tail -n 63 parameters.yaml

### 1d) Swapping Modules

- SeisFlows can easily 'swap' modules from one to another  
- Used to facilitate the transition from a 2D, local, development environment, to a 3D HPC run  
- Replaces parameter set for **one** module only, leaving the others the same 

In [None]:
! seisflows print modules

In [None]:
! seisflows par preprocess

In [None]:
! seisflows swap preprocess default

In [None]:
! head -316 parameters.yaml | tail -n 103

In [None]:
# Re-setting the preprocess module to None
! seisflows swap preprocess null

### 1e) Submit a Workflow

- Independent of your system, workflow etc., SeisFlows has only one entry point for running a workflow (`seisflows submit`)
- Under the hood, SeisFlows is doing what we manually did in the workshop, i.e., 
    - Generating working directories for each source
    - Checking acceptability of model parameters  
    - Setting the parameter file correctly for a forward simulation  
    - Running `xmeshfem2D` and `xspecfem2D` for each of the 10 sources  
 

In [None]:
! seisflows submit

The workflow will be **complete** after it runs `xspecfem2D` for `source 010`

### 1f) Understanding a SeisFlows Working Directory

- Similar to SPECFEM, SeisFlows outputs log files, and output files  
- Most of the heavy lifting is done in the *scratch/* directory  
- Any files that should be saved permanently (seismograms, updated models during inversion) are storred in the *output/* directory  
- Any important log information (previously-used parameter files, error messages) are stored in the *logs/* directory  
- SeisFlows has an internal checkpointing routine, which takes advantage of the *sflog.txt* **state** file  
- See https://seisflows.readthedocs.io/en/devel/working_directory.html for more details

In [None]:
# All of the SeisFlows workflow is contained here
! ls

In [None]:
# Model files are stored in the output/ directory
! ls output
! echo
! ls output/MODEL_INIT

# We can use SeisFlows command line tools to plot the initial model
! seisflows plot2d MODEL_INIT vs --save output/m_init_vs.png
Image("output/m_init_vs.png")

In [None]:
# Synthetic seismograms output by the solvers are stored here as well
! ls output/solver
! echo
! ls output/solver/001
! echo
! ls output/solver/001/syn

In [None]:
# We can use the RecSec to plot synthetics
! recsec --syn_path output/solver/001/syn/ --source specfem2d_workdir/DATA/SOURCE_001 --stations specfem2d_workdir/DATA/STATIONS --components Y --scale_by normalize --save output/record_section.png
Image("output/record_section.png")

### `scratch/` directory

- The active working directory of SeisFlows where all of the heavy lifting takes place  
- Each module in the SeisFlows package may have it’s own sub-directory where it stores temporary work data  
- Additionally, we have two eval*/ directories where objective function evaluation (eval_func) and gradient evaluation (eval_grad) files are stored  

In [None]:
! ls scratch

In [None]:
! ls scratch/solver

In [None]:
# Each solver directory is simply a SPECFEM workding directory controlled by SeisFlows
# The main solver is used for tasks which are not mandatory for all events (e.g., smoothing)  
! ls scratch/solver/001

-----------
## 2) Exercise: Run an Inversion w/ SeisFlows

- Okay, now that we have solved the forward problem, we can tackle the inverse problem
- We will take our current working directory and make adjustments to the required modules to run an inversion
- First we'll clean up our working directory prior to getting started

In [None]:
# Move to the SeisFlows working directory
%cd /home/scoped/work/intro_seisflows
! ls 

------------
In order to run our inversion, we will need a few components we did not have in the Forward problem, 
these tasks will help guide you into setting up your inversion. Much of the code you will need is available
in previous notebooks. 

### Task 1) Create 'Data'

#### Background
- To run an inversion, we need some kind of 'data' to compare to our synthetics, the data-synthetic differences (i.e., **misfit**) will guide the inversion.
- Often tomographers will run **synthetic inversions**, where are data consist of synthetic waveforms generated using a **target model**.
- In this example, we will take the data we just created in our forward simulations to use as our **target synthetics**.

#### Exercise Tasks
1) Identify `path_data` in the 'parameters.yaml' file, this is where SeisFlows expects waveform data  
   - You can open the file with the file manager, or use `seisflows par`
2) Create the required directory structure in `path_data`, which follows the format `{path_data}/{event_id}/`   
   - Each source requires its own sub-directory
   - Follow the source naming convention we covered earlier
3) Move or copy the synthetics generated by the forward problem we just ran into the directories you created in (2)  
   - Remember that synthetics are stored in: `scratch/solver/{event_id}/traces/syn/*`  
   - You can do this manually, with bash commands or with Python*)
4) Confirm that you have `ntask` sub-directories in `path_data`, each containing synthetic waveform data

In [None]:
# Space for Task 1

---------------
### Task 2) Generate a new 'Starting Model'

- Because our 'data' was generated using the checkerboard model shown above, we need a new 'starting model'
- If we do not change our starting model, the synthetics we generate will be the same as our **target synthetics**, resulting in 0 misfit
- Let's modify the model located in `specfem2d_workdir`, there are two approaches with (1) being easier than (2).  

#### Exercise Tasks

**Option 1 (Homogeneous Halfspace):**
1) Change the value of parameter `Model` in `specfem2d_workdir/DATA/Par_file` from `gll` -> `Default`
    - You can do this manually or use `seisflows sempar`
    - This will tell the internal mesher to use the parameter file definition of the model, which is a homogeneous halfspace
2) Rerun `xmeshfem2D` and `xspecfem2D` to generate the required Model files. You can find the syntax for running these commands in previous notebooks.
3) Reset `Model` parameter to `gll` for the inversion
   - We do this because the actual inversion uses this option to be able to update model parameters

**Option 2 (Checkerboard Perturbation):**
>Warning: This requires some Python skill
1) Change the value of parameter `Model` in `specfem2d_workdir/DATA/Par_file` from `gll` -> `legacy`
    - This will tell the internal mesher to read model values from the file `model_velocity.dat_input`
2) Find the file that defines the lgacy model values in `specfem2d_workdir/DATA` 
3) Modify this file in order to perturb the checkerboard model
    - The easiest thing to do is increase or decrease P and S-wave velocity structure by some percentage of their original value (5%?)
    - The column structure of this file is: `index, x-coordinate [m], y-coordinate [m], density, Vp [m/s], Vs [m/s]`
    - Probably best to use Python to read, write and modify the file (e.g., with NumPy `loadtxt` and `savetxt`)
5) Rerun `xmeshfem2D` and `xspecfem2D` to generate the required Model files. You can find these commands in previous notebooks.
6) Reset `Model` parameter to `gll` for the inversion
   - We do this because the actual inversion uses this option to be able to update model parameters


In [None]:
# Space for Task 2

----------------
### Task 3) Set up your SeisFlows Parameter File

- Now we need to modify our existing parameter file to switch our workflow from Forward simulations to Inversion
- Inversion workflows require additional modules for `preprocess` for data-synthetic comparisons, 
- They also require an `optimize` module which is in charge of model updates
- We will use the `seisflows swap` command which swaps in the set of parameters associated with a given module
- You can use the command `seisflows print modules` to check the available choices for each module

#### Exercise Tasks

1) `Swap` the `preprocess` module to option: `default`
    - SeisFlows currently has two preprocessing modules, 'Default' and 'Pyaflowa'
    - Both modules perform similar functionality, but Pyaflowa provides richer features such as windowing, improved data storage, and plotting
2) `Swap` the `optimize` module to option: `gradient`
    - The optimize module takes care of gradient regularization and model updates
    - Other optimization modules include L-BFGS and Nonlinear Conjugate Gradient (NLCG)
3) `Swap` the `workflow` module to option: `inversion`
    - The `inversion` submodule builds upon the forward simulation and adds in functionality for generating kernels and updating models
    - Other workflow modules include: Forward, Migration (for generating kernels), and NoiseInversion (for ambient noise adjoint tomography)
4) Change the location of `path_model_init` which points to your starting model.  
   - Note: in (2) we generated a starting model in `specfem2d_workdir/OUTPUT_FILES`
   - You might use the command `seisflows par` to change parameters from the command line, or do this manually

#### Optional Tasks
- Have a look through the remainder of the parameter file, are there parameters you think would be useful to change?
- You can run the Inversion as is, but advanced Users may play around with filtering (preprocess module) and smoothing (solver module) .

In [None]:
# Space for Task 3

--------------
### Task 4) Clean Up The Working Directory

- Run `seisflows clean` to delete all of the files from the previous Forward simulation, getting ready for our inversion.
- You can use the `-f/--force` option to skip over any 'are you sure about that?' prompts.

In [None]:
# Space for Task 4

-------------
### Task 5) Ready to Run? Check and See!

- When your data are ready, and your parameter file is setup, you can perform a sanity check 
- Run `seisflows check` to perform a number of internal checks that makes sure paths and parameters are set properly  
- If you receive any error messages from `seisflows check`, please fix them and re-run `seisflows check` to see if new errors pop up.

In [None]:
# Space for Task 5

### Task 6) Let's go!

If you think you're ready, run `seisflows submit` to start your inversion. 

In [None]:
# Space for Task 6