# Specfem2D workstation example

To demonstrate the inversion capabilities of SeisFlows3, we will run a __Specfem2D synthetic-synthetic example__ on a __local machine__ (Linux workstation running CentOS 7). Many of the setup steps here will likely be unique to our OS and workstation, but hopefully they may serve as templates for new Users wanting to explore SeisFlows3. 

The numerical solver we will use is: [SPECFEM2D](https://geodynamics.org/cig/software/specfem2d/). We'll also be working in our `seisflows3` [Conda](https://docs.conda.io/en/latest/) environment, see the installation documentation page for instructions on how to install and activate the required Conda environment.

--------------

The following Table of Contents outlines the steps we will take in this tutorial:

1. __[Setup SPECFEM2D](#1.-Setup-SPECFEM2D)__  
    a. [Download and compile codebase](#1a.-Download-and-compile-codebase*)  
    b. [Create a separate SPECFEM2D working directory](#1b.-Create-a-separate-SPECFEM2D-working-directory)  
    c. [Generate initial and target models](#1c.-Generate-initial-and-target-models)  

2. __[Initialize SeisFlows3 (SF3)](#2.-Initialize-SeisFlows3-(SF3))__  
    a. [SF3 working directory and parameter file](#2a.-SF3-working-directory-and-parameter-file)  
    b. [Initialize SF3 working state](#2b.-Initialize-SF3-working-state)  

3. __[Run SeisFlows3](#2.-Run-SeisFlows3)__  
    a. [Forward simulations](#3a.-Forward-simulations)  
    b. [Exploring the SF3 directory structure](#3b.-Exploring-the-SF3-directory-structure)    
    c. [Adjoint simulations](#3c.-Adjoint-simulations)  
    d. [Line search and model update](#3d.-Line-search-and-model-update)  

4. __[Conclusions](#4.-Conclusions)__  



## 1. Setup SPECFEM2D 
### 1a. Download and compile codebase*

>__\*__ If you have already downloaded and compiled SPECFEM2D, you can skip most of this subsection (1a). However you will need to edit the first two paths in the following cell (WORKDIR and SPECFEM2D_ORIGINAL), and execute the path structure defined in the cell.

First we'll download and compile SPECFEM2D to generate the binaries necessary to run our simulations. We will then populate a new SPECFEM2D working directory that will be used by SeisFlows3. We'll use to Python OS module to do our filesystem processes just to keep everything in Python, but this can easily be accomplished in bash.

In [14]:
import os
import glob
import shutil
import numpy as np

In [15]:
# USER MUST EDIT THE FOLLOWING PATHS:
# WORKDIR: points to your own working directory
# SPECFEM2D: points to an existing specfem2D repository if available (if not set as '')
WORKDIR = "/home/bchow/Work/work/sf3_specfem2d_example" 
SPECFEM2D = "/home/bchow/REPOSITORIES/specfem2d"

# ======================================================================================================

# Distribute the necessary file structure of the SPECFEM2D repository that we will downloaded/reference
SPECFEM2D_ORIGINAL = os.path.join(WORKDIR, "specfem2d")
SPECFEM2D_BIN_ORIGINAL = os.path.join(SPECFEM2D_ORIGINAL, "bin")
SPECFEM2D_DATA_ORIGINAL = os.path.join(SPECFEM2D_ORIGINAL, "DATA")
TAPE_2007_EXAMPLE = os.path.join(SPECFEM2D_ORIGINAL, "EXAMPLES", "Tape2007")

# The SPECFEM2D working directory that we will create separate from the downloaded repo
SPECFEM2D_WORKDIR = os.path.join(WORKDIR, "specfem2d_workdir")
SPECFEM2D_BIN = os.path.join(SPECFEM2D_WORKDIR, "bin")
SPECFEM2D_DATA = os.path.join(SPECFEM2D_WORKDIR, "DATA")
SPECFEM2D_OUTPUT = os.path.join(SPECFEM2D_WORKDIR, "OUTPUT_FILES")

# Pre-defined locations of velocity models we will generate using the solver
SPECFEM2D_MODEL_INIT = os.path.join(SPECFEM2D_WORKDIR, "OUTPUT_FILES_INIT")
SPECFEM2D_MODEL_TRUE = os.path.join(SPECFEM2D_WORKDIR, "OUTPUT_FILES_TRUE")

In [16]:
# Download SPECFEM2D from GitHub, devel branch for latest codebase OR symlink from existing repo
os.chdir(WORKDIR)

if os.path.exists("specfem2d"):
    print("SPECFEM2D repository already found, you may skip this subsection")
    pass
elif os.path.exists(SPECFEM2D):
    print("Existing SPECMFE2D respository found, symlinking to working directory")
    os.symlink(SPECFEM2D, "./specfem2d")
else:
    print("Cloning respository from GitHub")
    ! git clone --recursive --branch devel https://github.com/geodynamics/specfem2d.git

SPECFEM2D repository already found, you may skip this subsection


In [4]:
# Compile SPECFEM2D to generate the Makefile
os.chdir(SPECFEM2D_ORIGINAL)
if not os.path.exists("./config.log"):
    os.system("./configure")

In [5]:
# Run make to generate SPECFEM2D binaries
if not os.path.exists("bin"):
    os.system("make all")

In [6]:
# Check out the binary files that have been created
os.chdir(SPECFEM2D_ORIGINAL)
! pwd
! ls bin/

/home/bchow/REPOSITORIES/specfem2d
xadj_seismogram		      xconvolve_source_timefunction  xspecfem2D
xcheck_quality_external_mesh  xmeshfem2D		     xsum_kernels
xcombine_sem		      xsmooth_sem


### 1b. Create a separate SPECFEM2D working directory

Next we'll create a new SPECFEM2D working directory, separate from the original repository. The intent here is to isolate the original SPECFEM2D repository from our working state, to protect it from things like accidental file deletions or manipulations. This is not a mandatory step for using SeisFlows3, but it helps keep file structure clean in the long run, and is the SeisFlows3 dev team's preferred method of using SPECFEM. 

>__NOTE:__ All SPECFEM2D/3D/3D_GLOBE need to run successfully are the __bin/__, __DATA/__, and __OUTPUT_FILES/__ directories. Everything else in the repository is not mandatory for running binaries.

In this tutorial we will be using the [Tape2007 example problem](https://github.com/geodynamics/specfem2d/tree/devel/EXAMPLES/Tape2007) to define our __DATA/__ directory (last tested 3/9/22, cf893667).

In [30]:
# Incase we've run this docs page before, delete the working directory before remaking
if os.path.exists(SPECFEM2D_WORKDIR):
    shutil.rmtree(SPECFEM2D_WORKDIR)

os.mkdir(SPECFEM2D_WORKDIR)
os.chdir(SPECFEM2D_WORKDIR)

# Copy the binary files incase we update the source code. These can also be symlinked.
shutil.copytree(SPECFEM2D_BIN_ORIGINAL, "bin")

# Copy the DATA/ directory because we will be making edits here frequently and it's useful to
# retain the original files for reference. We will be running one of the example problems: Tape2007
shutil.copytree(os.path.join(TAPE_2007_EXAMPLE, "DATA"), "DATA")

! pwd
! ls

/home/bchow/Work/work/sf3_specfem2d_example/specfem2d_workdir
bin  DATA


In [10]:
# Run the Tape2007 example to make sure SPECFEM2D is working as expected
os.chdir(TAPE_2007_EXAMPLE)
! ./run_this_example.sh > output_log.txt

assert(os.path.exists("OUTPUT_FILES/AA.S000000.BXY.semd")), \
    "Example did not run, the remainder the doc is likely not to work"

! tail output_log.txt

 -------------------------------------------------------------------------------
 -------------------------------------------------------------------------------
 D a t e : 10 - 03 - 2022                                 T i m e  : 14:36:50
 -------------------------------------------------------------------------------
 -------------------------------------------------------------------------------

see results in directory: OUTPUT_FILES/

done
Thu Mar 10 14:36:50 AKST 2022


------------------------------------
Now we need to manually set up our SPECFEM2D working directory. As mentioned in the previous cell, the only required elements of this working directory are the following (these files will form the basis for how SeisFlows3 operates within the SPECFEM2D framework):

1. __bin/__ directory containing SPECFEM2D binaries
2. __DATA/__ directory containing SOURCE and STATION files, as well as a SPECFEM2D Par_file
3. __OUTPUT_FILES/proc??????_*.bin__ files which define the starting (and target) models
 

>__NOTE:__ This file structure is the same for all versions of SPECFEM (2D/3D/3D_GLOBE)

In [31]:
# First we will set the correct SOURCE and STATION files.
# This is the same task as shown in ./run_this_example.sh
os.chdir(SPECFEM2D_DATA)

# Symlink source 001 as our main source
if os.path.exists("SOURCE"):
    os.remove("SOURCE")
os.symlink("SOURCE_001", "SOURCE")

# Copy the correct Par_file so that edits do not affect the original file
if os.path.exists("Par_file"):
    os.remove("Par_file")
shutil.copy("Par_file_Tape2007_onerec", "Par_file")

! ls

interfaces_Tape2007.dat		     SOURCE_003  SOURCE_012  SOURCE_021
model_velocity.dat_checker	     SOURCE_004  SOURCE_013  SOURCE_022
Par_file			     SOURCE_005  SOURCE_014  SOURCE_023
Par_file_Tape2007_132rec_checker     SOURCE_006  SOURCE_015  SOURCE_024
Par_file_Tape2007_onerec	     SOURCE_007  SOURCE_016  SOURCE_025
proc000000_model_velocity.dat_input  SOURCE_008  SOURCE_017  STATIONS
SOURCE				     SOURCE_009  SOURCE_018  STATIONS_checker
SOURCE_001			     SOURCE_010  SOURCE_019
SOURCE_002			     SOURCE_011  SOURCE_020


### 1c. Generate initial and target models

Since we're doing a synthetic-synthetic inversion, we need to manually set up the velocity models with which we generate our synthetic waveforms. The naming conventions for these models are:

1. __MODEL_INIT:__ The initial or starting model. Used to generate the actual synthetic seismograms. This is considered M00.
2. __MODEL_TRUE:__ The target or true model. Used to generate 'data' (also synthetic). This is the reference model that our inversion is trying to resolve.

The starting model is defined as a homogeneous halfspace uin the Tape2007 example problem. We will need to run both `xmeshfem2D` and `xspecfem2D` to generate the required velocity model database files. We will generate our target model by slightly perturbing the parameters of the initial model.

>__NOTE:__ We can use the SeisFlows3 command line option `seisflows sempar` to directly edit the SPECFEM2D Par_file in the command line. This will work for the SPECFEM3D Par_file as well.

In [32]:
os.chdir(SPECFEM2D_DATA)

# Ensure that SPECFEM2D outputs the velocity model in the expected binary format
! seisflows sempar setup_with_binary_database 1  # allow creation of .bin files
! seisflows sempar save_model binary  # output model in .bin database format
! seisflows sempar save_ascii_kernels .false.  # output kernels in .bin format, not ASCII


	setup_with_binary_database = 0 -> 1


	SAVE_MODEL = default -> binary


	save_ASCII_kernels = .true. -> .false.



In [33]:
# SPECFEM requires that we create the OUTPUT_FILES directory before running
os.chdir(SPECFEM2D_WORKDIR)

if os.path.exists(SPECFEM2D_OUTPUT):
    shutil.rmtree(SPECFEM2D_OUTPUT)
    
os.mkdir(SPECFEM2D_OUTPUT)

! ls

bin  DATA  OUTPUT_FILES


In [34]:
# GENERATE MODEL_INIT
os.chdir(SPECFEM2D_WORKDIR)

# Run the mesher and solver to generate our initial model
! ./bin/xmeshfem2D > OUTPUT_FILES/mesher_log.txt
! ./bin/xspecfem2D > OUTPUT_FILES/solver_log.txt

# Move the model files (*.bin) into the OUTPUT_FILES directory, where SeisFlows3 expects them
! mv DATA/*bin OUTPUT_FILES

# Make sure we don't overwrite this initial model when creating our target model in the next step
! mv OUTPUT_FILES OUTPUT_FILES_INIT

! head OUTPUT_FILES_INIT/solver_log.txt
! tail OUTPUT_FILES_INIT/solver_log.txt


 **********************************************
 **** Specfem 2-D Solver - serial version  ****
 **********************************************

 Running Git version of the code corresponding to commit cf89366717d9435985ba852ef1d41a10cee97884
 dating From Date:   Mon Nov 29 23:20:51 2021 -0800


 NDIM =            2
 -------------------------------------------------------------------------------
 Program SPECFEM2D: 
 -------------------------------------------------------------------------------
 -------------------------------------------------------------------------------
 Tape-Liu-Tromp (GJI 2007)
 -------------------------------------------------------------------------------
 -------------------------------------------------------------------------------
 D a t e : 10 - 03 - 2022                                 T i m e  : 14:45:55
 -------------------------------------------------------------------------------
 --------------------------------------------------------------------

-----------------------------

Now we want to perturb the initial model to create our target model (__MODEL_TRUE__). The seisflows command line subargument `seisflows sempar velocity_model` will let us view and edit the velocity model. You can also do this manually by editing the Par_file directly. 

In [35]:
# GENERATE MODEL_TRUE
os.chdir(SPECFEM2D_DATA)

# Edit the Par_file by increasing velocities by ~10% 
! seisflows sempar velocity_model '1 1 2600.d0 5900.d0 3550.0d0 0 0 10.d0 10.d0 0 0 0 0 0 0'


1 1 2600.d0 5800.d0 3500.0d0 0 0 10.d0 10.d0 0 0 0 0 0 0

->

1 1 2600.d0 5900.d0 3550.0d0 0 0 10.d0 10.d0 0 0 0 0 0 0


In [36]:
# Re-run the mesher and solver to generate our target velocity model
os.chdir(SPECFEM2D_WORKDIR)

# Make sure the ./OUTPUT_FILES directory exists since we moved the old one
if os.path.exists(SPECFEM2D_OUTPUT):
    shutil.rmtree(SPECFEM2D_OUTPUT)
os.mkdir(SPECFEM2D_OUTPUT)

# Run the binaries to generate MODEL_TRUE
! ./bin/xmeshfem2D > OUTPUT_FILES/mesher_log.txt
! ./bin/xspecfem2D > OUTPUT_FILES/solver_log.txt

# Move all the relevant files into OUTPUT_FILES 
! mv ./DATA/*bin OUTPUT_FILES
! mv OUTPUT_FILES OUTPUT_FILES_TRUE

! head OUTPUT_FILES_INIT/solver_log.txt
! tail OUTPUT_FILES_INIT/solver_log.txt


 **********************************************
 **** Specfem 2-D Solver - serial version  ****
 **********************************************

 Running Git version of the code corresponding to commit cf89366717d9435985ba852ef1d41a10cee97884
 dating From Date:   Mon Nov 29 23:20:51 2021 -0800


 NDIM =            2
 -------------------------------------------------------------------------------
 Program SPECFEM2D: 
 -------------------------------------------------------------------------------
 -------------------------------------------------------------------------------
 Tape-Liu-Tromp (GJI 2007)
 -------------------------------------------------------------------------------
 -------------------------------------------------------------------------------
 D a t e : 10 - 03 - 2022                                 T i m e  : 14:45:55
 -------------------------------------------------------------------------------
 --------------------------------------------------------------------

In [37]:
# Great, we have all the necessary SPECFEM files to run our SeisFlows3 inversion!
! ls

bin  DATA  OUTPUT_FILES_INIT  OUTPUT_FILES_TRUE


## 2. Initialize SeisFlows3 (SF3)
In this Section we will look at a SeisFlows3 working directory, parameter file, and working state.

### 2a. SF3 working directory and parameter file

As with SPECFEM, SeisFlows3 requires a parameter file (__parameters.yaml__) that controls how an automated workflow will proceed. Because SeisFlows3 is modular, there are a large number of potential parameters which may be present in SF3 parameter file, as each sub-module may have its own set of unique parameters.

Different to SPECFEM's method of listing all available parameters and leaving it up the User to determine which ones are relevant to them, SeisFlows3 dynamically builds its parameter file based on User inputs. In this subsection we will use the built-in SeisFlows3 command line tools to generate and populate the parameter file. 

In the previous section we saw the `sempar` command in action. We can use the `-h` or help flag to list all available SiesFlows3 command line commands.

In [38]:
! seisflows -h

usage: seisflows [-h] [-w [WORKDIR]] [-p [PARAMETER_FILE]]
                 [--path_file [PATH_FILE]]
                 {setup,configure,init,submit,resume,restart,clean,par,sempar,check,print,convert,reset,inspect,debug,edit}
                 ...


                     SeisFlows3: Waveform Inversion Package                     


optional arguments:
  -h, --help            show this help message and exit
  -w [WORKDIR], --workdir [WORKDIR]
                        The SeisFlows working directory, default: cwd
  -p [PARAMETER_FILE], --parameter_file [PARAMETER_FILE]
                        Parameters file, default: 'parameters.yaml'
  --path_file [PATH_FILE]
                        Legacy path file, default: 'paths.py'

command:
  Available SeisFlows arguments and their intended usages

    setup               Setup working directory from scratch
    configure           Fill parameter file with defaults
    init                Initiate working environment
    subm

In [41]:
# The command 'setup' creates the 'parameters.yaml' file that controls all of SeisFlows3
os.chdir(WORKDIR)
! seisflows setup
! ls

parameters.yaml  specfem2d  specfem2d_workdir


In [25]:
# Let's have a look at this file, which has not yet been populated
! cat parameters.yaml

# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#
#                 Seisflows YAML Parameter File and Path Input
#
#  For NoneType, set variables to `None` or `null`. For infinity, set to `inf`
#
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#
# These modules correspond to the structure of the source code, and determine
# SeisFlows' behavior at runtime. Check the source code directory for available 
# module names. Each module will require its own set of sub parameters. 
#
# To fill this parameter file with docstrings and default values, run:
#
# > seisflows configure
#
#                                    MODULES
#                                    -------
#
# WORKFLOW:    The method for running SeisFlows. Equivalent to main()
# SOLVER:      External numerical solver to use for waveform simulations.
# SYSTEM:      Computer architecture of the system being used to run SeisFlows
# OPTIMIZE:    Opt

In [26]:
# We can use the `seisflows print modules` command to list out the available options 
! seisflows print modules


SYSTEM
 * seisflows3
	base
	lsf_lg
	serial
	slurm_lg
 * seisflows3-super
	chinook_lg
	maui

PREPROCESS
 * seisflows3
	base
	default
	pyatoa
 * seisflows3-super
	pyatoa_nz

SOLVER
 * seisflows3
	base
	specfem2d
	specfem3d
	specfem3d_globe
 * seisflows3-super
	specfem3d_maui

POSTPROCESS
 * seisflows3
	base
 * seisflows3-super

OPTIMIZE
 * seisflows3
	LBFGS
	NLCG
	base
	steepest_descent
 * seisflows3-super

WORKFLOW
 * seisflows3
	base
	inversion
	migration
 * seisflows3-super
	thrifty_inversion
	thrifty_maui




In [42]:
# For this example, we can use most of the default modules, however we need to 
# change the SOLVER module to let SeisFlows3 know we're using SPECFEM2D (as opposed to 3D)
! seisflows par solver specfem2d
! cat parameters.yaml


	SOLVER: specfem3d -> specfem2d

# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#
#                 Seisflows YAML Parameter File and Path Input
#
#  For NoneType, set variables to `None` or `null`. For infinity, set to `inf`
#
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#
# These modules correspond to the structure of the source code, and determine
# SeisFlows' behavior at runtime. Check the source code directory for available 
# module names. Each module will require its own set of sub parameters. 
#
# To fill this parameter file with docstrings and default values, run:
#
# > seisflows configure
#
#                                    MODULES
#                                    -------
#
# WORKFLOW:    The method for running SeisFlows. Equivalent to main()
# SOLVER:      External numerical solver to use for waveform simulations.
# SYSTEM:      Computer architecture of the system being used to run SeisFlows
# OPTI

-------------------------
The `seisflows configure` command populates the parameter file based on the chosen modules. SeisFlows3 will attempt to fill in all parameters with default values when possible, but values that the User __MUST__ set will be denoted by the value:

>__!!! REQUIRED PARAMETER !!!__

SeisFlows3 will not work until all of these required parameters are set by the User. Docstrings above each module show descriptions and available options for each of these parameters. In the follownig cell we will use the `seisflows par` command to edit the parameters.yaml file directly, replacing each of the required parameters with a chosen value. Comments next to each evaluation describe the choice for each.

In [43]:
! seisflows configure
! cat parameters.yaml

# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#
#                 Seisflows YAML Parameter File and Path Input
#
#  For NoneType, set variables to `None` or `null`. For infinity, set to `inf`
#
# !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
#
# These modules correspond to the structure of the source code, and determine
# SeisFlows' behavior at runtime. Check the source code directory for available 
# module names. Each module will require its own set of sub parameters. 
#
# To fill this parameter file with docstrings and default values, run:
#
# > seisflows configure
#
#                                    MODULES
#                                    -------
#
# WORKFLOW:    The method for running SeisFlows. Equivalent to main()
# SOLVER:      External numerical solver to use for waveform simulations.
# SYSTEM:      Computer architecture of the system being used to run SeisFlows
# OPTIMIZE:    Opt

In [44]:
# EDIT THE SEISFLOWS3 PARAMETER FILE
! seisflows par walltime 10  # master job run time in minutes
! seisflows par tasktime 1  # individual job run time in minutes
! seisflows par materials elastic  # how the velocity model is parameterized
! seisflows par density constant  # update density or keep constant
! seisflows par nt 5000  # set by SPECFEM2D Par_file
! seisflows par dt .06  # set by SPECFEM2D Par_file
! seisflows par f0 0.084  # set by SOURCE file
! seisflows par format ascii  # how to output synthetic seismograms
! seisflows par begin 1  # first iteration
! seisflows par end 1  # final iteration -- we will only run 1
! seisflows par case synthetic  # synthetic-synthetic means we need both INIT and TRUE models

# Use Python syntax here to access path constants
os.system(f"seisflows par specfem_bin {SPECFEM2D_BIN}")  # set path to SPECFEM2D binaries
os.system(f"seisflows par specfem_data {SPECFEM2D_DATA}")  # set path to SEPCFEM2D DATA/
os.system(f"seisflows par model_init {SPECFEM2D_MODEL_INIT}")  # set path to INIT model
os.system(f"seisflows par model_true {SPECFEM2D_MODEL_TRUE}")  # set path to TRUE model


	WALLTIME: !!! REQUIRED PARAMETER !!! -> 10


	TASKTIME: !!! REQUIRED PARAMETER !!! -> 1


	MATERIALS: !!! REQUIRED PARAMETER !!! -> elastic


	DENSITY: !!! REQUIRED PARAMETER !!! -> constant


	NT: !!! REQUIRED PARAMETER !!! -> 5000


	DT: !!! REQUIRED PARAMETER !!! -> .06


	F0: !!! REQUIRED PARAMETER !!! -> 0.084


	FORMAT: !!! REQUIRED PARAMETER !!! -> ascii


	BEGIN: !!! REQUIRED PARAMETER !!! -> 1


	END: !!! REQUIRED PARAMETER !!! -> 1


	CASE: !!! REQUIRED PARAMETER !!! -> synthetic



0

------------------------------
One last thing, we will need to edit the SPECFEM2D Par_file parameter `MODEL` such that `xmeshfem2d` reads our pre-built velocity models (\*.bin files) rather than the meshing parameters defined in the Par_file.

In [45]:
os.chdir(SPECFEM2D_DATA)
! seisflows sempar model gll


	MODEL = default -> gll



### 2b. Initialize SF3 working state

The SeisFlows3 command `seisflows init` will generate the a SeisFlows3 working state without submitting any jobs to the system. This is useful for testing to see if the user has set an acceptable parameter file, and if SeisFlows3 is working as expected. 

The result of running `seisflows init` is a collection of pickle (\*.p) and JSON files which define the active Python environment. SeisFlows3 relies directly on these files to determine where it is in a workflow. Throughout an active workflow, SeisFlows3 will checkpoint itself to these pickle and JSON files such that if a workflow finishes or crashes, the User can resume a workflow from the last checkpointed state rather than needing to restart the workflow.

>__DEBUG MODE:__ After running `seisflows init` you can explore the SeisFlows3 working state in an interactive iPython environment by running `seisflows debug`. This will open up an iPython environment in which the active working state is loaded and accessible The debug mode is invaluable for exploring the SeisFlows3 working state, debugging errors, and performing manual manipulations to an otherwise automated tool. You can try for yourself by running debug mode and typing 'preprocess' to access the active preprocess module.

In [47]:
os.chdir(WORKDIR)
! seisflows init
! ls output

seisflows_optimize.p	   seisflows_postprocess.p  seisflows_system.p
seisflows_parameters.json  seisflows_preprocess.p   seisflows_workflow.p
seisflows_paths.json	   seisflows_solver.p


In [7]:
# All of the parameters defined in parameters.yaml are saved in this 
# internally-used JSON file
! head output/seisflows_parameters.json

{
    "BACKPROJECT": null,
    "BEGIN": 1,
    "CASE": "synthetic",
    "COMPONENTS": "ZNE",
    "DENSITY": "constant",
    "DT": 0.06,
    "END": 1,
    "F0": 5.196,
    "FILTER": null,


In [8]:
# Similarly, paths that SeisFlows3 uses to navigate the system are stored
# in the seisflows_paths.json file
! head output/seisflows_paths.json

{
    "DATA": null,
    "FUNC": "/home/bchow/Work/work/sf3_specfem2d_example/scratch/evalfunc",
    "GRAD": "/home/bchow/Work/work/sf3_specfem2d_example/scratch/evalgrad",
    "HESS": "/home/bchow/Work/work/sf3_specfem2d_example/scratch/evalhess",
    "LOCAL": null,
    "LOG": "/home/bchow/Work/work/sf3_specfem2d_example/output_seisflows3.txt",
    "MASK": null,
    "MODEL_INIT": "/home/bchow/Work/work/sf3_specfem2d_example/specfem2d_workdir/OUTPUT_FILES_INIT",
    "MODEL_TRUE": "/home/bchow/Work/work/sf3_specfem2d_example/specfem2d_workdir/OUTPUT_FILES_TRUE",


## 3. Run SeisFlows3

In this Section we will run SeisFlows3 to generate synthetic seismograms, kernels, a gradient, and an updated velocity model.

### 3a. Forward simulations

SeisFlows3 is an automated workflow tool, such that once we run `seisflows submit` we should not need to intervene in the workflow. However the package does allow the User flexibility in how they want the workflow to behave.

For example, we can run our workflow in stages by taking advantage of the `stop_after` and `resume_from` parameters. As their names suggest, these parameters allow us to stop and resume the workflow at certain stages (i.e., functions in workflow.main()). 

The available arguments for `stop_after` and `resume_from` are discovered by running the command: `seisflows print flow`, which tells us what functions will be run from main(). 

In [49]:
! seisflows print flow


	FLOW ARGUMENTS
	<class 'seisflows3.workflow.inversion.Inversion'>

	1: initialize
	2: evaluate_gradient
	3: write_gradient
	4: compute_direction
	5: line_search
	6: finalize
	7: clean



-----------------------------
In an inversion (the workflow we have selected) the flow arguments are described as:

0. __setup:__ Not technically listed in the flow arguments, runs setup() for all SeisFlows3 modules. If running a synthetic-synthetic workflow, solver.setup() will generate "data" by running the forward solver using MODEL_TRUE
1. __initialize:__  
    a. Call numerical solver to run forward simulations using MODEL_INIT, generating synthetics  
    b. Evaluate the objective function by performing waveform comparisons  
    c. Prepare `evaluate gradient` step by generating adjoint sources and auxiliary files
2. __evaluate_gradient:__ Call numerical solver to run adjoint simulation, generating kernels
3. __write_gradient:__ Combine all event kernels into a misfit kernel. Optionally smooth and mask the misfit kernel
4. __compute_direction:__ Call on the optimization library to scale the misfit kernel into the gradient and compute a search direction
5. __line_search:__ Perform a line search by algorithmically scaling the gradient and evaluating the misfit function (forward simulations and misfit quantification) until misfit is acceptably reduced
6. __finalize:__ Run any finalization steps such as saving traces, kernels, gradients and models to disk, setting up SeisFlows3 for any subsequent iterations.
7. __clean:__ Clean the scratch/ directory in preparation for subsequent i

Let's set the `stop_after` argument to __initialize__, this will halt the workflow after the intialization step. We'll also set the `verbose` parameter to 'False', to keep the logging format relatively simple. We will explore the `verbose`==True option in a later cell.

In [48]:
! seisflows par stop_after initialize
! seisflows par verbose False


	STOP_AFTER:  -> initialize


	VERBOSE: True -> False



-----------------------
Now let's run SeisFlows3. There are a few ways to do this: `submit`, `resume`, and `restart`

1. Since we already ran `seisflows init`, the `seisflows submit` option will not work, as SeisFlows3 considers this an active working state and `submit` can only be run on uninitialized working states.
2. To run a workflow in an active working state `resume` will load the current working state from the output/ directory and submit a workflow given the current parameter file.
3. The `restart` command is simply a convenience function that runs `clean` (to remove an active working state) and `submit` (to submit a fresh working state). 

Since we haven't done anything in this working state, we will go with a modified version of Option 3 by running `clean` and then `submit`. We'll use the `-f` flag (stands for __'force'__) to skip over the standard input prompt that asks the User if they are sure they want to clean and submit.

But first we'll try to run `seisflows submit` to show why Option 1 will not work.

In [11]:
! seisflows submit -f




To delete data and start a new workflow type:
  seisflows restart

To resume existing workflow type:
  seisflows resume




----------------------------
__Okay, let's go!__ In the following cell we will run the SeisFlows3 Inversion workflow. In the output cell we will see the logging statements outputted by SeisFlows3, both to stdout and to the output log file (defaults to ./output_seisflows3.txt) which details the progress of our inversion

In [49]:
! seisflows clean -f
! seisflows submit -f

2022-03-10 14:49:42 | check paths/pars module: preprocess.Default
2022-03-10 14:49:49 | 
                          STARTING INVERSION WORKFLOW                           

2022-03-10 14:49:49 | 
--------------------------------------------------------------------------------
                            PERFORMING MODULE SETUP                             
--------------------------------------------------------------------------------

2022-03-10 14:49:49 | setting up module: preprocess.Default
2022-03-10 14:49:49 | misfit function is: 'waveform'
Appending to this files without deleting them may lead to unintended consequences
2022-03-10 14:49:52 | model parameters (m_new i01s00):
2022-03-10 14:49:52 | 5800.00 <= vp <= 5800.00
2022-03-10 14:49:52 | 3500.00 <= vs <= 3500.00
2022-03-10 14:49:52 | 0.21 <= pr <= 0.21
2022-03-10 14:49:55 | checkpointing working environment to disk
2022-03-10 14:49:57 | intializing solver directories
2022-03-10 14:50:13 | intializing empty adjoint traces
2022-

### 3b. Exploring the SF3 directory structure
This is a good point to have a look around at the SeisFlows3 directory structure, which has been created during the module setup stage. In this subsection we will look at files and directories within an active SeisFlows3 working directory and explain what each file is and what its purpose is within this workflow. 

In [18]:
os.chdir(WORKDIR)
! ls -l

total 20
drwxrwxr-x. 1 bchow bchow    54 Mar  9 15:26 logs
drwxrwxr-x. 1 bchow bchow   372 Mar  9 15:26 output
-rw-rw-r--. 1 bchow bchow  3048 Mar  9 15:26 output_seisflows3.txt
-rw-rw-r--. 1 bchow bchow 10992 Mar  9 15:25 parameters.yaml
drwxrwxr-x. 1 bchow bchow    56 Mar  9 15:26 scratch
lrwxrwxrwx. 1 bchow bchow    35 Mar  2 12:12 specfem2d -> /home/bchow/REPOSITORIES/specfem2d/
drwxrwxr-x. 1 bchow bchow    82 Mar  9 11:27 specfem2d_workdir
drwxrwxr-x. 1 bchow bchow    44 Mar  9 15:20 stats



Directory structure:
- __logs/:__ Where any auxiliary logs are stored, e.g., submitted parameter files, output logs from individual cores (not applicable in this tutorial) 
    - __previous/:__ Old output logs (output_seisflows3.txt) so that they are not overwritten by other workflows
- __output/:__ The current active state of SeisFlows3, containing pickle and JSON files. Also storage of any files that are to be permanently saved (e.g., models, kernels, traces).
- __scratch/:__ Active working directory of SeisFlows3, more detailed information in the following slide.
- __stats/:__ Text files describing the optimization statistics of the current workflow

In [19]:
! ls logs

parameters_1-1.yaml  previous


In [20]:
! head output_seisflows3.txt

2022-03-09 15:26:00 | check paths/pars module: preprocess.Default
2022-03-09 15:26:03 | 
                          STARTING INVERSION WORKFLOW                           

2022-03-09 15:26:03 | 
--------------------------------------------------------------------------------
                            PERFORMING MODULE SETUP                             
--------------------------------------------------------------------------------


In [21]:
! ls stats

output.optim  step_count


In [15]:
! cat stats/output.optim

      ITER     STEPLEN      MISFIT  


-------------------------------
#### The SeisFlows3 scratch/ directory

This directory defines the SeisFlows3 working directory. It contains sub-directories defining individual processes and modules within a SeisFlows3 workflow.

In [25]:
! ls scratch

evalgrad  optimize  solver  system


__scratch/evalgrad/:__ Disk storage for files related to gradient evaluation

In [27]:
! ls scratch/evalgrad

model  residuals


In [29]:
# The current model used for gradient evaluation
! ls scratch/evalgrad/model

proc000000_vp.bin  proc000000_vs.bin


In [31]:
# Per-event text files which define the residual or misfit 
! ls scratch/evalgrad/residuals

001


In [33]:
# Each line in the residual files relate to a given source-receiver pair
! cat scratch/evalgrad/residuals/001

2.413801941841247842e-02
2.413801941841247842e-02
2.413801941841247842e-02


----------------
__scratch/optimize/:__ Values relating to the optimization algorithm. Variable names are described in the [base optimization module](https://github.com/bch0w/seisflows3/blob/master/seisflows3/optimize/base.py) and are copied here for reference:

__Optimization Variable Names__:  
- m_new: current model  
- m_old: previous model  
- m_try: line search model  
- f_new: current objective function value  
- f_old: previous objective function value  
- f_try: line search function value  
- g_new: current gradient direction  
- g_old: previous gradient direction  
- p_new: current search direction  
- p_old: previous search direction  

In [34]:
! ls scratch/optimize

f_new  LBFGS  m_new


In [35]:
! head scratch/optimize/f_new

1.747932e-03


In [42]:
# Internally, SeisFlows3 stores models as vectors in a .npy format (NumPy arrays)
m_new = np.load("scratch/optimize/m_new")
print(m_new[:10])

[5800. 5800. 5800. 5800. 5800. 5800. 5800. 5800. 5800. 5800.]


----------------------
__scratch/system__: Storage of any system related files. Currently empty but any system errors or system-wide log messages will be sent here. 

--------------------------
__scratch/solver__: Solver related files. Each event has its own directory which is a copy of the SPECFEM working directory. SeisFlows3 runs the numerical solver by generating embarassingly-parallel individual working directories for each event/process.

>__NOTE:__ The __mainsolver/__ directory is a symlink pointing to the first (alphabetical) source. This is not a necessary symlink (i.e., you can delete it and nothing will break in SeisFlows3), but it conventiently provides an easy access point for the main solver, which is typically used for non-parallel processes such as kernel summation (xcombine_sem) and gradient smoothing (xsmooth_sem), since source names can vary wildly.

In [45]:
! ls scratch/solver

001  mainsolver


In [47]:
# We can see that each solver sub-directory is simply a SPECFEM2D working directory
! ls scratch/solver/001

bin  DATA  mesher.log  OUTPUT_FILES  solver.log  traces


In [49]:
# The traces/ directory contains all the waveforms that have been generated during the SeisFlows3 inversion
# these include the observed waveforms (data, or obs/), the synthetic waveforms (syn) and the adjoing sources (adj)
! ls scratch/solver/001/traces

adj  obs  syn


In [50]:
# We can take a look at the adjoint sources which were created by the preprocessing module
! ls scratch/solver/001/traces/adj

AA.S0001.BXY.adj


In [51]:
# The adjoint source is created in the same format as the synthetics (two-column ASCII) 
! head scratch/solver/001/traces/adj/AA.S0001.BXY.adj

  -48.0000000         0.0000000
  -47.9400000         0.0000000
  -47.8800000         0.0000000
  -47.8200000         0.0000000
  -47.7600000         0.0000000
  -47.7000000         0.0000000
  -47.6400000         0.0000000
  -47.5800000         0.0000000
  -47.5200000         0.0000000
  -47.4600000         0.0000000


In [53]:
# We can also see that we have generated a STATIONS_ADJOINT file, which is required for 
# running the adjoint simulations (i.e., evaluate the gradient)
! head scratch/solver/001/DATA/STATIONS_ADJOINT

S0001    AA       180081.4100000       388768.7100000       0.0         0.0


### 3c. Adjoint simulations

Now that we have all the required files for running an adjoint simulation (\*.adj waveforms and STATIONS_ADJOINT file), we can continue with the SeisFlows3 Inversion workflow. No need to edit the Par_file or anything like that, SeisFlows3 will take care of that under the hood. We simply need to tell the workflow (via the parameters.yaml file) to `resume_from` the correct function. We can have a look at these functions again:

In [54]:
! seisflows print flow


	FLOW ARGUMENTS
	<class 'seisflows3.workflow.inversion.Inversion'>

	1: initialize
	2: evaluate_gradient
	3: write_gradient
	4: compute_direction
	5: line_search
	6: finalize
	7: clean



In [50]:
# We'll stop just before the line search so that we can take a look at the files 
# generated during the middle tasks
! seisflows par resume_from evaluate_gradient
! seisflows par stop_after compute_direction


	RESUME_FROM:  -> evaluate_gradient


	STOP_AFTER: initialize -> compute_direction



In [51]:
# We can use the `seisflows resume` command to continue an active workflow
# again we use the '-f' flag to skip past the user-input stage.
! seisflows resume -f

2022-03-10 14:53:23 | 
             RESUME ITERATION 1 FROM FUNCTION: 'evaluate_gradient'              

2022-03-10 14:53:23 | 
--------------------------------------------------------------------------------
                                ITERATION 1 / 1                                 
--------------------------------------------------------------------------------

2022-03-10 14:53:23 | 
--------------------------------------------------------------------------------
                              EVALUATING GRADIENT                               
--------------------------------------------------------------------------------

2022-03-10 14:53:23 | evaluating gradient 1 times
2022-03-10 14:53:24 | checkpointing working environment to disk
2022-03-10 14:53:26 | running adjoint simulations
2022-03-10 14:53:40 | exporting kernels to /home/bchow/Work/work/sf3_specfem2d_example/scratch/evalgrad
2022-03-10 14:53:42 | 
----------------------------------------------------------------------

----------------
The functions __evaluate_gradient()__ through __compute_direction()__ have run adjoint simulations to generate event kernels and sum the kernels into the misfit kernel. Because we only have one event, our misfit kernel is just exactly our event kernel. Because we did not specify any smoothing lenghts (PAR.SMOOTH_H and PAR.SMOOTH_V), no smoothing of the gradient has occurred. 

Using the L-BFGS optimization algorithm, SeisFlows3 has computed a search direction that will be used in the line search to search for a best fitting model which optimally reduces the objective function. 

We can take a look at where SeisFlows3 has stored the information relating to kernel generation and the optimization computation.

In [28]:
# Gradient evaluation files are stored here, the kernels are stored separately from the gradient incase
# the user wants to manually manipulate them
! ls scratch/evalgrad

gradient  kernels  model  residuals


In [32]:
# SeisFlows3 stores all kernels and gradient information as SPECFEM binary (.bin) files
! ls scratch/evalgrad/gradient

proc000000_vp_kernel.bin  proc000000_vs_kernel.bin


In [34]:
# Kernels are stored on a per-event basis, and summed together (sum/). If smoothing was performed, 
# we would see both smoothed and unsmoothed versions of the misfit kernel
! ls scratch/evalgrad/kernels

001  sum


In [39]:
# We can see that some new values have been stored in prepartion for the line search,
# including g_new (current gradient) and p_new (current search direction). These are also
# stored as vector NumPy arrays (.npy files)
! ls scratch/optimize

f_new  g_new  LBFGS  m_new  p_new


In [41]:
p_new = np.load("scratch/optimize/p_new")
print(p_new)

[-0.00000000e+00 -0.00000000e+00 -0.00000000e+00 ... -4.31527557e-11
 -3.65300012e-11 -5.90630062e-12]


--------------------
### 3d. Line search and model update

Let's finish off the inversion by running through the line search, which will generate new models using the
gradient, evaluate the objective function by running forward simulations, and comparing the evaluated objective function with the value obtained in __initialize__. Satisfactory reduction in the objective function will result in a termination of the line search. We are using a bracketing line search here (CITE RYANS PAPER), which requires finding models which both increase and decrease the misfit with respect to the initial evaluation. Therefore it will likely take more than two trial steps to complete the line search

In [52]:
! seisflows par resume_from line_search  # resume from the line search 
! seisflows par stop_after finalize  # We don't want to run the clean() argument so that we can explore the dir


	RESUME_FROM: evaluate_gradient -> line_search


	STOP_AFTER: compute_direction -> finalize



In [53]:
! seisflows resume -f

2022-03-10 14:54:56 | 
                RESUME ITERATION 1 FROM FUNCTION: 'line_search'                 

2022-03-10 14:54:56 | 
--------------------------------------------------------------------------------
                                ITERATION 1 / 1                                 
--------------------------------------------------------------------------------

2022-03-10 14:54:56 | 
                        CONDUCTING LINE SEARCH (i01s00)                         

2022-03-10 14:54:56 | evaluating bracketing line search
2022-03-10 14:54:56 | step length(s) = 0.00E+00
2022-03-10 14:54:56 | misfit val(s)  = 1.75E-03
2022-03-10 14:54:56 | first iteration, guessing trial step
2022-03-10 14:54:56 | initial step length safegaurd, setting manual step length
2022-03-10 14:54:56 | step length override due to PAR.STEPLENINIT=0.05
2022-03-10 14:54:56 | model parameters (m_try i01s00):
2022-03-10 14:54:56 | 5800.00 <= vp <= 5800.00
2022-03-10 14:54:56 | 3269.01 <= vs <= 3790.00
2022-03-10 1

From the log statements above, we can see that the SeisFlows3 line search required three trial steps, where it modified values of Vs until satisfactory reduction in the objective function was met. This was the final step in the iteration, and so the finalization step was also run, which ran any last-minute functions to prepare for a subsequent iteration. 

In [59]:
# We can see that we have 'new' and 'old' values for each of the optimization values,
# representing the previous model (M00) and the current model (M01).
! ls scratch/optimize

alpha  f_new  f_old  f_try  g_old  LBFGS  m_new  m_old	p_old


In [62]:
# The stats/ directory contains text files describing the optimization/line search
! ls stats

factor		  gradient_norm_L2  output.optim  slope       step_length
gradient_norm_L1  misfit	    restarted	  step_count  theta


In [64]:
# For example we can look at the step length chosen for the accepted trial step in the line search
! cat stats/step_length

6.509678e+09


## 4. Conclusions

We've now seen how SeisFlows3 runs an __Inversion__ workflow using the __Specfem2D__ solver on a __serial__ system (local workstation). More or less, this is all you need to run SeisFlows3 with any combination of modules. The specificities of a system or numerical solver are already handled internally by SeisFlows3, so if you want to use Specmfe3D_Cartesian as your solver, you would only need to run `seisflows par solver specfem3d` at the beginning of your workflow (you will also need to setup your Specfem3D models, similar to what we did for Specfem2D here). To run on a slurm system like Chinook, you can run `seisflows par system chinook`. 