# Notebook S2

This notebook takes you through the steps to running both the template matching and EQT branches on a local server. Note that these tseps require a functioning installation of MPI and its python wrapper, mpi4py.

In [1]:
import obspy
import json
import sys
import os
import pandas as pd

### Create job list

In [2]:
# Build the command to create the job list

# Path to json config file:
config_path = "/Users/zoekrauss/seismicloud/configs/config_sample.json"
# Path to the python environment which has everything you need:
interpreter_path = sys.executable
# Path to the script that creates the job list:
script_path = '/Users/zoekrauss/seismicloud/scripts/template_matching/create_joblist.py'
# Number of CPUs you want to parallelize across:
nproc = 10 

command = ' '.join([interpreter_path,script_path,'--config',config_path,'--nproc',str(nproc)])

In [None]:
# Execute!

os.system(command)

#### We see that the created job list distributes all days of detection to run, following how many days are in the data directory specified in the config file, across the number of CPUs we specified with nproc.

In [4]:
# Let's read in and look at the joblist:
job_path = '/Users/zoekrauss/seismicloud/jobs/NV_2017_templatematching_joblist.csv'
df = pd.read_csv(job_path)
df

Unnamed: 0,network,year,doy,fpath,rank
0,NV,2017,152,/fd1/yiyu_data/Endeavour/data//NV/2017/152,0
1,NV,2017,153,/fd1/yiyu_data/Endeavour/data//NV/2017/153,0
2,NV,2017,154,/fd1/yiyu_data/Endeavour/data//NV/2017/154,0
3,NV,2017,155,/fd1/yiyu_data/Endeavour/data//NV/2017/155,1
4,NV,2017,156,/fd1/yiyu_data/Endeavour/data//NV/2017/156,1
5,NV,2017,157,/fd1/yiyu_data/Endeavour/data//NV/2017/157,1
6,NV,2017,158,/fd1/yiyu_data/Endeavour/data//NV/2017/158,2
7,NV,2017,159,/fd1/yiyu_data/Endeavour/data//NV/2017/159,2
8,NV,2017,160,/fd1/yiyu_data/Endeavour/data//NV/2017/160,2
9,NV,2017,161,/fd1/yiyu_data/Endeavour/data//NV/2017/161,3


### With the job list created, we can now run the detection in parallel using MPI

In [11]:
# Build the command

network = 'NV'
year = 2017

script_path = '/Users/Zoe/seismicloud/scripts/template_matching/distributed_detection.py'

command = ' '.join(['mpirun','-np',str(nproc),interpreter_path,script_path,'--config',config_path,'-n',network,'-y',str(year)])

In [12]:
os.system(command)

'mpirun -np 10 /opt/anaconda3/envs/alaska-ml/bin/python /Users/Zoe/seismicloud/scripts/template_matching/distributed_detection.py --config /Users/Zoe/seismicloud/configs/config_sample.json -n NV -y 2017'

#### As the process is run, it writes its progress to logs in the /logs folder you specified. Here we can look at them:

In [8]:
# Look at the master log

master_log = '/Users/zoekrauss/seismicloud/logs/master.log'
with open(master_log,'r') as f:
    file_contents = f.read()
    print (file_contents)

Submission time:    PDT 10/18/22 17:02:55
Verbose:            0
-----------------------------------
master | 	submit NV.167.2017 to C5 	| PDT 10/18/22 17:02:55
master | 	submit NV.155.2017 to C1 	| PDT 10/18/22 17:02:55
master | 	submit NV.173.2017 to C7 	| PDT 10/18/22 17:02:55
master | 	submit NV.161.2017 to C3 	| PDT 10/18/22 17:02:55
master | 	submit NV.152.2017 to C0 	| PDT 10/18/22 17:02:55
master | 	submit NV.158.2017 to C2 	| PDT 10/18/22 17:02:55
master | 	submit NV.170.2017 to C6 	| PDT 10/18/22 17:02:55
master | 	submit NV.164.2017 to C4 	| PDT 10/18/22 17:02:55
master | 	submit NV.176.2017 to C8 	| PDT 10/18/22 17:02:55
master | 	submit NV.171.2017 to C6 	| PDT 10/18/22 17:03:02
master | 	submit NV.159.2017 to C2 	| PDT 10/18/22 17:03:05
master | 	submit NV.165.2017 to C4 	| PDT 10/18/22 17:03:06
master | 	submit NV.174.2017 to C7 	| PDT 10/18/22 17:03:06
master | 	submit NV.172.2017 to C6 	| PDT 10/18/22 17:03:06
master | 	submit NV.168.2017 to C5 	| PDT 10/18/22 17:03:07


#### We see that the master log above records what time each of the days of detection was submitted to a CPU, and which CPU number it was submitted to. If we're interested in seeing the activity on a specific CPU, say C5, we can look at its specific log:

In [19]:
# Look at the log for CPU 5

cpu5_log = '/Users/zoekrauss/seismicloud/logs/9.log'
with open(cpu5_log,'r') as f:
    file_contents = f.read()
    print (file_contents)

--------------NV.NCHR.2017-----------------
9 | 	master PID 1604743
9 | 	PID 1605343
9 | 	loaded model (ORIGINAL) to GPU:1
9 | 	total 1 days of data
9 | 	2017.181.NV.NCHR 	| Finish, found 33 picks 	 | 12.593 sec
--------------NV.NCHR.2017-----------------



#### We see that the CPU-specific log records some print statements showing the progress of each day of detection it is running on. Any errors that it runs into will be recorded here. If the detection finishes successfully, the number of picks or detections found will be printed here, with the time it took to run the detection.